Ordinary Least Squares

## Ordinary Least Squares Regression

The general form of the linear regression equation considers the relationship between a dependent variable and several explanatory variables. This is demonstrated with the Theil textile data set. Consider estimating the relationship between the dependent variable `CONSUME` and the explanatory variables `INCOME` and `PRICE`. The linear regression equation is:

CONSUMEt = 0 + 1 INCOMEt + 2 PRICEt + t

where t is a random error term. Ordinary least squares estimates of the parameters can be obtained with the next SHAZAM commands.

 ```SAMPLE 1 17 READ (THEIL.txt) YEAR CONSUME INCOME PRICE OLS CONSUME INCOME PRICE STOP ```

The `OLS` command contains a list of variables. The dependent variable must be listed as the first variable name. All variable names that follow are the explanatory variables. SHAZAM automatically includes an intercept in the regression equation. Note: If there is some compelling reason to exclude the intercept parameter this can be done by specifying the option `NOCONSTANT` on the `OLS` command. This will then give a regression through the origin.

With the `OLS` command the explanatory variables can be listed in any order. On the `READ` command the variables must be listed in the order that they appear in the data file. This does not need to be the order that is used on the `OLS` command. So the `OLS` command:

 `OLS CONSUME INCOME PRICE `

is equivalent to the command:

 `OLS CONSUME PRICE INCOME `

The SHAZAM OLS estimation results are below.

 ``` |_SAMPLE 1 17 |_READ (THEIL.txt) YEAR CONSUME INCOME PRICE UNIT 88 IS NOW ASSIGNED TO: THEIL.txt 4 VARIABLES AND 17 OBSERVATIONS STARTING AT OBS 1 |_OLS CONSUME INCOME PRICE OLS ESTIMATION 17 OBSERVATIONS DEPENDENT VARIABLE = CONSUME ...NOTE..SAMPLE RANGE SET TO: 1, 17 R-SQUARE = .9513 R-SQUARE ADJUSTED = .9443 VARIANCE OF THE ESTIMATE-SIGMA**2 = 30.951 STANDARD ERROR OF THE ESTIMATE-SIGMA = 5.5634 SUM OF SQUARED ERRORS-SSE= 433.31 MEAN OF DEPENDENT VARIABLE = 134.51 LOG OF THE LIKELIHOOD FUNCTION = -51.6471 VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 14 DF P-VALUE CORR. COEFFICIENT AT MEANS INCOME 1.0617 .2667 3.981 .001 .729 .2387 .8129 PRICE -1.3830 .8381E-01 -16.50 .000 -.975 -.9893 -.7846 CONSTANT 130.71 27.09 4.824 .000 .790 .0000 .9718 |_STOP ```

The intercept estimate (assigned the name `CONSTANT`) is listed as the final coefficient estimate. The estimated equation can be written as:

```         CONSUME = 130.7 + 1.06 INCOME - 1.38 PRICE + ê
```

where ê is the estimated residual.

The OLS estimation output for the model with 2 or more explanatory variables can be interpreted in a similar way to the estimation results that are obtained for the model with 1 explanatory variable. That is, the `T-RATIO` gives the t-statistic for a test of the null hypothesis that the coefficient is zero. The `P-VALUE` gives the associated p-value for a two-sided test.

Note that the R-square estimate is .9513. What does this mean ? It says that 95.13% of the variation in the dependent variable `CONSUME` has been explained by the regression equation. This suggests a very "good fit". However, "high" R-square values can be typical of models that use time series data. Economic time series may have a similar tendency to follow an upward or downward trend. When working with time series data, it is important to test for the presence of serial correlation in the residuals. This is discussed later in this guide.

[SHAZAM Guide home]