Ordinary Least Squares

Ordinary Least Squares Regression


The general form of the linear regression equation considers the relationship between a dependent variable and several explanatory variables. This is demonstrated with the Theil textile data set. Consider estimating the relationship between the dependent variable CONSUME and the explanatory variables INCOME and PRICE. The linear regression equation is:

      CONSUMEt = beta0 + beta1 INCOMEt + beta2 PRICEt + et

where et is a random error term. Ordinary least squares estimates of the parameters can be obtained with the next SHAZAM commands.

SAMPLE 1 17
READ (THEIL.txt) YEAR CONSUME INCOME PRICE
OLS CONSUME INCOME PRICE
STOP

The OLS command contains a list of variables. The dependent variable must be listed as the first variable name. All variable names that follow are the explanatory variables. SHAZAM automatically includes an intercept in the regression equation. Note: If there is some compelling reason to exclude the intercept parameter this can be done by specifying the option NOCONSTANT on the OLS command. This will then give a regression through the origin.

With the OLS command the explanatory variables can be listed in any order. On the READ command the variables must be listed in the order that they appear in the data file. This does not need to be the order that is used on the OLS command. So the OLS command:

OLS CONSUME INCOME PRICE

is equivalent to the command:

OLS CONSUME PRICE INCOME

The SHAZAM OLS estimation results are below.

 |_SAMPLE 1 17
 |_READ (THEIL.txt) YEAR CONSUME INCOME PRICE
 
 UNIT 88 IS NOW ASSIGNED TO: THEIL.txt
    4 VARIABLES AND       17 OBSERVATIONS STARTING AT OBS       1
 
 |_OLS CONSUME INCOME PRICE
 
  OLS ESTIMATION
       17 OBSERVATIONS     DEPENDENT VARIABLE = CONSUME
 ...NOTE..SAMPLE RANGE SET TO:    1,   17
 
  R-SQUARE =    .9513     R-SQUARE ADJUSTED =    .9443
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   30.951
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   5.5634
 SUM OF SQUARED ERRORS-SSE=   433.31
 MEAN OF DEPENDENT VARIABLE =   134.51
 LOG OF THE LIKELIHOOD FUNCTION = -51.6471
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      14 DF   P-VALUE CORR. COEFFICIENT  AT MEANS 
 INCOME     1.0617      .2667       3.981      .001  .729      .2387      .8129
 PRICE     -1.3830      .8381E-01  -16.50      .000 -.975     -.9893     -.7846
 CONSTANT   130.71      27.09       4.824      .000  .790      .0000      .9718
 |_STOP

The intercept estimate (assigned the name CONSTANT) is listed as the final coefficient estimate. The estimated equation can be written as:

         CONSUME = 130.7 + 1.06 INCOME - 1.38 PRICE + ê

where ê is the estimated residual.

The OLS estimation output for the model with 2 or more explanatory variables can be interpreted in a similar way to the estimation results that are obtained for the model with 1 explanatory variable. That is, the T-RATIO gives the t-statistic for a test of the null hypothesis that the coefficient is zero. The P-VALUE gives the associated p-value for a two-sided test.

Note that the R-square estimate is .9513. What does this mean ? It says that 95.13% of the variation in the dependent variable CONSUME has been explained by the regression equation. This suggests a very "good fit". However, "high" R-square values can be typical of models that use time series data. Economic time series may have a similar tendency to follow an upward or downward trend. When working with time series data, it is important to test for the presence of serial correlation in the residuals. This is discussed later in this guide.


Home [SHAZAM Guide home]