Chapter 14 - STATISTICS FOR BUSINESS & ECONOMICS by Paul Newbold
*****************************************************************************
* CHAPTER 14 - STATISTICS FOR BUSINESS & ECONOMICS, 4th Ed., by Paul Newbold*
*****************************************************************************
*
* Lagged Dependent Variables on page 542.
*
* The SAMPLE command is used to specify the sample range of the data to be
* read.  The READ command inputs the data and assigns variable names.  In
* this case, the variables are Year, YEAR and Local Advertising Expenditure
* per Household, ADVERT.  
*
* A second READ command is used to read in the data in Table 12.5.
*
SAMPLE 1 23
READ YEAR ADVERT 
 0        115.80
 1        117.66
 2        115.62
 3        110.79
 4        119.22
 5        120.78
 6        110.20
 7        110.86
 8        114.06
 9        120.87
10        127.03
11        132.08
12        132.27
13        134.69
14        138.62
15        136.15
16        144.17
17        154.03
18        161.39
19        157.72
20        145.37
21        152.73
22        155.70
*
* The sample range for Table 12.5 is from observation 2 to 23 rather than 1 
* to 22 since the first observation in Table 14.2 is omitted when the OLS
* regression is estimated with a lagged variable for Local Advertising
* Expenditure per Household, LAGADVT.
*
SAMPLE 2 23
READ YYEAR DINCOME RSALES 
 1     9098    5492
 2     9138    5540
 3     9094    5305
 4     9282    5507
 5     9229    5418
 6     9347    5320
 7     9525    5538
 8     9756    5692
 9    10282    5871
10    10662    6157
11    11019    6342
12    11307    5907
13    11432    6124
14    11449    6186
15    11697    6224
16    11871    6496
17    12018    6718
18    12523    6921
19    12053    6471
20    12088    6394
21    12215    6555
22    12494    6755
*
* The assumption that advertisers may be unwilling or unable to adjust their
* plans to sudden changes in the level of retail sales, the value of local
* advertising expenditures per household in the previous year was added to
* the model.  Thus, the GENR command is used to lag the Local Advertising
* Expenditure per Household, ADVERT, back one year.
*
SAMPLE 1 23
GENR LAGADVT=LAG(ADVERT)
PRINT YEAR ADVERT LAGADVT 
*
* The sample range of the data set must be changed to 2 to 23 since one
* observation is lost when the variable ADVERT was lagged back one year.
* If the sample range was not changed to reflect the lagged variable, the
* regression estimates would be incorrect.
*
* The COEF= option is used on the OLS command to save the regression estimates
* in the vector called COEF.  These coefficients will be used in calculating
* the expected impact on Local Advertising per Household with the GEN1
* command.  The estimated coefficient for RSALES is stored in Row 1 of the
* vector COEF (COEF:1), LAGADVT is stored in Row 2 (COEF:2) and the regression
* constant is stored in Row 3 (COEF:3).
*
* The CONFID command is used to estimate the 95% Confidence Interval.  This
* command must follow an regression estimation and thus, the expected impact
* on Local Advertising per Household is calculated after the Confidence
* Interval estimation.
*
* The TEST command tests the Null Hypothesis that the coefficient on the
* lagged Local Advertising per Household, LAGADVT, is equal to 0.
*
SAMPLE 2 23
OLS ADVERT RSALES LAGADVT / COEF=COEF
CONFID RSALES LAGADVT
TEST LAGADVT=0
GEN1 INCR1=COEF:1*COEF:2
GEN1 INCR2=(COEF:2**2)*COEF:1
GEN1 TOTAL=COEF:1/(1-COEF:2)
PRINT INCR1 INCR2 TOTAL
*
DELETE / ALL
*
*----------------------------------------------------------------------------
* Specification Bias example on page 557.
*
* The annual data on the Percentage of Profit Margin, Y, of savings and loan
* associations, their Percentage Net Revenues per Deposit Dollar, X1, and
* the Number of Offices, X2.
*
SAMPLE 1 25
READ YEAR X1 X2 Y 
 1     3.92    7298    0.75
 2     3.61    6855    0.71
 3     3.32    6636    0.66
 4     3.07    6506    0.61
 5     3.06    6450    0.70
 6     3.11    6402    0.72
 7     3.21    6368    0.77
 8     3.26    6340    0.74
 9     3.42    6349    0.90
10     3.42    6352    0.82
11     3.45    6361    0.75
12     3.58    6369    0.77
13     3.66    6546    0.78
14     3.78    6672    0.84
15     3.82    6890    0.79
16     3.97    7115    0.70
17     4.07    7327    0.68
18     4.25    7546    0.72
19     4.41    7931    0.55
20     4.49    8097    0.63
21     4.70    8468    0.56
22     4.58    8717    0.41
23     4.69    8991    0.51
24     4.71    9179    0.47
25     4.78    9318    0.32
OLS Y X1 X2
*
* Estimate the regression of Profit Margin on Net Revenue.
*
OLS Y X1
*
*----------------------------------------------------------------------------
* Heteroskedascity example on page 568
*
* The LIST option on the OLS command lists and plots the residuals and
* predicted values of the dependent variable and residual statistics.  The
* predicted values of the dependent variable, Y are saved with the PREDICT=
* option and the regression residuals are saved with the RESID= option on
* the OLS command.
*
OLS Y X1 X2 / LIST PREDICT=YHAT RESID=E
*
* The PLOT command is used to plot the regression residuals against the
* independent variable of savings and loan associations, their Percentage
* Net Revenues per Deposit Dollar, X1.  
*
* Figure 14.9 (a) on page 569.
*
PLOT E X1
*
* Next, the regression residuals are plotted against the Number of Offices,
* X2.
*
* Figure 14.9 (b) on page 569.
*
PLOT E X2
*
* Finally, the regression residuals are plotted against the predicted values
* of the dependent variable, Percentage of Profit Margin.
*
* Figure 14.10 on page 569.
*
PLOT E YHAT
*
* The GENR command is used to calculate the squared residuals, E2.
*
GENR E2=E**2
*
* The PRINT command is used to replicate Table 14.5 on page 570.
*
PRINT E2 YHAT
*
* The OLS command is used to estimate the least squares regression of squared
* residuals, E2 on the predicted values, YHAT.  The regression estimates in
* SHAZAM are different from those listed at the top of page 571 of the
* textbook.  It is not clear why the results are different even though the
* data used in SHAZAM for the regression is the same as Table 14.5 on
* page 570 of the textbook.
*
OLS E2 YHAT
TEST YHAT=0
GEN1 STAT=$N*$R2
PRINT STAT
*
*----------------------------------------------------------------------------
* Autocorrelated Errors on page 576
*
* The Durbin-Watson statistic is available in the temporary variable $DW
* after an OLS regression if the RSTAT, LIST or MAX options are specified.
* The PRINT command is used to print the Durbin-Watson statistic.
*
OLS Y X1 X2 / RSTAT
PRINT $DW
*
DELETE / ALL
*
*----------------------------------------------------------------------------
* Example 14.2, page 579
*
SAMPLE 1 30
READ Y X1 X2 X3 X4 X5 
2.03   25.4   9.9   17    1    0
2.22   26.7   4.7   18    1    0
2.27   29.1   1.9   23    1    0
2.12   29.2   1.2   28    1    0
2.04   29.2   1.9   30    1    0
2.41   27.8   3.9   27    0    0
2.66   27.4   3.9   24    0    1
2.49   28.0   3.8   23    0    1
2.45   28.3   5.9   25    0    1
2.41   28.8   5.3   23    1    0
2.49   29.3   3.3   24    1    0
2.51   29.4   3.0   25    1    0
2.50   29.2   2.9   25    1    0
2.53   29.4   5.5   25    0    0
2.50   30.2   4.4   25    0    1
2.52   31.0   4.1   24    0    1
2.53   31.2   4.3   25    0    1
2.45   31.5   6.8   25    0    0
2.40   31.7   5.5   26    0    0
2.37   32.3   5.5   27    0    0
2.33   32.6   6.7   26    0    0
2.24   32.7   5.5   26    0    0
2.17   33.2   5.7   26    0    0
2.10   33.6   5.2   26    0    0 
1.94   34.0   4.5   27    1    0
1.84   34.6   3.8   27    1    0
1.78   35.1   3.8   27    1    0
1.75   35.5   3.6   28    1    0
1.78   36.3   3.5   30    1    0
1.84   36.7   4.9   33    1    0
OLS Y X1 X2 X3 X4 X5 / RSTAT
*
* The regression R-squared is stored in the temporary variable $R2 after an
* OLS regression.
*
PRINT $R2 $DW 
*
*----------------------------------------------------------------------------
* Example 14.3, page 582
*
* The book computes rho, R, the long way in estimating the regression with
* autocorrelation.  The AUTO command in SHAZAM is a built-in way to calculate
* the value of rho, R in a model with autocorrelation.   This is the
* preferred way.
*
* The general format of the AUTO command is:
*
*     AUTO depvar indeps / options
*
* where:  depvar  = the dependent variable
*         indeps  = list of independent variables
*         options = list of desired options
*
* Note:  The regression results below do not match those found in the
*        textbook.
*
AUTO Y X1 X2 X3 X4 X5 
*
* The TEST command tests the Null Hypothesis that all else being equal the
* percentage of women in the labor force, X1, does not influence the birth
* rate.
TEST X1=0
*
DELETE / ALL
*
*----------------------------------------------------------------------------
*
STOP