Chapter 12 - STATISTICS FOR BUSINESS & ECONOMICS by Paul Newbold
*****************************************************************************
* CHAPTER 12 - STATISTICS FOR BUSINESS & ECONOMICS, 4th Ed., by Paul Newbold*
*****************************************************************************
*
* Read in Table 12.1 data on the cost of advertisement and circulation (X) and
* return-on-inquiry cost (Y) on page 431.
*
* The SAMPLE command is used to specify the sample range of the data to be
* read.  The READ command inputs the data and assigns variable names.  In
* this case, the random variables of X and Y are assigned to X and Y.  The
* LIST option on the READ command lists all the data read.
*
SAMPLE 1 17
READ X Y / LIST  
 4.07    17.41
 2.51    22.25
 1.25   106.84
14.67    14.41
16.02    24.18
 3.81    29.73
 9.87    35.95
 1.27    61.81
 1.80    48.36
 1.50    78.74
 1.68    66.42
 2.72   121.95
 1.61    21.93
 1.52    31.29
 3.10    88.31
 3.32    92.70
 3.07    59.06
*
* Plot Figure 12.3 on page 431.
*
PLOT Y X
*
* The GENR command is used to calculate the sample correlation and the PRINT
* command prints the variables specified to replicate Table 12.2 on page 432.
*
GEN1 N=17
GENR X2=X**2
GENR Y2=Y**2
GENR XY=X*Y
PRINT X Y X2 Y2 XY
*
* The MEAN= option on the STAT command is used to stored the mean of
* the variable specified in a constant.  The SUM= option stores the sum of
* the variable specified in a constant.
*
STAT X / MEAN=XBAR SUM=SUMX
STAT Y / MEAN=YBAR SUM=SUMY
STAT XY / SUM=SUMXY
STAT X2 / SUM=SUMX2
STAT Y2 / SUM=SUMY2
*
* Print the value for the number of observations, the mean of X, and the
* mean of Y.
*
PRINT N XBAR YBAR
*
* Replicate the summation of X, Y, X2, Y2, XY in Table 12.2.
*
PRINT SUMX SUMY SUMX2 SUMY2 SUMXY
*
* Calculate the Sample Correlation.
*
GEN1 R=(SUMXY-N*XBAR*YBAR)/(SQRT((SUMX2-N*(XBAR**2))*(SUMY2-N*(YBAR**2))))
PRINT R
*
* The Sample Correlation of variables X and Y can easily be calculated
* in SHAZAM with the PCOR option on the STAT command.  This method of
* calculating the sample correlation is shorter than the method described
* in the textbook.
*
STAT X Y / PCOR
*
* The Null Hypothesis of no population correlation against the two-sided
* alternative.  The test statistic is calculated using the GEN1 command.
*
GEN1 T=R/(SQRT((1-R**2)/(N-2)))
PRINT T
*
*-----------------------------------------------------------------------------
* Example 12.1, page 435
*
* The number of countries surveyed in a political risk project is defined as
* N and the sample correlation between political risk and inflation for these
* countries is defined as R.
*
GEN1 N=49
GEN1 R=0.43
*
* The Null Hypothesis is that no population correlation between political
* riskiness and inflation.  The Alternative Hypothesis is there is no
* positive correlation.  The GEN1 command is used to calculate the test
* statistic.
*
GEN1 STAT=R/(SQRT((1-R**2)/(N-2)))
PRINT STAT
*
*-----------------------------------------------------------------------------
* Example on page 437
*
* The Spearman's Rank Correlation is calculated when the PRANKCOR option
* is specified on the STAT command.
*
STAT X Y / PRANKCOR
*
DELETE / ALL
*
*----------------------------------------------------------------------------
* 12.4 Least Squares Estimation, page 448
*
* The SAMPLE command is used to specify the sample range of the data to be
* read.  The READ command inputs the data and assigns variable names.  In
* this case, the disposable income is assigned X and retail sales is Y.
* The LIST option on the READ command lists all the data read.
* 
SAMPLE 1 22
READ YEAR X Y / LIST
 1     9098    5492
 2     9138    5540
 3     9094    5305
 4     9282    5507
 5     9229    5418
 6     9347    5320
 7     9525    5538
 8     9756    5692
 9    10282    5871
10    10662    6157
11    11019    6342
12    11307    5907
13    11432    6124
14    11449    6186
15    11697    6224
16    11871    6496
17    12018    6718
18    12523    6921
19    12053    6471
20    12088    6394
21    12215    6555
22    12494    6755
*
* Plot Figure 12.4 on page 443.
*
PLOT Y X
*
* The GENR command is used to calculate XY and X2.
*
GEN1 N=22
GENR XY=X*Y
GENR X2=X**2
*
* Replicate Table 12.6, page 449.
*
PRINT X Y XY X2
*
* The MEAN= option on the STAT command is used to stored the mean of
* the variable specified in a constant.  The SUM= option stores the sum of
* the variable specified in a constant.
*
STAT X / MEAN=XBAR SUM=SUMX
STAT Y / MEAN=YBAR SUM=SUMY
STAT XY / SUM=SUMXY
STAT X2 / SUM=SUMX2
PRINT SUMX SUMY SUMXY SUMX2
PRINT XBAR YBAR
GEN1 B=(SUMXY-(N*XBAR*YBAR))/(SUMX2-(N*XBAR**2))
GEN1 A=YBAR-(B*XBAR)
PRINT B A
*
* The above least squares regression can be easily estimated in SHAZAM
* using the OLS command as follows:
*
OLS Y X
*
*----------------------------------------------------------------------------
* 12.7 The Explanatory Power of a Linear Regression Equation, page 453
*
GENR YHAT=A+B*X
GENR E=Y-YHAT
GENR YYBAR=Y-YBAR
GENR YHATYBAR=YHAT-YBAR
PRINT Y YHAT E YYBAR YHATYBAR
*
* The Error Sum of Squares (SSE), Total Sum of Squares (SST) and Coefficient
* of Determination (R2) can be calculated in using a combination of the
* GENR and GEN1 commands.
*
GENR E2=E**2
STAT E2 / SUMS=SSE
GENR YYBAR2=YYBAR**2 
STAT YYBAR2 / SUMS=SST
GEN1 R2=1-(SSE/SST)
PRINT SSE SST R2
*
* The COEF= option on the OLS command stores the regression coefficients in
* the vector called COEF.  This vector will be required in a following
* example.
*
* The SSE, SST and R2 values are automatically calculated in SHAZAM when
* an Ordinary Least Squares regression is estimated.  The "?" preceeding
* the OLS command suppress the OLS output.  The SSE, SST and R2 are stored
* in the temporary variables $SSE, $SST and $R2 after an OLS regression.
* The PRINT command is used to print out these temporary variables.
*
?OLS Y X / COEF=COEF
PRINT $SSE $SST $R2 
*
*----------------------------------------------------------------------------
* 12.8 Confidence Intervals and Hypothesis Tests, page 459
*
* The common variance of the error term can be calculated with the GEN1
* command.  The temporary variables $SSE and $N are available following
* an OLS regression.
*
GEN1 SE2=$SSE/($N-2)
*
* Recall in Table 12.5 the values for X-bar and the sum of X-squared were
* saved in the variable XBAR and SUMX2 respectively.
*
PRINT SE2 XBAR SUMX2
*
* The unbiased estimate of variance of Beta is calculated:
*
GEN1 SIGMAB=SE2/(SUMX2-($N*(XBAR**2)))
PRINT SIGMAB
*
* The estimated standard deviation of the least square estimator of the
* slope is:
*
GEN1 SB=SQRT(SIGMAB)
PRINT SB
*
*-----------------------------------------------------------------------------
* Confidence Intervals for the Population Regression Slope, page 461
*
* Recall that the COEF= option on the OLS command saved the regression
* coefficients in the vector COEF.  In Row 1 of this vector the estimate
* for the slope coefficient is stored and Row 2 stores the estimate for the
* constant.
*
* The 99% Confidence Interval for the retail sales on disposable income
* regression is:
*
GEN1 B=COEF:1
GEN1 LOWER=B-2.845*SB
GEN1 UPPER=B+2.845*SB
PRINT LOWER UPPER
*
*-----------------------------------------------------------------------------
* Tests of the Population Regression Slope, page 463
*
GEN1 B0=0
GEN1 T=(B-B0)/SB
PRINT T
*
*-----------------------------------------------------------------------------
* Confidence Intervals for Prediction, page 465
*
* The results for Equations 12.9.3 and 12.9.4 found on page 465 can be
* calculated using the GEN1 commands in SHAZAM.  This is a long and tedious
* process but it is done to illustrate the procedure to achieve the end
* results.
*
GEN1 X=12000
GEN1 A=1923
GEN1 B=0.3815
GEN1 YHAT=A+B*X
GEN1 N=22
GEN1 XBAR=10799
GEN1 SUMX2=2599715*1000
GEN1 SE2=21789.95
GEN1 FCSE=SQRT((1+(1/N)+((X-XBAR)**2)/(SUMX2-N*(XBAR**2)))*SE2)
GEN1 LAST=(X-XBAR)**2/(SUMX2-N*(XBAR**2))
GEN1 MEANPRED=SQRT(((1/N)+(X-XBAR)**2/(SUMX2-N*(XBAR**2)))*SE2)
PRINT FCSE MEANPRED
*
* The 95% Confidence Interval in this case has a critical value of 2.086.
*
* Therefore, 95% Confidence Interval for the prediction of the actual value
* for retail sales in a year when disposable income equals $12000 is:
*
GEN1 PLOWER=YHAT-2.086*FCSE
GEN1 PUPPER=YHAT+2.086*FCSE
PRINT PLOWER PUPPER
*
* The 95% Confidence Interval for the Expected Value of retail sales when
* disposable income equals $12000 is:
*
GEN1 ELOWER=YHAT-2.086*MEANPRED
GEN1 EUPPER=YHAT+2.086*MEANPRED
PRINT ELOWER EUPPER
*
DELETE / ALL
*
* Alternatively, a short method in SHAZAM is available in estimating the
* confidence intervals.  The FC command in SHAZAM can be used.
*
* The FC command is used following a regression to forecast into the future.
* The format of the FC command is:
*
*     estimation command
*     FC / options
*
* where:  estimation command = AUTO, BOX, LOGIT, OLS, POOL, PROBIT or TOBIT.
*         options            = list of desired options.
*
* The contents under the Square Root sign in Equations 12.9.3 and 12.9.4
* represent 2 types of forecast error.  Under the Square Root sign of 
* Equation 12.9.3 is the Individual Predicted Error and under the Square
* Root sign of Equation 12.9.4 is the Mean or Conditional Predicted Error.
*
* The data provided for this forecast is that the disposable income per
* household (X) is equal to $12000 for observation N+1.  The retail sales
* per household (Y) in the year N+1 variable is to the observation to be
* predicted and in the data file TABL125B it is set to zero.
*
* The DIM command is used to dimension a vector or matrix before any data is
* defined.   This command is required since the predicted value of the
* dependent variable, Y, is saved in the vector FC and the forecasted
* standard errors are to be saved in the vector SE.
*
* The format of the DIM command is:
*
*     DIM var size var size ...
*
* where:  var  = the name of the vector or matrix
*         size = the size of the vector or matrix
*
SAMPLE 1 23
DIM FC1 23 SE1 23 FC2 23 SE2 23
READ YEAR X Y / LIST
 1     9098    5492
 2     9138    5540
 3     9094    5305
 4     9282    5507
 5     9229    5418
 6     9347    5320
 7     9525    5538
 8     9756    5692
 9    10282    5871
10    10662    6157
11    11019    6342
12    11307    5907
13    11432    6124
14    11449    6186
15    11697    6224
16    11871    6496
17    12018    6718
18    12523    6921
19    12053    6471
20    12088    6394
21    12215    6555
22    12494    6755
23    12000       0
*
* SHAZAM automatically calculates the Individual Predicted Error on the FC
* command.  The FCSE= option on the FC command saves these forecasted
* standard errors in the specified vector which are then used to calculate
* the Confidence Interval.  The predicted values (Yhat) in Equation 12.9.3
* are saved with the PREDICT= option in the regression preceding the FC
* command.  The critical values for the Confidence Intervals are found in
* the t-tables.  Now all the required data to estimate Equation 12.9.3
* is defined. 
*
* Note, the sample range must be changed from 1 23 to 1 22 before the OLS
* regression is estimated.
*
SAMPLE 1 22
OLS Y X
FC / LIST BEG=23 END=23 PREDICT=FC1 FCSE=SE1
*
* The 95% Confidence Interval for retail sales in a year when a disposable
* income, X=12000 is calculated with the GENR command.  The SAMPLE 23 23
* command extracts only the 23rd observation for the predicted values of
* Yhat and the forecasted standard errors.
*
* Note:  If the GEN1 command is used the confidence interval will be equal
*        to zero since the GEN1 command is equal to the SAMPLE 1 1 command
*        and the GENR command to generate a variable with only one
*        observation.
*
SAMPLE 23 23
GENR LOWER1=FC1-2.086*SE1
GENR UPPER1=FC1+2.086*SE1
*
* The predicted Yhat (FC1), forecasted standard error (SE1), lower bound of
* the 95% Confidence Interval (LOWER1), upper bound of the 95% Confidence
* Interval (UPPER1) are printed with the SHAZAM PRINT command.
*
PRINT FC1 SE1 LOWER1 UPPER1
*
* Equation 12.9.4 can be calculated similarly as Equation 12.9.3.  In this
* case, the Mean Predicted Error is printed with the MEANPRED option on
* the FC command.  The predicted values (Yhat) and the critical values for
* the Confidence Intervals is determined as in the previous equation.
*
* First the sample range must be reset to 1 22 since the previous sample
* range was set to 23 23.
*
SAMPLE 1 22
OLS Y X
FC / LIST MEANPRED BEG=23 END=23 PREDICT=FC2 FCSE=SE2
SAMPLE 23 23
GENR LOWER2=FC2-2.086*SE2
GENR UPPER2=FC2+2.086*SE2
*
* The predicted Yhat (FC1), forecasted standard error (SE1), lower bound of
* the 95% Confidence Interval (LOWER1), upper bound of the 95% Confidence
* Interval (UPPER1) are printed with the SHAZAM PRINT command.
*
PRINT FC2 SE2 LOWER2 UPPER2
*
DELETE / ALL
*
*-----------------------------------------------------------------------------
*
STOP