*****************************************************************************
* CHAPTER 11 - STATISTICS FOR BUSINESS & ECONOMICS, 5th Edition             *
*****************************************************************************
* Example 11.1, p. 415
*
* Read this example carefully.  Be sure you understand the methodology.
*
*----------------------------------------------------------------------------
* Example 11.2, p. 415
*
* Read this example carefully.  Be sure you understand the methodology.
*
*----------------------------------------------------------------------------
* Example 11.3, p. 423
*
* The SAMPLE command is used to specify the sample range of the data to be
* read.  The READ command inputs the data and assigns variable names.  In
* this case, Savings and Loans Profit, Y, Percent Net Revenue per Deposit
* Dollar, X1 and the Number of Offices, X2.  The LIST option on the READ
* command lists all the data read.
* 
SAMPLE 1 25
READ(SAVLOAN.DIF) / DIF LIST
*
* Figure 11.3, p. 424
* The Ordinary Least Squares, OLS, command is used to estimate profit, Y,
* as a function of the percent net revenue per deposit, X1, and the number
* of offices, X2.
*
OLS Y X1 X2
*
* The STAT command with the PCOR option prints the correlation matrix
* of the variables specified.
*
STAT Y X1 X2 / PCOR
*
* Figure 11.4, p. 424
* The OLS command is used to estimate profit, Y, on revenue, X1.
*
OLS Y X1
*
* Figure 11.5, p. 425
* The next OLS command is used to estimate profit, Y, on the Number
* of Offices, X2.
*
OLS Y X2
*
* Table 11.2, is replicated with the OLS command.  The output from this
* regression is suppressed with the "?" command since this same regression
* was previous estimated in the above example.  The PREDICT= option on the 
* OLS command saves the predicted values of the dependent variable and the
* RESID= option saves the residuals from the regression in a variable.
*
?OLS Y X1 X2 / PREDICT=YHAT RESID=E
*
* The sample mean of the dependent variable, Y, is saved with the MEAN=
* option on the STAT command.
*
STAT Y / MEAN=MEANY
*
* The GENR commands are used to generate Columns 4 and 5 and the Sum of
* Squares (SSE, SST and SSR) in Table 11.2.  The PRINT command replicates
* Table 11.2, p. 427.
*
GENR YYBAR=Y-MEANY
GENR YHATYBAR=YHAT-MEANY
PRINT Y YHAT E YYBAR YHATYBAR
*
* When the ANOVA option is specified on the OLS command, the Total Sum of
* Squares (SST), Regression Sum of Squares (SSR), and Error Sum of Squares
* (SSE) are printed from the Mean and Zero.  In this case, the SSE, SST,
* and SSR from the Mean are those listed at the end of Table 11.2.
*
* The ANOVA option prints the Analysis of Variance tables and the F-statistic
* for the test that all the coefficients are zero.
*
OLS Y X1 X2 / ANOVA
*
* SHAZAM automatically stores the SSE, SSR and SST from the OLS regression
* in the temporary variables $SSE, $SSR and $SST.
*
GEN1 CSSR=$SST-$SSE
PRINT $SSR CSSR
*
* The variance, S2, can be calculated based on the formula in Equation
* 11.13 with the GEN1 command.
*
GEN1 K=2
GEN1 S2=$SSE/($N-K-1)
PRINT S2
*
* The Adjusted Coefficient of Determination as defined in Equation 11.14
* is calculated with the GEN1 command.
*
GEN1 R2ADJ=1-(($SSE/($N-K-1))/($SST/($N-1)))
PRINT R2ADJ
*
*----------------------------------------------------------------------------
* Example 11.4, p. 434
*
* The GEN1 and DISTRIB command is used to print out critical values in lieu 
* of referring to a statistical table.  The GEN1 command is used to generate
* a constant, X, at the 0.005 level of significance before the DISTRIB command
* can be executed.  The format of the DISTRIB command is:
*
*  DISTRIB vars / options
*
*  where:  vars      = list of variables
*          options   = list of desired options
*          TYPE=     - specifies the type of distribution
*          DF1=,DF2= - specifies the degrees of freedom for the numerator
*                      (DF1) and the denominator (DF2).  Must be used when
*                      TYPE=F.
*          INVERSE   = computes the inverse survival function
*
GEN1 X=0.005
DISTRIB X / TYPE=T DF=22 INVERSE
*
* SHAZAM automatically calculates the 95% and 90% Confidence Intervals with
* CONFID command.  The 99% Confidence Interval for the coefficients is
* calculated by saving the regression coefficients with the COEF= option and
* the coefficient standard errors with the STDERR= option on the OLS command.
* SHAZAM output for a specific command can be suppressed by placing the "?"
* before the command.
*
?OLS Y X1 X2 / COEF=COEF STDERR=S
*
* The GEN1 command is then used to calculate the lower bound, LOWER1 and upper
* bound, UPPER1 for the 99% Confidence Interval for the Saving and Loan Profit
* Margins variable X1.  The regression coefficients are saved in a vector
* called COEF with the COEF= option on the OLS command.  The coefficient for
* X1 is stored in Row 1, X2 is stored in Row 2 and the constant in Row 3 of
* vector COEF.  The corresponding SHAZAM command for these coefficients is
* COEF:1, COEF:2 and COEF:3.  
*
GEN1 LOWER1=COEF:1-2.819*S:1
GEN1 UPPER1=COEF:1+2.819*S:1
PRINT LOWER1 UPPER1
*
* The above commands are repeated to calculate the lower and upper bounds of
* the 99% Confidence Interval for the Profit Margin variable X2.  However,
* in this case the value in the second row of the vector COEF and S needs to
* be used in the calculation so the COEF:2 and S:2 is used.
*
GEN1 LOWER2=COEF:2-2.819*S:2
GEN1 UPPER2=COEF:2+2.819*S:2
PRINT LOWER2 UPPER2
*
*-----------------------------------------------------------------------------
* Example 11.5, p. 437
*
* The Null Hypothesis is Total Revenue has a significant effect on increasing
* profits conditional on or controlling for the effect of the number of
* offices.  The TEST command is used in SHAZAM to test this hypothesis
* immediately following the OLS command.
*
?OLS Y X1 X2
TEST X1=0
*
* The second Null Hypothesis tests whether the total number of offices has
* a significant effect on reducing profit margins.  
*
TEST X2=0
*
*-----------------------------------------------------------------------------
* Example 11.6, p. 438
*
* The GEN1 command is used to generate the constants for this example since
* the data used for the regression is not provided.  The confidence interval
* for the expected increase in the Effective Property Tax Rate when Government
* Revenue Share increases by 1 percentage point while the Number of Housing
* Units per Square Mile (X1) and Median per Capita Personal Income (X3) remain
* constant.
*
GEN1 B1=0.000567
GEN1 SB1=0.000139
GEN1 B2=0.0183
GEN1 SB2=0.0082
GEN1 B3=-0.000191
GEN1 SB3=0.000446
*
* The 95% Confidence Interval is calculated with the GEN1 command for the
* number of housing units per square mile is:
*
GEN1 LOWER1=B1-2.120*SB1
GEN1 UPPER1=B1+2.120*SB1
PRINT LOWER1 UPPER1
* 
* The 95% Confidence Interval for the percentage of revenue represented by
* grants is:
*
GEN1 LOWER2=B2-2.120*SB2
GEN1 UPPER2=B2+2.120*SB2
PRINT LOWER2 UPPER2
*
* The 95% Confidence Interval for the median per capita personal income is:
*
GEN1 LOWER3=B3-2.120*SB3
GEN1 UPPER3=B3+2.120*SB3
PRINT LOWER3 UPPER3
*
DELETE / ALL
*-----------------------------------------------------------------------------
* Example 11.7, p. 439
*
SAMPLE 1 90
READ(CITYDAT.DIF) / DIF
OLS HSEVAL SIZEHSE INCOM72 TAXRATE COMPER / ANOVA
*
*----------------------------------------------------------------------------
* Example 11.8, p. 444
*
* The Null Hypothesis is that the four predictor variables:
*
*    SIZEHSE = mean number of rooms in houses
*    INCOM72 = mean household income
*    TAXRATE = tax rate per thousand dollars of assessed value for houses
*    COMPER  = percentage of taxable property that is commercial property
*
* are not significant predictors of housing price.  The F-statistic for this
* test where all the coefficients are zero is printed in the above OLS output
* under the ANALYSIS OF VARIANCE - FROM MEAN when the ANOVA option is
* specified.  Alternatively, the TEST command can be used following an OLS
* command to print the F-statistic.  In this example, more than one linear
* combination of coefficients is being tested so the TEST and END commands
* are used to indicate the beginning and end of the coefficients to be tested.
*
TEST
TEST SIZEHSE=0
TEST INCOM72=0
TEST TAXRATE=0
TEST COMPER=0
END
GEN1 X=0.01
DISTRIB X / TYPE=F DF1=4 DF2=85 INVERSE
*
*----------------------------------------------------------------------------
* Example 11.9, p. 446
*
* The SSE from the regression with all four variables is saved in the constant
* SSE4 with the GEN1 command otherwise, this value will be lost when the
* regression of mean market price for houses in the city (HSEVAL) is regressed
* on mean number of rooms in houses (SIZEHSE) and mean household income
* (INCOM72).
*
GEN1 SSE4=$SSE
OLS HSEVAL SIZEHSE INCOM72 / ANOVA
*
* The F-statistic in this example is calculated based on Equation 11.24
* with the GEN1 command.  
*
GEN1 R=2
GEN1 N=90
GEN1 K=2
GEN1 F=(($SSE-SSE4)/R)/(SSE4/(N-K-1))
PRINT F 
*
DELETE / ALL
*-----------------------------------------------------------------------------
* Example 11.10, p. 449
*
* Recall the data from the Savings and Loan example has a range of 1 to 25.
* Therefore, the SAMPLE command must be specified to change the sample range
* of the CityDat to the Savings and Loan data of 25 observations.
*
SAMPLE 1 25
READ(SAVLOAN.DIF) / DIF LIST
*
* The output from the OLS regression is suppressed with the "?" command.
* The regression coefficients are saved in a vector called COEF with the
* COEF= option on the OLS command.  The coefficient for X1 is stored in
* Row 1, X2 is stored in Row 2 and the constant in Row 3 of the vector COEF.
* The corresponding SHAZAM command for these coefficients is COEF:1, COEF:2
* and COEF:3.  The point predictor of profit margins can now be calculated
* with the GEN1 command.
*
?OLS Y X1 X2 / COEF=COEF
PRINT COEF:3 COEF:1 COEF:2
GEN1 X1N=4.50
GEN1 X2N=9000
GEN1 YHAT=COEF:3+(COEF:1*X1N)+(COEF:2*X2N)
PRINT YHAT
*
DELETE / ALL
*----------------------------------------------------------------------------
* Example 11.11, p. 452
*
SAMPLE 1 9
READ(PRODCOST.DIF) / DIF LIST
*
* Figure 11.14 , p. 453
*
PLOT COST UNITS 
*
* Figure 11.15 - Linear Regression
*
OLS COST UNITS / ANOVA
*
* To estimate the Quadratic Model as shown in Figure 11.16, the GENR command
* must be used first to calculate the UNITS squared.
* 
GENR UNITS2=UNITS**2
OLS COST UNITS UNITS2 / ANOVA
PRINT UNITS UNITS2 COST
*
DELETE / ALL
*----------------------------------------------------------------------------
* Example 11.12, p. 456
*
SAMPLE 1 24
READ(BOATPRO.DIF) / DIF LIST
*
* The GENR command with the LOG(x) function is used to transform the data
* for the variables BOATS, PRODUCT and WORKERS into the Natural Logs.
*
GENR YK=BOATS/PRODUCT
GENR LK=WORKERS/PRODUCT
GENR LNY=LOG(YK)
GENR LNL=LOG(LK)
*
* To save the forecasted number of boats below, the DIM command must first
* be used to dimension a vector before the data can be defined.  The
* format of the DIM command is:
*
*    DIM var size var size ....
*
*    where:  var  = name of the vector or matrix to be dimensioned
*            size = a one or two numbers separated by a space to
*                   indicate the size of the var to be dimensioned
*
DIM FBOATS 24
*
* The LOGLOG option on the OLS regression is specified since the dependent
* and independent variables are in Log form and this ensures that the
* estimated elasticities are correctly calculated.
*
* The FC command is specified after the OLS regression to forecast the number
* of boats.  The format of the command is:
*
*    estimation command
*    FC / options
*
*    where:  estimation command = AUTO, BOX, LOGIT, OLS, POOL, PROBIT, 
*                                 TOBIT, GLS, MLE or 2SLS
*            options            = list of desired options
*
OLS LNY LNL / LOGLOG ANOVA
FC / LIST PREDICT=FBOATS
PLOT BOATS FBOATS
*
DELETE / ALL
*----------------------------------------------------------------------------
* Example 11.13, p. 459
*
* In this example, the Salaries of Male and Female Financial Analysts is
* defined as Y, Years of Experience as X1 and Gender as X2.  X2=0 when it
* is a Female Employee and X2=1 when it is a Male Employee.
*
SAMPLE 1 12
READ(GENSAL.DIF) / DIF LIST
*
* Figure 11.20, p. 460
*
PLOT X1 X2 Y
*
* Figure 11.21, p. 460
*
OLS Y X2 X1 / ANOVA
*
DELETE / ALL
*----------------------------------------------------------------------------
* Example 11.14, p. 462
*
* The data in this example is based on the data from Example 11.13.  As
* previously defined, the Salaries of Male and Female Financial Analysts is
* defined as Y, Years of Experience as X1 and Gender as X2 (X2=0 if Female
* and X2=1 if Male).  The variable Experience Times Gender is defined as
* EXPGEN.
*
READ(GENSALINCR.DIF) / DIF LIST
*
* Figure 11.22, p. 462
*
PLOT X1 X2 Y 
*
* Figure 11.23, p. 463
*
* The T-statistic is automatically printed out with the SHAZAM output in
* an OLS regression.  The TEST command immediately following an OLS command
* can also be used to calculate the T-statistic for the Experience times
* Gender variable, EXPGEN.
*
OLS Y X2 EXPGEN X1 / ANOVA
TEST EXPGEN
*
DELETE / ALL
*----------------------------------------------------------------------------
* Multiple Regression Analysis Application Procedure, p. 466
*
SAMPLE 1 28
READ(COTTON.DIF) / DIF LIST
GENR T=YEAR+0.25*QUARTER
*
* Figure 11.24 and 11.25, p. 467
*
STAT COTTONQ WHOPRICE IMPFAB EXPFAB T / PCOR
*
* Figure 11.27, p. 469
*
OLS COTTONQ WHOPRICE IMPFAB EXPFAB T / ANOVA 
*
* Figure 11.28, p. 469
*
* The F and t statistics for the variable EXPFAB can be calculated using the
* TEST command immediately following an OLS regression.  The PREDICT= option
* saves the predicted values of the dependent variable in the vector called
* YHAT for later use.  The RESID= option is used to saved the regression
* residuals in the vector called RESL1 for later use.  The regression's SSR
* and SSE are saved in the constant SSR1 and SSE for later use manually in
* calculating the F statistic.
*
OLS COTTONQ WHOPRICE IMPFAB EXPFAB / ANOVA LIST PREDICT=FITS1 RESID=RESL1
TEST EXPFAB
GEN1 SSR1=$SSR
GEN1 SSE=$SSE
*
* Figure 11.29, p. 470
*
OLS COTTONQ WHOPRICE IMPFAB / ANOVA
GEN1 N=28
GEN1 K=3
*
* Manually, the F statistic is calculated with the GEN1 command.
*
GEN1 FX3=(SSR1-$SSR)/(SSE/(N-K-1))
PRINT FX3
*
* Figure 11.32, p. 472
*
PLOT RESL1 WHOPRICE
PLOT RESL1 / TIME
*
* Figure 11.33, p. 473
*
PLOT RESL1 IMPFAB
PLOT RESL1 EXPFAB
*
* Figure 11.34, p. 473
*
PLOT RESL1 COTTONQ
*
* Figure 11.35, p. 474
*
PLOT RESL1 FITS1
*
DELETE / ALL
*-----------------------------------------------------------------------------
*
STOP