Chapter 17 - STATISTICS FOR BUSINESS & ECONOMICS by Paul Newbold
*****************************************************************************
* CHAPTER 17 - STATISTICS FOR BUSINESS & ECONOMICS, 4th Ed., by Paul Newbold*
*****************************************************************************
*
* Index Numbers on page 684.
*
* The SAMPLE command is used to set the sample range 1 10 for the data found
* in Table 17.4 and 17.6.
*
SAMPLE 1 10
*
* Table 17.4 stores the Prices per Bushel of Wheat, Corn and Soybeans on
* page 681.
*
READ YEAR PWHEAT PCORN PSOYBEAN PAVERAGE PINDEX / LIST
 1  1.33  1.33  2.85  1.837  100.0
 2  1.34  1.08  3.03  1.817   98.9
 3  1.76  1.57  4.37  2.567  139.7
 4  3.95  2.55  5.68  4.060  221.0
 5  4.09  3.03  6.64  4.587  249.7
 6  3.56  2.54  4.92  3.673  199.9
 7  2.73  2.15  6.81  3.897  212.1
 8  2.33  2.02  6.42  3.590  195.4
 9  2.97  2.25  6.12  3.780  205.8
10  3.78  2.52  6.28  4.193  228.3
*
* Table 17.6 stores the Production in millions of Bushels of Wheat, Corn
* and Soybeans on page 684.
*
READ YEAR WHEAT CORN SOYBEAN / LIST
 1  1352  4152  1127
 2  1618  5641  1176
 3  1545  5573  1271
 4  1705  5647  1547
 5  2122  5829  1547
 6  2142  6266  1288
 7  2026  6357  1716
 8  1799  7082  1843
 9  2134  7939  2268
10  2370  6648  1817
*
* The INDEX commmand computes the price indexes from a set of price and
* quantity data on a number of commodities.  SHAZAM automatically calculates
* the Divisia, Paasche, Laspeyres and Fisher Price and Quantity Indexes
* when the INDEX command is specified.  The BASE= option specifies the
* observation number to be used as the base period for the index.
*
* The format of the command is:
*
*  INDEX p1 q1 p2 q2 p3 q3 ... / options
*
* Table 17.5 - Laspeyres Price Index for Wheat, Corn and Soybean on page 682
* is replicated with the INDEX command.  The LASPEYRES= option stores the
* Laspeyres Price Index in the vector specified.
*
*
INDEX PWHEAT WHEAT PCORN CORN PSOYBEAN SOYBEAN / BASE=1 LASPEYRES=PLS
GENR PINDEX=PLS*100
PRINT PINDEX
*
* Table 17.7 - Laspeyres Quantity Index for Wheat, Corn and Soybean on page
* 684 is replicated with the INDEX command.  In this case, the quantities
* are specified before the prices.  The QLASPEYRES= option stores the
* Laspeyres Quantity Index in the vector specified.
*
*
INDEX WHEAT PWHEAT CORN PCORN SOYBEAN PSOYBEAN / BASE=1 QLASPEYRES=QLS
GENR QINDEX=QLS*100
PRINT QINDEX
*
* The Aggregate Laspeyres Price Index for Wheat, Corn and Soybean is
* estimated with the base year 6.
*
INDEX PWHEAT WHEAT PCORN CORN PSOYBEAN SOYBEAN / BASE=6
*
*----------------------------------------------------------------------------
* Change in Base Period
*
* First read the data for the 1971 and 1976 based indexes.
*
SAMPLE 1 10
READ YEAR P71 P76 / LIST
1971   100.0     0.0
1972    92.2     0.0
1973   131.2     0.0
1974   212.0     0.0
1975   243.0     0.0
1976   198.5   100.0
1977     0.0    94.0
1978     0.0    86.7
1979     0.0    94.9
1980     0.0   107.0
SAMPLE 6 10
*
* Using the GENR command copy the last 5 years of variable P76 into the
* SPLICE index.
*
GENR SPLICE=P76
SAMPLE 1 5
*
* Compute the first 5 years of P71 using the 1976 base.
*
GENR SPLICE=P71*P76:6/P71:6
SAMPLE 1 10
*
* Now print all 10 years of the SPLICED INDEX with the PRINT command listed
* in the last Column of Table 17.8 on page 685.
*
PRINT YEAR SPLICE
* 
DELETE / ALL
*
*----------------------------------------------------------------------------
* A Nonparametric Test for Randomness on page 688.
*
SAMPLE 1 16
READ DAY VOLUME / LIST
  1   98
  2   93
  3   82
  4  103
  5  113
  6  111
  7  104
  8  103
  9  114
 10  107
 11  111
 12  109
 13  109
 14  108
 15  128
 16   92
*
* The median observation of the volume data is calculated using the MEDIAN=
* option on the STAT command.  The median value is saved in a constant
* called M.
*
STAT VOLUME / MEDIAN=M
PRINT M
*
* There are 2 ways in computing the Runs Test.  In the textbook, Newbold uses
* the residuals around the median.  In SHAZAM, the Runs Test is calculated
* with the residuals around the mean.  The most common way in calculating
* the Runs Test is using the residuals around the mean.
*
* SHAZAM automatically computes the Runs Test when the OLS command is 
* specified with the RSTAT option.  The LIST option is used to list and print
* out the residuals.  This is a visual check for the number of residuals that
* are above and below the mean.  
*
OLS VOLUME / RSTAT LIST
*
* Plot 17.3 on page 689 is replicated with the PLOT command.
*
PLOT VOLUME DAY
*
DELETE / ALL
*
*----------------------------------------------------------------------------
* Example 17.1, page 691
*
* The TIME command specifies the beginning year and frequency for a time
* series.  This is an alternate form of the SAMPLE command.
*
TIME 1931 1
SAMPLE 1931.0 1960.0
READ YEAR SALES / LIST
1931  1806
1932  1644
1933  1814
1934  1770
1935  1518
1936  1103
1937  1266
1938  1473
1939  1423
1940  1767
1941  2161
1942  2336
1943  2602
1944  2518
1945  2637
1946  2177
1947  1920
1948  1910
1949  1984
1950  1787
1951  1689
1952  1866
1953  1896
1954  1684
1955  1633
1956  1657
1957  1569
1958  1390
1959  1387
1960  1289
STAT SALES / MEDIAN=M
PRINT M
*
* In this example, the Runs Test results from SHAZAM match those in the
* textbook.  The mean and median values in this example are close enough that
* it does not move the residual from a plus to a negative value or vica versa.
*
OLS SALES / RSTAT LIST
*
DELETE / ALL
*
*----------------------------------------------------------------------------
* Components of a Time Series, page 692
*
SAMPLE 1 11
READ YEAR CREDIT / LIST
 1     133
 2     155
 3     165
 4     171
 5     194
 6     231
 7     274
 8     312
 9     313
10     333
11     343
PLOT CREDIT YEAR
*
DELETE / ALL
*
*----------------------------------------------------------------------------
* Components of a Time Series, page 693
*
SAMPLE 1 32
*
* The BYVAR option on the READ command tells SHAZAM to read in the data
* variable by variable rather than observation by observation.
*
READ Q / BYVAR LIST
0.300   0.460   0.345   0.910
0.330   0.545   0.440   1.040
0.495   0.680   0.545   1.285
0.550   0.870   0.660   1.580
0.590   0.990   0.830   1.730
0.610   1.050   0.920   2.040
0.700   1.230   1.060   2.320
0.820   1.410   1.250   2.730
*
* The GENR command with the TIME function is used to generate a time index
* so that the first observation is equal to 1 and the rest are consecutively
* numbered.
*
GENR YEAR=TIME(0)
PLOT Q YEAR 
*
DELETE / ALL
*
*-----------------------------------------------------------------------------
* Moving Averages, page 698
*
* The TIME command specifies the beginning year and frequency for a time
* series.  This is an alternate form of the SAMPLE command.
*
TIME 1931.0 1
SAMPLE 1931.0 1960.0
READ YEAR SALES 
1931  1806
1932  1644
1933  1814
1934  1770
1935  1518
1936  1103
1937  1266
1938  1473
1939  1423
1940  1767
1941  2161
1942  2336
1943  2602
1944  2518
1945  2637
1946  2177
1947  1920
1948  1910
1949  1984
1950  1787
1951  1689
1952  1866
1953  1896
1954  1684
1955  1633
1956  1657
1957  1569
1958  1390
1959  1387
1960  1289
*
* The TIME(0) function is used to create a time index so that the first
* observation is equal to 1 and the rest are consecutively numbered.
*
GENR T=TIME(0)
*
* The 5-Point Centered Moving Average for the SALES variable is calculated
* using the GENR command and LAG function.  The LAG(x,n) function lags the
* variable x, n times.  Using a negative value for n on the LAG(x,n)
* function will lead future variables.
*
GENR SMA5=(LAG(SALES,2)+LAG(SALES)+SALES+LAG(SALES,-1)+LAG(SALES,-2))/5
PRINT T SALES SMA5
*
* The SAMPLE command is used to specify the range for the PLOT command.
* In this example, to replicate Figure 17.7 on page 698 the sample range
* is from 1933 to 1958.  The years omitted are not plotted.
*
SAMPLE 1933.0 1958.0
*
* The YMIN=, YMAX=, XMIN=, and XMAX= options are specified on the PLOT
* command to specify the desired range for the X and Y axis to replicate
* Figure 17.7 on page 698.
*
PLOT SMA5 YEAR / YMIN=1100 YMAX=2700 XMIN=1933 XMAX=1958
*
DELETE / ALL
*
*-----------------------------------------------------------------------------
* Extraction of the Seasonal Component Through Moving Averages, page 699
*
SAMPLE 1 32
*
* The BYVAR option on the READ command reads the data in by variable and not
* by observation.  Therefore, SHAZAM will read the data on Row 1 of the data
* file from left to right until each observation has been read for variable
* X and then continue with Row 2 etc. until all 32 observations have been
* read.
*
READ X / BYVAR
0.300   0.460   0.345   0.910
0.330   0.545   0.440   1.040
0.495   0.680   0.545   1.285
0.550   0.870   0.660   1.580
0.590   0.990   0.830   1.730
0.610   1.050   0.920   2.040
0.700   1.230   1.060   2.320
0.820   1.410   1.250   2.730
*
* The GENR command and TIME(0) function is used to create a time index so
* that the first observation is equal to 1 and the rest are consecutively
* numbered.
*
GENR T=TIME(0)
*
* The 4-Point Moving Average for the Earnings variable, X, is calculated
* using the GENR command and LAG function.  The LAG(x,n) function lags the
* variable x, n times.  Using a negative value for n on the LAG(x,n)
* function will lead future variables.
*
GENR FPMA=(LAG(X,2)+LAG(X,1)+X+LAG(X,-1))/4
*
* The 4-Point Centered Moving Average for the Earnings variable, X, is
* calculated using the GENR command and LAG function.  The LAG(x,n) function
* lags the variable x, n times.  Using a negative value for n on the LAG(x,n)
* function will lead future variables.  This average is calculated using 3
* separate GENR statements to ensure there is no confusion.
*
GENR P1=(LAG(X,2)+LAG(X,1)+X+LAG(X,-1))/4
GENR P2=(LAG(X,1)+X+LAG(X,-1)+LAG(X,-2))/4
*
* The SAMPLE command is used to change the range of the data from 1 30 to
* 3 30 since the data was lagged back 2 time periods.
*
SAMPLE 3 30
GENR XSTAR=(P1+P2)/2
*
* Table 17.13 on page 700 is replicated with the PRINT command.  The SAMPLE
* command is used before each PRINT command to ensure the desired data is
* printed only.
*
SAMPLE 1 32
PRINT T X
SAMPLE 3 31
PRINT FPMA
SAMPLE 3 30
PRINT XSTAR
*
*-----------------------------------------------------------------------------
* The SAMPLE command is used to change the range to 3 31 in calculating
* Column 5 of Table 17.14 on page 702.
* 
SAMPLE 3 31
GENR COL5=(X/XSTAR)*100
PRINT COL5
*
* The GENR command with the SUM and SEAS function is used to create an index
* called CSINDEX to represent each cross-section.  A repeating time index
* called TINDEX is created with the GENR command for the 4 observations.
*
SAMPLE 1 32
GENR CSINDEX=SUM(SEAS(4))
GENR TINDEX=TIME(0)-4*(CSINDEX-1)
PRINT CSINDEX TINDEX COL5
*
* The sample range is changed to include only observations 3 to 30 in
* calculating the median of each quarter with the STAT command.  The DO
* command creates a DO-loop to execute the 3 commands immediately following.
* The first command skips all observations where the variable TINDEX not
* equal to 1.  If TINDEX is equal to 1 then the STAT command is executed.
* The descriptive statistics of the variable COL5 is printed.  The PMEDIAN
* option prints the median, mode and quartiles for variable COL5.  The
* MEDIAN= option stores the median value in a constant.  Then the DELETE
* SKIP$ command permanently eliminates all the SKIPIF commands in effect.
* The ENDO command indicates the end of the DO-loop.
*
SAMPLE 3 30
DO #=1,4
SKIPIF(TINDEX.NE.#)
STAT COL5 / PMEDIAN MEDIAN=M#
DELETE SKIP$
ENDO
*
* The GEN1 command is used to generate the constant for the sum of the
* median values.
*
GEN1 MEDSUM=M1+M2+M3+M4
*
* The sample range is reset to 1 32 to calculate the Seasonal Index.  The
* DO-loop is used to calculate the Seasonal Index of each quarter in
* Table 17.14 on page 702.
*
SAMPLE 1 32
DO #=1,4
GEN1 SINDEX#=M#*400/MEDSUM
PRINT SINDEX#
ENDO
*
* The SET NOWARNSKIP command is used to suppress the printing of the warning
* message as to which observations will be skipped.  The Adjusted Series
* values is generated with the GENR command within a DO-loop.
*
DO #=1,4
SET NOWARNSKIP
SKIPIF(TINDEX.NE.#)
GENR AS=X*(100/SINDEX#)
DELETE SKIP$
ENDO
*
* The Adjusted Series data in Table 17.14 is printed with the PRINT command.
* Notice, this command is specified after the Do-loop has ended.  If the
* PRINT command was specified within the DO-loop the values for AS would be
* printed each time the DO-loop was executed.
*
PRINT AS
*
* The PLOT command is used to replicate Figure 17.9 on page 703.  The YMIN=
* and YMAX= options are specified so the range of the Y-axis is the same as
* the textbook.  The NOPRETTY option must be included when the YMIN= or YMAX=
* option is specified.  The WIDE option increases the size of the plot on the
* terminal screen.  If the WIDE option is omitted the plot will be compressed
* and it does not resemble that in Figure 17.9.
*
PLOT AS T / YMIN=0.300 YMAX=1.900 NOPRETTY WIDE
*
DELETE / ALL
*
*-----------------------------------------------------------------------------
* Simple Exponential Smoothing, page 708
* This example was done by Diana Whistler.
*
SAMPLE 1 30
READ SALES / BYVAR
1806  1644  1814  1770  1518  1103  1266  1473  1423  1767
2161  2336  2602  2518  2637  2177  1920  1910  1984  1787
1689  1866  1896  1684  1633  1657  1569  1390  1387  1289
GENR YEAR=TIME(1930)
*
* Set the smoothing constant.
*
GEN1 A=0.4
*
* Generate the smoothed time series.
*
SAMPLE 1 30
GENR S=SALES
GENR PREDICT=SALES
SAMPLE 2 30
GENR S=A*LAG(S)+(1-A)*SALES
*
* Generate 1-step ahead predictions.
*
GENR PREDICT=LAG(S)
SAMPLE 1 30
*
* Generate forecast errors.
*
GENR E=SALES-PREDICT
*
* Calculate model diagnostics.
*
GENR E2=E*E
STAT E2 / SUMS=SSE
PRINT SSE
*
* Print the results (see Newbold, Table 17.16, page 710)
*
FORMAT(2F14.0,2F14.1,F14.2)
PRINT YEAR SALES S PREDICT E / FORMAT
*
DELETE / ALL
*
*-----------------------------------------------------------------------------
* Holt-Winters Exponential Smoothing Forecasting Model, page 712.
* This example was done by Diana Whistler.
*
SAMPLE 1 11
READ X / BYVAR
133  155  165  171  194  231  274  312  313  333  343
GEN1 A=0.7
GEN1 B=0.6
DIM S 16 T 11
SAMPLE 2 2
GENR S=X
GENR T=X-LAG(X)
*
* Do the recursive computations.
*
SET NODOECHO
DO #=3,11
SAMPLE # #
GENR S=A*X+(1-A)*(LAG(S)+LAG(T))
GENR T=B*(S-LAG(S))+(1-B)*LAG(T)
ENDO
*
* Print the results
*
SAMPLE 1 11
PRINT X S T
*
* Forecasting
*
SAMPLE 12 16
GENR OBS=TIME(0)-11
GENR S=S:11+OBS*T:11
PRINT OBS S
*
DELETE / ALL
*
*-----------------------------------------------------------------------------
* Autoregressive Models on page 725.
*
SAMPLE 1 30
READ YEAR X / LIST 
1931  1806
1932  1644
1933  1814
1934  1770
1935  1518
1936  1103
1937  1266
1938  1473
1939  1423
1940  1767
1941  2161
1942  2336
1943  2602
1944  2518
1945  2637
1946  2177
1947  1920
1948  1910
1949  1984
1950  1787
1951  1689
1952  1866
1953  1896
1954  1684
1955  1633
1956  1657
1957  1569
1958  1390
1959  1387
1960  1289
*
* The GENR command is used with the LAG(x) function to generate the variables
* lagged SALES one time period (X1), lagged SALES two time periods (X2),
* lagged SALES three time periods (X3) and lagged SALES four time periods
* (X4).
*
GENR X1=LAG(X)
GENR X2=LAG(X,2)
GENR X3=LAG(X,3)
GENR X4=LAG(X,4)
*
* The sample range of the first-order model must be changed from 2 30 since
* the first observation is lost when the SALES variable was lagged one time
* period.  The first-order model is estimated with the OLS command.
*
SAMPLE 2 30
OLS X X1
*
* The second-order model is estimated with the OLS command but the sample
* range is changed accordingly as the first two observations are lost in the
* lagging process of the SALES variable.  The COEF= option is used to save
* the regression estimates in the vector called COEF.  These values will
* be used in forecasting the X31.
*
SAMPLE 3 30
OLS X X1 X2 / COEF=COEF
*
* The third-order model is estimated and the sample range is changed
* accordingly as the first three observations are lost in the lagging process
* of the SALES variable.
*
SAMPLE 4 30
OLS X X1 X2 X3
*
* The fourth-order model is estimated and the sample range is changed
* accordingly as the first four observations are lost in the lagging process
* of the SALES variable.
*
SAMPLE 5 30
OLS X X1 X2 X3 X4
*
* The regression coefficients for the second-order model were saved in the
* vector COEF.  The CONSTANT is saved in COEF:3, the regression estimate for
* X lagged one time period is saved in COEF:1 and the regression estimate for
* X lagged two time periods is saved in COEF:2.  The sales figures for X29
* and X30 are stored in the vector X in X:29 and X:30.  The GEN1 command is
* used to forecast the value of X when t=31.
*
GEN1 X31=COEF:3+(COEF:1*X:30)+(COEF:2*X:29)
PRINT X31 COEF:3 COEF:1 X:30
PRINT COEF:2 X:29
*
* Similarly, X can be forecasted when t=32 and t=33 using the GEN1 command.
*
GEN1 X32=COEF:3+(COEF:1*X31)+(COEF:2*X:30)
GEN1 X33=COEF:3+(COEF:1*X32)+(COEF:2*X31)
PRINT X32 X33
*
DELETE / ALL
*
*-----------------------------------------------------------------------------
*
STOP