Weighted least squares (WLS) is implemented in SHAZAM by specifying
the name of an appropriate weight variable with the
WEIGHT=
option on the OLS
command.
The general command format is:
OLS depvar indeps / WEIGHT=weight options
|
where depvar
is the dependent variable, indeps
is a list of the explanatory variables, weight
is the name of
the weight variable and options
is a list of desired options.
Suppose the name of the weight variable is W. SHAZAM makes the following assumption about the form of the error variance:
var(et) =
2
/
Wt
for t = 1, ..., N
where N is the sample size and
2
is an unknown scalar. Note that this assumes that the error
variance is inversely proportional to the weight
variable W. Many textbooks use an error variance
assumption that is proportional to an explanatory variable in the
regression equation. To accomplish this, the inverse of the variable must
be generated and specified as the weight variable on the
WEIGHT=
option.
The purpose of weighted least squares estimation is to get more efficient parameter estimates. After model estimation some thought must be given to the computation of an R-square measure, residuals, predicted values, elasticities at the mean etc. The computations used by SHAZAM can be studied.
This example of weighted least squares uses a data set from Greene on state expenditure on public schools. First consider the equation specification. Greene [1993, p.385] recommends using the square of the per capita income variable as an explanatory variable in the regression. The regression specification is:
EXPt = 0 + 1 INCOMEt + 2 INCOMEt2 + et
This is an example of a polynomial regression model. Applications of the polynomial regression model are discussed in Griffiths, Hill and Judge [1993, p.337], Gujarati [1995, p.217] and Ramanathan [1995, p.260].
The SHAZAM command file (filename:
WLS.SHA
)
that follows constructs the data and runs the model
estimation. Tests for heteroskedasticity are obtained after OLS estimation.
Two alternative assumptions are then made about
the weight variable to use in the weighted least squares estimation.
The first assumption is that the variance is proportional to
income and the second assumption is that the variance
is proportional to the square of income.
SAMPLE 1 51 FORMAT (A8,F6.0,1X,F6.0) READ (GREENE.txt) STATE EXP INCOME / SKIPLINES=1 FORMAT * First scale the INCOME data GENR INCOME=INCOME/1000 * Generate the square of income GENR INC2=INCOME**2 * Exclude missing values from the analysis SET MISSVALU=-99 SET SKIPMISS * Model estimation as reported in Table 14.2 of Greene (1993) OLS EXP INCOME INC2 * Tests for heteroskedasticity DIAGNOS / HET * Get heteroskedasticity-consistent standard errors (Greene, Table 14.3) OLS EXP INCOME INC2 / HETCOV * * Weighted least squares estimation (Greene, Table 14.4, p.398) * Assumption A : Variance is proportional to INCOME GENR WT=1/INCOME OLS EXP INCOME INC2 / WEIGHT=WT PREDICT=YHAT RESID=E * Test for heteroskedasticity (approx. 1% critical value = 6.64) GENR E2=E*E ?OLS E2 YHAT GEN1 LM=$N*$R2 PRINT LM * Assumption B : Variance is proportional to INCOME**2 GENR WT=1/(INCOME**2) OLS EXP INCOME INC2 / WEIGHT=WT PREDICT=YHAT RESID=E GENR E2=E*E ?OLS E2 YHAT GEN1 LM=$N*$R2 PRINT LM STOP |
A note on reading the data : The first column of the
GREENE.txt
data file contains character data. Therefore
a FORMAT
command must be used to load the data set into
SHAZAM. The FORMAT
option on the READ
command instructs SHAZAM to use the FORMAT
command
when reading the data.
Another feature of the data set is that expenditure data is missing for
Wisconsin and has been assigned the code -99
SET MISSVALU=-99
SET SKIPMISS
Note that the ? prefix to a command instructs SHAZAM to suppress output from the command.
After the WLS estimation if may be sensible to check if there is
still evidence for heteroskedasticity in the errors. The user is
cautioned that the DIAGNOS / HET
RESID=
option on the OLS / WEIGHT=
The SHAZAM commands above propose a test
statistic for testing for heteroskedasticity after WLS estimation.
This test corresponds to the first statistic reported on the output
from the DIAGNOS / HET
The test statistic from the first model estimation is reported as:
LM 9.088942
The test statistic from the second model estimation is reported as:
LM 5.068986
These results suggest that the second assumption (variance is proportional to the square of income) is more successful in correcting the observed heteroskedasticity in the equation residuals.
The SHAZAM output can be viewed.
|_SAMPLE 1 51 |_FORMAT (A8,F6.0,1X,F6.0) |_READ (GREENE.txt) STATE EXP INCOME / SKIPLINES=1 FORMAT UNIT 88 IS NOW ASSIGNED TO: GREENE.txt READ USES FORMAT: (A8,F6.0,1X,F6.0) |_* First scale the INCOME data |_GENR INCOME=INCOME/1000 |_* Generate the square of income |_GENR INC2=INCOME**2 |_* Exclude missing values from the analysis |_SET MISSVALU=-99 |_SET SKIPMISS |_* Model estimation as reported in Table 14.2 of Greene (1993) |_OLS EXP INCOME INC2 OBSERVATION 34 IS SKIPPED BECAUSE EXP MISSING OLS ESTIMATION 50 OBSERVATIONS DEPENDENT VARIABLE= EXP ...NOTE..SAMPLE RANGE SET TO: 1, 51 R-SQUARE = 0.6553 R-SQUARE ADJUSTED = 0.6407 VARIANCE OF THE ESTIMATE-SIGMA**2 = 3212.5 STANDARD ERROR OF THE ESTIMATE-SIGMA = 56.679 SUM OF SQUARED ERRORS-SSE= 0.15099E+06 MEAN OF DEPENDENT VARIABLE = 373.26 LOG OF THE LIKELIHOOD FUNCTION = -271.270 VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 47 DF P-VALUE CORR. COEFFICIENT AT MEANS INCOME -183.42 82.90 -2.213 0.032-0.307 -2.0381 -3.7389 INC2 15.870 5.191 3.057 0.004 0.407 2.8163 2.5074 CONSTANT 832.91 327.3 2.545 0.014 0.348 0.0000 2.2315 |_* Tests for heteroskedasticity |_DIAGNOS / HET OBSERVATION 34 IS SKIPPED BECAUSE EXP MISSING DEPENDENT VARIABLE = EXP 50 OBSERVATIONS REGRESSION COEFFICIENTS -183.420294634 15.8704226661 832.914356455 HETEROSKEDASTICITY TESTS CHI-SQUARE D.F. P-VALUE TEST STATISTIC E**2 ON YHAT: 13.401 1 0.00025 E**2 ON YHAT**2: 14.692 1 0.00013 E**2 ON LOG(YHAT**2): 11.408 1 0.00073 E**2 ON LAG(E**2) ARCH TEST: 0.339 1 0.56048 LOG(E**2) ON X (HARVEY) TEST: 4.825 2 0.08958 ABS(E) ON X (GLEJSER) TEST: 11.952 2 0.00254 E**2 ON X TEST: KOENKER(R2): 15.834 2 0.00036 B-P-G (SSR) : 18.903 2 0.00008 ...MATRIX IS NOT POSITIVE DEFINITE..FAILED IN ROW 4 E**2 ON X X**2 (WHITE) TEST: KOENKER(R2): ********** 4 ********* B-P-G (SSR) : ********** 4 ********* ...MATRIX IS NOT POSITIVE DEFINITE..FAILED IN ROW 4 E**2 ON X X**2 XX (WHITE) TEST: KOENKER(R2): ********** 5 ********* B-P-G (SSR) : ********** 5 ********* |_* Get heteroskedasticity-consistent standard errors (Greene, Table 14.3) |_OLS EXP INCOME INC2 / HETCOV OBSERVATION 34 IS SKIPPED BECAUSE EXP MISSING OLS ESTIMATION 50 OBSERVATIONS DEPENDENT VARIABLE= EXP ...NOTE..SAMPLE RANGE SET TO: 1, 51 USING HETEROSKEDASTICITY-CONSISTENT COVARIANCE MATRIX R-SQUARE = 0.6553 R-SQUARE ADJUSTED = 0.6407 VARIANCE OF THE ESTIMATE-SIGMA**2 = 3212.5 STANDARD ERROR OF THE ESTIMATE-SIGMA = 56.679 SUM OF SQUARED ERRORS-SSE= 0.15099E+06 MEAN OF DEPENDENT VARIABLE = 373.26 LOG OF THE LIKELIHOOD FUNCTION = -271.270 VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 47 DF P-VALUE CORR. COEFFICIENT AT MEANS INCOME -183.42 124.3 -1.476 0.147-0.210 -2.0381 -3.7389 INC2 15.870 8.300 1.912 0.062 0.269 2.8163 2.5074 CONSTANT 832.91 460.9 1.807 0.077 0.255 0.0000 2.2315 |_* |_* Weighted least squares estimation (Greene, Table 14.4, p.398) |_* Assumption A : Variance is proportional to INCOME |_GENR WT=1/INCOME |_OLS EXP INCOME INC2 / WEIGHT=WT PREDICT=YHAT RESID=E OBSERVATION 34 IS SKIPPED BECAUSE EXP MISSING OLS ESTIMATION 50 OBSERVATIONS DEPENDENT VARIABLE= EXP ...NOTE..SAMPLE RANGE SET TO: 1, 51 SUM OF LOG(SQRT(ABS(WEIGHT))) = -0.22015 R-SQUARE = 0.6274 R-SQUARE ADJUSTED = 0.6115 VARIANCE OF THE ESTIMATE-SIGMA**2 = 2978.8 STANDARD ERROR OF THE ESTIMATE-SIGMA = 54.578 SUM OF SQUARED ERRORS-SSE= 0.14000E+06 MEAN OF DEPENDENT VARIABLE = 364.48 LOG OF THE LIKELIHOOD FUNCTION = -269.602 VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 47 DF P-VALUE CORR. COEFFICIENT AT MEANS INCOME -161.23 84.48 -1.908 0.062-0.268 -1.8654 -3.3061 INC2 14.475 5.377 2.692 0.010 0.365 2.6312 2.2583 CONSTANT 746.36 328.2 2.274 0.028 0.315 0.0000 2.0477 |_* Test for heteroskedasticity (approx. 1% critical value = 6.64) |_ GENR E2=E*E ..OBSERVATION 34 IS ASSIGNED MISSING CODE= -99. |_ ?OLS E2 YHAT OBSERVATION 34 IS SKIPPED BECAUSE YHAT MISSING OBSERVATION 34 IS SKIPPED BECAUSE E2 MISSING |_ GEN1 LM=$N*$R2 ..NOTE..CURRENT VALUE OF $N = 50.000 ..NOTE..CURRENT VALUE OF $R2 = 0.18178 |_ PRINT LM LM 9.088942 |_* Assumption B : Variance is proportional to INCOME**2 |_GENR WT=1/(INCOME**2) |_OLS EXP INCOME INC2 / WEIGHT=WT PREDICT=YHAT RESID=E OBSERVATION 34 IS SKIPPED BECAUSE EXP MISSING OLS ESTIMATION 50 OBSERVATIONS DEPENDENT VARIABLE= EXP ...NOTE..SAMPLE RANGE SET TO: 1, 51 SUM OF LOG(SQRT(ABS(WEIGHT))) = -0.86795 R-SQUARE = 0.5983 R-SQUARE ADJUSTED = 0.5812 VARIANCE OF THE ESTIMATE-SIGMA**2 = 2784.3 STANDARD ERROR OF THE ESTIMATE-SIGMA = 52.766 SUM OF SQUARED ERRORS-SSE= 0.13086E+06 MEAN OF DEPENDENT VARIABLE = 356.60 LOG OF THE LIKELIHOOD FUNCTION = -268.562 VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 47 DF P-VALUE CORR. COEFFICIENT AT MEANS INCOME -139.93 87.21 -1.605 0.115-0.228 -1.6730 -2.8830 INC2 13.113 5.637 2.326 0.024 0.321 2.4256 2.0193 CONSTANT 664.58 333.6 1.992 0.052 0.279 0.0000 1.8637 |_ GENR E2=E*E ..OBSERVATION 34 IS ASSIGNED MISSING CODE= -99. |_ ?OLS E2 YHAT OBSERVATION 34 IS SKIPPED BECAUSE YHAT MISSING OBSERVATION 34 IS SKIPPED BECAUSE E2 MISSING |_ GEN1 LM=$N*$R2 ..NOTE..CURRENT VALUE OF $N = 50.000 ..NOTE..CURRENT VALUE OF $R2 = 0.10138 |_ PRINT LM LM 5.068986 |_STOP