Weighted Least Squares

Weighted Least Squares : Variance with pre-set form

Weighted least squares (WLS) is implemented in SHAZAM by specifying the name of an appropriate weight variable with the WEIGHT= option on the OLS command. The general command format is:

OLS depvar indeps / WEIGHT=weight options

where depvar is the dependent variable, indeps is a list of the explanatory variables, weight is the name of the weight variable and options is a list of desired options.

Suppose the name of the weight variable is W. SHAZAM makes the following assumption about the form of the error variance:

var(e_t) = sigma ² / W_t for t = 1, ..., N

where N is the sample size and sigma ² is an unknown scalar. Note that this assumes that the error variance is inversely proportional to the weight variable W. Many textbooks use an error variance assumption that is proportional to an explanatory variable in the regression equation. To accomplish this, the inverse of the variable must be generated and specified as the weight variable on the WEIGHT= option.

The purpose of weighted least squares estimation is to get more efficient parameter estimates. After model estimation some thought must be given to the computation of an R-square measure, residuals, predicted values, elasticities at the mean etc. The computations used by SHAZAM can be studied.

Example

This example of weighted least squares uses a data set from Greene on state expenditure on public schools. First consider the equation specification. Greene [1993, p.385] recommends using the square of the per capita income variable as an explanatory variable in the regression. The regression specification is:

EXP_t = beta ₀ + beta ₁ INCOME_t + beta ₂ INCOME_t² + e_t

This is an example of a polynomial regression model. Applications of the polynomial regression model are discussed in Griffiths, Hill and Judge [1993, p.337], Gujarati [1995, p.217] and Ramanathan [1995, p.260].

The SHAZAM command file (filename: WLS.SHA) that follows constructs the data and runs the model estimation. Tests for heteroskedasticity are obtained after OLS estimation. Two alternative assumptions are then made about the weight variable to use in the weighted least squares estimation. The first assumption is that the variance is proportional to income and the second assumption is that the variance is proportional to the square of income.

SAMPLE 1 51 FORMAT (A8,F6.0,1X,F6.0) READ (GREENE.txt) STATE EXP INCOME / SKIPLINES=1 FORMAT * First scale the INCOME data GENR INCOME=INCOME/1000 * Generate the square of income GENR INC2=INCOME**2 * Exclude missing values from the analysis SET MISSVALU=-99 SET SKIPMISS * Model estimation as reported in Table 14.2 of Greene (1993) OLS EXP INCOME INC2 * Tests for heteroskedasticity DIAGNOS / HET * Get heteroskedasticity-consistent standard errors (Greene, Table 14.3) OLS EXP INCOME INC2 / HETCOV * * Weighted least squares estimation (Greene, Table 14.4, p.398) * Assumption A : Variance is proportional to INCOME GENR WT=1/INCOME OLS EXP INCOME INC2 / WEIGHT=WT PREDICT=YHAT RESID=E * Test for heteroskedasticity (approx. 1% critical value = 6.64) GENR E2=E*E ?OLS E2 YHAT GEN1 LM=$N*$R2 PRINT LM * Assumption B : Variance is proportional to INCOME**2 GENR WT=1/(INCOME**2) OLS EXP INCOME INC2 / WEIGHT=WT PREDICT=YHAT RESID=E GENR E2=E*E ?OLS E2 YHAT GEN1 LM=$N*$R2 PRINT LM STOP

A note on reading the data : The first column of the GREENE.txt data file contains character data. Therefore a FORMAT command must be used to load the data set into SHAZAM. The FORMAT option on the READ command instructs SHAZAM to use the FORMAT command when reading the data. Another feature of the data set is that expenditure data is missing for Wisconsin and has been assigned the code -99. To ensure that the missing observations are excluded from the analysis the command SET MISSVALU=-99 is used to set the missing value code and the command SET SKIPMISS is used to instruct SHAZAM to recognize the missing values.

Note that the ? prefix to a command instructs SHAZAM to suppress output from the command.

After the WLS estimation if may be sensible to check if there is still evidence for heteroskedasticity in the errors. The user is cautioned that the DIAGNOS / HET command is not appropriate after weighted least squares. This command applies the parameter estimates to the original data and so the diagnostic tests use the untransformed residuals. The residuals that are of interest to use for testing are the transformed residuals that can be saved with the RESID= option on the OLS / WEIGHT= command.

The SHAZAM commands above propose a test statistic for testing for heteroskedasticity after WLS estimation. This test corresponds to the first statistic reported on the output from the DIAGNOS / HET command. The test statistic can be compared with critical values from a chi-square distribution with 1 degree of freedom (this test can be considered as an approximate test). The 1% critical value from a chi-square distribution with 1 degree of freedom is 6.64.

The test statistic from the first model estimation is reported as:

     LM
    9.088942

The test statistic from the second model estimation is reported as:

     LM
    5.068986

These results suggest that the second assumption (variance is proportional to the square of income) is more successful in correcting the observed heteroskedasticity in the equation residuals.

The SHAZAM output can be viewed.

[SHAZAM Guide home]

SHAZAM output - Weighted Least Squares

|_SAMPLE 1 51
|_FORMAT (A8,F6.0,1X,F6.0)
|_READ (GREENE.txt) STATE EXP INCOME / SKIPLINES=1 FORMAT
UNIT 88 IS NOW ASSIGNED TO: GREENE.txt
 READ USES FORMAT: (A8,F6.0,1X,F6.0)
|_* First scale the INCOME data
|_GENR INCOME=INCOME/1000
|_* Generate the square of income
|_GENR INC2=INCOME**2
|_* Exclude missing values from the analysis
|_SET MISSVALU=-99
|_SET SKIPMISS

|_* Model estimation as reported in Table 14.2 of Greene (1993)
|_OLS EXP INCOME INC2
OBSERVATION     34 IS SKIPPED BECAUSE EXP      MISSING

 OLS ESTIMATION
       50 OBSERVATIONS     DEPENDENT VARIABLE= EXP
...NOTE..SAMPLE RANGE SET TO:      1,     51

 R-SQUARE =   0.6553     R-SQUARE ADJUSTED =   0.6407
VARIANCE OF THE ESTIMATE-SIGMA**2 =   3212.5
STANDARD ERROR OF THE ESTIMATE-SIGMA =   56.679
SUM OF SQUARED ERRORS-SSE=  0.15099E+06
MEAN OF DEPENDENT VARIABLE =   373.26
LOG OF THE LIKELIHOOD FUNCTION = -271.270

VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
  NAME    COEFFICIENT   ERROR      47 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
INCOME    -183.42      82.90      -2.213     0.032-0.307    -2.0381    -3.7389
INC2       15.870      5.191       3.057     0.004 0.407     2.8163     2.5074
CONSTANT   832.91      327.3       2.545     0.014 0.348     0.0000     2.2315

|_* Tests for heteroskedasticity
|_DIAGNOS / HET
OBSERVATION     34 IS SKIPPED BECAUSE EXP      MISSING

DEPENDENT VARIABLE = EXP             50 OBSERVATIONS
REGRESSION COEFFICIENTS
  -183.420294634       15.8704226661       832.914356455

HETEROSKEDASTICITY TESTS
                            CHI-SQUARE     D.F.   P-VALUE
                          TEST STATISTIC
E**2 ON YHAT:                     13.401     1    0.00025
E**2 ON YHAT**2:                  14.692     1    0.00013
E**2 ON LOG(YHAT**2):             11.408     1    0.00073
E**2 ON LAG(E**2) ARCH TEST:       0.339     1    0.56048
LOG(E**2) ON X (HARVEY) TEST:      4.825     2    0.08958
ABS(E) ON X (GLEJSER) TEST:       11.952     2    0.00254
E**2 ON X                 TEST:
          KOENKER(R2):            15.834     2    0.00036
          B-P-G (SSR) :           18.903     2    0.00008

...MATRIX IS NOT POSITIVE DEFINITE..FAILED IN ROW    4
E**2 ON X X**2    (WHITE) TEST:
          KOENKER(R2):        **********     4  *********
          B-P-G (SSR) :       **********     4  *********

...MATRIX IS NOT POSITIVE DEFINITE..FAILED IN ROW    4
E**2 ON X X**2 XX (WHITE) TEST:
          KOENKER(R2):        **********     5  *********
          B-P-G (SSR) :       **********     5  *********

|_* Get heteroskedasticity-consistent standard errors (Greene, Table 14.3)
|_OLS EXP INCOME INC2 / HETCOV
OBSERVATION     34 IS SKIPPED BECAUSE EXP      MISSING

 OLS ESTIMATION
       50 OBSERVATIONS     DEPENDENT VARIABLE= EXP
...NOTE..SAMPLE RANGE SET TO:      1,     51

USING HETEROSKEDASTICITY-CONSISTENT COVARIANCE MATRIX

 R-SQUARE =   0.6553     R-SQUARE ADJUSTED =   0.6407
VARIANCE OF THE ESTIMATE-SIGMA**2 =   3212.5
STANDARD ERROR OF THE ESTIMATE-SIGMA =   56.679
SUM OF SQUARED ERRORS-SSE=  0.15099E+06
MEAN OF DEPENDENT VARIABLE =   373.26
LOG OF THE LIKELIHOOD FUNCTION = -271.270

VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
  NAME    COEFFICIENT   ERROR      47 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
INCOME    -183.42      124.3      -1.476     0.147-0.210    -2.0381    -3.7389
INC2       15.870      8.300       1.912     0.062 0.269     2.8163     2.5074
CONSTANT   832.91      460.9       1.807     0.077 0.255     0.0000     2.2315
|_*
|_* Weighted least squares estimation (Greene, Table 14.4, p.398)
|_*   Assumption A : Variance is proportional to INCOME
|_GENR WT=1/INCOME
|_OLS EXP INCOME INC2 / WEIGHT=WT PREDICT=YHAT RESID=E
OBSERVATION     34 IS SKIPPED BECAUSE EXP      MISSING

 OLS ESTIMATION
       50 OBSERVATIONS     DEPENDENT VARIABLE= EXP
...NOTE..SAMPLE RANGE SET TO:      1,     51
SUM OF LOG(SQRT(ABS(WEIGHT)))  = -0.22015

 R-SQUARE =   0.6274     R-SQUARE ADJUSTED =   0.6115
VARIANCE OF THE ESTIMATE-SIGMA**2 =   2978.8
STANDARD ERROR OF THE ESTIMATE-SIGMA =   54.578
SUM OF SQUARED ERRORS-SSE=  0.14000E+06
MEAN OF DEPENDENT VARIABLE =   364.48
LOG OF THE LIKELIHOOD FUNCTION = -269.602

VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
  NAME    COEFFICIENT   ERROR      47 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
INCOME    -161.23      84.48      -1.908     0.062-0.268    -1.8654    -3.3061
INC2       14.475      5.377       2.692     0.010 0.365     2.6312     2.2583
CONSTANT   746.36      328.2       2.274     0.028 0.315     0.0000     2.0477

|_*   Test for heteroskedasticity  (approx. 1% critical value = 6.64)
|_    GENR E2=E*E
..OBSERVATION    34 IS ASSIGNED MISSING CODE=   -99.
|_    ?OLS E2 YHAT
OBSERVATION     34 IS SKIPPED BECAUSE YHAT     MISSING
OBSERVATION     34 IS SKIPPED BECAUSE E2       MISSING
|_    GEN1 LM=$N*$R2
..NOTE..CURRENT VALUE OF $N   =   50.000
..NOTE..CURRENT VALUE OF $R2  =  0.18178
|_    PRINT LM
    LM
   9.088942

|_*   Assumption B : Variance is proportional to INCOME**2
|_GENR WT=1/(INCOME**2)
|_OLS EXP INCOME INC2 / WEIGHT=WT PREDICT=YHAT RESID=E
OBSERVATION     34 IS SKIPPED BECAUSE EXP      MISSING

 OLS ESTIMATION
       50 OBSERVATIONS     DEPENDENT VARIABLE= EXP
...NOTE..SAMPLE RANGE SET TO:      1,     51
SUM OF LOG(SQRT(ABS(WEIGHT)))  = -0.86795

 R-SQUARE =   0.5983     R-SQUARE ADJUSTED =   0.5812
VARIANCE OF THE ESTIMATE-SIGMA**2 =   2784.3
STANDARD ERROR OF THE ESTIMATE-SIGMA =   52.766
SUM OF SQUARED ERRORS-SSE=  0.13086E+06
MEAN OF DEPENDENT VARIABLE =   356.60
LOG OF THE LIKELIHOOD FUNCTION = -268.562

VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
  NAME    COEFFICIENT   ERROR      47 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
INCOME    -139.93      87.21      -1.605     0.115-0.228    -1.6730    -2.8830
INC2       13.113      5.637       2.326     0.024 0.321     2.4256     2.0193
CONSTANT   664.58      333.6       1.992     0.052 0.279     0.0000     1.8637

|_    GENR E2=E*E
..OBSERVATION    34 IS ASSIGNED MISSING CODE=   -99.
|_    ?OLS E2 YHAT
OBSERVATION     34 IS SKIPPED BECAUSE YHAT     MISSING
OBSERVATION     34 IS SKIPPED BECAUSE E2       MISSING
|_    GEN1 LM=$N*$R2
..NOTE..CURRENT VALUE OF $N   =   50.000
..NOTE..CURRENT VALUE OF $R2  =  0.10138
|_    PRINT LM
    LM
   5.068986
|_STOP

[SHAZAM Guide home]