Comparing linear vs. log-linear modelsAn equation that specifies a linear relationship among the variables gives an approximate description of some economic behaviour. An alternative approach is to consider a linear relationship among log-transformed variables. This is a log-log model - the dependent variable as well as all explanatory variables are transformed to logarithms. Since the relationship among the log variables is linear some researchers call this a log-linear model. Different functional forms give parameter estimates that have different economic interpretation. The parameters of the linear model have an interpretation as marginal effects. The elasticities will vary depending on the data. In contrast the parameters of the log-log model have an interpretation as elasticities. So the log-log model assumes a constant elasticity over all values of the data set. The log transformation is only applicable when all the observations in the data set are positive. Gujarati [Basic Econometrics, Third Edition, 1995, McGraw-Hill, p.387] notes that this can be guaranteed by using a transformation like log(X+k) where k is a positive scalar chosen to ensure positive values. However, users will then need to give careful thought to the interpretation of the parameter estimates. For a given data set there may be no particular reason to assume that one functional form is better than the other. A model selection approach is to estimate competing models by OLS and choose the model with the highest R-square. SHAZAM computes the R-square as:
R2 = 1 where SSE is the sum of squared estimated residuals and SST is the sum of squared deviations from the mean of the dependent variable. An equivalent computation is to compute the squared coefficient of correlation between the observed and predicted values of the dependent variable. (It may be useful to verify this as an exercise.) An R-square comparison is meaningful only if the dependent variable is the same for both models. So the R-square from the linear model cannot be compared with the R-square from the log-log model. That is, the R-square measure gives the proportion of variation in the dependent variable that is explained by the explanatory variables. For the log-log model the R-square gives the amount of variation in ln(Y) that is explained by the model. For comparison purposes we would like a measure that uses the anti-log of ln(Y). For the log-log model, the way to proceed is to obtain the antilog predicted values and compute the R-square between the antilog of the observed and predicted values. This R-square can then be compared with the R-square obtained from OLS estimation of the linear model. When estimating a log-log model the following two options can be used
on the
ExampleThis example uses the Theil textile data set.
The SHAZAM commands (filename:
Note that on the The SHAZAM output can be inspected. The SHAZAM output from the linear model gives the result: R-SQUARE = .9513 The SHAZAM output from the log-log model gives the result: R-SQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED = .9689 In this example, the R-square for the log-log model is higher - so there is some evidence to prefer the log-log specification. Users may be interested in more formal procedures for testing between the linear and log-log model specification. Test procedures have been proposed by various researchers. Other functional forms can be considered. The Box-Cox transformation creates a general functional form where both the linear model and log-log model are special cases. Features for estimating this model are described in the chapter on Box-Cox regression in the SHAZAM User's Reference Manual. [SHAZAM Guide home] Computing antilog predictionsIn the above example, the log-log model is estimated and the antilog predictions are computed with the commands:
When constructing the antilog predictions some consideration should be given to using an unbiased predictor. A result from statistical theory is that if a random variable Y is normally distributed with mean µ and variance 2 then a random variable Z defined as Z=exp(Y) has mean: exp(µ +
2 (See, for example, Mood, Graybill and Boes [1974] and Ramanathan [1995 p.271]). Therefore, it is important to include an estimate of
2
SHAZAM output for the comparison of linear and log-log models|_SAMPLE 1 17 |_READ (THEIL.txt) YEAR CONSUME INCOME PRICE UNIT 88 IS NOW ASSIGNED TO: THEIL.txt 4 VARIABLES AND 17 OBSERVATIONS STARTING AT OBS 1 |_* Obtain parameter estimates for the linear model |_OLS CONSUME INCOME PRICE / PREDICT=YHAT1 OLS ESTIMATION 17 OBSERVATIONS DEPENDENT VARIABLE = CONSUME ...NOTE..SAMPLE RANGE SET TO: 1, 17 R-SQUARE = .9513 R-SQUARE ADJUSTED = .9443 VARIANCE OF THE ESTIMATE-SIGMA**2 = 30.951 STANDARD ERROR OF THE ESTIMATE-SIGMA = 5.5634 SUM OF SQUARED ERRORS-SSE= 433.31 MEAN OF DEPENDENT VARIABLE = 134.51 LOG OF THE LIKELIHOOD FUNCTION = -51.6471 VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 14 DF P-VALUE CORR. COEFFICIENT AT MEANS INCOME 1.0617 .2667 3.981 .001 .729 .2387 .8129 PRICE -1.3830 .8381E-01 -16.50 .000 -.975 -.9893 -.7846 CONSTANT 130.71 27.09 4.824 .000 .790 .0000 .9718 |_* Use the GENR command to get logarithms of the variables |_GENR LC=LOG(CONSUME) |_GENR LINC=LOG(INCOME) |_GENR LP=LOG(PRICE) |_* Obtain parameter estimates for the log-log model |_OLS LC LINC LP / RSTAT LOGLOG PREDICT=YHAT2 OLS ESTIMATION 17 OBSERVATIONS DEPENDENT VARIABLE = LC ...NOTE..SAMPLE RANGE SET TO: 1, 17 R-SQUARE = .9744 R-SQUARE ADJUSTED = .9707 VARIANCE OF THE ESTIMATE-SIGMA**2 = .97236E-03 STANDARD ERROR OF THE ESTIMATE-SIGMA = .31183E-01 SUM OF SQUARED ERRORS-SSE= .13613E-01 MEAN OF DEPENDENT VARIABLE = 4.8864 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -46.5862 VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 14 DF P-VALUE CORR. COEFFICIENT AT MEANS LINC 1.1432 .1560 7.328 .000 .891 .3216 1.1432 LP -.82884 .3611E-01 -22.95 .000 -.987 -1.0074 -.8288 CONSTANT 3.1636 .7048 4.489 .001 .768 .0000 3.1636 DURBIN-WATSON = 1.9267 VON NEUMANN RATIO = 2.0471 RHO = -.11385 RESIDUAL SUM = .10769E-13 RESIDUAL VARIANCE = .97236E-03 SUM OF ABSOLUTE ERRORS= .40583 R-SQUARE BETWEEN OBSERVED AND PREDICTED = .9744 R-SQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED = .9689 RUNS TEST: 9 RUNS, 9 POS, 0 ZERO, 8 NEG NORMAL STATISTIC = -.2366 |_* Obtain the antilog predicted values (include a bias adjustment) |_GENR YHAT2=EXP(YHAT2+$SIG2/2) ..NOTE..CURRENT VALUE OF $SIG2= .97236E-03 |_* |_* Print results |_PRINT YEAR CONSUME YHAT1 YHAT2 YEAR CONSUME YHAT1 YHAT2 1923.000 99.20000 93.69238 96.05522 1924.000 99.00000 96.42346 98.37372 1925.000 100.0000 98.57900 100.6381 1926.000 111.6000 116.7814 115.3575 1927.000 122.2000 122.4517 119.8714 1928.000 117.6000 122.9100 122.1649 1929.000 121.1000 123.0455 122.8039 1930.000 136.0000 135.4254 134.3674 1931.000 154.2000 149.8042 149.5499 1932.000 153.6000 152.0574 151.7951 1933.000 158.5000 153.9054 153.9190 1934.000 140.6000 145.5571 140.7879 1935.000 136.2000 145.0975 140.4307 1936.000 168.0000 161.5844 166.7092 1937.000 154.3000 156.8614 158.5688 1938.000 149.0000 156.2887 157.5912 1939.000 165.5000 156.1350 157.5576 |_STOP
Testing the Linear versus Log-log ModelVarious methods for testing the linear versus log-log model have been proposed. Some discussion is in Maddala [1992, pp.222-3]. A test procedure is described in Griffiths, Hill and Judge [1993, pp.345-6]. SHAZAM has the flexibility for the user to program these tests with SHAZAM commands. Additional references that can be consulted are: G. E. P. Box and D. R. Cox, "An Analysis of Transformations", Journal of the Royal Statistical Society, Series B, Vol. 26, 1964, pp. 211-243. R. Davidson and J.G. MacKinnon, "Testing Linear and Log-linear Regressions against Box-Cox Alternatives", Canadian Journal of Economics, 1985, pp. 499-517. L.G. Godfrey and M.R. Wickens, "Testing Linear and Log-linear Regressions for Functional Form", Review of Economic Studies, 1981, pp. 487-496. J.G. MacKinnon, H. White and R. Davidson, "Tests for Model Specification in the Presence of Alternative Hypotheses: Some Further Results", Journal of Econometrics, Vol. 21, 1983, pp. 53-70.
|