Comparing linear vs. loglinear modelsAn equation that specifies a linear relationship among the variables gives an approximate description of some economic behaviour. An alternative approach is to consider a linear relationship among logtransformed variables. This is a loglog model  the dependent variable as well as all explanatory variables are transformed to logarithms. Since the relationship among the log variables is linear some researchers call this a loglinear model. Different functional forms give parameter estimates that have different economic interpretation. The parameters of the linear model have an interpretation as marginal effects. The elasticities will vary depending on the data. In contrast the parameters of the loglog model have an interpretation as elasticities. So the loglog model assumes a constant elasticity over all values of the data set. The log transformation is only applicable when all the observations in the data set are positive. Gujarati [Basic Econometrics, Third Edition, 1995, McGrawHill, p.387] notes that this can be guaranteed by using a transformation like log(X+k) where k is a positive scalar chosen to ensure positive values. However, users will then need to give careful thought to the interpretation of the parameter estimates. For a given data set there may be no particular reason to assume that one functional form is better than the other. A model selection approach is to estimate competing models by OLS and choose the model with the highest Rsquare. SHAZAM computes the Rsquare as:
R^{2} = 1 where SSE is the sum of squared estimated residuals and SST is the sum of squared deviations from the mean of the dependent variable. An equivalent computation is to compute the squared coefficient of correlation between the observed and predicted values of the dependent variable. (It may be useful to verify this as an exercise.) An Rsquare comparison is meaningful only if the dependent variable is the same for both models. So the Rsquare from the linear model cannot be compared with the Rsquare from the loglog model. That is, the Rsquare measure gives the proportion of variation in the dependent variable that is explained by the explanatory variables. For the loglog model the Rsquare gives the amount of variation in ln(Y) that is explained by the model. For comparison purposes we would like a measure that uses the antilog of ln(Y). For the loglog model, the way to proceed is to obtain the antilog predicted values and compute the Rsquare between the antilog of the observed and predicted values. This Rsquare can then be compared with the Rsquare obtained from OLS estimation of the linear model. When estimating a loglog model the following two options can be used
on the
ExampleThis example uses the Theil textile data set.
The SHAZAM commands (filename:
Note that on the The SHAZAM output can be inspected. The SHAZAM output from the linear model gives the result: RSQUARE = .9513 The SHAZAM output from the loglog model gives the result: RSQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED = .9689 In this example, the Rsquare for the loglog model is higher  so there is some evidence to prefer the loglog specification. Users may be interested in more formal procedures for testing between the linear and loglog model specification. Test procedures have been proposed by various researchers. Other functional forms can be considered. The BoxCox transformation creates a general functional form where both the linear model and loglog model are special cases. Features for estimating this model are described in the chapter on BoxCox regression in the SHAZAM User's Reference Manual. [SHAZAM Guide home] Computing antilog predictionsIn the above example, the loglog model is estimated and the antilog predictions are computed with the commands:
When constructing the antilog predictions some consideration should be given to using an unbiased predictor. A result from statistical theory is that if a random variable Y is normally distributed with mean µ and variance ^{2} then a random variable Z defined as Z=exp(Y) has mean: exp(µ +
^{2} (See, for example, Mood, Graybill and Boes [1974] and Ramanathan [1995 p.271]). Therefore, it is important to include an estimate of
^{2}
SHAZAM output for the comparison of linear and loglog models_SAMPLE 1 17 _READ (THEIL.txt) YEAR CONSUME INCOME PRICE UNIT 88 IS NOW ASSIGNED TO: THEIL.txt 4 VARIABLES AND 17 OBSERVATIONS STARTING AT OBS 1 _* Obtain parameter estimates for the linear model _OLS CONSUME INCOME PRICE / PREDICT=YHAT1 OLS ESTIMATION 17 OBSERVATIONS DEPENDENT VARIABLE = CONSUME ...NOTE..SAMPLE RANGE SET TO: 1, 17 RSQUARE = .9513 RSQUARE ADJUSTED = .9443 VARIANCE OF THE ESTIMATESIGMA**2 = 30.951 STANDARD ERROR OF THE ESTIMATESIGMA = 5.5634 SUM OF SQUARED ERRORSSSE= 433.31 MEAN OF DEPENDENT VARIABLE = 134.51 LOG OF THE LIKELIHOOD FUNCTION = 51.6471 VARIABLE ESTIMATED STANDARD TRATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 14 DF PVALUE CORR. COEFFICIENT AT MEANS INCOME 1.0617 .2667 3.981 .001 .729 .2387 .8129 PRICE 1.3830 .8381E01 16.50 .000 .975 .9893 .7846 CONSTANT 130.71 27.09 4.824 .000 .790 .0000 .9718 _* Use the GENR command to get logarithms of the variables _GENR LC=LOG(CONSUME) _GENR LINC=LOG(INCOME) _GENR LP=LOG(PRICE) _* Obtain parameter estimates for the loglog model _OLS LC LINC LP / RSTAT LOGLOG PREDICT=YHAT2 OLS ESTIMATION 17 OBSERVATIONS DEPENDENT VARIABLE = LC ...NOTE..SAMPLE RANGE SET TO: 1, 17 RSQUARE = .9744 RSQUARE ADJUSTED = .9707 VARIANCE OF THE ESTIMATESIGMA**2 = .97236E03 STANDARD ERROR OF THE ESTIMATESIGMA = .31183E01 SUM OF SQUARED ERRORSSSE= .13613E01 MEAN OF DEPENDENT VARIABLE = 4.8864 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = 46.5862 VARIABLE ESTIMATED STANDARD TRATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 14 DF PVALUE CORR. COEFFICIENT AT MEANS LINC 1.1432 .1560 7.328 .000 .891 .3216 1.1432 LP .82884 .3611E01 22.95 .000 .987 1.0074 .8288 CONSTANT 3.1636 .7048 4.489 .001 .768 .0000 3.1636 DURBINWATSON = 1.9267 VON NEUMANN RATIO = 2.0471 RHO = .11385 RESIDUAL SUM = .10769E13 RESIDUAL VARIANCE = .97236E03 SUM OF ABSOLUTE ERRORS= .40583 RSQUARE BETWEEN OBSERVED AND PREDICTED = .9744 RSQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED = .9689 RUNS TEST: 9 RUNS, 9 POS, 0 ZERO, 8 NEG NORMAL STATISTIC = .2366 _* Obtain the antilog predicted values (include a bias adjustment) _GENR YHAT2=EXP(YHAT2+$SIG2/2) ..NOTE..CURRENT VALUE OF $SIG2= .97236E03 _* _* Print results _PRINT YEAR CONSUME YHAT1 YHAT2 YEAR CONSUME YHAT1 YHAT2 1923.000 99.20000 93.69238 96.05522 1924.000 99.00000 96.42346 98.37372 1925.000 100.0000 98.57900 100.6381 1926.000 111.6000 116.7814 115.3575 1927.000 122.2000 122.4517 119.8714 1928.000 117.6000 122.9100 122.1649 1929.000 121.1000 123.0455 122.8039 1930.000 136.0000 135.4254 134.3674 1931.000 154.2000 149.8042 149.5499 1932.000 153.6000 152.0574 151.7951 1933.000 158.5000 153.9054 153.9190 1934.000 140.6000 145.5571 140.7879 1935.000 136.2000 145.0975 140.4307 1936.000 168.0000 161.5844 166.7092 1937.000 154.3000 156.8614 158.5688 1938.000 149.0000 156.2887 157.5912 1939.000 165.5000 156.1350 157.5576 _STOP
Testing the Linear versus Loglog ModelVarious methods for testing the linear versus loglog model have been proposed. Some discussion is in Maddala [1992, pp.2223]. A test procedure is described in Griffiths, Hill and Judge [1993, pp.3456]. SHAZAM has the flexibility for the user to program these tests with SHAZAM commands. Additional references that can be consulted are: G. E. P. Box and D. R. Cox, "An Analysis of Transformations", Journal of the Royal Statistical Society, Series B, Vol. 26, 1964, pp. 211243. R. Davidson and J.G. MacKinnon, "Testing Linear and Loglinear Regressions against BoxCox Alternatives", Canadian Journal of Economics, 1985, pp. 499517. L.G. Godfrey and M.R. Wickens, "Testing Linear and Loglinear Regressions for Functional Form", Review of Economic Studies, 1981, pp. 487496. J.G. MacKinnon, H. White and R. Davidson, "Tests for Model Specification in the Presence of Alternative Hypotheses: Some Further Results", Journal of Econometrics, Vol. 21, 1983, pp. 5370.
