Linear vs. log-linear models

Comparing linear vs. log-linear models

An equation that specifies a linear relationship among the variables gives an approximate description of some economic behaviour. An alternative approach is to consider a linear relationship among log-transformed variables. This is a log-log model - the dependent variable as well as all explanatory variables are transformed to logarithms. Since the relationship among the log variables is linear some researchers call this a log-linear model.

Different functional forms give parameter estimates that have different economic interpretation. The parameters of the linear model have an interpretation as marginal effects. The elasticities will vary depending on the data. In contrast the parameters of the log-log model have an interpretation as elasticities. So the log-log model assumes a constant elasticity over all values of the data set.

The log transformation is only applicable when all the observations in the data set are positive. Gujarati [Basic Econometrics, Third Edition, 1995, McGraw-Hill, p.387] notes that this can be guaranteed by using a transformation like log(X+k) where k is a positive scalar chosen to ensure positive values. However, users will then need to give careful thought to the interpretation of the parameter estimates.

For a given data set there may be no particular reason to assume that one functional form is better than the other. A model selection approach is to estimate competing models by OLS and choose the model with the highest R-square. SHAZAM computes the R-square as:

R² = 1 - SSE / SST

where SSE is the sum of squared estimated residuals and SST is the sum of squared deviations from the mean of the dependent variable. An equivalent computation is to compute the squared coefficient of correlation between the observed and predicted values of the dependent variable. (It may be useful to verify this as an exercise.)

An R-square comparison is meaningful only if the dependent variable is the same for both models. So the R-square from the linear model cannot be compared with the R-square from the log-log model. That is, the R-square measure gives the proportion of variation in the dependent variable that is explained by the explanatory variables. For the log-log model the R-square gives the amount of variation in ln(Y) that is explained by the model. For comparison purposes we would like a measure that uses the anti-log of ln(Y).

For the log-log model, the way to proceed is to obtain the antilog predicted values and compute the R-square between the antilog of the observed and predicted values. This R-square can then be compared with the R-square obtained from OLS estimation of the linear model.

When estimating a log-log model the following two options can be used on the OLS command.

LOGLOG This option tells SHAZAM that the dependent variable and explanatory variables have been transformed to logarithms. SHAZAM reports elasticities that are identical to the estimated coefficients.

RSTAT This option computes a number of residual statistics. When the LOGLOG option is also specified the SHAZAM output will report the the R-square between the antilog of the observed and predicted values. This can be used for comparison with the R-square obtained from the linear model.

Example

This example uses the Theil textile data set. The SHAZAM commands (filename: LINLOG.SHA) below estimate a linear demand equation. Log-transformed variables are then generated and the log-log model is estimated by the method of ordinary least squares.

SAMPLE 1 17 READ (THEIL.txt) YEAR CONSUME INCOME PRICE * Obtain parameter estimates for the linear model OLS CONSUME INCOME PRICE / PREDICT=YHAT1 * Use the GENR command to get logarithms of the variables GENR LC=LOG(CONSUME) GENR LINC=LOG(INCOME) GENR LP=LOG(PRICE) * Obtain parameter estimates for the log-log model OLS LC LINC LP / RSTAT LOGLOG PREDICT=YHAT2 * Obtain the antilog predicted values (include a bias adjustment) GENR YHAT2=EXP(YHAT2+$SIG2/2) * * Print results PRINT YEAR CONSUME YHAT1 YHAT2 STOP

Note that on the OLS estimation commands the PREDICT= option is used to save the predicted values in the variable specified. The predicted values from the linear model are saved in the variable assigned the name YHAT1. The predicted values from the log-log model are saved in the variable named YHAT2. From the log-log model estimation, predictions for CONSUME are constructed by taking antilogs. More details on computing antilog predictions are available.

The SHAZAM output can be inspected. The SHAZAM output from the linear model gives the result:

R-SQUARE = .9513

The SHAZAM output from the log-log model gives the result:

R-SQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED = .9689

In this example, the R-square for the log-log model is higher - so there is some evidence to prefer the log-log specification.

Users may be interested in more formal procedures for testing between the linear and log-log model specification. Test procedures have been proposed by various researchers.

Other functional forms can be considered. The Box-Cox transformation creates a general functional form where both the linear model and log-log model are special cases. Features for estimating this model are described in the chapter on Box-Cox regression in the SHAZAM User's Reference Manual.

[SHAZAM Guide home]

Computing antilog predictions

In the above example, the log-log model is estimated and the antilog predictions are computed with the commands:

* Obtain parameter estimates for the log-log model OLS LC LINC LP / RSTAT LOGLOG PREDICT=YHAT2 * Obtain the antilog predicted values (include a bias adjustment) GENR YHAT2=EXP(YHAT2+$SIG2/2)

When constructing the antilog predictions some consideration should be given to using an unbiased predictor. A result from statistical theory is that if a random variable Y is normally distributed with mean µ and variance ² then a random variable Z defined as Z=exp(Y) has mean:

exp(µ + ²/2)

(See, for example, Mood, Graybill and Boes [1974] and Ramanathan [1995 p.271]).

Therefore, it is important to include an estimate of ²/2 in the computation of the antilog predictions. On the OLS estimation output the estimated error variance is reported on the line VARIANCE OF THE ESTIMATE-SIGMA**2. After model estimation this estimate is available in the temporary variable with the special name $SIG2. The GENR command for constructing the antilog predictions includes this in the calculation.

[Back to Top] [SHAZAM Guide home]

SHAZAM output for the comparison of linear and log-log models

|_SAMPLE 1 17 |_READ (THEIL.txt) YEAR CONSUME INCOME PRICE UNIT 88 IS NOW ASSIGNED TO: THEIL.txt 4 VARIABLES AND 17 OBSERVATIONS STARTING AT OBS 1 |_* Obtain parameter estimates for the linear model |_OLS CONSUME INCOME PRICE / PREDICT=YHAT1 OLS ESTIMATION 17 OBSERVATIONS DEPENDENT VARIABLE = CONSUME ...NOTE..SAMPLE RANGE SET TO: 1, 17 R-SQUARE = .9513 R-SQUARE ADJUSTED = .9443 VARIANCE OF THE ESTIMATE-SIGMA**2 = 30.951 STANDARD ERROR OF THE ESTIMATE-SIGMA = 5.5634 SUM OF SQUARED ERRORS-SSE= 433.31 MEAN OF DEPENDENT VARIABLE = 134.51 LOG OF THE LIKELIHOOD FUNCTION = -51.6471 VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 14 DF P-VALUE CORR. COEFFICIENT AT MEANS INCOME 1.0617 .2667 3.981 .001 .729 .2387 .8129 PRICE -1.3830 .8381E-01 -16.50 .000 -.975 -.9893 -.7846 CONSTANT 130.71 27.09 4.824 .000 .790 .0000 .9718 |_* Use the GENR command to get logarithms of the variables |_GENR LC=LOG(CONSUME) |_GENR LINC=LOG(INCOME) |_GENR LP=LOG(PRICE) |_* Obtain parameter estimates for the log-log model |_OLS LC LINC LP / RSTAT LOGLOG PREDICT=YHAT2 OLS ESTIMATION 17 OBSERVATIONS DEPENDENT VARIABLE = LC ...NOTE..SAMPLE RANGE SET TO: 1, 17 R-SQUARE = .9744 R-SQUARE ADJUSTED = .9707 VARIANCE OF THE ESTIMATE-SIGMA**2 = .97236E-03 STANDARD ERROR OF THE ESTIMATE-SIGMA = .31183E-01 SUM OF SQUARED ERRORS-SSE= .13613E-01 MEAN OF DEPENDENT VARIABLE = 4.8864 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -46.5862 VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 14 DF P-VALUE CORR. COEFFICIENT AT MEANS LINC 1.1432 .1560 7.328 .000 .891 .3216 1.1432 LP -.82884 .3611E-01 -22.95 .000 -.987 -1.0074 -.8288 CONSTANT 3.1636 .7048 4.489 .001 .768 .0000 3.1636 DURBIN-WATSON = 1.9267 VON NEUMANN RATIO = 2.0471 RHO = -.11385 RESIDUAL SUM = .10769E-13 RESIDUAL VARIANCE = .97236E-03 SUM OF ABSOLUTE ERRORS= .40583 R-SQUARE BETWEEN OBSERVED AND PREDICTED = .9744 R-SQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED = .9689 RUNS TEST: 9 RUNS, 9 POS, 0 ZERO, 8 NEG NORMAL STATISTIC = -.2366 |_* Obtain the antilog predicted values (include a bias adjustment) |_GENR YHAT2=EXP(YHAT2+$SIG2/2) ..NOTE..CURRENT VALUE OF $SIG2= .97236E-03 |_* |_* Print results |_PRINT YEAR CONSUME YHAT1 YHAT2 YEAR CONSUME YHAT1 YHAT2 1923.000 99.20000 93.69238 96.05522 1924.000 99.00000 96.42346 98.37372 1925.000 100.0000 98.57900 100.6381 1926.000 111.6000 116.7814 115.3575 1927.000 122.2000 122.4517 119.8714 1928.000 117.6000 122.9100 122.1649 1929.000 121.1000 123.0455 122.8039 1930.000 136.0000 135.4254 134.3674 1931.000 154.2000 149.8042 149.5499 1932.000 153.6000 152.0574 151.7951 1933.000 158.5000 153.9054 153.9190 1934.000 140.6000 145.5571 140.7879 1935.000 136.2000 145.0975 140.4307 1936.000 168.0000 161.5844 166.7092 1937.000 154.3000 156.8614 158.5688 1938.000 149.0000 156.2887 157.5912 1939.000 165.5000 156.1350 157.5576 |_STOP

[Back to Top] [SHAZAM Guide home]

Testing the Linear versus Log-log Model

Various methods for testing the linear versus log-log model have been proposed. Some discussion is in Maddala [1992, pp.222-3]. A test procedure is described in Griffiths, Hill and Judge [1993, pp.345-6]. SHAZAM has the flexibility for the user to program these tests with SHAZAM commands. Additional references that can be consulted are:

G. E. P. Box and D. R. Cox, "An Analysis of Transformations", Journal of the Royal Statistical Society, Series B, Vol. 26, 1964, pp. 211-243. R. Davidson and J.G. MacKinnon, "Testing Linear and Log-linear Regressions against Box-Cox Alternatives", Canadian Journal of Economics, 1985, pp. 499-517. L.G. Godfrey and M.R. Wickens, "Testing Linear and Log-linear Regressions for Functional Form", Review of Economic Studies, 1981, pp. 487-496. J.G. MacKinnon, H. White and R. Davidson, "Tests for Model Specification in the Presence of Alternative Hypotheses: Some Further Results", Journal of Econometrics, Vol. 21, 1983, pp. 53-70.

[Back to Top] [SHAZAM Guide home]