Linear vs. log-linear models

Comparing linear vs. log-linear models


An equation that specifies a linear relationship among the variables gives an approximate description of some economic behaviour. An alternative approach is to consider a linear relationship among log-transformed variables. This is a log-log model - the dependent variable as well as all explanatory variables are transformed to logarithms. Since the relationship among the log variables is linear some researchers call this a log-linear model.

Different functional forms give parameter estimates that have different economic interpretation. The parameters of the linear model have an interpretation as marginal effects. The elasticities will vary depending on the data. In contrast the parameters of the log-log model have an interpretation as elasticities. So the log-log model assumes a constant elasticity over all values of the data set.

The log transformation is only applicable when all the observations in the data set are positive. Gujarati [Basic Econometrics, Third Edition, 1995, McGraw-Hill, p.387] notes that this can be guaranteed by using a transformation like log(X+k) where k is a positive scalar chosen to ensure positive values. However, users will then need to give careful thought to the interpretation of the parameter estimates.

For a given data set there may be no particular reason to assume that one functional form is better than the other. A model selection approach is to estimate competing models by OLS and choose the model with the highest R-square. SHAZAM computes the R-square as:

        R2 = 1 - SSE / SST

where SSE is the sum of squared estimated residuals and SST is the sum of squared deviations from the mean of the dependent variable. An equivalent computation is to compute the squared coefficient of correlation between the observed and predicted values of the dependent variable. (It may be useful to verify this as an exercise.)

An R-square comparison is meaningful only if the dependent variable is the same for both models. So the R-square from the linear model cannot be compared with the R-square from the log-log model. That is, the R-square measure gives the proportion of variation in the dependent variable that is explained by the explanatory variables. For the log-log model the R-square gives the amount of variation in ln(Y) that is explained by the model. For comparison purposes we would like a measure that uses the anti-log of ln(Y).

For the log-log model, the way to proceed is to obtain the antilog predicted values and compute the R-square between the antilog of the observed and predicted values. This R-square can then be compared with the R-square obtained from OLS estimation of the linear model.

When estimating a log-log model the following two options can be used on the OLS command.

LOGLOG This option tells SHAZAM that the dependent variable and explanatory variables have been transformed to logarithms. SHAZAM reports elasticities that are identical to the estimated coefficients.
RSTAT This option computes a number of residual statistics. When the LOGLOG option is also specified the SHAZAM output will report the the R-square between the antilog of the observed and predicted values. This can be used for comparison with the R-square obtained from the linear model.

Example

This example uses the Theil textile data set. The SHAZAM commands (filename: LINLOG.SHA) below estimate a linear demand equation. Log-transformed variables are then generated and the log-log model is estimated by the method of ordinary least squares.

SAMPLE 1 17
READ (THEIL.txt) YEAR CONSUME INCOME PRICE
* Obtain parameter estimates for the linear model
OLS CONSUME INCOME PRICE / PREDICT=YHAT1
* Use the GENR command to get logarithms of the variables
GENR LC=LOG(CONSUME)
GENR LINC=LOG(INCOME)
GENR LP=LOG(PRICE)
* Obtain parameter estimates for the log-log model
OLS LC LINC LP / RSTAT LOGLOG PREDICT=YHAT2
* Obtain the antilog predicted values (include a bias adjustment)
GENR YHAT2=EXP(YHAT2+$SIG2/2)
*
* Print results
PRINT YEAR CONSUME YHAT1 YHAT2
STOP

Note that on the OLS estimation commands the PREDICT= option is used to save the predicted values in the variable specified. The predicted values from the linear model are saved in the variable assigned the name YHAT1. The predicted values from the log-log model are saved in the variable named YHAT2. From the log-log model estimation, predictions for CONSUME are constructed by taking antilogs. More details on computing antilog predictions are available.

The SHAZAM output can be inspected. The SHAZAM output from the linear model gives the result:

  R-SQUARE =    .9513     

The SHAZAM output from the log-log model gives the result:

 R-SQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED =  .9689

In this example, the R-square for the log-log model is higher - so there is some evidence to prefer the log-log specification.

Users may be interested in more formal procedures for testing between the linear and log-log model specification. Test procedures have been proposed by various researchers.

Other functional forms can be considered. The Box-Cox transformation creates a general functional form where both the linear model and log-log model are special cases. Features for estimating this model are described in the chapter on Box-Cox regression in the SHAZAM User's Reference Manual.


Home [SHAZAM Guide home]

Computing antilog predictions

In the above example, the log-log model is estimated and the antilog predictions are computed with the commands:

* Obtain parameter estimates for the log-log model
OLS LC LINC LP / RSTAT LOGLOG PREDICT=YHAT2
* Obtain the antilog predicted values (include a bias adjustment)
GENR YHAT2=EXP(YHAT2+$SIG2/2)

When constructing the antilog predictions some consideration should be given to using an unbiased predictor. A result from statistical theory is that if a random variable Y is normally distributed with mean µ and variance sigma2 then a random variable Z defined as Z=exp(Y) has mean:

        exp(µ + sigma2/2)

(See, for example, Mood, Graybill and Boes [1974] and Ramanathan [1995 p.271]).

Therefore, it is important to include an estimate of sigma2/2 in the computation of the antilog predictions. On the OLS estimation output the estimated error variance is reported on the line VARIANCE OF THE ESTIMATE-SIGMA**2. After model estimation this estimate is available in the temporary variable with the special name $SIG2. The GENR command for constructing the antilog predictions includes this in the calculation.


back [Back to Top] Home [SHAZAM Guide home]

SHAZAM output for the comparison of linear and log-log models


 |_SAMPLE 1 17
 |_READ (THEIL.txt) YEAR CONSUME INCOME PRICE
 
 UNIT 88 IS NOW ASSIGNED TO: THEIL.txt
    4 VARIABLES AND       17 OBSERVATIONS STARTING AT OBS       1
 
 |_* Obtain parameter estimates for the linear model
 |_OLS CONSUME INCOME PRICE / PREDICT=YHAT1
 
  OLS ESTIMATION
       17 OBSERVATIONS     DEPENDENT VARIABLE = CONSUME
 ...NOTE..SAMPLE RANGE SET TO:    1,   17
 
  R-SQUARE =    .9513     R-SQUARE ADJUSTED =    .9443
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   30.951
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   5.5634
 SUM OF SQUARED ERRORS-SSE=   433.31
 MEAN OF DEPENDENT VARIABLE =   134.51
 LOG OF THE LIKELIHOOD FUNCTION = -51.6471
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      14 DF   P-VALUE CORR. COEFFICIENT  AT MEANS 
 INCOME     1.0617      .2667       3.981      .001  .729      .2387      .8129
 PRICE     -1.3830      .8381E-01  -16.50      .000 -.975     -.9893     -.7846
 CONSTANT   130.71      27.09       4.824      .000  .790      .0000      .9718

 |_* Use the GENR command to get logarithms of the variables
 |_GENR LC=LOG(CONSUME)
 |_GENR LINC=LOG(INCOME)
 |_GENR LP=LOG(PRICE)

 |_* Obtain parameter estimates for the log-log model
 |_OLS LC LINC LP / RSTAT LOGLOG PREDICT=YHAT2
 
  OLS ESTIMATION
       17 OBSERVATIONS     DEPENDENT VARIABLE = LC
 ...NOTE..SAMPLE RANGE SET TO:    1,   17
 
  R-SQUARE =    .9744     R-SQUARE ADJUSTED =    .9707
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   .97236E-03
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   .31183E-01
 SUM OF SQUARED ERRORS-SSE=   .13613E-01
 MEAN OF DEPENDENT VARIABLE =   4.8864
 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -46.5862
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      14 DF   P-VALUE CORR. COEFFICIENT  AT MEANS 
 LINC       1.1432      .1560       7.328      .000  .891      .3216     1.1432
 LP        -.82884      .3611E-01  -22.95      .000 -.987    -1.0074     -.8288
 CONSTANT   3.1636      .7048       4.489      .001  .768      .0000     3.1636
 
 DURBIN-WATSON = 1.9267    VON NEUMANN RATIO = 2.0471    RHO =  -.11385
 RESIDUAL SUM =   .10769E-13  RESIDUAL VARIANCE =   .97236E-03
 SUM OF ABSOLUTE ERRORS=   .40583
 R-SQUARE BETWEEN OBSERVED AND PREDICTED =  .9744
 R-SQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED =  .9689
 RUNS TEST:    9 RUNS,    9 POS,    0 ZERO,    8 NEG  NORMAL STATISTIC =  -.2366

 |_* Obtain the antilog predicted values (include a bias adjustment)
 |_GENR YHAT2=EXP(YHAT2+$SIG2/2)
 ..NOTE..CURRENT VALUE OF $SIG2=   .97236E-03
 |_*
 |_* Print results
 |_PRINT YEAR CONSUME YHAT1 YHAT2
       YEAR           CONSUME        YHAT1          YHAT2
    1923.000       99.20000       93.69238       96.05522
    1924.000       99.00000       96.42346       98.37372
    1925.000       100.0000       98.57900       100.6381
    1926.000       111.6000       116.7814       115.3575
    1927.000       122.2000       122.4517       119.8714
    1928.000       117.6000       122.9100       122.1649
    1929.000       121.1000       123.0455       122.8039
    1930.000       136.0000       135.4254       134.3674
    1931.000       154.2000       149.8042       149.5499
    1932.000       153.6000       152.0574       151.7951
    1933.000       158.5000       153.9054       153.9190
    1934.000       140.6000       145.5571       140.7879
    1935.000       136.2000       145.0975       140.4307
    1936.000       168.0000       161.5844       166.7092
    1937.000       154.3000       156.8614       158.5688
    1938.000       149.0000       156.2887       157.5912
    1939.000       165.5000       156.1350       157.5576
 |_STOP

back [Back to Top] Home [SHAZAM Guide home]

Testing the Linear versus Log-log Model

Various methods for testing the linear versus log-log model have been proposed. Some discussion is in Maddala [1992, pp.222-3]. A test procedure is described in Griffiths, Hill and Judge [1993, pp.345-6]. SHAZAM has the flexibility for the user to program these tests with SHAZAM commands. Additional references that can be consulted are:

G. E. P. Box and D. R. Cox, "An Analysis of Transformations", 
    Journal of the Royal Statistical Society, Series B, Vol. 26, 1964, 
    pp. 211-243.
    
R. Davidson and J.G. MacKinnon, "Testing Linear and Log-linear 
    Regressions against Box-Cox Alternatives", Canadian Journal of 
    Economics, 1985, pp. 499-517.

L.G. Godfrey and M.R. Wickens, "Testing Linear and Log-linear Regressions
    for Functional Form", Review of Economic Studies, 1981, pp. 487-496.

J.G. MacKinnon, H. White and R. Davidson, "Tests for Model Specification
    in the Presence of Alternative Hypotheses: Some Further Results",
    Journal of Econometrics, Vol. 21, 1983, pp. 53-70.

back [Back to Top] Home [SHAZAM Guide home]