Linear vs. log-linear models

## Comparing linear vs. log-linear models

An equation that specifies a linear relationship among the variables gives an approximate description of some economic behaviour. An alternative approach is to consider a linear relationship among log-transformed variables. This is a log-log model - the dependent variable as well as all explanatory variables are transformed to logarithms. Since the relationship among the log variables is linear some researchers call this a log-linear model.

Different functional forms give parameter estimates that have different economic interpretation. The parameters of the linear model have an interpretation as marginal effects. The elasticities will vary depending on the data. In contrast the parameters of the log-log model have an interpretation as elasticities. So the log-log model assumes a constant elasticity over all values of the data set.

The log transformation is only applicable when all the observations in the data set are positive. Gujarati [Basic Econometrics, Third Edition, 1995, McGraw-Hill, p.387] notes that this can be guaranteed by using a transformation like log(X+k) where k is a positive scalar chosen to ensure positive values. However, users will then need to give careful thought to the interpretation of the parameter estimates.

For a given data set there may be no particular reason to assume that one functional form is better than the other. A model selection approach is to estimate competing models by OLS and choose the model with the highest R-square. SHAZAM computes the R-square as:

R2 = 1 `-` SSE `/` SST

where SSE is the sum of squared estimated residuals and SST is the sum of squared deviations from the mean of the dependent variable. An equivalent computation is to compute the squared coefficient of correlation between the observed and predicted values of the dependent variable. (It may be useful to verify this as an exercise.)

An R-square comparison is meaningful only if the dependent variable is the same for both models. So the R-square from the linear model cannot be compared with the R-square from the log-log model. That is, the R-square measure gives the proportion of variation in the dependent variable that is explained by the explanatory variables. For the log-log model the R-square gives the amount of variation in ln(Y) that is explained by the model. For comparison purposes we would like a measure that uses the anti-log of ln(Y).

For the log-log model, the way to proceed is to obtain the antilog predicted values and compute the R-square between the antilog of the observed and predicted values. This R-square can then be compared with the R-square obtained from OLS estimation of the linear model.

When estimating a log-log model the following two options can be used on the `OLS` command.

 ` LOGLOG` This option tells SHAZAM that the dependent variable and explanatory variables have been transformed to logarithms. SHAZAM reports elasticities that are identical to the estimated coefficients. ` RSTAT` This option computes a number of residual statistics. When the `LOGLOG` option is also specified the SHAZAM output will report the the R-square between the antilog of the observed and predicted values. This can be used for comparison with the R-square obtained from the linear model.

#### Example

This example uses the Theil textile data set. The SHAZAM commands (filename: `LINLOG.SHA`) below estimate a linear demand equation. Log-transformed variables are then generated and the log-log model is estimated by the method of ordinary least squares.

 ```SAMPLE 1 17 READ (THEIL.txt) YEAR CONSUME INCOME PRICE * Obtain parameter estimates for the linear model OLS CONSUME INCOME PRICE / PREDICT=YHAT1 * Use the GENR command to get logarithms of the variables GENR LC=LOG(CONSUME) GENR LINC=LOG(INCOME) GENR LP=LOG(PRICE) * Obtain parameter estimates for the log-log model OLS LC LINC LP / RSTAT LOGLOG PREDICT=YHAT2 * Obtain the antilog predicted values (include a bias adjustment) GENR YHAT2=EXP(YHAT2+\$SIG2/2) * * Print results PRINT YEAR CONSUME YHAT1 YHAT2 STOP ```

Note that on the `OLS` estimation commands the `PREDICT=` option is used to save the predicted values in the variable specified. The predicted values from the linear model are saved in the variable assigned the name `YHAT1`. The predicted values from the log-log model are saved in the variable named `YHAT2`. From the log-log model estimation, predictions for `CONSUME` are constructed by taking antilogs. More details on computing antilog predictions are available.

The SHAZAM output can be inspected. The SHAZAM output from the linear model gives the result:

```  R-SQUARE =    .9513
```

The SHAZAM output from the log-log model gives the result:

``` R-SQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED =  .9689
```

In this example, the R-square for the log-log model is higher - so there is some evidence to prefer the log-log specification.

Users may be interested in more formal procedures for testing between the linear and log-log model specification. Test procedures have been proposed by various researchers.

Other functional forms can be considered. The Box-Cox transformation creates a general functional form where both the linear model and log-log model are special cases. Features for estimating this model are described in the chapter on Box-Cox regression in the SHAZAM User's Reference Manual. [SHAZAM Guide home]

#### Computing antilog predictions

In the above example, the log-log model is estimated and the antilog predictions are computed with the commands:

 ```* Obtain parameter estimates for the log-log model OLS LC LINC LP / RSTAT LOGLOG PREDICT=YHAT2 * Obtain the antilog predicted values (include a bias adjustment) GENR YHAT2=EXP(YHAT2+\$SIG2/2) ```

When constructing the antilog predictions some consideration should be given to using an unbiased predictor. A result from statistical theory is that if a random variable Y is normally distributed with mean µ and variance 2 then a random variable Z defined as Z=exp(Y) has mean:

exp(µ + 2`/`2)

(See, for example, Mood, Graybill and Boes  and Ramanathan [1995 p.271]).

Therefore, it is important to include an estimate of 2`/`2 in the computation of the antilog predictions. On the OLS estimation output the estimated error variance is reported on the line ` VARIANCE OF THE ESTIMATE-SIGMA**2`. After model estimation this estimate is available in the temporary variable with the special name `\$SIG2`. The `GENR` command for constructing the antilog predictions includes this in the calculation.

#### SHAZAM output for the comparison of linear and log-log models

``` |_SAMPLE 1 17
|_READ (THEIL.txt) YEAR CONSUME INCOME PRICE

UNIT 88 IS NOW ASSIGNED TO: THEIL.txt
4 VARIABLES AND       17 OBSERVATIONS STARTING AT OBS       1

|_* Obtain parameter estimates for the linear model
|_OLS CONSUME INCOME PRICE / PREDICT=YHAT1

OLS ESTIMATION
17 OBSERVATIONS     DEPENDENT VARIABLE = CONSUME
...NOTE..SAMPLE RANGE SET TO:    1,   17

R-SQUARE =    .9513     R-SQUARE ADJUSTED =    .9443
VARIANCE OF THE ESTIMATE-SIGMA**2 =   30.951
STANDARD ERROR OF THE ESTIMATE-SIGMA =   5.5634
SUM OF SQUARED ERRORS-SSE=   433.31
MEAN OF DEPENDENT VARIABLE =   134.51
LOG OF THE LIKELIHOOD FUNCTION = -51.6471

VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
NAME    COEFFICIENT   ERROR      14 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
INCOME     1.0617      .2667       3.981      .001  .729      .2387      .8129
PRICE     -1.3830      .8381E-01  -16.50      .000 -.975     -.9893     -.7846
CONSTANT   130.71      27.09       4.824      .000  .790      .0000      .9718

|_* Use the GENR command to get logarithms of the variables
|_GENR LC=LOG(CONSUME)
|_GENR LINC=LOG(INCOME)
|_GENR LP=LOG(PRICE)

|_* Obtain parameter estimates for the log-log model
|_OLS LC LINC LP / RSTAT LOGLOG PREDICT=YHAT2

OLS ESTIMATION
17 OBSERVATIONS     DEPENDENT VARIABLE = LC
...NOTE..SAMPLE RANGE SET TO:    1,   17

R-SQUARE =    .9744     R-SQUARE ADJUSTED =    .9707
VARIANCE OF THE ESTIMATE-SIGMA**2 =   .97236E-03
STANDARD ERROR OF THE ESTIMATE-SIGMA =   .31183E-01
SUM OF SQUARED ERRORS-SSE=   .13613E-01
MEAN OF DEPENDENT VARIABLE =   4.8864
LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -46.5862

VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
NAME    COEFFICIENT   ERROR      14 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
LINC       1.1432      .1560       7.328      .000  .891      .3216     1.1432
LP        -.82884      .3611E-01  -22.95      .000 -.987    -1.0074     -.8288
CONSTANT   3.1636      .7048       4.489      .001  .768      .0000     3.1636

DURBIN-WATSON = 1.9267    VON NEUMANN RATIO = 2.0471    RHO =  -.11385
RESIDUAL SUM =   .10769E-13  RESIDUAL VARIANCE =   .97236E-03
SUM OF ABSOLUTE ERRORS=   .40583
R-SQUARE BETWEEN OBSERVED AND PREDICTED =  .9744
R-SQUARE BETWEEN ANTILOGS OBSERVED AND PREDICTED =  .9689
RUNS TEST:    9 RUNS,    9 POS,    0 ZERO,    8 NEG  NORMAL STATISTIC =  -.2366

|_* Obtain the antilog predicted values (include a bias adjustment)
|_GENR YHAT2=EXP(YHAT2+\$SIG2/2)
..NOTE..CURRENT VALUE OF \$SIG2=   .97236E-03
|_*
|_* Print results
|_PRINT YEAR CONSUME YHAT1 YHAT2
YEAR           CONSUME        YHAT1          YHAT2
1923.000       99.20000       93.69238       96.05522
1924.000       99.00000       96.42346       98.37372
1925.000       100.0000       98.57900       100.6381
1926.000       111.6000       116.7814       115.3575
1927.000       122.2000       122.4517       119.8714
1928.000       117.6000       122.9100       122.1649
1929.000       121.1000       123.0455       122.8039
1930.000       136.0000       135.4254       134.3674
1931.000       154.2000       149.8042       149.5499
1932.000       153.6000       152.0574       151.7951
1933.000       158.5000       153.9054       153.9190
1934.000       140.6000       145.5571       140.7879
1935.000       136.2000       145.0975       140.4307
1936.000       168.0000       161.5844       166.7092
1937.000       154.3000       156.8614       158.5688
1938.000       149.0000       156.2887       157.5912
1939.000       165.5000       156.1350       157.5576
|_STOP
```

#### Testing the Linear versus Log-log Model

Various methods for testing the linear versus log-log model have been proposed. Some discussion is in Maddala [1992, pp.222-3]. A test procedure is described in Griffiths, Hill and Judge [1993, pp.345-6]. SHAZAM has the flexibility for the user to program these tests with SHAZAM commands. Additional references that can be consulted are:

```G. E. P. Box and D. R. Cox, "An Analysis of Transformations",
Journal of the Royal Statistical Society, Series B, Vol. 26, 1964,
pp. 211-243.

R. Davidson and J.G. MacKinnon, "Testing Linear and Log-linear
Regressions against Box-Cox Alternatives", Canadian Journal of
Economics, 1985, pp. 499-517.

L.G. Godfrey and M.R. Wickens, "Testing Linear and Log-linear Regressions
for Functional Form", Review of Economic Studies, 1981, pp. 487-496.

J.G. MacKinnon, H. White and R. Davidson, "Tests for Model Specification
in the Presence of Alternative Hypotheses: Some Further Results",
Journal of Econometrics, Vol. 21, 1983, pp. 53-70.
```