SHAZAM Dummy variables in log models

## Dummy variables in models with a log-transformed dependent variable

This example is taken from Exercise 12.10, Griffiths, Hill and Judge [1993, pp. 427-429]. The data set contains weekly sales of a major brand of canned tuna by a supermarket chain in a large midwestern U.S. city. The regression equation of interest is:

ln(SALES) = 0 + 1 PRICE1 + 2 PRICE2 + 3 PRICE3 + 4 D1 + 5 D2 + e

where D1 and D2 are dummy variables for two different advertising schemes. The dependent variable is in log form. What impact do the dummy variables have on weekly sales of canned tuna ? Discussion on the interpretation of the coefficients of dummy variables when the dependent variable is log-transformed is given in:

Halvorsen, R. and Palmquist, P., "The Interpretation of Dummy Variables in Semilogarithmic Equations", American Economic Review, Vol. 70, 1980, pp. 474-475.

Kennedy, P., "Estimation with Correctly Interpreted Dummy Variables in Semilogarithmic Equations", American Economic Review, Vol. 71, 1981, p. 801.

The result developed in the above papers is that if b is the estimated coefficient on a dummy variable and V(b) is the estimated variance of b then:

g = 100 (exp(b `-` V(b)`/`2) `-` 1)

gives an estimate of the percentage impact of the dummy variable on the variable being explained.

Also of interest is: how do we interpret the coefficients on the price variables ? The price variables are in levels and the dependent variable is in log form. In this situation, 100( 1) gives the percentage change in sales of canned tuna for a 1 unit change in PRICE1 (holding all else constant).

The SHAZAM commands (filename: `TUNA.SHA`) below estimate the coefficients of the regression equation and compute some test statistics. The percentage impact of each advertising dummy variable on the sales of canned tuna is also computed.

```SAMPLE 1 52
READ (TUNA.txt) SALES PRICE1 PRICE2 PRICE3 D1 D2
GENR LSALES=LOG(SALES)
* Estimation
OLS LSALES PRICE1 PRICE2 PRICE3 D1 D2 / LOGLIN COEF=BETA STDERR=SE
* Hypothesis testing
TEST
TEST D1=0
TEST D2=0
END
TEST D1=D2
* Estimate the percentage effect of dummy variable D1 on SALES
GEN1 C1=BETA:4
GEN1 SE1=SE:4
GEN1 G1= 100*(EXP(C1 - SE1*SE1/2) - 1)
* Estimate the percentage effect of dummy variable D2 on SALES
GEN1 C2=BETA:5
GEN1 SE2=SE:5
GEN1 G2=100*(EXP(C2 - SE2*SE2/2) - 1)
PRINT G1 G2
STOP
```

The `COEF=BETA` option on the `OLS` command saves the estimated coefficients in the new variable `BETA` and the `STDERR=SE` option saves the estimated standard errors of the estimated coefficients in the new variable `SE`. These results are used later to compute the percentage impacts of the advertising dummy variables on sales.

The `LOGLIN` option is specified on the `OLS` command. When this option is used the elasticities at sample means are computed assuming a semi-logarithmic model specification where the dependent variable is in log form but the explanatory variables are in levels. Suppose that b1 is the estimated coefficient on the variable PRICE1 and MP1 is the mean of PRICE1. The elasticity evaluated at the mean is:

b1 (MP1)

The elasticities that are reported in the final column of the SHAZAM OLS estimation output must be interpreted with caution. That is, they may not be appropriate for some explanatory variables. For example, elasticities reported for dummy variables likely have no meaningful interpretation.

The SHAZAM output can be viewed. The price elasticities evaluated at the sample means (rounded to 2 decimal places) are:

Variable     Elasticity
PRICE1 `-2.93`
PRICE2 `0.93`
PRICE3 `1.02`

The positive elasticities for PRICE2 and PRICE3 give evidence that Brand 2 and Brand 3 are substitutes for Brand 1. The negative elasticity for the own price PRICE1 is as expected -- sales of Brand 1 canned tuna will drop in response to any price increase.

The estimation results show that the estimated coefficients on the dummy variables D1 and D2 are both significantly different from 0. A joint test of the hypothesis:

H0: 4 = 5 = 0

gives an F-test statistic of 42.0. The 5% critical value from the F-distribution with (2,46) degrees of freedom is 3.20. This gives strong evidence to reject the null hypothesis. That is, advertising of any kind will increase sales of Brand 1 canned tuna.

The dummy variable D2 is 1 for both a store display and a newspaper ad, whereas the dummy variable D1 is 1 for a store display only. The supermarket executives may be interested in knowing whether the newspaper ad will increase sales more than just a store display on its own. The OLS estimation results show that the estimated coefficient on D2 is higher than the estimated coefficient on D1. So this gives some support to the hypothesis that it is advantageous to combine a newspaper ad with a store display. However, to test this we can consider a test of the hypothesis:

H0: 4 = 5

The t-test statistic computed from the SHAZAM `TEST` command is `-`6.86. SHAZAM reports the p-value as 0.00000. This actually means less than 0.000005 and so the null hypothesis is rejected at any reasonable significance level. We conclude that sales are increased in weeks when both forms of advertising are used.

We can now ask the question : What is the magnitude of the increase in sales when the store has both a store display and a newspaper ad ? The calculations show that weekly sales will increase by about 313%. In contrast, when only a store display is used, the weekly sales of Brand 1 canned tuna will increase by about 52%. [SHAZAM Guide home]

### SHAZAM output

``` |_SAMPLE 1 52
|_READ (TUNA.txt) SALES PRICE1 PRICE2 PRICE3 D1 D2

UNIT 88 IS NOW ASSIGNED TO: TUNA.txt
6 VARIABLES AND       52 OBSERVATIONS STARTING AT OBS       1

|_GENR LSALES=LOG(SALES)

|_* Estimation
|_OLS LSALES PRICE1 PRICE2 PRICE3 D1 D2 / LOGLIN COEF=BETA STDERR=SE

OLS ESTIMATION
52 OBSERVATIONS     DEPENDENT VARIABLE = LSALES
...NOTE..SAMPLE RANGE SET TO:    1,   52

R-SQUARE =    .8428     R-SQUARE ADJUSTED =    .8257
VARIANCE OF THE ESTIMATE-SIGMA**2 =   .11538
STANDARD ERROR OF THE ESTIMATE-SIGMA =   .33967
SUM OF SQUARED ERRORS-SSE=   5.3073
MEAN OF DEPENDENT VARIABLE =   8.4372
LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -453.182

VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
NAME    COEFFICIENT   ERROR      46 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
PRICE1    -3.7463      .5765      -6.498      .000 -.692     -.4514    -2.9315
PRICE2     1.1495      .4486       2.562      .014  .353      .1584      .9264
PRICE3     1.2880      .6053       2.128      .039  .299      .1268     1.0223
D1         .42374      .1052       4.028      .000  .511      .2612      .1874
D2         1.4313      .1562       9.165      .000  .804      .6720      .2477
CONSTANT   8.9848      .6464       13.90      .000  .899      .0000     8.9848

|_* Hypothesis testing
|_TEST
|_  TEST D1=0
|_  TEST D2=0
|_END
F STATISTIC =   42.015301     WITH    2 AND   46 D.F.  P-VALUE=  .00000
WALD CHI-SQUARE STATISTIC =   84.030601     WITH    2 D.F.  P-VALUE=  .00000
UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY =  .02380
|_TEST D1=D2
TEST VALUE =  -1.0075     STD. ERROR OF TEST VALUE   .14692
T STATISTIC =  -6.8577456     WITH   46 D.F.    P-VALUE=  .00000
F STATISTIC =   47.028674     WITH    1 AND   46 D.F.  P-VALUE=  .00000
WALD CHI-SQUARE STATISTIC =   47.028674     WITH    1 D.F.  P-VALUE=  .00000
UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY =  .02126

|_* Estimate the percentage effect of dummy variable D1 on SALES
|_GEN1 C1=BETA:4
|_GEN1 SE1=SE:4
|_GEN1 G1= 100*(EXP(C1 - SE1*SE1/2) - 1)
|_* Estimate the percentage effect of dummy variable D2 on SALES
|_GEN1 C2=BETA:5
|_GEN1 SE2=SE:5
|_GEN1 G2=100*(EXP(C2 - SE2*SE2/2) - 1)
|_PRINT G1 G2
G1
51.92391
G2
313.3233
|_STOP
``` [SHAZAM Guide home]