SHAZAM Dummy variables in log models

Dummy variables in models with a log-transformed dependent variable

This example is taken from Exercise 12.10, Griffiths, Hill and Judge [1993, pp. 427-429]. The data set contains weekly sales of a major brand of canned tuna by a supermarket chain in a large midwestern U.S. city. The regression equation of interest is:

ln(SALES) = beta ₀ + beta ₁ PRICE1 + beta ₂ PRICE2 + beta ₃ PRICE3 + beta ₄ D1 + beta ₅ D2 + e

where D1 and D2 are dummy variables for two different advertising schemes. The dependent variable is in log form. What impact do the dummy variables have on weekly sales of canned tuna ? Discussion on the interpretation of the coefficients of dummy variables when the dependent variable is log-transformed is given in:

Halvorsen, R. and Palmquist, P., "The Interpretation of Dummy Variables in Semilogarithmic Equations", American Economic Review, Vol. 70, 1980, pp. 474-475.
Kennedy, P., "Estimation with Correctly Interpreted Dummy Variables in Semilogarithmic Equations", American Economic Review, Vol. 71, 1981, p. 801.

The result developed in the above papers is that if b is the estimated coefficient on a dummy variable and V(b) is the estimated variance of b then:

g = 100 (exp(b - V(b)/2) - 1)

gives an estimate of the percentage impact of the dummy variable on the variable being explained.

Also of interest is: how do we interpret the coefficients on the price variables ? The price variables are in levels and the dependent variable is in log form. In this situation, 100( beta ₁) gives the percentage change in sales of canned tuna for a 1 unit change in PRICE1 (holding all else constant).

The SHAZAM commands (filename: TUNA.SHA) below estimate the coefficients of the regression equation and compute some test statistics. The percentage impact of each advertising dummy variable on the sales of canned tuna is also computed.

SAMPLE 1 52 READ (TUNA.txt) SALES PRICE1 PRICE2 PRICE3 D1 D2 GENR LSALES=LOG(SALES) * Estimation OLS LSALES PRICE1 PRICE2 PRICE3 D1 D2 / LOGLIN COEF=BETA STDERR=SE * Hypothesis testing TEST TEST D1=0 TEST D2=0 END TEST D1=D2 * Estimate the percentage effect of dummy variable D1 on SALES GEN1 C1=BETA:4 GEN1 SE1=SE:4 GEN1 G1= 100*(EXP(C1 - SE1*SE1/2) - 1) * Estimate the percentage effect of dummy variable D2 on SALES GEN1 C2=BETA:5 GEN1 SE2=SE:5 GEN1 G2=100*(EXP(C2 - SE2*SE2/2) - 1) PRINT G1 G2 STOP

The COEF=BETA option on the OLS command saves the estimated coefficients in the new variable BETA and the STDERR=SE option saves the estimated standard errors of the estimated coefficients in the new variable SE. These results are used later to compute the percentage impacts of the advertising dummy variables on sales.

The LOGLIN option is specified on the OLS command. When this option is used the elasticities at sample means are computed assuming a semi-logarithmic model specification where the dependent variable is in log form but the explanatory variables are in levels. Suppose that b₁ is the estimated coefficient on the variable PRICE1 and MP1 is the mean of PRICE1. The elasticity evaluated at the mean is:

b₁ (MP1)

The elasticities that are reported in the final column of the SHAZAM OLS estimation output must be interpreted with caution. That is, they may not be appropriate for some explanatory variables. For example, elasticities reported for dummy variables likely have no meaningful interpretation.

The SHAZAM output can be viewed. The price elasticities evaluated at the sample means (rounded to 2 decimal places) are:

Variable Elasticity

PRICE1 -2.93

PRICE2 0.93

PRICE3 1.02

The positive elasticities for PRICE2 and PRICE3 give evidence that Brand 2 and Brand 3 are substitutes for Brand 1. The negative elasticity for the own price PRICE1 is as expected -- sales of Brand 1 canned tuna will drop in response to any price increase.

The estimation results show that the estimated coefficients on the dummy variables D1 and D2 are both significantly different from 0. A joint test of the hypothesis:

H₀: beta ₄ = beta ₅ = 0

gives an F-test statistic of 42.0. The 5% critical value from the F-distribution with (2,46) degrees of freedom is 3.20. This gives strong evidence to reject the null hypothesis. That is, advertising of any kind will increase sales of Brand 1 canned tuna.

The dummy variable D2 is 1 for both a store display and a newspaper ad, whereas the dummy variable D1 is 1 for a store display only. The supermarket executives may be interested in knowing whether the newspaper ad will increase sales more than just a store display on its own. The OLS estimation results show that the estimated coefficient on D2 is higher than the estimated coefficient on D1. So this gives some support to the hypothesis that it is advantageous to combine a newspaper ad with a store display. However, to test this we can consider a test of the hypothesis:

H₀: beta ₄ = beta ₅

The t-test statistic computed from the SHAZAM TEST command is -6.86. SHAZAM reports the p-value as 0.00000. This actually means less than 0.000005 and so the null hypothesis is rejected at any reasonable significance level. We conclude that sales are increased in weeks when both forms of advertising are used.

We can now ask the question : What is the magnitude of the increase in sales when the store has both a store display and a newspaper ad ? The calculations show that weekly sales will increase by about 313%. In contrast, when only a store display is used, the weekly sales of Brand 1 canned tuna will increase by about 52%.

[SHAZAM Guide home]

SHAZAM output

|_SAMPLE 1 52 |_READ (TUNA.txt) SALES PRICE1 PRICE2 PRICE3 D1 D2 UNIT 88 IS NOW ASSIGNED TO: TUNA.txt 6 VARIABLES AND 52 OBSERVATIONS STARTING AT OBS 1 |_GENR LSALES=LOG(SALES) |_* Estimation |_OLS LSALES PRICE1 PRICE2 PRICE3 D1 D2 / LOGLIN COEF=BETA STDERR=SE OLS ESTIMATION 52 OBSERVATIONS DEPENDENT VARIABLE = LSALES ...NOTE..SAMPLE RANGE SET TO: 1, 52 R-SQUARE = .8428 R-SQUARE ADJUSTED = .8257 VARIANCE OF THE ESTIMATE-SIGMA**2 = .11538 STANDARD ERROR OF THE ESTIMATE-SIGMA = .33967 SUM OF SQUARED ERRORS-SSE= 5.3073 MEAN OF DEPENDENT VARIABLE = 8.4372 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -453.182 VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 46 DF P-VALUE CORR. COEFFICIENT AT MEANS PRICE1 -3.7463 .5765 -6.498 .000 -.692 -.4514 -2.9315 PRICE2 1.1495 .4486 2.562 .014 .353 .1584 .9264 PRICE3 1.2880 .6053 2.128 .039 .299 .1268 1.0223 D1 .42374 .1052 4.028 .000 .511 .2612 .1874 D2 1.4313 .1562 9.165 .000 .804 .6720 .2477 CONSTANT 8.9848 .6464 13.90 .000 .899 .0000 8.9848 |_* Hypothesis testing |_TEST |_ TEST D1=0 |_ TEST D2=0 |_END F STATISTIC = 42.015301 WITH 2 AND 46 D.F. P-VALUE= .00000 WALD CHI-SQUARE STATISTIC = 84.030601 WITH 2 D.F. P-VALUE= .00000 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = .02380 |_TEST D1=D2 TEST VALUE = -1.0075 STD. ERROR OF TEST VALUE .14692 T STATISTIC = -6.8577456 WITH 46 D.F. P-VALUE= .00000 F STATISTIC = 47.028674 WITH 1 AND 46 D.F. P-VALUE= .00000 WALD CHI-SQUARE STATISTIC = 47.028674 WITH 1 D.F. P-VALUE= .00000 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = .02126 |_* Estimate the percentage effect of dummy variable D1 on SALES |_GEN1 C1=BETA:4 |_GEN1 SE1=SE:4 |_GEN1 G1= 100*(EXP(C1 - SE1*SE1/2) - 1) |_* Estimate the percentage effect of dummy variable D2 on SALES |_GEN1 C2=BETA:5 |_GEN1 SE2=SE:5 |_GEN1 G2=100*(EXP(C2 - SE2*SE2/2) - 1) |_PRINT G1 G2 G1 51.92391 G2 313.3233 |_STOP

[SHAZAM Guide home]