Dummy variables in models with a log-transformed dependent variable
This example is taken from Exercise 12.10, Griffiths, Hill and Judge [1993, pp. 427-429]. The data set contains weekly sales of a major brand of canned tuna by a supermarket chain in a large midwestern U.S. city. The regression equation of interest is:
ln(SALES) = 0 + 1 PRICE1 + 2 PRICE2 + 3 PRICE3 + 4 D1 + 5 D2 + e
where D1 and D2 are dummy variables for two different advertising schemes. The dependent variable is in log form. What impact do the dummy variables have on weekly sales of canned tuna ? Discussion on the interpretation of the coefficients of dummy variables when the dependent variable is log-transformed is given in:
The result developed in the above papers is that if b is the estimated coefficient on a dummy variable and V(b) is the estimated variance of b then:
g = 100 (exp(b
gives an estimate of the percentage impact of the dummy variable on the variable being explained.
Also of interest is: how do we interpret the coefficients on the price variables ? The price variables are in levels and the dependent variable is in log form. In this situation, 100(1) gives the percentage change in sales of canned tuna for a 1 unit change in PRICE1 (holding all else constant).
The SHAZAM commands (filename:
SAMPLE 1 52 READ (TUNA.txt) SALES PRICE1 PRICE2 PRICE3 D1 D2 GENR LSALES=LOG(SALES) * Estimation OLS LSALES PRICE1 PRICE2 PRICE3 D1 D2 / LOGLIN COEF=BETA STDERR=SE * Hypothesis testing TEST TEST D1=0 TEST D2=0 END TEST D1=D2 * Estimate the percentage effect of dummy variable D1 on SALES GEN1 C1=BETA:4 GEN1 SE1=SE:4 GEN1 G1= 100*(EXP(C1 - SE1*SE1/2) - 1) * Estimate the percentage effect of dummy variable D2 on SALES GEN1 C2=BETA:5 GEN1 SE2=SE:5 GEN1 G2=100*(EXP(C2 - SE2*SE2/2) - 1) PRINT G1 G2 STOP
The elasticities that are reported in the final column of the SHAZAM OLS estimation output must be interpreted with caution. That is, they may not be appropriate for some explanatory variables. For example, elasticities reported for dummy variables likely have no meaningful interpretation.
The SHAZAM output can be viewed. The price elasticities evaluated at the sample means (rounded to 2 decimal places) are:
The positive elasticities for PRICE2 and PRICE3 give evidence that Brand 2 and Brand 3 are substitutes for Brand 1. The negative elasticity for the own price PRICE1 is as expected -- sales of Brand 1 canned tuna will drop in response to any price increase.
The estimation results show that the estimated coefficients on the dummy variables D1 and D2 are both significantly different from 0. A joint test of the hypothesis:
H0: 4 = 5 = 0
gives an F-test statistic of 42.0. The 5% critical value from the F-distribution with (2,46) degrees of freedom is 3.20. This gives strong evidence to reject the null hypothesis. That is, advertising of any kind will increase sales of Brand 1 canned tuna.
The dummy variable D2 is 1 for both a store display and a newspaper ad, whereas the dummy variable D1 is 1 for a store display only. The supermarket executives may be interested in knowing whether the newspaper ad will increase sales more than just a store display on its own. The OLS estimation results show that the estimated coefficient on D2 is higher than the estimated coefficient on D1. So this gives some support to the hypothesis that it is advantageous to combine a newspaper ad with a store display. However, to test this we can consider a test of the hypothesis:
H0: 4 = 5
The t-test statistic computed from the SHAZAM
We can now ask the question : What is the magnitude of the increase in sales when the store has both a store display and a newspaper ad ? The calculations show that weekly sales will increase by about 313%. In contrast, when only a store display is used, the weekly sales of Brand 1 canned tuna will increase by about 52%.
[SHAZAM Guide home]
|_SAMPLE 1 52 |_READ (TUNA.txt) SALES PRICE1 PRICE2 PRICE3 D1 D2 UNIT 88 IS NOW ASSIGNED TO: TUNA.txt 6 VARIABLES AND 52 OBSERVATIONS STARTING AT OBS 1 |_GENR LSALES=LOG(SALES) |_* Estimation |_OLS LSALES PRICE1 PRICE2 PRICE3 D1 D2 / LOGLIN COEF=BETA STDERR=SE OLS ESTIMATION 52 OBSERVATIONS DEPENDENT VARIABLE = LSALES ...NOTE..SAMPLE RANGE SET TO: 1, 52 R-SQUARE = .8428 R-SQUARE ADJUSTED = .8257 VARIANCE OF THE ESTIMATE-SIGMA**2 = .11538 STANDARD ERROR OF THE ESTIMATE-SIGMA = .33967 SUM OF SQUARED ERRORS-SSE= 5.3073 MEAN OF DEPENDENT VARIABLE = 8.4372 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -453.182 VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 46 DF P-VALUE CORR. COEFFICIENT AT MEANS PRICE1 -3.7463 .5765 -6.498 .000 -.692 -.4514 -2.9315 PRICE2 1.1495 .4486 2.562 .014 .353 .1584 .9264 PRICE3 1.2880 .6053 2.128 .039 .299 .1268 1.0223 D1 .42374 .1052 4.028 .000 .511 .2612 .1874 D2 1.4313 .1562 9.165 .000 .804 .6720 .2477 CONSTANT 8.9848 .6464 13.90 .000 .899 .0000 8.9848 |_* Hypothesis testing |_TEST |_ TEST D1=0 |_ TEST D2=0 |_END F STATISTIC = 42.015301 WITH 2 AND 46 D.F. P-VALUE= .00000 WALD CHI-SQUARE STATISTIC = 84.030601 WITH 2 D.F. P-VALUE= .00000 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = .02380 |_TEST D1=D2 TEST VALUE = -1.0075 STD. ERROR OF TEST VALUE .14692 T STATISTIC = -6.8577456 WITH 46 D.F. P-VALUE= .00000 F STATISTIC = 47.028674 WITH 1 AND 46 D.F. P-VALUE= .00000 WALD CHI-SQUARE STATISTIC = 47.028674 WITH 1 D.F. P-VALUE= .00000 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = .02126 |_* Estimate the percentage effect of dummy variable D1 on SALES |_GEN1 C1=BETA:4 |_GEN1 SE1=SE:4 |_GEN1 G1= 100*(EXP(C1 - SE1*SE1/2) - 1) |_* Estimate the percentage effect of dummy variable D2 on SALES |_GEN1 C2=BETA:5 |_GEN1 SE2=SE:5 |_GEN1 G2=100*(EXP(C2 - SE2*SE2/2) - 1) |_PRINT G1 G2 G1 51.92391 G2 313.3233 |_STOP
[SHAZAM Guide home]