Dummy variables in models with a log-transformed dependent variableThis example is taken from Exercise 12.10, Griffiths, Hill and Judge [1993, pp. 427-429]. The data set contains weekly sales of a major brand of canned tuna by a supermarket chain in a large midwestern U.S. city. The regression equation of interest is: ln(SALES) =
where D1 and D2 are dummy variables for two different advertising schemes. The dependent variable is in log form. What impact do the dummy variables have on weekly sales of canned tuna ? Discussion on the interpretation of the coefficients of dummy variables when the dependent variable is log-transformed is given in:
The result developed in the above papers is that if b is the estimated coefficient on a dummy variable and V(b) is the estimated variance of b then:
g = 100 (exp(b gives an estimate of the percentage impact of the dummy variable on the variable being explained. Also of interest is: how do we interpret the coefficients on the price
variables ? The price variables are in levels and the dependent variable
is in log form. In this situation,
100( The SHAZAM commands (filename:
SAMPLE 1 52 READ (TUNA.txt) SALES PRICE1 PRICE2 PRICE3 D1 D2 GENR LSALES=LOG(SALES) * Estimation OLS LSALES PRICE1 PRICE2 PRICE3 D1 D2 / LOGLIN COEF=BETA STDERR=SE * Hypothesis testing TEST TEST D1=0 TEST D2=0 END TEST D1=D2 * Estimate the percentage effect of dummy variable D1 on SALES GEN1 C1=BETA:4 GEN1 SE1=SE:4 GEN1 G1= 100*(EXP(C1 - SE1*SE1/2) - 1) * Estimate the percentage effect of dummy variable D2 on SALES GEN1 C2=BETA:5 GEN1 SE2=SE:5 GEN1 G2=100*(EXP(C2 - SE2*SE2/2) - 1) PRINT G1 G2 STOP The The b1 (MP1) The elasticities that are reported in the final column of the SHAZAM OLS estimation output must be interpreted with caution. That is, they may not be appropriate for some explanatory variables. For example, elasticities reported for dummy variables likely have no meaningful interpretation. The SHAZAM output can be viewed. The price elasticities evaluated at the sample means (rounded to 2 decimal places) are:
The positive elasticities for PRICE2 and PRICE3 give evidence that Brand 2 and Brand 3 are substitutes for Brand 1. The negative elasticity for the own price PRICE1 is as expected -- sales of Brand 1 canned tuna will drop in response to any price increase. The estimation results show that the estimated coefficients on the dummy variables D1 and D2 are both significantly different from 0. A joint test of the hypothesis: H0:
gives an F-test statistic of 42.0. The 5% critical value from the F-distribution with (2,46) degrees of freedom is 3.20. This gives strong evidence to reject the null hypothesis. That is, advertising of any kind will increase sales of Brand 1 canned tuna. The dummy variable D2 is 1 for both a store display and a newspaper ad, whereas the dummy variable D1 is 1 for a store display only. The supermarket executives may be interested in knowing whether the newspaper ad will increase sales more than just a store display on its own. The OLS estimation results show that the estimated coefficient on D2 is higher than the estimated coefficient on D1. So this gives some support to the hypothesis that it is advantageous to combine a newspaper ad with a store display. However, to test this we can consider a test of the hypothesis: H0:
The t-test statistic computed from the SHAZAM We can now ask the question : What is the magnitude of the increase in sales when the store has both a store display and a newspaper ad ? The calculations show that weekly sales will increase by about 313%. In contrast, when only a store display is used, the weekly sales of Brand 1 canned tuna will increase by about 52%.
[SHAZAM Guide home]
SHAZAM output
|_SAMPLE 1 52
|_READ (TUNA.txt) SALES PRICE1 PRICE2 PRICE3 D1 D2
UNIT 88 IS NOW ASSIGNED TO: TUNA.txt
6 VARIABLES AND 52 OBSERVATIONS STARTING AT OBS 1
|_GENR LSALES=LOG(SALES)
|_* Estimation
|_OLS LSALES PRICE1 PRICE2 PRICE3 D1 D2 / LOGLIN COEF=BETA STDERR=SE
OLS ESTIMATION
52 OBSERVATIONS DEPENDENT VARIABLE = LSALES
...NOTE..SAMPLE RANGE SET TO: 1, 52
R-SQUARE = .8428 R-SQUARE ADJUSTED = .8257
VARIANCE OF THE ESTIMATE-SIGMA**2 = .11538
STANDARD ERROR OF THE ESTIMATE-SIGMA = .33967
SUM OF SQUARED ERRORS-SSE= 5.3073
MEAN OF DEPENDENT VARIABLE = 8.4372
LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -453.182
VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY
NAME COEFFICIENT ERROR 46 DF P-VALUE CORR. COEFFICIENT AT MEANS
PRICE1 -3.7463 .5765 -6.498 .000 -.692 -.4514 -2.9315
PRICE2 1.1495 .4486 2.562 .014 .353 .1584 .9264
PRICE3 1.2880 .6053 2.128 .039 .299 .1268 1.0223
D1 .42374 .1052 4.028 .000 .511 .2612 .1874
D2 1.4313 .1562 9.165 .000 .804 .6720 .2477
CONSTANT 8.9848 .6464 13.90 .000 .899 .0000 8.9848
|_* Hypothesis testing
|_TEST
|_ TEST D1=0
|_ TEST D2=0
|_END
F STATISTIC = 42.015301 WITH 2 AND 46 D.F. P-VALUE= .00000
WALD CHI-SQUARE STATISTIC = 84.030601 WITH 2 D.F. P-VALUE= .00000
UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = .02380
|_TEST D1=D2
TEST VALUE = -1.0075 STD. ERROR OF TEST VALUE .14692
T STATISTIC = -6.8577456 WITH 46 D.F. P-VALUE= .00000
F STATISTIC = 47.028674 WITH 1 AND 46 D.F. P-VALUE= .00000
WALD CHI-SQUARE STATISTIC = 47.028674 WITH 1 D.F. P-VALUE= .00000
UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY = .02126
|_* Estimate the percentage effect of dummy variable D1 on SALES
|_GEN1 C1=BETA:4
|_GEN1 SE1=SE:4
|_GEN1 G1= 100*(EXP(C1 - SE1*SE1/2) - 1)
|_* Estimate the percentage effect of dummy variable D2 on SALES
|_GEN1 C2=BETA:5
|_GEN1 SE2=SE:5
|_GEN1 G2=100*(EXP(C2 - SE2*SE2/2) - 1)
|_PRINT G1 G2
G1
51.92391
G2
313.3233
|_STOP
[SHAZAM Guide home]
|