SHAZAM Dummy variables in log models

Dummy variables in models with a log-transformed dependent variable


This example is taken from Exercise 12.10, Griffiths, Hill and Judge [1993, pp. 427-429]. The data set contains weekly sales of a major brand of canned tuna by a supermarket chain in a large midwestern U.S. city. The regression equation of interest is:

    ln(SALES) = beta0 + beta1 PRICE1 + beta2 PRICE2 + beta3 PRICE3 + beta4 D1 + beta5 D2 + e

where D1 and D2 are dummy variables for two different advertising schemes. The dependent variable is in log form. What impact do the dummy variables have on weekly sales of canned tuna ? Discussion on the interpretation of the coefficients of dummy variables when the dependent variable is log-transformed is given in:

Halvorsen, R. and Palmquist, P., "The Interpretation of Dummy Variables in Semilogarithmic Equations", American Economic Review, Vol. 70, 1980, pp. 474-475.

Kennedy, P., "Estimation with Correctly Interpreted Dummy Variables in Semilogarithmic Equations", American Economic Review, Vol. 71, 1981, p. 801.

The result developed in the above papers is that if b is the estimated coefficient on a dummy variable and V(b) is the estimated variance of b then:

      g = 100 (exp(b - V(b)/2) - 1)

gives an estimate of the percentage impact of the dummy variable on the variable being explained.

Also of interest is: how do we interpret the coefficients on the price variables ? The price variables are in levels and the dependent variable is in log form. In this situation, 100(beta1) gives the percentage change in sales of canned tuna for a 1 unit change in PRICE1 (holding all else constant).

The SHAZAM commands (filename: TUNA.SHA) below estimate the coefficients of the regression equation and compute some test statistics. The percentage impact of each advertising dummy variable on the sales of canned tuna is also computed.


SAMPLE 1 52
READ (TUNA.txt) SALES PRICE1 PRICE2 PRICE3 D1 D2 
GENR LSALES=LOG(SALES)
* Estimation
OLS LSALES PRICE1 PRICE2 PRICE3 D1 D2 / LOGLIN COEF=BETA STDERR=SE
* Hypothesis testing
TEST 
  TEST D1=0
  TEST D2=0
END
TEST D1=D2
* Estimate the percentage effect of dummy variable D1 on SALES
GEN1 C1=BETA:4
GEN1 SE1=SE:4
GEN1 G1= 100*(EXP(C1 - SE1*SE1/2) - 1)
* Estimate the percentage effect of dummy variable D2 on SALES
GEN1 C2=BETA:5
GEN1 SE2=SE:5
GEN1 G2=100*(EXP(C2 - SE2*SE2/2) - 1)
PRINT G1 G2
STOP

The COEF=BETA option on the OLS command saves the estimated coefficients in the new variable BETA and the STDERR=SE option saves the estimated standard errors of the estimated coefficients in the new variable SE. These results are used later to compute the percentage impacts of the advertising dummy variables on sales.

The LOGLIN option is specified on the OLS command. When this option is used the elasticities at sample means are computed assuming a semi-logarithmic model specification where the dependent variable is in log form but the explanatory variables are in levels. Suppose that b1 is the estimated coefficient on the variable PRICE1 and MP1 is the mean of PRICE1. The elasticity evaluated at the mean is:

      b1 (MP1)

The elasticities that are reported in the final column of the SHAZAM OLS estimation output must be interpreted with caution. That is, they may not be appropriate for some explanatory variables. For example, elasticities reported for dummy variables likely have no meaningful interpretation.

The SHAZAM output can be viewed. The price elasticities evaluated at the sample means (rounded to 2 decimal places) are:

  Variable     Elasticity  
PRICE1 -2.93
PRICE2 0.93
PRICE3 1.02

The positive elasticities for PRICE2 and PRICE3 give evidence that Brand 2 and Brand 3 are substitutes for Brand 1. The negative elasticity for the own price PRICE1 is as expected -- sales of Brand 1 canned tuna will drop in response to any price increase.

The estimation results show that the estimated coefficients on the dummy variables D1 and D2 are both significantly different from 0. A joint test of the hypothesis:

      H0: beta4 = beta5 = 0

gives an F-test statistic of 42.0. The 5% critical value from the F-distribution with (2,46) degrees of freedom is 3.20. This gives strong evidence to reject the null hypothesis. That is, advertising of any kind will increase sales of Brand 1 canned tuna.

The dummy variable D2 is 1 for both a store display and a newspaper ad, whereas the dummy variable D1 is 1 for a store display only. The supermarket executives may be interested in knowing whether the newspaper ad will increase sales more than just a store display on its own. The OLS estimation results show that the estimated coefficient on D2 is higher than the estimated coefficient on D1. So this gives some support to the hypothesis that it is advantageous to combine a newspaper ad with a store display. However, to test this we can consider a test of the hypothesis:

      H0: beta4 = beta5

The t-test statistic computed from the SHAZAM TEST command is -6.86. SHAZAM reports the p-value as 0.00000. This actually means less than 0.000005 and so the null hypothesis is rejected at any reasonable significance level. We conclude that sales are increased in weeks when both forms of advertising are used.

We can now ask the question : What is the magnitude of the increase in sales when the store has both a store display and a newspaper ad ? The calculations show that weekly sales will increase by about 313%. In contrast, when only a store display is used, the weekly sales of Brand 1 canned tuna will increase by about 52%.


Home [SHAZAM Guide home]

SHAZAM output


 |_SAMPLE 1 52
 |_READ (TUNA.txt) SALES PRICE1 PRICE2 PRICE3 D1 D2
 
 UNIT 88 IS NOW ASSIGNED TO: TUNA.txt
    6 VARIABLES AND       52 OBSERVATIONS STARTING AT OBS       1
 
 |_GENR LSALES=LOG(SALES)

 |_* Estimation
 |_OLS LSALES PRICE1 PRICE2 PRICE3 D1 D2 / LOGLIN COEF=BETA STDERR=SE
 
  OLS ESTIMATION
       52 OBSERVATIONS     DEPENDENT VARIABLE = LSALES
 ...NOTE..SAMPLE RANGE SET TO:    1,   52
 
  R-SQUARE =    .8428     R-SQUARE ADJUSTED =    .8257
 VARIANCE OF THE ESTIMATE-SIGMA**2 =   .11538
 STANDARD ERROR OF THE ESTIMATE-SIGMA =   .33967
 SUM OF SQUARED ERRORS-SSE=   5.3073
 MEAN OF DEPENDENT VARIABLE =   8.4372
 LOG OF THE LIKELIHOOD FUNCTION(IF DEPVAR LOG) = -453.182
 
 VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
   NAME    COEFFICIENT   ERROR      46 DF   P-VALUE CORR. COEFFICIENT  AT MEANS 
 PRICE1    -3.7463      .5765      -6.498      .000 -.692     -.4514    -2.9315
 PRICE2     1.1495      .4486       2.562      .014  .353      .1584      .9264
 PRICE3     1.2880      .6053       2.128      .039  .299      .1268     1.0223
 D1         .42374      .1052       4.028      .000  .511      .2612      .1874
 D2         1.4313      .1562       9.165      .000  .804      .6720      .2477
 CONSTANT   8.9848      .6464       13.90      .000  .899      .0000     8.9848

 |_* Hypothesis testing
 |_TEST
 |_  TEST D1=0
 |_  TEST D2=0
 |_END
 F STATISTIC =   42.015301     WITH    2 AND   46 D.F.  P-VALUE=  .00000
 WALD CHI-SQUARE STATISTIC =   84.030601     WITH    2 D.F.  P-VALUE=  .00000
 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY =  .02380
 |_TEST D1=D2
 TEST VALUE =  -1.0075     STD. ERROR OF TEST VALUE   .14692
 T STATISTIC =  -6.8577456     WITH   46 D.F.    P-VALUE=  .00000
 F STATISTIC =   47.028674     WITH    1 AND   46 D.F.  P-VALUE=  .00000
 WALD CHI-SQUARE STATISTIC =   47.028674     WITH    1 D.F.  P-VALUE=  .00000
 UPPER BOUND ON P-VALUE BY CHEBYCHEV INEQUALITY =  .02126

 |_* Estimate the percentage effect of dummy variable D1 on SALES
 |_GEN1 C1=BETA:4
 |_GEN1 SE1=SE:4
 |_GEN1 G1= 100*(EXP(C1 - SE1*SE1/2) - 1)
 |_* Estimate the percentage effect of dummy variable D2 on SALES
 |_GEN1 C2=BETA:5
 |_GEN1 SE2=SE:5
 |_GEN1 G2=100*(EXP(C2 - SE2*SE2/2) - 1)
 |_PRINT G1 G2
     G1
    51.92391
     G2
    313.3233
 |_STOP

Home [SHAZAM Guide home]