Logit and Probit Models - Testing for HeteroskedasticityDavidson and MacKinnon (1984) propose test statistics for heteroskedasticity in logit and probit models. It is assumed that the heteroskedasiticity is a function of variables Z. The Z variables are typically chosen from the X variables that are included in the logit or probit model. Test statistics are based on the Lagrange multiplier (LM) principle. The estimation results from a logit or probit model are used to construct an artificial regression designed to test for heteroskedasticity. A test statistic is the explained sum of squares from the artificial regression. Sampling experiments were used to compare the properties of alternative forms of the LM test statistics. Davidson and MacKinnon (1984, p. 259) concluded that the test statistic named LM2 "tends to be the most reliable test under the null, but not the most powerful". The SHAZAM procedure
The general format for using the
The Warning: The SHAZAM commands for applying tests of heteroskedasticity following logit estimation for the school budget voting model are below.
The SHAZAM output can be viewed. The first test considered that the heteroskedasiticity was a function of all the explanatory variables in the logit model. The calculated test statistic was 5.72. A comparison with the chi-square distribution with 8 degrees of freedom gives a p-value of 0.68. Therefore, there is no evidence of heteroskedasiticity at any of the usual significance levels. The second test for heteroskedasticity considered the possibility of a different error variance for school teachers and individuals in occupations other than school teaching. (It can be noted that, for an OLS regression, the Goldfeld-Quandt test is designed for testing for different error variances in two groups of observations). For this test, the calculated test statistic was 1.96. The p-value of 0.16 again suggests no evidence of heteroskedasticity.
[SHAZAM Guide home]
TESTHET
|
=SET NOECHO PROC TESTHET * Logit and Probit Models - Test for heteroskedasticity * Reference: R. Davidson and J.G. MacKinnon, "Convenient Specification * Tests for Logit and Probit Models", Journal of Econometrics, * Vol 25, 1984, pp. 241-262. SET NODOECHO NOOUTPUT GEN1 TYPE_="[MODEL]" * Check that the model type is valid FORMAT(' ERROR: Model must be either PROBIT or LOGIT') IF ((TYPE_.NE." LOGIT").AND.(TYPE_.NE." PROBIT")) PRINT / FORMAT IF ((TYPE_.NE." LOGIT").AND.(TYPE_.NE." PROBIT")) STOP * Model estimation [MODEL] [DEPVAR] [X] / INDEX=XBETA_ PREDICT=CDF_ IF (TYPE_.EQ." LOGIT") GENR PDF_=(1+EXP(-XBETA_))/((1+EXP(-XBETA_))**2) IF (TYPE_.EQ." PROBIT") DISTRIB XBETA_ / TYPE=NORMAL PDF=PDF_ COPY [Z] Z_ MATRIX Z_=Z_ GEN1 DF_=$COLS * Equation (26), p. 247. GENR ONE_=1 COPY [X] ONE_ X_ DO #=1,DF_ MATRIX ZZ_=Z_(0,#) GENR ZZ_=-XBETA_*ZZ_ MATRIX Z_(0,#)=ZZ_ ENDO MATRIX X_ = X_ | Z_ * Equations (16) and (17) , p. 245. GENR YAUX_=[DEPVAR]*SQRT((1-CDF_)/CDF_) + ([DEPVAR]-1)*SQRT(CDF_/(1-CDF_)) MATRIX R_=(PDF_/SQRT(CDF_*(1-CDF_)))*X_ * Artificial regression - Equation (18), p. 246. OLS YAUX_ R_ / NOCONSTANT * LM test statistic - explained sum of squares GEN1 LM2=$ZSSR * p-value DISTRIB LM2 / TYPE=CHI DF=DF_ GEN1 pvalue_=1-$CDF * Print results PRINT MODEL / NONAME FORMAT(' Test statistic for heteroskedasticity LM2 ='/F15.5) PRINT LM2 / NONAME FORMAT FORMAT(' chi-square degrees of freedom'/5X,F5.0) PRINT DF_ / NONAME FORMAT FORMAT(' p-value'/5X,F10.5) PRINT pvalue_ / NONAME FORMAT DELETE / ALL_ SET DOECHO OUTPUT PROCEND SET ECHO |
|_SAMPLE 1 95 |_READ (school.txt) PUB12 PUB34 PUB5 PRIV YEARS SCHOOL & | LOGINC PTCON YESVM UNIT 88 IS NOW ASSIGNED TO: school.txt 9 VARIABLES AND 95 OBSERVATIONS STARTING AT OBS 1 |_LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON LOGIT ANALYSIS DEPENDENT VARIABLE =YESVM CHOICES = 2 95. TOTAL OBSERVATIONS 59. OBSERVATIONS AT ONE 36. OBSERVATIONS AT ZERO 25 MAXIMUM ITERATIONS CONVERGENCE TOLERANCE =0.00100 LOG OF LIKELIHOOD WITH CONSTANT TERM ONLY = -63.037 BINOMIAL ESTIMATE = 0.6211 ITERATION 0 LOG OF LIKELIHOOD FUNCTION = -63.037 ITERATION 1 ESTIMATES 0.45375 0.92076 0.43035 -0.28835 -0.23416E-01 1.3330 1.6059 -1.7546 -3.7958 ITERATION 1 LOG OF LIKELIHOOD FUNCTION = -54.139 ITERATION 2 ESTIMATES 0.55298 1.0944 0.50979 -0.32984 -0.25855E-01 2.1655 2.0427 -2.2551 -4.7103 ITERATION 2 LOG OF LIKELIHOOD FUNCTION = -53.370 ITERATION 3 ESTIMATES 0.58166 1.1250 0.52500 -0.33987 -0.26178E-01 2.5635 2.1706 -2.3799 -5.1361 ITERATION 3 LOG OF LIKELIHOOD FUNCTION = -53.304 ITERATION 4 ESTIMATES 0.58362 1.1261 0.52605 -0.34139 -0.26129E-01 2.6239 2.1869 -2.3942 -5.2003 ITERATION 4 LOG OF LIKELIHOOD FUNCTION = -53.303 ITERATION 5 ESTIMATES 0.58364 1.1261 0.52606 -0.34142 -0.26127E-01 2.6250 2.1872 -2.3945 -5.2014 ASYMPTOTIC WEIGHTED VARIABLE ESTIMATED STANDARD T-RATIO ELASTICITY AGGREGATE NAME COEFFICIENT ERROR AT MEANS ELASTICITY PUB12 0.58364 0.68778 0.84858 0.93986E-01 0.91051E-01 PUB34 1.1261 0.76820 1.4659 0.11827 0.96460E-01 PUB5 0.52606 1.2693 0.41445 0.73664E-02 0.69375E-02 PRIV -0.34142 0.78299 -0.43605 -0.11952E-01 -0.12037E-01 YEARS -0.26127E-01 0.26934E-01 -0.97006 -0.73996E-01 -0.68592E-01 SCHOOL 2.6250 1.4101 1.8616 0.10108 0.28999E-01 LOGINC 2.1872 0.78781 2.7763 7.2529 6.7561 PTCON -2.3945 1.0813 -2.2145 -5.5262 -5.1745 CONSTANT -5.2014 7.5503 -0.68890 -1.7298 -1.6137 SCALE FACTOR = 0.22197 VARIABLE MARGINAL ----- PROBABILITIES FOR A TYPICAL CASE ----- NAME EFFECT CASE X=0 X=1 MARGINAL VALUES EFFECT PUB12 0.12955 0.0000 0.44231 0.58706 0.14476 PUB34 0.24996 0.0000 0.44231 0.70978 0.26747 PUB5 0.11677 0.0000 0.44231 0.57304 0.13073 PRIV -0.75785E-01 0.0000 0.44231 0.36049 -0.81814E-01 YEARS -0.57995E-02 8.5158 SCHOOL 0.58267 0.0000 0.44231 0.91631 0.47400 LOGINC 0.48548 9.9711 PTCON -0.53150 6.9395 LOG-LIKELIHOOD FUNCTION = -53.303 LOG-LIKELIHOOD(0) = -63.037 LIKELIHOOD RATIO TEST = 19.4681 WITH 8 D.F. P-VALUE= 0.01255 ESTRELLA R-SQUARE 0.19956 MADDALA R-SQUARE 0.18529 CRAGG-UHLER R-SQUARE 0.25218 MCFADDEN R-SQUARE 0.15442 ADJUSTED FOR DEGREES OF FREEDOM 0.75759E-01 APPROXIMATELY F-DISTRIBUTED 0.20544 WITH 8 AND 9 D.F. CHOW R-SQUARE 0.17197 PREDICTION SUCCESS TABLE ACTUAL 0 1 0 18. 7. PREDICTED 1 18. 52. NUMBER OF RIGHT PREDICTIONS = 70.0 PERCENTAGE OF RIGHT PREDICTIONS = 0.73684 NAIVE MODEL PERCENTAGE OF RIGHT PREDICTIONS = 0.62105 EXPECTED OBSERVATIONS AT 0 = 36.0 OBSERVED = 36.0 EXPECTED OBSERVATIONS AT 1 = 59.0 OBSERVED = 59.0 SUM OF SQUARED "RESIDUALS" = 18.513 WEIGHTED SUM OF SQUARED "RESIDUALS" = 86.839 HENSHER-JOHNSON PREDICTION SUCCESS TABLE OBSERVED OBSERVED PREDICTED CHOICE COUNT SHARE ACTUAL 0 1 0 17.591 18.409 36.000 0.379 1 18.409 40.591 59.000 0.621 PREDICTED COUNT 36.000 59.000 95.000 1.000 PREDICTED SHARE 0.379 0.621 1.000 PROP. SUCCESSFUL 0.489 0.688 0.612 SUCCESS INDEX 0.110 0.067 0.083 PROPORTIONAL ERROR 0.000 0.000 NORMALIZED SUCCESS INDEX 0.177 |_* Test for heteroskedasticity |_FILE PROC TESTHET UNIT 82 IS NOW ASSIGNED TO: TESTHET |_MODEL: LOGIT |_* Dependent variable |_DEPVAR: YESVM |_* List of explanatory variables (a constant term is assumed) |_X: PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON |_* List of variables in the error variance equation |_* Include all the explanatory variables in the model. |_Z: PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON |_* Get the LM test statistic for heteroskedasticity |_EXEC TESTHET _PROC TESTHET _ SET NODOECHO NOOUTPUT LOGIT Test statistic for heteroskedasticity LM2 = 5.72363 chi-square degrees of freedom 8. p-value 0.67816 _ PROCEND |_* Now assume a different form for the heteroskedasticity. |_* Test that the error variance is a function of the SCHOOL variable. |_Z: SCHOOL |_EXEC TESTHET _PROC TESTHET _ SET NODOECHO NOOUTPUT LOGIT Test statistic for heteroskedasticity LM2 = 1.96123 chi-square degrees of freedom 1. p-value 0.16138 _ PROCEND |_STOP