Logit Model - Predicting ProbabilitiesIt may be interesting to tabulate response probabilities for various levels of the explanatory variables. Example 1For the school budget voting model, consider predicting the probability of a yes vote for a school teacher and a non-school teacher, both with 1 or 2 children in public school, $21,000 income, 8 year of residency and property taxes of $1,000. The SHAZAM commands below use the
In the above commands, the The The SHAZAM output can be viewed. The results show that the probability of a yes vote for a school teacher is 95% compared to 60% for a non-school teacher, given other identical characteristics of 1 or 2 children in public school, $21,000 income, 8 year of residency and property taxes of $1,000. Example 2For the school budget voting model, how does the probability of a yes vote vary for individuals with income at the lower quartile, the mean and the upper quartile, with "typical" characteristics on all other variables ? The calculations are implemented in the SHAZAM commands below.
Th income variable is included in the model in log-transformed form. The predictions use the lower quartile, mean and upper quartile of the logarithm of income. Antilogs are used to express the income values in levels. Note that, for the mean, this gives the geometric mean. The SHAZAM output can be viewed. The results are summarized in the table below.
Note: The above probabilities are for a voter that is not a school teacher, with no children in public or private school, with 8.5 years residency and property taxes of $1,032 (the geometric mean of the tax variable).
[SHAZAM Guide home]
Predicting Probabilities - Example 1 - SHAZAM output|_SAMPLE 1 95 |_READ (school.txt) PUB12 PUB34 PUB5 PRIV YEARS SCHOOL & | LOGINC PTCON YESVM UNIT 88 IS NOW ASSIGNED TO: school.txt 9 VARIABLES AND 95 OBSERVATIONS STARTING AT OBS 1 |_LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA LOGIT ANALYSIS DEPENDENT VARIABLE =YESVM CHOICES = 2 95. TOTAL OBSERVATIONS 59. OBSERVATIONS AT ONE 36. OBSERVATIONS AT ZERO 25 MAXIMUM ITERATIONS CONVERGENCE TOLERANCE =0.00100 LOG OF LIKELIHOOD WITH CONSTANT TERM ONLY = -63.037 BINOMIAL ESTIMATE = 0.6211 ITERATION 0 LOG OF LIKELIHOOD FUNCTION = -63.037 ITERATION 1 ESTIMATES 0.45375 0.92076 0.43035 -0.28835 -0.23416E-01 1.3330 1.6059 -1.7546 -3.7958 ITERATION 1 LOG OF LIKELIHOOD FUNCTION = -54.139 ITERATION 2 ESTIMATES 0.55298 1.0944 0.50979 -0.32984 -0.25855E-01 2.1655 2.0427 -2.2551 -4.7103 ITERATION 2 LOG OF LIKELIHOOD FUNCTION = -53.370 ITERATION 3 ESTIMATES 0.58166 1.1250 0.52500 -0.33987 -0.26178E-01 2.5635 2.1706 -2.3799 -5.1361 ITERATION 3 LOG OF LIKELIHOOD FUNCTION = -53.304 ITERATION 4 ESTIMATES 0.58362 1.1261 0.52605 -0.34139 -0.26129E-01 2.6239 2.1869 -2.3942 -5.2003 ITERATION 4 LOG OF LIKELIHOOD FUNCTION = -53.303 ITERATION 5 ESTIMATES 0.58364 1.1261 0.52606 -0.34142 -0.26127E-01 2.6250 2.1872 -2.3945 -5.2014 ASYMPTOTIC WEIGHTED VARIABLE ESTIMATED STANDARD T-RATIO ELASTICITY AGGREGATE NAME COEFFICIENT ERROR AT MEANS ELASTICITY PUB12 0.58364 0.68778 0.84858 0.93986E-01 0.91051E-01 PUB34 1.1261 0.76820 1.4659 0.11827 0.96460E-01 PUB5 0.52606 1.2693 0.41445 0.73664E-02 0.69375E-02 PRIV -0.34142 0.78299 -0.43605 -0.11952E-01 -0.12037E-01 YEARS -0.26127E-01 0.26934E-01 -0.97006 -0.73996E-01 -0.68592E-01 SCHOOL 2.6250 1.4101 1.8616 0.10108 0.28999E-01 LOGINC 2.1872 0.78781 2.7763 7.2529 6.7561 PTCON -2.3945 1.0813 -2.2145 -5.5262 -5.1745 CONSTANT -5.2014 7.5503 -0.68890 -1.7298 -1.6137 SCALE FACTOR = 0.22197 VARIABLE MARGINAL ----- PROBABILITIES FOR A TYPICAL CASE ----- NAME EFFECT CASE X=0 X=1 MARGINAL VALUES EFFECT PUB12 0.12955 0.0000 0.44231 0.58706 0.14476 PUB34 0.24996 0.0000 0.44231 0.70978 0.26747 PUB5 0.11677 0.0000 0.44231 0.57304 0.13073 PRIV -0.75785E-01 0.0000 0.44231 0.36049 -0.81814E-01 YEARS -0.57995E-02 8.5158 SCHOOL 0.58267 0.0000 0.44231 0.91631 0.47400 LOGINC 0.48548 9.9711 PTCON -0.53150 6.9395 LOG-LIKELIHOOD FUNCTION = -53.303 LOG-LIKELIHOOD(0) = -63.037 LIKELIHOOD RATIO TEST = 19.4681 WITH 8 D.F. P-VALUE= 0.01255 ESTRELLA R-SQUARE 0.19956 MADDALA R-SQUARE 0.18529 CRAGG-UHLER R-SQUARE 0.25218 MCFADDEN R-SQUARE 0.15442 ADJUSTED FOR DEGREES OF FREEDOM 0.75759E-01 APPROXIMATELY F-DISTRIBUTED 0.20544 WITH 8 AND 9 D.F. CHOW R-SQUARE 0.17197 PREDICTION SUCCESS TABLE ACTUAL 0 1 0 18. 7. PREDICTED 1 18. 52. NUMBER OF RIGHT PREDICTIONS = 70.0 PERCENTAGE OF RIGHT PREDICTIONS = 0.73684 NAIVE MODEL PERCENTAGE OF RIGHT PREDICTIONS = 0.62105 EXPECTED OBSERVATIONS AT 0 = 36.0 OBSERVED = 36.0 EXPECTED OBSERVATIONS AT 1 = 59.0 OBSERVED = 59.0 SUM OF SQUARED "RESIDUALS" = 18.513 WEIGHTED SUM OF SQUARED "RESIDUALS" = 86.839 HENSHER-JOHNSON PREDICTION SUCCESS TABLE OBSERVED OBSERVED PREDICTED CHOICE COUNT SHARE ACTUAL 0 1 0 17.591 18.409 36.000 0.379 1 18.409 40.591 59.000 0.621 PREDICTED COUNT 36.000 59.000 95.000 1.000 PREDICTED SHARE 0.379 0.621 1.000 PROP. SUCCESSFUL 0.489 0.688 0.612 SUCCESS INDEX 0.110 0.067 0.083 PROPORTIONAL ERROR 0.000 0.000 NORMALIZED SUCCESS INDEX 0.177 |_* Set the characteristics for an individual for all |_* explanatory variables in the logit regression. |_SAMPLE 1 1 |_GENR YESVM=0 |_GENR PUB12=1 |_GENR PUB34=0 |_GENR PUB5=0 |_GENR PRIV=0 |_GENR YEARS=8 |_GENR LOGINC=LOG(21000) |_GENR PTCON=LOG(1000) |_* NOT a school teacher. |_GENR SCHOOL=0 |_* Predict the probability of voting yes. |_FC YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA & | MODEL=LOGIT PREDICT=P DEPENDENT VARIABLE = YESVM 1 OBSERVATIONS REGRESSION COEFFICIENTS 0.583639078557 1.12611043844 0.526057826339 -0.341421819258 -0.261274995333E-01 2.62502265170 2.18718615952 -2.39447685851 -5.20142641173 MEAN ERROR = -0.59874 SUM-SQUARED ERRORS = 0.35849 MEAN SQUARE ERROR = 0.35849 MEAN ABSOLUTE ERROR= 0.59874 ROOT MEAN SQUARE ERROR = 0.59874 MEAN SQUARED PERCENTAGE ERROR= 0.0000 THEIL INEQUALITY COEFFICIENT U = 0.000 DECOMPOSITION PROPORTION DUE TO BIAS = 1.0000 PROPORTION DUE TO VARIANCE = 0.0000 PROPORTION DUE TO COVARIANCE = 0.0000 DECOMPOSITION PROPORTION DUE TO BIAS = 1.0000 PROPORTION DUE TO REGRESSION = 0.0000 PROPORTION DUE TO DISTURBANCE = 0.0000 |_* Print the probability |_PRINT P P 0.5987397 |_* Now predict the probability of voting yes for an individual that |_* is a school teacher, but all other characteristics the same. |_GENR SCHOOL=1 |_FC YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA & | MODEL=LOGIT PREDICT=P DEPENDENT VARIABLE = YESVM 1 OBSERVATIONS REGRESSION COEFFICIENTS 0.583639078557 1.12611043844 0.526057826339 -0.341421819258 -0.261274995333E-01 2.62502265170 2.18718615952 -2.39447685851 -5.20142641173 MEAN ERROR = -0.95370 SUM-SQUARED ERRORS = 0.90955 MEAN SQUARE ERROR = 0.90955 MEAN ABSOLUTE ERROR= 0.95370 ROOT MEAN SQUARE ERROR = 0.95370 MEAN SQUARED PERCENTAGE ERROR= 0.0000 THEIL INEQUALITY COEFFICIENT U = 0.000 DECOMPOSITION PROPORTION DUE TO BIAS = 1.0000 PROPORTION DUE TO VARIANCE = 0.0000 PROPORTION DUE TO COVARIANCE = 0.0000 DECOMPOSITION PROPORTION DUE TO BIAS = 1.0000 PROPORTION DUE TO REGRESSION = 0.0000 PROPORTION DUE TO DISTURBANCE = 0.0000 |_PRINT P P 0.9537014 |_STOP
Predicting Probabilities - Example 2 - SHAZAM output|_SAMPLE 1 95 |_READ (school.txt) PUB12 PUB34 PUB5 PRIV YEARS SCHOOL & | LOGINC PTCON YESVM UNIT 88 IS NOW ASSIGNED TO: school.txt 9 VARIABLES AND 95 OBSERVATIONS STARTING AT OBS 1 |_LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA LOGIT ANALYSIS DEPENDENT VARIABLE =YESVM CHOICES = 2 95. TOTAL OBSERVATIONS 59. OBSERVATIONS AT ONE 36. OBSERVATIONS AT ZERO 25 MAXIMUM ITERATIONS CONVERGENCE TOLERANCE =0.00100 LOG OF LIKELIHOOD WITH CONSTANT TERM ONLY = -63.037 BINOMIAL ESTIMATE = 0.6211 ITERATION 0 LOG OF LIKELIHOOD FUNCTION = -63.037 ITERATION 1 ESTIMATES 0.45375 0.92076 0.43035 -0.28835 -0.23416E-01 1.3330 1.6059 -1.7546 -3.7958 ITERATION 1 LOG OF LIKELIHOOD FUNCTION = -54.139 ITERATION 2 ESTIMATES 0.55298 1.0944 0.50979 -0.32984 -0.25855E-01 2.1655 2.0427 -2.2551 -4.7103 ITERATION 2 LOG OF LIKELIHOOD FUNCTION = -53.370 ITERATION 3 ESTIMATES 0.58166 1.1250 0.52500 -0.33987 -0.26178E-01 2.5635 2.1706 -2.3799 -5.1361 ITERATION 3 LOG OF LIKELIHOOD FUNCTION = -53.304 ITERATION 4 ESTIMATES 0.58362 1.1261 0.52605 -0.34139 -0.26129E-01 2.6239 2.1869 -2.3942 -5.2003 ITERATION 4 LOG OF LIKELIHOOD FUNCTION = -53.303 ITERATION 5 ESTIMATES 0.58364 1.1261 0.52606 -0.34142 -0.26127E-01 2.6250 2.1872 -2.3945 -5.2014 ASYMPTOTIC WEIGHTED VARIABLE ESTIMATED STANDARD T-RATIO ELASTICITY AGGREGATE NAME COEFFICIENT ERROR AT MEANS ELASTICITY PUB12 0.58364 0.68778 0.84858 0.93986E-01 0.91051E-01 PUB34 1.1261 0.76820 1.4659 0.11827 0.96460E-01 PUB5 0.52606 1.2693 0.41445 0.73664E-02 0.69375E-02 PRIV -0.34142 0.78299 -0.43605 -0.11952E-01 -0.12037E-01 YEARS -0.26127E-01 0.26934E-01 -0.97006 -0.73996E-01 -0.68592E-01 SCHOOL 2.6250 1.4101 1.8616 0.10108 0.28999E-01 LOGINC 2.1872 0.78781 2.7763 7.2529 6.7561 PTCON -2.3945 1.0813 -2.2145 -5.5262 -5.1745 CONSTANT -5.2014 7.5503 -0.68890 -1.7298 -1.6137 SCALE FACTOR = 0.22197 VARIABLE MARGINAL ----- PROBABILITIES FOR A TYPICAL CASE ----- NAME EFFECT CASE X=0 X=1 MARGINAL VALUES EFFECT PUB12 0.12955 0.0000 0.44231 0.58706 0.14476 PUB34 0.24996 0.0000 0.44231 0.70978 0.26747 PUB5 0.11677 0.0000 0.44231 0.57304 0.13073 PRIV -0.75785E-01 0.0000 0.44231 0.36049 -0.81814E-01 YEARS -0.57995E-02 8.5158 SCHOOL 0.58267 0.0000 0.44231 0.91631 0.47400 LOGINC 0.48548 9.9711 PTCON -0.53150 6.9395 LOG-LIKELIHOOD FUNCTION = -53.303 LOG-LIKELIHOOD(0) = -63.037 LIKELIHOOD RATIO TEST = 19.4681 WITH 8 D.F. P-VALUE= 0.01255 ESTRELLA R-SQUARE 0.19956 MADDALA R-SQUARE 0.18529 CRAGG-UHLER R-SQUARE 0.25218 MCFADDEN R-SQUARE 0.15442 ADJUSTED FOR DEGREES OF FREEDOM 0.75759E-01 APPROXIMATELY F-DISTRIBUTED 0.20544 WITH 8 AND 9 D.F. CHOW R-SQUARE 0.17197 PREDICTION SUCCESS TABLE ACTUAL 0 1 0 18. 7. PREDICTED 1 18. 52. NUMBER OF RIGHT PREDICTIONS = 70.0 PERCENTAGE OF RIGHT PREDICTIONS = 0.73684 NAIVE MODEL PERCENTAGE OF RIGHT PREDICTIONS = 0.62105 EXPECTED OBSERVATIONS AT 0 = 36.0 OBSERVED = 36.0 EXPECTED OBSERVATIONS AT 1 = 59.0 OBSERVED = 59.0 SUM OF SQUARED "RESIDUALS" = 18.513 WEIGHTED SUM OF SQUARED "RESIDUALS" = 86.839 HENSHER-JOHNSON PREDICTION SUCCESS TABLE OBSERVED OBSERVED PREDICTED CHOICE COUNT SHARE ACTUAL 0 1 0 17.591 18.409 36.000 0.379 1 18.409 40.591 59.000 0.621 PREDICTED COUNT 36.000 59.000 95.000 1.000 PREDICTED SHARE 0.379 0.621 1.000 PROP. SUCCESSFUL 0.489 0.688 0.612 SUCCESS INDEX 0.110 0.067 0.083 PROPORTIONAL ERROR 0.000 0.000 NORMALIZED SUCCESS INDEX 0.177 |_* Prediction exercise |_GEN1 K=$K ..NOTE..CURRENT VALUE OF $K = 9.0000 |_GENR ONE=1 |_* Save the modes of the variables in XVAL. |_STAT PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON ONE / MODES=XVAL NAME N MEAN ST. DEV VARIANCE MINIMUM MAXIMUM PUB12 95 0.48421 0.50240 0.25241 0.0000 1.0000 PUB34 95 0.31579 0.46730 0.21837 0.0000 1.0000 PUB5 95 0.42105E-01 0.20189 0.40761E-01 0.0000 1.0000 PRIV 95 0.10526 0.30852 0.95185E-01 0.0000 1.0000 YEARS 95 8.5158 9.5158 90.550 1.0000 49.000 SCHOOL 95 0.11579 0.32167 0.10347 0.0000 1.0000 LOGINC 95 9.9711 0.41175 0.16954 8.2940 10.820 PTCON 95 6.9395 0.31692 0.10044 5.9915 7.4955 ONE 95 1.0000 0.0000 0.0000 1.0000 1.0000 |_* Get the means of the "continuous variables". |_STAT YEARS LOGINC PTCON / MEAN=MU PMEDIAN NAME N MEAN ST. DEV VARIANCE MINIMUM MAXIMUM YEARS 95 8.5158 9.5158 90.550 1.0000 49.000 LOGINC 95 9.9711 0.41175 0.16954 8.2940 10.820 PTCON 95 6.9395 0.31692 0.10044 5.9915 7.4955 VARIABLE = YEARS MEDIAN = 5.0000 LOWER 25%= 3.0000 UPPER 25%= 10.000 INTERQUARTILE RANGE= 7.000 MODE = 3.0000 WITH 23 OBSERVATIONS VARIABLE = LOGINC MEDIAN = 10.021 LOWER 25%= 9.7700 UPPER 25%= 10.222 INTERQUARTILE RANGE= 0.4520 MODE = 10.021 WITH 31 OBSERVATIONS VARIABLE = PTCON MEDIAN = 7.0475 LOWER 25%= 6.7452 UPPER 25%= 7.0475 INTERQUARTILE RANGE= 0.3023 MODE = 7.0475 WITH 46 OBSERVATIONS |_GEN1 XVAL:5=MU:1 |_GEN1 XVAL:7=MU:2 |_GEN1 XVAL:8=MU:3 |_PRINT XVAL XVAL 0.000000 0.000000 0.000000 0.000000 8.515789 0.000000 9.971069 6.939496 1.000000 |_SET NODOECHO |_* Predict the probability of a yes vote for three income levels: |_* the lower quartile, the mean and the upper quartile - |_* with the dummy variables at their mode values and the |_* other variables at their mean values. |_* (The quartiles are listed with the PMEDIAN option on the |_* STAT command above). |_SAMPLE 1 3 |_DO #=1,K |_ GENR XVAL#=XVAL:# |_ENDO ****** EXECUTION BEGINNING FOR DO LOOP # = 1 ****** EXECUTION FINISHED FOR DO LOOP #= 9 |_* Income is XVAL7 -- set the three values. |_GEN1 XVAL7:1=9.77 |_GEN1 XVAL7:2=MU:2 |_GEN1 XVAL7:3=10.22 |_* The FC command is used for prediction. |_FC YESVM XVAL1-XVAL8 / COEF=BETA MODEL=LOGIT PREDICT=PHAT REQUIRED MEMORY IS PAR= 9 CURRENT PAR= 3000 DEPENDENT VARIABLE = YESVM 3 OBSERVATIONS REGRESSION COEFFICIENTS 0.583639078557 1.12611043844 0.526057826339 -0.341421819258 -0.261274995333E-01 2.62502265170 2.18718615952 -2.39447685851 -5.20142641173 MEAN ERROR = -0.11933 SUM-SQUARED ERRORS = 0.96724 MEAN SQUARE ERROR = 0.32241 MEAN ABSOLUTE ERROR= 0.56057 ROOT MEAN SQUARE ERROR = 0.56781 MEAN SQUARED PERCENTAGE ERROR= 1460.2 THEIL INEQUALITY COEFFICIENT U = 0.727 DECOMPOSITION PROPORTION DUE TO BIAS = 0.44165E-01 PROPORTION DUE TO VARIANCE = 0.43245 PROPORTION DUE TO COVARIANCE = 0.52338 DECOMPOSITION PROPORTION DUE TO BIAS = 0.44165E-01 PROPORTION DUE TO REGRESSION = 0.73713 PROPORTION DUE TO DISTURBANCE = 0.21870 |_* Print the probabilites |_GENR INC=EXP(XVAL7)/1000 |_PRINT INC PHAT INC PHAT 17.50077 0.3381440 21.39836 0.4423082 27.44667 0.5775339 |_STOP
[SHAZAM Guide home] |