Logit Model - Predicting ProbabilitiesIt may be interesting to tabulate response probabilities for various levels of the explanatory variables. Example 1For the school budget voting model, consider predicting the probability of a yes vote for a school teacher and a non-school teacher, both with 1 or 2 children in public school, $21,000 income, 8 year of residency and property taxes of $1,000. The SHAZAM commands below use the
In the above commands, the The The SHAZAM output can be viewed. The results show that the probability of a yes vote for a school teacher is 95% compared to 60% for a non-school teacher, given other identical characteristics of 1 or 2 children in public school, $21,000 income, 8 year of residency and property taxes of $1,000. Example 2For the school budget voting model, how does the probability of a yes vote vary for individuals with income at the lower quartile, the mean and the upper quartile, with "typical" characteristics on all other variables ? The calculations are implemented in the SHAZAM commands below.
Th income variable is included in the model in log-transformed form. The predictions use the lower quartile, mean and upper quartile of the logarithm of income. Antilogs are used to express the income values in levels. Note that, for the mean, this gives the geometric mean. The SHAZAM output can be viewed. The results are summarized in the table below.
Note: The above probabilities are for a voter that is not a school teacher, with no children in public or private school, with 8.5 years residency and property taxes of $1,032 (the geometric mean of the tax variable).
[SHAZAM Guide home]
Predicting Probabilities - Example 1 - SHAZAM output|_SAMPLE 1 95
|_READ (school.txt) PUB12 PUB34 PUB5 PRIV YEARS SCHOOL &
| LOGINC PTCON YESVM
UNIT 88 IS NOW ASSIGNED TO: school.txt
9 VARIABLES AND 95 OBSERVATIONS STARTING AT OBS 1
|_LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA
LOGIT ANALYSIS DEPENDENT VARIABLE =YESVM CHOICES = 2
95. TOTAL OBSERVATIONS
59. OBSERVATIONS AT ONE
36. OBSERVATIONS AT ZERO
25 MAXIMUM ITERATIONS
CONVERGENCE TOLERANCE =0.00100
LOG OF LIKELIHOOD WITH CONSTANT TERM ONLY = -63.037
BINOMIAL ESTIMATE = 0.6211
ITERATION 0 LOG OF LIKELIHOOD FUNCTION = -63.037
ITERATION 1 ESTIMATES
0.45375 0.92076 0.43035 -0.28835 -0.23416E-01 1.3330
1.6059 -1.7546 -3.7958
ITERATION 1 LOG OF LIKELIHOOD FUNCTION = -54.139
ITERATION 2 ESTIMATES
0.55298 1.0944 0.50979 -0.32984 -0.25855E-01 2.1655
2.0427 -2.2551 -4.7103
ITERATION 2 LOG OF LIKELIHOOD FUNCTION = -53.370
ITERATION 3 ESTIMATES
0.58166 1.1250 0.52500 -0.33987 -0.26178E-01 2.5635
2.1706 -2.3799 -5.1361
ITERATION 3 LOG OF LIKELIHOOD FUNCTION = -53.304
ITERATION 4 ESTIMATES
0.58362 1.1261 0.52605 -0.34139 -0.26129E-01 2.6239
2.1869 -2.3942 -5.2003
ITERATION 4 LOG OF LIKELIHOOD FUNCTION = -53.303
ITERATION 5 ESTIMATES
0.58364 1.1261 0.52606 -0.34142 -0.26127E-01 2.6250
2.1872 -2.3945 -5.2014
ASYMPTOTIC WEIGHTED
VARIABLE ESTIMATED STANDARD T-RATIO ELASTICITY AGGREGATE
NAME COEFFICIENT ERROR AT MEANS ELASTICITY
PUB12 0.58364 0.68778 0.84858 0.93986E-01 0.91051E-01
PUB34 1.1261 0.76820 1.4659 0.11827 0.96460E-01
PUB5 0.52606 1.2693 0.41445 0.73664E-02 0.69375E-02
PRIV -0.34142 0.78299 -0.43605 -0.11952E-01 -0.12037E-01
YEARS -0.26127E-01 0.26934E-01 -0.97006 -0.73996E-01 -0.68592E-01
SCHOOL 2.6250 1.4101 1.8616 0.10108 0.28999E-01
LOGINC 2.1872 0.78781 2.7763 7.2529 6.7561
PTCON -2.3945 1.0813 -2.2145 -5.5262 -5.1745
CONSTANT -5.2014 7.5503 -0.68890 -1.7298 -1.6137
SCALE FACTOR = 0.22197
VARIABLE MARGINAL ----- PROBABILITIES FOR A TYPICAL CASE -----
NAME EFFECT CASE X=0 X=1 MARGINAL
VALUES EFFECT
PUB12 0.12955 0.0000 0.44231 0.58706 0.14476
PUB34 0.24996 0.0000 0.44231 0.70978 0.26747
PUB5 0.11677 0.0000 0.44231 0.57304 0.13073
PRIV -0.75785E-01 0.0000 0.44231 0.36049 -0.81814E-01
YEARS -0.57995E-02 8.5158
SCHOOL 0.58267 0.0000 0.44231 0.91631 0.47400
LOGINC 0.48548 9.9711
PTCON -0.53150 6.9395
LOG-LIKELIHOOD FUNCTION = -53.303
LOG-LIKELIHOOD(0) = -63.037
LIKELIHOOD RATIO TEST = 19.4681 WITH 8 D.F. P-VALUE= 0.01255
ESTRELLA R-SQUARE 0.19956
MADDALA R-SQUARE 0.18529
CRAGG-UHLER R-SQUARE 0.25218
MCFADDEN R-SQUARE 0.15442
ADJUSTED FOR DEGREES OF FREEDOM 0.75759E-01
APPROXIMATELY F-DISTRIBUTED 0.20544 WITH 8 AND 9 D.F.
CHOW R-SQUARE 0.17197
PREDICTION SUCCESS TABLE
ACTUAL
0 1
0 18. 7.
PREDICTED 1 18. 52.
NUMBER OF RIGHT PREDICTIONS = 70.0
PERCENTAGE OF RIGHT PREDICTIONS = 0.73684
NAIVE MODEL PERCENTAGE OF RIGHT PREDICTIONS = 0.62105
EXPECTED OBSERVATIONS AT 0 = 36.0 OBSERVED = 36.0
EXPECTED OBSERVATIONS AT 1 = 59.0 OBSERVED = 59.0
SUM OF SQUARED "RESIDUALS" = 18.513
WEIGHTED SUM OF SQUARED "RESIDUALS" = 86.839
HENSHER-JOHNSON PREDICTION SUCCESS TABLE
OBSERVED OBSERVED
PREDICTED CHOICE COUNT SHARE
ACTUAL 0 1
0 17.591 18.409 36.000 0.379
1 18.409 40.591 59.000 0.621
PREDICTED COUNT 36.000 59.000 95.000 1.000
PREDICTED SHARE 0.379 0.621 1.000
PROP. SUCCESSFUL 0.489 0.688 0.612
SUCCESS INDEX 0.110 0.067 0.083
PROPORTIONAL ERROR 0.000 0.000
NORMALIZED SUCCESS INDEX 0.177
|_* Set the characteristics for an individual for all
|_* explanatory variables in the logit regression.
|_SAMPLE 1 1
|_GENR YESVM=0
|_GENR PUB12=1
|_GENR PUB34=0
|_GENR PUB5=0
|_GENR PRIV=0
|_GENR YEARS=8
|_GENR LOGINC=LOG(21000)
|_GENR PTCON=LOG(1000)
|_* NOT a school teacher.
|_GENR SCHOOL=0
|_* Predict the probability of voting yes.
|_FC YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA &
| MODEL=LOGIT PREDICT=P
DEPENDENT VARIABLE = YESVM 1 OBSERVATIONS
REGRESSION COEFFICIENTS
0.583639078557 1.12611043844 0.526057826339 -0.341421819258
-0.261274995333E-01 2.62502265170 2.18718615952 -2.39447685851
-5.20142641173
MEAN ERROR = -0.59874
SUM-SQUARED ERRORS = 0.35849
MEAN SQUARE ERROR = 0.35849
MEAN ABSOLUTE ERROR= 0.59874
ROOT MEAN SQUARE ERROR = 0.59874
MEAN SQUARED PERCENTAGE ERROR= 0.0000
THEIL INEQUALITY COEFFICIENT U = 0.000
DECOMPOSITION
PROPORTION DUE TO BIAS = 1.0000
PROPORTION DUE TO VARIANCE = 0.0000
PROPORTION DUE TO COVARIANCE = 0.0000
DECOMPOSITION
PROPORTION DUE TO BIAS = 1.0000
PROPORTION DUE TO REGRESSION = 0.0000
PROPORTION DUE TO DISTURBANCE = 0.0000
|_* Print the probability
|_PRINT P
P
0.5987397
|_* Now predict the probability of voting yes for an individual that
|_* is a school teacher, but all other characteristics the same.
|_GENR SCHOOL=1
|_FC YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA &
| MODEL=LOGIT PREDICT=P
DEPENDENT VARIABLE = YESVM 1 OBSERVATIONS
REGRESSION COEFFICIENTS
0.583639078557 1.12611043844 0.526057826339 -0.341421819258
-0.261274995333E-01 2.62502265170 2.18718615952 -2.39447685851
-5.20142641173
MEAN ERROR = -0.95370
SUM-SQUARED ERRORS = 0.90955
MEAN SQUARE ERROR = 0.90955
MEAN ABSOLUTE ERROR= 0.95370
ROOT MEAN SQUARE ERROR = 0.95370
MEAN SQUARED PERCENTAGE ERROR= 0.0000
THEIL INEQUALITY COEFFICIENT U = 0.000
DECOMPOSITION
PROPORTION DUE TO BIAS = 1.0000
PROPORTION DUE TO VARIANCE = 0.0000
PROPORTION DUE TO COVARIANCE = 0.0000
DECOMPOSITION
PROPORTION DUE TO BIAS = 1.0000
PROPORTION DUE TO REGRESSION = 0.0000
PROPORTION DUE TO DISTURBANCE = 0.0000
|_PRINT P
P
0.9537014
|_STOP
Predicting Probabilities - Example 2 - SHAZAM output|_SAMPLE 1 95
|_READ (school.txt) PUB12 PUB34 PUB5 PRIV YEARS SCHOOL &
| LOGINC PTCON YESVM
UNIT 88 IS NOW ASSIGNED TO: school.txt
9 VARIABLES AND 95 OBSERVATIONS STARTING AT OBS 1
|_LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA
LOGIT ANALYSIS DEPENDENT VARIABLE =YESVM CHOICES = 2
95. TOTAL OBSERVATIONS
59. OBSERVATIONS AT ONE
36. OBSERVATIONS AT ZERO
25 MAXIMUM ITERATIONS
CONVERGENCE TOLERANCE =0.00100
LOG OF LIKELIHOOD WITH CONSTANT TERM ONLY = -63.037
BINOMIAL ESTIMATE = 0.6211
ITERATION 0 LOG OF LIKELIHOOD FUNCTION = -63.037
ITERATION 1 ESTIMATES
0.45375 0.92076 0.43035 -0.28835 -0.23416E-01 1.3330
1.6059 -1.7546 -3.7958
ITERATION 1 LOG OF LIKELIHOOD FUNCTION = -54.139
ITERATION 2 ESTIMATES
0.55298 1.0944 0.50979 -0.32984 -0.25855E-01 2.1655
2.0427 -2.2551 -4.7103
ITERATION 2 LOG OF LIKELIHOOD FUNCTION = -53.370
ITERATION 3 ESTIMATES
0.58166 1.1250 0.52500 -0.33987 -0.26178E-01 2.5635
2.1706 -2.3799 -5.1361
ITERATION 3 LOG OF LIKELIHOOD FUNCTION = -53.304
ITERATION 4 ESTIMATES
0.58362 1.1261 0.52605 -0.34139 -0.26129E-01 2.6239
2.1869 -2.3942 -5.2003
ITERATION 4 LOG OF LIKELIHOOD FUNCTION = -53.303
ITERATION 5 ESTIMATES
0.58364 1.1261 0.52606 -0.34142 -0.26127E-01 2.6250
2.1872 -2.3945 -5.2014
ASYMPTOTIC WEIGHTED
VARIABLE ESTIMATED STANDARD T-RATIO ELASTICITY AGGREGATE
NAME COEFFICIENT ERROR AT MEANS ELASTICITY
PUB12 0.58364 0.68778 0.84858 0.93986E-01 0.91051E-01
PUB34 1.1261 0.76820 1.4659 0.11827 0.96460E-01
PUB5 0.52606 1.2693 0.41445 0.73664E-02 0.69375E-02
PRIV -0.34142 0.78299 -0.43605 -0.11952E-01 -0.12037E-01
YEARS -0.26127E-01 0.26934E-01 -0.97006 -0.73996E-01 -0.68592E-01
SCHOOL 2.6250 1.4101 1.8616 0.10108 0.28999E-01
LOGINC 2.1872 0.78781 2.7763 7.2529 6.7561
PTCON -2.3945 1.0813 -2.2145 -5.5262 -5.1745
CONSTANT -5.2014 7.5503 -0.68890 -1.7298 -1.6137
SCALE FACTOR = 0.22197
VARIABLE MARGINAL ----- PROBABILITIES FOR A TYPICAL CASE -----
NAME EFFECT CASE X=0 X=1 MARGINAL
VALUES EFFECT
PUB12 0.12955 0.0000 0.44231 0.58706 0.14476
PUB34 0.24996 0.0000 0.44231 0.70978 0.26747
PUB5 0.11677 0.0000 0.44231 0.57304 0.13073
PRIV -0.75785E-01 0.0000 0.44231 0.36049 -0.81814E-01
YEARS -0.57995E-02 8.5158
SCHOOL 0.58267 0.0000 0.44231 0.91631 0.47400
LOGINC 0.48548 9.9711
PTCON -0.53150 6.9395
LOG-LIKELIHOOD FUNCTION = -53.303
LOG-LIKELIHOOD(0) = -63.037
LIKELIHOOD RATIO TEST = 19.4681 WITH 8 D.F. P-VALUE= 0.01255
ESTRELLA R-SQUARE 0.19956
MADDALA R-SQUARE 0.18529
CRAGG-UHLER R-SQUARE 0.25218
MCFADDEN R-SQUARE 0.15442
ADJUSTED FOR DEGREES OF FREEDOM 0.75759E-01
APPROXIMATELY F-DISTRIBUTED 0.20544 WITH 8 AND 9 D.F.
CHOW R-SQUARE 0.17197
PREDICTION SUCCESS TABLE
ACTUAL
0 1
0 18. 7.
PREDICTED 1 18. 52.
NUMBER OF RIGHT PREDICTIONS = 70.0
PERCENTAGE OF RIGHT PREDICTIONS = 0.73684
NAIVE MODEL PERCENTAGE OF RIGHT PREDICTIONS = 0.62105
EXPECTED OBSERVATIONS AT 0 = 36.0 OBSERVED = 36.0
EXPECTED OBSERVATIONS AT 1 = 59.0 OBSERVED = 59.0
SUM OF SQUARED "RESIDUALS" = 18.513
WEIGHTED SUM OF SQUARED "RESIDUALS" = 86.839
HENSHER-JOHNSON PREDICTION SUCCESS TABLE
OBSERVED OBSERVED
PREDICTED CHOICE COUNT SHARE
ACTUAL 0 1
0 17.591 18.409 36.000 0.379
1 18.409 40.591 59.000 0.621
PREDICTED COUNT 36.000 59.000 95.000 1.000
PREDICTED SHARE 0.379 0.621 1.000
PROP. SUCCESSFUL 0.489 0.688 0.612
SUCCESS INDEX 0.110 0.067 0.083
PROPORTIONAL ERROR 0.000 0.000
NORMALIZED SUCCESS INDEX 0.177
|_* Prediction exercise
|_GEN1 K=$K
..NOTE..CURRENT VALUE OF $K = 9.0000
|_GENR ONE=1
|_* Save the modes of the variables in XVAL.
|_STAT PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON ONE / MODES=XVAL
NAME N MEAN ST. DEV VARIANCE MINIMUM MAXIMUM
PUB12 95 0.48421 0.50240 0.25241 0.0000 1.0000
PUB34 95 0.31579 0.46730 0.21837 0.0000 1.0000
PUB5 95 0.42105E-01 0.20189 0.40761E-01 0.0000 1.0000
PRIV 95 0.10526 0.30852 0.95185E-01 0.0000 1.0000
YEARS 95 8.5158 9.5158 90.550 1.0000 49.000
SCHOOL 95 0.11579 0.32167 0.10347 0.0000 1.0000
LOGINC 95 9.9711 0.41175 0.16954 8.2940 10.820
PTCON 95 6.9395 0.31692 0.10044 5.9915 7.4955
ONE 95 1.0000 0.0000 0.0000 1.0000 1.0000
|_* Get the means of the "continuous variables".
|_STAT YEARS LOGINC PTCON / MEAN=MU PMEDIAN
NAME N MEAN ST. DEV VARIANCE MINIMUM MAXIMUM
YEARS 95 8.5158 9.5158 90.550 1.0000 49.000
LOGINC 95 9.9711 0.41175 0.16954 8.2940 10.820
PTCON 95 6.9395 0.31692 0.10044 5.9915 7.4955
VARIABLE = YEARS
MEDIAN = 5.0000
LOWER 25%= 3.0000 UPPER 25%= 10.000 INTERQUARTILE RANGE= 7.000
MODE = 3.0000 WITH 23 OBSERVATIONS
VARIABLE = LOGINC
MEDIAN = 10.021
LOWER 25%= 9.7700 UPPER 25%= 10.222 INTERQUARTILE RANGE= 0.4520
MODE = 10.021 WITH 31 OBSERVATIONS
VARIABLE = PTCON
MEDIAN = 7.0475
LOWER 25%= 6.7452 UPPER 25%= 7.0475 INTERQUARTILE RANGE= 0.3023
MODE = 7.0475 WITH 46 OBSERVATIONS
|_GEN1 XVAL:5=MU:1
|_GEN1 XVAL:7=MU:2
|_GEN1 XVAL:8=MU:3
|_PRINT XVAL
XVAL
0.000000 0.000000 0.000000 0.000000 8.515789
0.000000 9.971069 6.939496 1.000000
|_SET NODOECHO
|_* Predict the probability of a yes vote for three income levels:
|_* the lower quartile, the mean and the upper quartile -
|_* with the dummy variables at their mode values and the
|_* other variables at their mean values.
|_* (The quartiles are listed with the PMEDIAN option on the
|_* STAT command above).
|_SAMPLE 1 3
|_DO #=1,K
|_ GENR XVAL#=XVAL:#
|_ENDO
****** EXECUTION BEGINNING FOR DO LOOP # = 1
****** EXECUTION FINISHED FOR DO LOOP #= 9
|_* Income is XVAL7 -- set the three values.
|_GEN1 XVAL7:1=9.77
|_GEN1 XVAL7:2=MU:2
|_GEN1 XVAL7:3=10.22
|_* The FC command is used for prediction.
|_FC YESVM XVAL1-XVAL8 / COEF=BETA MODEL=LOGIT PREDICT=PHAT
REQUIRED MEMORY IS PAR= 9 CURRENT PAR= 3000
DEPENDENT VARIABLE = YESVM 3 OBSERVATIONS
REGRESSION COEFFICIENTS
0.583639078557 1.12611043844 0.526057826339 -0.341421819258
-0.261274995333E-01 2.62502265170 2.18718615952 -2.39447685851
-5.20142641173
MEAN ERROR = -0.11933
SUM-SQUARED ERRORS = 0.96724
MEAN SQUARE ERROR = 0.32241
MEAN ABSOLUTE ERROR= 0.56057
ROOT MEAN SQUARE ERROR = 0.56781
MEAN SQUARED PERCENTAGE ERROR= 1460.2
THEIL INEQUALITY COEFFICIENT U = 0.727
DECOMPOSITION
PROPORTION DUE TO BIAS = 0.44165E-01
PROPORTION DUE TO VARIANCE = 0.43245
PROPORTION DUE TO COVARIANCE = 0.52338
DECOMPOSITION
PROPORTION DUE TO BIAS = 0.44165E-01
PROPORTION DUE TO REGRESSION = 0.73713
PROPORTION DUE TO DISTURBANCE = 0.21870
|_* Print the probabilites
|_GENR INC=EXP(XVAL7)/1000
|_PRINT INC PHAT
INC PHAT
17.50077 0.3381440
21.39836 0.4423082
27.44667 0.5775339
|_STOP
[SHAZAM Guide home]
| |||||||||||||||