SHAZAM Logit Prediction

### Logit Model - Predicting Probabilities

It may be interesting to tabulate response probabilities for various levels of the explanatory variables.

#### Example 1

For the school budget voting model, consider predicting the probability of a yes vote for a school teacher and a non-school teacher, both with 1 or 2 children in public school, \$21,000 income, 8 year of residency and property taxes of \$1,000.

The SHAZAM commands below use the `FC` command to predict probabilities.

 ```SAMPLE 1 95 READ (school.txt) PUB12 PUB34 PUB5 PRIV YEARS SCHOOL & LOGINC PTCON YESVM LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA * Set the characteristics for an individual for all * explanatory variables in the logit regression. SAMPLE 1 1 GENR YESVM=0 GENR PUB12=1 GENR PUB34=0 GENR PUB5=0 GENR PRIV=0 GENR YEARS=8 GENR LOGINC=LOG(21000) GENR PTCON=LOG(1000) * NOT a school teacher. GENR SCHOOL=0 * Predict the probability of voting yes. FC YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA & MODEL=LOGIT PREDICT=P * Print the probability PRINT P * Now predict the probability of voting yes for an individual that * is a school teacher, but all other characteristics the same. GENR SCHOOL=1 FC YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA & MODEL=LOGIT PREDICT=P PRINT P STOP ```

In the above commands, the `COEF=` option is used on the `LOGIT` command to save the estimated coefficients in the variable `BETA`. Following the logit estimation the `SAMPLE 1 1` command specifies that one individual is to be analyzed in the subsequent commands. A set of `GENR` commands is used to define the individual characterisics for every explanatory variable in the logit estimation. Note that the original first observation in the voting data set will now be replaced by this "new" individual. Therefore, no further analysis, that relies on the original data set, can be implemented.

The `FC` command obtains the predicted probability of voting yes. Following this command, more `GENR` commands can be used to define the characteristics of another individual. An additional `FC` command will then calculate the probability of a yes vote for the new individual.

The SHAZAM output can be viewed.

The results show that the probability of a yes vote for a school teacher is 95% compared to 60% for a non-school teacher, given other identical characteristics of 1 or 2 children in public school, \$21,000 income, 8 year of residency and property taxes of \$1,000.

#### Example 2

For the school budget voting model, how does the probability of a yes vote vary for individuals with income at the lower quartile, the mean and the upper quartile, with "typical" characteristics on all other variables ? The calculations are implemented in the SHAZAM commands below.

 ```SAMPLE 1 95 READ (school.txt) PUB12 PUB34 PUB5 PRIV YEARS SCHOOL & LOGINC PTCON YESVM LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA * Prediction exercise GEN1 K=\$K GENR ONE=1 * Save the modes of the variables in XVAL. STAT PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON ONE / MODES=XVAL * Get the means of the "continuous variables". STAT YEARS LOGINC PTCON / MEAN=MU PMEDIAN GEN1 XVAL:5=MU:1 GEN1 XVAL:7=MU:2 GEN1 XVAL:8=MU:3 PRINT XVAL SET NODOECHO * Predict the probability of a yes vote for three income levels: * the lower quartile, the mean and the upper quartile - * with the dummy variables at their mode values and the * other variables at their mean values. * (The quartiles are listed with the PMEDIAN option on the * STAT command above). SAMPLE 1 3 DO #=1,K GENR XVAL#=XVAL:# ENDO * Income is XVAL7 -- set the three values. GEN1 XVAL7:1=9.77 GEN1 XVAL7:2=MU:2 GEN1 XVAL7:3=10.22 * The FC command is used for prediction. FC YESVM XVAL1-XVAL8 / COEF=BETA MODEL=LOGIT PREDICT=PHAT * Print the probabilites GENR INC=EXP(XVAL7)/1000 PRINT INC PHAT STOP ```

Th income variable is included in the model in log-transformed form. The predictions use the lower quartile, mean and upper quartile of the logarithm of income. Antilogs are used to express the income values in levels. Note that, for the mean, this gives the geometric mean.

The SHAZAM output can be viewed. The results are summarized in the table below.

Income (1973 US\$) Probability
of a yes vote
lower quartile   17,501 0.34
geometric mean 21,398 0.44
upper quartile 27,447 0.58
Note: The above probabilities are for a voter that is not a school teacher, with no children in public or private school, with 8.5 years residency and property taxes of \$1,032 (the geometric mean of the tax variable).

[SHAZAM Guide home]

#### Predicting Probabilities - Example 1 - SHAZAM output

```|_SAMPLE 1 95
|_READ (school.txt) PUB12 PUB34 PUB5 PRIV YEARS SCHOOL &
|  LOGINC PTCON YESVM
UNIT 88 IS NOW ASSIGNED TO: school.txt
9 VARIABLES AND       95 OBSERVATIONS STARTING AT OBS       1

|_LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA

LOGIT ANALYSIS     DEPENDENT VARIABLE =YESVM    CHOICES =  2
95. TOTAL OBSERVATIONS
59. OBSERVATIONS AT ONE
36. OBSERVATIONS AT ZERO
25 MAXIMUM ITERATIONS
CONVERGENCE TOLERANCE =0.00100

LOG OF LIKELIHOOD WITH CONSTANT TERM ONLY =    -63.037
BINOMIAL  ESTIMATE = 0.6211
ITERATION  0      LOG OF LIKELIHOOD FUNCTION =   -63.037

ITERATION  1 ESTIMATES
0.45375     0.92076     0.43035    -0.28835    -0.23416E-01  1.3330
1.6059     -1.7546     -3.7958
ITERATION  1      LOG OF LIKELIHOOD FUNCTION =   -54.139

ITERATION  2 ESTIMATES
0.55298      1.0944     0.50979    -0.32984    -0.25855E-01  2.1655
2.0427     -2.2551     -4.7103
ITERATION  2      LOG OF LIKELIHOOD FUNCTION =   -53.370

ITERATION  3 ESTIMATES
0.58166      1.1250     0.52500    -0.33987    -0.26178E-01  2.5635
2.1706     -2.3799     -5.1361
ITERATION  3      LOG OF LIKELIHOOD FUNCTION =   -53.304

ITERATION  4 ESTIMATES
0.58362      1.1261     0.52605    -0.34139    -0.26129E-01  2.6239
2.1869     -2.3942     -5.2003
ITERATION  4      LOG OF LIKELIHOOD FUNCTION =   -53.303

ITERATION  5 ESTIMATES
0.58364      1.1261     0.52606    -0.34142    -0.26127E-01  2.6250
2.1872     -2.3945     -5.2014

ASYMPTOTIC                         WEIGHTED
VARIABLE    ESTIMATED      STANDARD     T-RATIO    ELASTICITY      AGGREGATE
NAME     COEFFICIENT       ERROR                  AT MEANS      ELASTICITY
PUB12         0.58364      0.68778      0.84858      0.93986E-01  0.91051E-01
PUB34          1.1261      0.76820       1.4659      0.11827      0.96460E-01
PUB5          0.52606       1.2693      0.41445      0.73664E-02  0.69375E-02
PRIV         -0.34142      0.78299     -0.43605     -0.11952E-01 -0.12037E-01
YEARS        -0.26127E-01  0.26934E-01 -0.97006     -0.73996E-01 -0.68592E-01
SCHOOL         2.6250       1.4101       1.8616      0.10108      0.28999E-01
LOGINC         2.1872      0.78781       2.7763       7.2529       6.7561
PTCON         -2.3945       1.0813      -2.2145      -5.5262      -5.1745
CONSTANT      -5.2014       7.5503     -0.68890      -1.7298      -1.6137

SCALE FACTOR =   0.22197

VARIABLE      MARGINAL      ----- PROBABILITIES FOR A TYPICAL CASE -----
NAME         EFFECT        CASE         X=0          X=1        MARGINAL
VALUES                                 EFFECT
PUB12         0.12955       0.0000      0.44231      0.58706      0.14476
PUB34         0.24996       0.0000      0.44231      0.70978      0.26747
PUB5          0.11677       0.0000      0.44231      0.57304      0.13073
PRIV         -0.75785E-01   0.0000      0.44231      0.36049     -0.81814E-01
YEARS        -0.57995E-02   8.5158
SCHOOL        0.58267       0.0000      0.44231      0.91631      0.47400
PTCON        -0.53150       6.9395

LOG-LIKELIHOOD FUNCTION =  -53.303
LOG-LIKELIHOOD(0)  =   -63.037
LIKELIHOOD RATIO TEST  =    19.4681    WITH     8  D.F.   P-VALUE= 0.01255

ESTRELLA R-SQUARE           0.19956
CRAGG-UHLER R-SQUARE        0.25218
ADJUSTED FOR DEGREES OF FREEDOM        0.75759E-01
APPROXIMATELY F-DISTRIBUTED    0.20544      WITH        8  AND     9  D.F.
CHOW R-SQUARE               0.17197

PREDICTION SUCCESS TABLE
ACTUAL
0             1
0     18.            7.
PREDICTED 1     18.           52.

NUMBER OF RIGHT PREDICTIONS =        70.0
PERCENTAGE OF RIGHT PREDICTIONS =    0.73684
NAIVE MODEL PERCENTAGE OF RIGHT PREDICTIONS =    0.62105

EXPECTED OBSERVATIONS AT 0  =         36.0   OBSERVED =     36.0
EXPECTED OBSERVATIONS AT 1  =         59.0   OBSERVED =     59.0
SUM OF SQUARED "RESIDUALS" =           18.513
WEIGHTED SUM OF SQUARED "RESIDUALS" =     86.839

HENSHER-JOHNSON PREDICTION SUCCESS TABLE
OBSERVED    OBSERVED
PREDICTED  CHOICE        COUNT       SHARE
ACTUAL           0          1
0           17.591     18.409     36.000      0.379
1           18.409     40.591     59.000      0.621

PREDICTED COUNT        36.000     59.000     95.000      1.000
PREDICTED SHARE         0.379      0.621      1.000
PROP. SUCCESSFUL        0.489      0.688      0.612
SUCCESS INDEX           0.110      0.067      0.083
PROPORTIONAL ERROR      0.000      0.000
NORMALIZED SUCCESS INDEX                      0.177

|_* Set the characteristics for an individual for all
|_* explanatory variables in the logit regression.
|_SAMPLE 1 1
|_GENR YESVM=0
|_GENR PUB12=1
|_GENR PUB34=0
|_GENR PUB5=0
|_GENR PRIV=0
|_GENR YEARS=8
|_GENR PTCON=LOG(1000)
|_*  NOT a school teacher.
|_GENR SCHOOL=0

|_* Predict the probability of voting yes.
|_FC YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA &
|    MODEL=LOGIT PREDICT=P

DEPENDENT VARIABLE = YESVM            1 OBSERVATIONS
REGRESSION COEFFICIENTS
0.583639078557       1.12611043844      0.526057826339     -0.341421819258
-0.261274995333E-01   2.62502265170       2.18718615952      -2.39447685851
-5.20142641173
MEAN ERROR = -0.59874
SUM-SQUARED ERRORS =  0.35849
MEAN SQUARE ERROR =  0.35849
MEAN ABSOLUTE ERROR=  0.59874
ROOT MEAN SQUARE ERROR =  0.59874
MEAN SQUARED PERCENTAGE ERROR=   0.0000
THEIL INEQUALITY COEFFICIENT U = 0.000
DECOMPOSITION
PROPORTION DUE TO BIAS =   1.0000
PROPORTION DUE TO VARIANCE =   0.0000
PROPORTION DUE TO COVARIANCE =   0.0000
DECOMPOSITION
PROPORTION DUE TO BIAS =   1.0000
PROPORTION DUE TO REGRESSION =   0.0000
PROPORTION DUE TO DISTURBANCE =   0.0000
|_* Print the probability
|_PRINT P
P
0.5987397

|_* Now predict the probability of voting yes for an individual that
|_* is a school teacher, but all other characteristics the same.
|_GENR SCHOOL=1
|_FC YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA &
|    MODEL=LOGIT PREDICT=P

DEPENDENT VARIABLE = YESVM            1 OBSERVATIONS
REGRESSION COEFFICIENTS
0.583639078557       1.12611043844      0.526057826339     -0.341421819258
-0.261274995333E-01   2.62502265170       2.18718615952      -2.39447685851
-5.20142641173
MEAN ERROR = -0.95370
SUM-SQUARED ERRORS =  0.90955
MEAN SQUARE ERROR =  0.90955
MEAN ABSOLUTE ERROR=  0.95370
ROOT MEAN SQUARE ERROR =  0.95370
MEAN SQUARED PERCENTAGE ERROR=   0.0000
THEIL INEQUALITY COEFFICIENT U = 0.000
DECOMPOSITION
PROPORTION DUE TO BIAS =   1.0000
PROPORTION DUE TO VARIANCE =   0.0000
PROPORTION DUE TO COVARIANCE =   0.0000
DECOMPOSITION
PROPORTION DUE TO BIAS =   1.0000
PROPORTION DUE TO REGRESSION =   0.0000
PROPORTION DUE TO DISTURBANCE =   0.0000
|_PRINT P
P
0.9537014
|_STOP
```

#### Predicting Probabilities - Example 2 - SHAZAM output

```|_SAMPLE 1 95
|_READ (school.txt) PUB12 PUB34 PUB5 PRIV YEARS SCHOOL &
|  LOGINC PTCON YESVM
UNIT 88 IS NOW ASSIGNED TO: school.txt
9 VARIABLES AND       95 OBSERVATIONS STARTING AT OBS       1

|_LOGIT YESVM PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON / COEF=BETA

LOGIT ANALYSIS     DEPENDENT VARIABLE =YESVM    CHOICES =  2
95. TOTAL OBSERVATIONS
59. OBSERVATIONS AT ONE
36. OBSERVATIONS AT ZERO
25 MAXIMUM ITERATIONS
CONVERGENCE TOLERANCE =0.00100

LOG OF LIKELIHOOD WITH CONSTANT TERM ONLY =    -63.037
BINOMIAL  ESTIMATE = 0.6211
ITERATION  0      LOG OF LIKELIHOOD FUNCTION =   -63.037

ITERATION  1 ESTIMATES
0.45375     0.92076     0.43035    -0.28835    -0.23416E-01  1.3330
1.6059     -1.7546     -3.7958
ITERATION  1      LOG OF LIKELIHOOD FUNCTION =   -54.139

ITERATION  2 ESTIMATES
0.55298      1.0944     0.50979    -0.32984    -0.25855E-01  2.1655
2.0427     -2.2551     -4.7103
ITERATION  2      LOG OF LIKELIHOOD FUNCTION =   -53.370

ITERATION  3 ESTIMATES
0.58166      1.1250     0.52500    -0.33987    -0.26178E-01  2.5635
2.1706     -2.3799     -5.1361
ITERATION  3      LOG OF LIKELIHOOD FUNCTION =   -53.304

ITERATION  4 ESTIMATES
0.58362      1.1261     0.52605    -0.34139    -0.26129E-01  2.6239
2.1869     -2.3942     -5.2003
ITERATION  4      LOG OF LIKELIHOOD FUNCTION =   -53.303

ITERATION  5 ESTIMATES
0.58364      1.1261     0.52606    -0.34142    -0.26127E-01  2.6250
2.1872     -2.3945     -5.2014

ASYMPTOTIC                         WEIGHTED
VARIABLE    ESTIMATED      STANDARD     T-RATIO    ELASTICITY      AGGREGATE
NAME     COEFFICIENT       ERROR                  AT MEANS      ELASTICITY
PUB12         0.58364      0.68778      0.84858      0.93986E-01  0.91051E-01
PUB34          1.1261      0.76820       1.4659      0.11827      0.96460E-01
PUB5          0.52606       1.2693      0.41445      0.73664E-02  0.69375E-02
PRIV         -0.34142      0.78299     -0.43605     -0.11952E-01 -0.12037E-01
YEARS        -0.26127E-01  0.26934E-01 -0.97006     -0.73996E-01 -0.68592E-01
SCHOOL         2.6250       1.4101       1.8616      0.10108      0.28999E-01
LOGINC         2.1872      0.78781       2.7763       7.2529       6.7561
PTCON         -2.3945       1.0813      -2.2145      -5.5262      -5.1745
CONSTANT      -5.2014       7.5503     -0.68890      -1.7298      -1.6137

SCALE FACTOR =   0.22197

VARIABLE      MARGINAL      ----- PROBABILITIES FOR A TYPICAL CASE -----
NAME         EFFECT        CASE         X=0          X=1        MARGINAL
VALUES                                 EFFECT
PUB12         0.12955       0.0000      0.44231      0.58706      0.14476
PUB34         0.24996       0.0000      0.44231      0.70978      0.26747
PUB5          0.11677       0.0000      0.44231      0.57304      0.13073
PRIV         -0.75785E-01   0.0000      0.44231      0.36049     -0.81814E-01
YEARS        -0.57995E-02   8.5158
SCHOOL        0.58267       0.0000      0.44231      0.91631      0.47400
PTCON        -0.53150       6.9395

LOG-LIKELIHOOD FUNCTION =  -53.303
LOG-LIKELIHOOD(0)  =   -63.037
LIKELIHOOD RATIO TEST  =    19.4681    WITH     8  D.F.   P-VALUE= 0.01255

ESTRELLA R-SQUARE           0.19956
CRAGG-UHLER R-SQUARE        0.25218
ADJUSTED FOR DEGREES OF FREEDOM        0.75759E-01
APPROXIMATELY F-DISTRIBUTED    0.20544      WITH        8  AND     9  D.F.
CHOW R-SQUARE               0.17197

PREDICTION SUCCESS TABLE
ACTUAL
0             1
0     18.            7.
PREDICTED 1     18.           52.

NUMBER OF RIGHT PREDICTIONS =        70.0
PERCENTAGE OF RIGHT PREDICTIONS =    0.73684
NAIVE MODEL PERCENTAGE OF RIGHT PREDICTIONS =    0.62105

EXPECTED OBSERVATIONS AT 0  =         36.0   OBSERVED =     36.0
EXPECTED OBSERVATIONS AT 1  =         59.0   OBSERVED =     59.0
SUM OF SQUARED "RESIDUALS" =           18.513
WEIGHTED SUM OF SQUARED "RESIDUALS" =     86.839

HENSHER-JOHNSON PREDICTION SUCCESS TABLE
OBSERVED    OBSERVED
PREDICTED  CHOICE        COUNT       SHARE
ACTUAL           0          1
0           17.591     18.409     36.000      0.379
1           18.409     40.591     59.000      0.621

PREDICTED COUNT        36.000     59.000     95.000      1.000
PREDICTED SHARE         0.379      0.621      1.000
PROP. SUCCESSFUL        0.489      0.688      0.612
SUCCESS INDEX           0.110      0.067      0.083
PROPORTIONAL ERROR      0.000      0.000
NORMALIZED SUCCESS INDEX                      0.177

|_* Prediction exercise
|_GEN1 K=\$K
..NOTE..CURRENT VALUE OF \$K   =   9.0000
|_GENR ONE=1
|_*   Save the modes of the variables in XVAL.
|_STAT PUB12 PUB34 PUB5 PRIV YEARS SCHOOL LOGINC PTCON ONE / MODES=XVAL
NAME        N    MEAN        ST. DEV      VARIANCE     MINIMUM      MAXIMUM
PUB12        95  0.48421     0.50240     0.25241       0.0000       1.0000
PUB34        95  0.31579     0.46730     0.21837       0.0000       1.0000
PUB5         95  0.42105E-01 0.20189     0.40761E-01   0.0000       1.0000
PRIV         95  0.10526     0.30852     0.95185E-01   0.0000       1.0000
YEARS        95   8.5158      9.5158      90.550       1.0000       49.000
SCHOOL       95  0.11579     0.32167     0.10347       0.0000       1.0000
LOGINC       95   9.9711     0.41175     0.16954       8.2940       10.820
PTCON        95   6.9395     0.31692     0.10044       5.9915       7.4955
ONE          95   1.0000      0.0000      0.0000       1.0000       1.0000
|_*   Get the means of the "continuous variables".
|_STAT YEARS LOGINC PTCON / MEAN=MU PMEDIAN
NAME        N    MEAN        ST. DEV      VARIANCE     MINIMUM      MAXIMUM
YEARS        95   8.5158      9.5158      90.550       1.0000       49.000
LOGINC       95   9.9711     0.41175     0.16954       8.2940       10.820
PTCON        95   6.9395     0.31692     0.10044       5.9915       7.4955

VARIABLE = YEARS
MEDIAN =    5.0000
LOWER 25%=   3.0000     UPPER 25%=   10.000     INTERQUARTILE RANGE=  7.000
MODE =    3.0000     WITH       23 OBSERVATIONS

MEDIAN =    10.021
LOWER 25%=   9.7700     UPPER 25%=   10.222     INTERQUARTILE RANGE= 0.4520
MODE =    10.021     WITH       31 OBSERVATIONS

VARIABLE = PTCON
MEDIAN =    7.0475
LOWER 25%=   6.7452     UPPER 25%=   7.0475     INTERQUARTILE RANGE= 0.3023
MODE =    7.0475     WITH       46 OBSERVATIONS
|_GEN1 XVAL:5=MU:1
|_GEN1 XVAL:7=MU:2
|_GEN1 XVAL:8=MU:3
|_PRINT XVAL
XVAL
0.000000       0.000000       0.000000       0.000000       8.515789
0.000000       9.971069       6.939496       1.000000
|_SET NODOECHO
|_*   Predict the probability of a yes vote for three income levels:
|_*   the lower quartile, the mean and the upper quartile -
|_*   with the dummy variables at their mode values and the
|_*   other variables at their mean values.
|_*   (The quartiles are listed with the PMEDIAN option on the
|_*   STAT command above).
|_SAMPLE 1 3
|_DO #=1,K
|_ GENR XVAL#=XVAL:#
|_ENDO
****** EXECUTION BEGINNING FOR DO LOOP  # =       1
****** EXECUTION FINISHED FOR DO LOOP  #=       9
|_*   Income is XVAL7 -- set the three values.
|_GEN1 XVAL7:1=9.77
|_GEN1 XVAL7:2=MU:2
|_GEN1 XVAL7:3=10.22
|_*   The FC command is used for prediction.

|_FC YESVM XVAL1-XVAL8 / COEF=BETA MODEL=LOGIT PREDICT=PHAT

REQUIRED MEMORY IS PAR=       9 CURRENT PAR=    3000
DEPENDENT VARIABLE = YESVM            3 OBSERVATIONS
REGRESSION COEFFICIENTS
0.583639078557       1.12611043844      0.526057826339     -0.341421819258
-0.261274995333E-01   2.62502265170       2.18718615952      -2.39447685851
-5.20142641173
MEAN ERROR = -0.11933
SUM-SQUARED ERRORS =  0.96724
MEAN SQUARE ERROR =  0.32241
MEAN ABSOLUTE ERROR=  0.56057
ROOT MEAN SQUARE ERROR =  0.56781
MEAN SQUARED PERCENTAGE ERROR=   1460.2
THEIL INEQUALITY COEFFICIENT U = 0.727
DECOMPOSITION
PROPORTION DUE TO BIAS =  0.44165E-01
PROPORTION DUE TO VARIANCE =  0.43245
PROPORTION DUE TO COVARIANCE =  0.52338
DECOMPOSITION
PROPORTION DUE TO BIAS =  0.44165E-01
PROPORTION DUE TO REGRESSION =  0.73713
PROPORTION DUE TO DISTURBANCE =  0.21870
|_*   Print the probabilites
|_GENR INC=EXP(XVAL7)/1000
|_PRINT INC PHAT
INC            PHAT
17.50077      0.3381440
21.39836      0.4423082
27.44667      0.5775339
|_STOP
```

[SHAZAM Guide home]