PredictionFollowing an estimation command, the
Some useful options with the
More options and features of the ExampleThis example analyzes voting patterns in the state of Florida for the presidential election held on November 7, 2000. The 2000 presidential race emerged as a close contest between Al Gore and George W. Bush. On election day, the results revealed that the Florida outcome would determine the next President of the United States. However, the Florida election results showed a difference of only a few hundred votes between Gore and Bush. A final decision was delayed until various recounts and counts of absentee ballots could be completed. An additional controversy was that the "butterfly" ballot design in the county of Palm Beach may have confused voters. There was speculation that Palm Beach voters that intended to vote for Gore may have mistakenly given their vote to Buchanan. Adams and Fastnow present a statistical analysis for detecting the possibility of voting irregularities in Palm Beach. The reference is: Greg D. Adams and Chris Fastnow, "A Note on the Voting Irregularities in Palm Beach, Florida" (downloaded from the internet). A data set contains the Florida county-level returns for the 2000 presidential election. It is proposed that an estimate of the number of votes for Buchanan in Palm Beach county can be predicted from a linear regression equation that relates Buchanan's votes to Bush's votes in the other Florida counties. Adams and Fastnow give the following reasoning. "There are theoretical reasons to think that the number of Buchanan's votes should correlate with Bush's. First, for any candidate, a large county with many people will generally provide the candidate more votes than a county with fewer people, all else being equal. Second, holding size of the county constant, a more conservative county should favor both Buchanan and Bush in a proportionate way. It thus seemed reasonable to us to expect a systematic relationship between the two candidates' votes." The SHAZAM commands below estimate the relationship between votes
for Buchanan and Bush in the counties of Florida excluding Palm Beach.
The
The SHAZAM output can be viewed. The results show that, assuming that Palm Beach voting patterns
are similar to the other Florida counties, the predicted number of
votes for Buchanan in Palm Beach county is In Palm Beach county, the actual number of votes for Buchanan of
A scatterplot, that shows the outlier Buchanan result in Palm Beach county, is displayed below.
Model CritiqueThe scatterplot shown above highlights the variation in population size for the 67 Florida counties. Summary statistics for the total number of votes by county are given below.
The SHAZAM commands for calculating the summary statistics are available. A few counties with relatively large population size (including Palm Beach county) are pulling up the mean to a value that exceeds the median. It may be reasonable to expect that large counties will have higher variability in the Buchanan vote than counties in the lower quartile with fewer than 8,000 total votes. For the simple linear regression model estimated above, this will be revealed in heteroskedastic errors. In the presence of heteroskedasticity, the confidence intervals calculated from the least squares estimation results will be incorrect. Therefore, tests for heteroskedasticity should be inspected. An alternative modelling approach is to use log-transformed data. The log transformation rescales the data and therefore may correct for heteroskedasticity that is observed in the linear model. In particular, the observations in the upper quartile are compressed so that the difference with the other observations is less extreme. The results for tests for heteroskedasticity and prediction with log-transformed variables are available. Concluding RemarksAdams and Fastnow tried a number of other model variations and concluded: "If one holds to the statistical assumptions of most of these models, and if Buchanan's unusual performance can be attributed to voters who intended to vote for Gore (an assumption that some have contested), then it can be claimed with a fairly high degree of statistical confidence that the mistakes cost Gore a significant share of votes." Note: This example is provided for teaching purposes only to illustrate econometric methodology that can be implemented with the SHAZAM software. The example is not intended to make any political comment.
SHAZAM output|_SAMPLE 1 67 |_READ (PRES2000.txt) GORE BUSH BUCHANAN NADER OTHER / SKIPLINE=1 UNIT 88 IS NOW ASSIGNED TO: PRES2000.txt 5 VARIABLES AND 67 OBSERVATIONS STARTING AT OBS 1 |_DIM YHAT 67 SE 67 |_* Estimate the relationship between votes for Buchanan and Bush |_* in the counties of Florida excluding Palm Beach. |_SAMPLE 1 66 |_OLS BUCHANAN BUSH OLS ESTIMATION 66 OBSERVATIONS DEPENDENT VARIABLE= BUCHANAN ...NOTE..SAMPLE RANGE SET TO: 1, 66 R-SQUARE = 0.7511 R-SQUARE ADJUSTED = 0.7472 VARIANCE OF THE ESTIMATE-SIGMA**2 = 12880. STANDARD ERROR OF THE ESTIMATE-SIGMA = 113.49 SUM OF SQUARED ERRORS-SSE= 0.82430E+06 MEAN OF DEPENDENT VARIABLE = 213.00 LOG OF THE LIKELIHOOD FUNCTION = -404.927 VARIABLE ESTIMATED STANDARD T-RATIO PARTIAL STANDARDIZED ELASTICITY NAME COEFFICIENT ERROR 64 DF P-VALUE CORR. COEFFICIENT AT MEANS BUSH 0.34962E-02 0.2516E-03 13.90 0.000 0.867 0.8666 0.6857 CONSTANT 66.940 17.48 3.829 0.000 0.432 0.0000 0.3143 |_* Predict the number of votes for Buchanan in Palm Beach. |_FC / LIST BEG=67 END=67 PREDICT=YHAT FCSE=SE DEPENDENT VARIABLE = BUCHANAN 1 OBSERVATIONS REGRESSION COEFFICIENTS 0.349623785167E-02 66.9403199359 OBS. OBSERVED PREDICTED CALCULATED STD. ERROR NO. VALUE VALUE RESIDUAL 67 3407.0 601.33 2805.7 117.711 I * SUM OF ABSOLUTE ERRORS= 2805.7 R-SQUARE BETWEEN OBSERVED AND PREDICTED = 0.0000 MEAN ERROR = 2805.7 SUM-SQUARED ERRORS = 0.78718E+07 MEAN SQUARE ERROR = 0.78718E+07 MEAN ABSOLUTE ERROR= 2805.7 ROOT MEAN SQUARE ERROR = 2805.7 MEAN SQUARED PERCENTAGE ERROR= 6781.6 THEIL INEQUALITY COEFFICIENT U = 0.000 DECOMPOSITION PROPORTION DUE TO BIAS = 1.0000 PROPORTION DUE TO VARIANCE = 0.0000 PROPORTION DUE TO COVARIANCE = 0.0000 DECOMPOSITION PROPORTION DUE TO BIAS = 1.0000 PROPORTION DUE TO REGRESSION = 0.0000 PROPORTION DUE TO DISTURBANCE = 0.0000 |_* Calculate a 99% prediction interval |_* Obtain the critical value. |_GEN1 DF=$N-$K ..NOTE..CURRENT VALUE OF $N = 66.000 ..NOTE..CURRENT VALUE OF $K = 2.0000 |_SAMPLE 1 1 |_GEN1 ALPHA=0.01/2 |_DISTRIB ALPHA / TYPE=T DF=DF INVERSE CRITICAL=TC T DISTRIBUTION DF= 64.000 VARIANCE= 1.0323 H= 1.0000 PROBABILITY CRITICAL VALUE PDF ALPHA ROW 1 0.50000E-02 2.6553 0.13308E-01 |_SAMPLE 67 67 |_GENR YUP=YHAT+TC*SE |_GENR YLOW=YHAT-TC*SE |_* Print the prediction interval |_PRINT YLOW YUP YLOW YUP 288.7689 913.8837 |_* Scatterplot |_SAMPLE 1 67 |_GRAPH BUCHANAN BUSH / NOKEY 67 OBSERVATIONS SHAZAM WILL NOW MAKE A PLOT FOR YOU |_STOP
SHAZAM commandsThe SHAZAM commands below calculate summary statistics for the total number of votes in Florida by county.
The SHAZAM output follows.
|_SAMPLE 1 67 |_READ (PRES2000.txt) GORE BUSH BUCHANAN NADER OTHER / SKIPLINE=1 UNIT 88 IS NOW ASSIGNED TO: PRES2000.txt 5 VARIABLES AND 67 OBSERVATIONS STARTING AT OBS 1 |_* Calculate the total number of votes recorded in each county |_GENR TOTAL=GORE+BUSH+BUCHANAN+NADER+OTHER |_* Summary statistics |_SAMPLE 1 67 |_STAT TOTAL / PMEDIAN NAME N MEAN ST. DEV VARIANCE MINIMUM MAXIMUM TOTAL 67 88912. 0.13180E+06 0.17370E+11 2410.0 0.62536E+06 VARIABLE = TOTAL MEDIAN = 35149. LOWER 25%= 8021.0 UPPER 25%= 0.10311E+06 INTERQUARTILE RANGE= 0.9509E+05 MODE NOT APPLICABLE |_* Print the total number of votes for Palm Beach county |_SAMPLE 67 67 |_PRINT TOTAL TOTAL 432286.0 |_STOP [SHAZAM Guide home] |