Prediction

Prediction


Following an estimation command, the FC command will generate predictions and forecast standard errors. For prediction with the linear regression model, the general command format is:

OLS depvar indeps / options
FC / options

Some useful options with the FC command are:
LIST Prints the predictions, the forecast standard errors and forecast diagnostics.
BEG=
END=
Specifies the start and end observation numbers for the predictions.
FCSE= Saves the forecast standard errors in the variable specified. The variable must be defined before the model estimation.
PREDICT= Saves the predictions in the variable specified. The variable must be defined before the model estimation.

More options and features of the FC command are described in the SHAZAM User's Reference Manual.

Example

This example analyzes voting patterns in the state of Florida for the presidential election held on November 7, 2000. The 2000 presidential race emerged as a close contest between Al Gore and George W. Bush. On election day, the results revealed that the Florida outcome would determine the next President of the United States. However, the Florida election results showed a difference of only a few hundred votes between Gore and Bush. A final decision was delayed until various recounts and counts of absentee ballots could be completed. An additional controversy was that the "butterfly" ballot design in the county of Palm Beach may have confused voters. There was speculation that Palm Beach voters that intended to vote for Gore may have mistakenly given their vote to Buchanan.

Adams and Fastnow present a statistical analysis for detecting the possibility of voting irregularities in Palm Beach. The reference is:

Greg D. Adams and Chris Fastnow, "A Note on the Voting Irregularities in Palm Beach, Florida" (downloaded from the internet).

A data set contains the Florida county-level returns for the 2000 presidential election. It is proposed that an estimate of the number of votes for Buchanan in Palm Beach county can be predicted from a linear regression equation that relates Buchanan's votes to Bush's votes in the other Florida counties. Adams and Fastnow give the following reasoning.

"There are theoretical reasons to think that the number of Buchanan's votes should correlate with Bush's. First, for any candidate, a large county with many people will generally provide the candidate more votes than a county with fewer people, all else being equal. Second, holding size of the county constant, a more conservative county should favor both Buchanan and Bush in a proportionate way. It thus seemed reasonable to us to expect a systematic relationship between the two candidates' votes."

The SHAZAM commands below estimate the relationship between votes for Buchanan and Bush in the counties of Florida excluding Palm Beach. The FC command is then used to predict the number of votes for Buchanan in Palm Beach county. From the results, a 99% prediction interval is calculated.

SAMPLE 1 67
READ (PRES2000.txt) GORE BUSH BUCHANAN NADER OTHER / SKIPLINE=1
DIM YHAT 67 SE 67
* Estimate the relationship between votes for Buchanan and Bush
* in the counties of Florida excluding Palm Beach.
SAMPLE 1 66
OLS BUCHANAN BUSH
* Predict the number of votes for Buchanan in Palm Beach.
FC / LIST BEG=67 END=67 PREDICT=YHAT FCSE=SE 

* Calculate a 99% prediction interval 
* Obtain the critical value.
GEN1 DF=$N-$K
SAMPLE 1 1
GEN1 ALPHA=0.01/2
DISTRIB ALPHA / TYPE=T DF=DF INVERSE CRITICAL=TC
SAMPLE 67 67
GENR YUP=YHAT+TC*SE
GENR YLOW=YHAT-TC*SE
* Print the prediction interval
PRINT YLOW YUP

* Scatterplot
SAMPLE 1 67 
GRAPH BUCHANAN BUSH / NOKEY
STOP

The SHAZAM output can be viewed.

The results show that, assuming that Palm Beach voting patterns are similar to the other Florida counties, the predicted number of votes for Buchanan in Palm Beach county is 601. The 99% prediction interval is:

      [289 , 914]

In Palm Beach county, the actual number of votes for Buchanan of 3407 exceeded the upper limit of the prediction interval by more than 2400 votes.

A scatterplot, that shows the outlier Buchanan result in Palm Beach county, is displayed below.

Model Critique

The scatterplot shown above highlights the variation in population size for the 67 Florida counties. Summary statistics for the total number of votes by county are given below.

Mean 88,912
First Quartile 8,021
Median 35,149
Third Quartile 103,110
Maximum
(Miami-Dade County)
625,362
Palm Beach County 432,286

The SHAZAM commands for calculating the summary statistics are available.

A few counties with relatively large population size (including Palm Beach county) are pulling up the mean to a value that exceeds the median. It may be reasonable to expect that large counties will have higher variability in the Buchanan vote than counties in the lower quartile with fewer than 8,000 total votes. For the simple linear regression model estimated above, this will be revealed in heteroskedastic errors. In the presence of heteroskedasticity, the confidence intervals calculated from the least squares estimation results will be incorrect.

Therefore, tests for heteroskedasticity should be inspected.

An alternative modelling approach is to use log-transformed data. The log transformation rescales the data and therefore may correct for heteroskedasticity that is observed in the linear model. In particular, the observations in the upper quartile are compressed so that the difference with the other observations is less extreme.

The results for tests for heteroskedasticity and prediction with log-transformed variables are available.

Concluding Remarks

Adams and Fastnow tried a number of other model variations and concluded:

"If one holds to the statistical assumptions of most of these models, and if Buchanan's unusual performance can be attributed to voters who intended to vote for Gore (an assumption that some have contested), then it can be claimed with a fairly high degree of statistical confidence that the mistakes cost Gore a significant share of votes."

Note: This example is provided for teaching purposes only to illustrate econometric methodology that can be implemented with the SHAZAM software. The example is not intended to make any political comment.


SHAZAM output


|_SAMPLE 1 67
|_READ (PRES2000.txt) GORE BUSH BUCHANAN NADER OTHER / SKIPLINE=1
UNIT 88 IS NOW ASSIGNED TO: PRES2000.txt
   5 VARIABLES AND       67 OBSERVATIONS STARTING AT OBS       1

|_DIM YHAT 67 SE 67

|_* Estimate the relationship between votes for Buchanan and Bush
|_* in the counties of Florida excluding Palm Beach.
|_SAMPLE 1 66
|_OLS BUCHANAN BUSH

 OLS ESTIMATION
       66 OBSERVATIONS     DEPENDENT VARIABLE= BUCHANAN
...NOTE..SAMPLE RANGE SET TO:      1,     66

 R-SQUARE =   0.7511     R-SQUARE ADJUSTED =   0.7472
VARIANCE OF THE ESTIMATE-SIGMA**2 =   12880.
STANDARD ERROR OF THE ESTIMATE-SIGMA =   113.49
SUM OF SQUARED ERRORS-SSE=  0.82430E+06
MEAN OF DEPENDENT VARIABLE =   213.00
LOG OF THE LIKELIHOOD FUNCTION = -404.927

VARIABLE   ESTIMATED  STANDARD   T-RATIO        PARTIAL STANDARDIZED ELASTICITY
  NAME    COEFFICIENT   ERROR      64 DF   P-VALUE CORR. COEFFICIENT  AT MEANS
BUSH      0.34962E-02 0.2516E-03   13.90     0.000 0.867     0.8666     0.6857
CONSTANT   66.940      17.48       3.829     0.000 0.432     0.0000     0.3143

|_* Predict the number of votes for Buchanan in Palm Beach.
|_FC / LIST BEG=67 END=67 PREDICT=YHAT FCSE=SE

DEPENDENT VARIABLE = BUCHANAN         1 OBSERVATIONS
REGRESSION COEFFICIENTS
  0.349623785167E-02   66.9403199359
    OBS.   OBSERVED     PREDICTED   CALCULATED  STD. ERROR
     NO.    VALUE        VALUE       RESIDUAL
     67   3407.0       601.33       2805.7      117.711               I    *

SUM OF ABSOLUTE ERRORS=   2805.7
R-SQUARE BETWEEN OBSERVED AND PREDICTED = 0.0000
MEAN ERROR =   2805.7
SUM-SQUARED ERRORS =  0.78718E+07
MEAN SQUARE ERROR =  0.78718E+07
MEAN ABSOLUTE ERROR=   2805.7
ROOT MEAN SQUARE ERROR =   2805.7
MEAN SQUARED PERCENTAGE ERROR=   6781.6
THEIL INEQUALITY COEFFICIENT U = 0.000
  DECOMPOSITION
     PROPORTION DUE TO BIAS =   1.0000
     PROPORTION DUE TO VARIANCE =   0.0000
     PROPORTION DUE TO COVARIANCE =   0.0000
  DECOMPOSITION
     PROPORTION DUE TO BIAS =   1.0000
     PROPORTION DUE TO REGRESSION =   0.0000
     PROPORTION DUE TO DISTURBANCE =   0.0000

|_* Calculate a 99% prediction interval
|_* Obtain the critical value.
|_GEN1 DF=$N-$K
..NOTE..CURRENT VALUE OF $N   =   66.000
..NOTE..CURRENT VALUE OF $K   =   2.0000
|_SAMPLE 1 1
|_GEN1 ALPHA=0.01/2
|_DISTRIB ALPHA / TYPE=T DF=DF INVERSE CRITICAL=TC
T DISTRIBUTION DF=   64.000
VARIANCE=   1.0323       H=   1.0000

              PROBABILITY CRITICAL VALUE   PDF
  ALPHA
 ROW     1    0.50000E-02  2.6553     0.13308E-01

|_SAMPLE 67 67
|_GENR YUP=YHAT+TC*SE
|_GENR YLOW=YHAT-TC*SE
|_* Print the prediction interval
|_PRINT YLOW YUP
      YLOW           YUP
   288.7689       913.8837

|_* Scatterplot
|_SAMPLE 1 67
|_GRAPH BUCHANAN BUSH / NOKEY

       67 OBSERVATIONS
 SHAZAM WILL NOW MAKE A PLOT FOR YOU
|_STOP


SHAZAM commands

The SHAZAM commands below calculate summary statistics for the total number of votes in Florida by county.

SAMPLE 1 67
READ (PRES2000.txt) GORE BUSH BUCHANAN NADER OTHER / SKIPLINE=1
* Calculate the total number of votes recorded in each county
GENR TOTAL=GORE+BUSH+BUCHANAN+NADER+OTHER
* Summary statistics
SAMPLE 1 67
STAT TOTAL / PMEDIAN
* Print the total number of votes for Palm Beach county
SAMPLE 67 67
PRINT TOTAL
STOP

The SHAZAM output follows.


|_SAMPLE 1 67
|_READ (PRES2000.txt) GORE BUSH BUCHANAN NADER OTHER / SKIPLINE=1
UNIT 88 IS NOW ASSIGNED TO: PRES2000.txt
   5 VARIABLES AND       67 OBSERVATIONS STARTING AT OBS       1

|_* Calculate the total number of votes recorded in each county
|_GENR TOTAL=GORE+BUSH+BUCHANAN+NADER+OTHER
|_* Summary statistics
|_SAMPLE 1 67
|_STAT TOTAL / PMEDIAN
NAME        N    MEAN        ST. DEV      VARIANCE     MINIMUM      MAXIMUM
TOTAL        67   88912.     0.13180E+06 0.17370E+11   2410.0      0.62536E+06

 VARIABLE = TOTAL
MEDIAN =    35149.
LOWER 25%=   8021.0     UPPER 25%=  0.10311E+06 INTERQUARTILE RANGE= 0.9509E+05
MODE NOT APPLICABLE

|_* Print the total number of votes for Palm Beach county
|_SAMPLE 67 67
|_PRINT TOTAL
    TOTAL
   432286.0
|_STOP

Home [SHAZAM Guide home]