* Structural Breaks in the US Gasoline Market
*
* Keywords:
* regression, ols, log, gasoline, market, us, chow, test, structural break
*
* Description:
* We illustrate how to conduct a Chow Structural Break test on a log-log
* OLS model for the U.S. per capita gasoline consumption
*
* Author(s):
* Noel Roy
* Skif Pankov
*
* Source:
* William H. Greene, Econometric Analysis - 7th Edition
* Pearson International Edition, Chapter 6, Example 6.9 (page 212)
*
* Setting the first time period to be equal to year 1953 with periodicity of
* one year
TIME 1953.0 1
SAMPLE 1953.0 2004.0
* Reading the datafile and naming the variables, specifying to ignore the
* first line of the file
read (TableF2-2.shd) year gasexp pop gasp income pnc puc ppt pd pn ps / skiplines = 1
* Generating logs of variables
genr lngpop = log(gasexp/pop/gasp)
genr lnincome = log(income)
genr lnpg = log(gasp)
genr lnpnc = log(pnc)
genr lnpuc = log(puc)
* Replicating figure 6.5
genr g = gasexp / gasp
graph gasp g / nokey
* Calculating time trend
genr t = year - 1952
* Running an OLS regression of lngpop on lnpg, lnincome, lnpnc, lnpuc and t,
* specifying that it's a log-log model
ols lngpop lnpg lnincome lnpnc lnpuc t / loglog
* Testing for a structural break in the model after the opec price shock in 1973
* with a Chow test by using a diagnos command
*
diagnos / chowone = 21
* Calculating the chow test statistic the "long way" (saving the estimated
* coefficients and the covariance matrix of coefficients using the coef =
* and cov = options for use in calculating the test statistic
*
ols lngpop lnpg lnincome lnpnc lnpuc t
gen1 sse=$sse
gen1 k=$k
sample 1 21
ols lngpop lnpg lnincome lnpnc lnpuc t / coef = theta1 cov = v1
gen1 sse1=$sse
gen1 n1=$n
sample 22 52
ols lngpop lnpg lnincome lnpnc lnpuc t / coef = theta2 cov = v2
gen1 sse2=$sse
gen1 n2=$n
* Computing and outputting the test statistic.
gen1 df1 = k
gen1 df2 = n1+n2-2*k
gen1 f = ((sse-sse1-sse2)/df1)/((sse1+sse2)/df2)
print f df1 df2
* Computing the probability density function (pdf) and the cummulative density
* function (cdf) for variable f, specifying that it has an F-distribution with k and
* df degrees of freedom - this gives the value of the test statistic
distrib f / type = f df1 = df1 df2 = df2
* Testing whether the observations for 1974, 1975, 1980, and 1981
* are consistent with the unrestricted estimate
* Defining the sample as 1953-1973, 1976-1979, 1982-2004.
* the four years 1974, 1975, 1980, and 1981 are excluded
sample 1 11 14 17 20 52
* Running an OLS regression for the selected sample
?ols lngpop lnpg lnincome lnpnc lnpuc t
* Computing the test statistic (6-15).
gen1 df1 = 4
gen1 df2 = $n-k
gen1 fstat = ((sse-$sse)/df1)/($sse/df2)
print fstat df1 df2
distrib fstat / type=f df1=df1 df2=df2
* An alternative method of calculating this statistic takes the full sample
* with dummy variables for the years of which there is a structural break.
* These can be created using the dum function, or, alternatively, by the
* if command
* Restoring the full sample
sample 1 52
* Defining the dummy variables
genr y1974 = 0
genr y1975 = 0
genr y1980 = 0
genr y1981 = 0
* The if command sets a variable at a certain value if a logical condition
* is satisfied.
if (year .eq. 1974) y1974 = 1
if (year .eq. 1975) y1975 = 1
if (year .eq. 1980) y1980 = 1
if (year .eq. 1981) y1981 = 1
* Estimating the equation with dummy variables included
?ols lngpop lnpg lnincome lnpnc lnpuc t y1974 y1975 y1980 y1981
* Computing the test statistic
gen1 df1 = 4
gen1 df2 = $n-$k
gen1 f = ((sse-$sse)/df1)/(($sse)/df2)
print f df1 df2
distrib f / type=f df1=df1 df2=df2
* Estimating the pooled model with different constant terms con1 and
* con2. The constants can be generaged using the if command
genr con1 = 1
genr con2 = 1
if (year .le. 1973) con2 = 0
if (year .gt. 1973) con1 = 0
* Running an OLS regression, specifying not to use a constant (since
* it is included implicitly via con1 and con2 variables)
ols lngpop con1 con2 lnpg lnincome lnpnc lnpuc t / noconstant
* Computing the test statistic
gen1 df1 = k-1
gen1 df2 = n1+n2-2*k
gen1 f = (($sse-sse1-sse2)/df1)/((sse1+sse2)/df2)
print f df1 df2
distrib f / type=f df1=df1 df2=df2
* Suppose that in the restricted model, the coefficients of ly, lpg,
* and the constant may differ in the two periods.
genr ly1 = con1*lnincome
genr ly2 = con2*lnincome
genr lpg1 = con1*lnpg
genr lpg2 = con2*lnpg
?ols lngpop con1 con2 ly1 ly2 lpg1 lpg2 lnpnc lnpuc t / noconstant
* Computing the test statistic.
gen1 df1 = k-3
gen1 df2 = n1+n2-2*k
gen1 f = (($sse-sse1-sse2)/df1)/((sse1+sse2)/df2)
print f df1 df2
distrib f / type=f df1=df1 df2=df2
* Testing the null hypothesis that the difference between the parameters
* of the two time periods is zero by computing the Wald statistic through
* matrix manipulation capabilities
matrix w = (theta1-theta2)'inv(v1+v2)(theta1-theta2)
* Outputting w
print w
* The wald statistic is asymptotic chi-square, so we can use the distrib
* command to calculate its p value
distrib w / type=chi df=k
stop