Forecasting Notes
Dummy Variable: a
dummy variable, in general, is a categorical variable, which has the
value zero or one depending on whether the characteristic applies. It is a way of introducing “states of nature”
which are important in
understanding the movement of the dependent variable, but are not
quantifiable as continuous variables.
Seasonal Effects:
use of dummy variables (within a simple linear trend case)
1. If
you wanted to estimate an equation to represent the sale of grass seed
over some period, where you believe that the basic relationship is
Sales = f(time)
say you have
quarterly data: per this table
Sales
time period 1:
1996-Q1 100
period
2: 1996-Q2 120
period 3:
1996-Q3 110
period 4:
1996-Q4 105
period 5:
1997-Q1 104
period 6: 1997-Q2 124
period 7:
1997-Q3 114
period 8:
1997-Q4 109
2. here is graph of the 8 observations:

3. without seasonal dummies, a simple linear trend regression will
estimate a straight line like so:
Sales = 106.7 + .881 t R2
= .07
4. but by including dummies, you can pick up the regular ups
and downs which appear to repeat on a seasonal basis
5. Since
there are four seasons which have to be introduced you will need more than 1
dummy. One dummy variable is good for a
2 part breakdown. The variable will take
on the value of one if it applies, zero it it does
not apply. So, if we want to break the year into two parts: summer (dummy =1),
and winter (dummy =0).
a. a subtle feature is that 3 dummies are correct
to handle a four part breakdown. Thus,
dummy 1 can be used to represent spring, dummy 2 can represent summer, and
dummy 3 can be fall, and the left-out category will be Winter. In other words, if the observation is for
winter all three dummies have the value of zero and the impact of winter season
is captured in the constant term.
6. so the grass seed data would be set up like this:
(Spring) (Summer)
(Fall)
Sales D1 D2
D3
time period 1: 1996-Q1
100 0
0
0
period 2: 1996-Q2
120 1 0 0
period 3: 1996-Q3 110
0 1 0
period
4: 1996-Q4 105
0 0 1
period 5: 1997-Q1
104 0 0 0
period 6:
1997-Q2 124 1
0 0
period 7 1997-Q3
114 0 1 0
period 8 1997-Q4
109 0 0 1
.
.
etc.
The observations. would be entered
like this:
for time period 1:
S=100 and
T=1 D1=0 D2=0
D3=0
for time period 2:
S=120 and
T=2 D1=1 D2=0
D3=0
etc.
The resulting
estimated equation would be
Sales
= 99 + 1.0(t) +
19(D1) + 8(D2) +
2(D3)
R square = 1.0
Note
the impressive improvement in R square -- it
would appear that much more accurate forecasts would be possible with this
improved model (of course the data above were hand-picked so that a very
precise underlying relationship was present).