Figure 9. The autocorrelations in hourly temperature at Barrow,
1 Jan 1985 – 31 Dec 2015
5 An ARCH/ARMAX Model of
Hourly Temperature
The model employed in this paper is an Autoregressive Conditional
Heteroskedasticity/ Autoregressive–Moving-Average with Exogenous Inputs
model of temperature (henceforth, an ARCH/ARMAX model of temperature).
The ARCH terms are employed to model the conditional heteroskedasticity,
an important consideration in the convergence process. The
Autoregressive–Moving-Average (ARMA) component models the
autocorrelations in temperature depicted in Figure 9. In this section,
the role of the exogenous inputs is discussed.
Following from Forbes and St. Cyr
( 2017, 2019) and Forbes and Zampelli (2019, 2020), the modeling
approach employed in this paper accepts the proposition that “All
models are wrong; some models are useful” (Box et al., 2005, p. 440).
They are all “wrong” because they represent a simplification of
reality; they can be useful if important features of that reality are
captured. A possibly related proposition that may be relevant during
these times of sharp differences in opinions is “that all modeling
results can easily be dismissed out of hand as being wrong, even if they
are useful.” In the case of this research, it may be asserted that the
results are “wrong” because the model is adversely affected by
“specification errors,” “multicollinearity,” “autocorrelation,”
“heteroskedasticity,” “overfitting,” and “unit-root issues.”
Other readers may conclude that
the model is “wrong” because it somehow “forces” the estimated
relationship between CO2 concentrations and temperature
to be positive because both are rising over time ( note: the correlation
between temperature and CO2 equals -0.1495). Still,
others will argue that the results are “biased” because the model’s
dependent variable is the natural logarithm of temperature.
Following from Forbes and Zampelli (2020, p. 13), this paper accepts the
proposition that the “…vulnerability of a model to be deemed as
wrong even though all models are “wrong” represents a challenge to the
recognition of insights provided by models that are useful.”
Fortunately, this challenge can be addressed by assessing a model’s
predictive accuracy. Common sense informs us that a model that yields
accurate predictions is useful if the evaluation interval is
sufficiently long. Based on this perspective, the approach in this paper
proceeds by estimating the model using 228,085 observations and
performing an out-of-sample analysis with 13,175 observations.
In the model, the association
between CO2 concentrations and temperature is presumed
to be conditional on the level of
downward total solar irradiance
measured at the Earth’s surface, downward total solar irradiance being
the primary driver of the weather and climate system. The other drivers
of the surface energy balance, such as upward and downward longwave
irradiance, are not included as explanatory variables in the model
because they are hypothesized to be affected by CO2concentrations. Upward short-wave irradiance is not hypothesized to be
directly affected by CO2 concentrations. Its inclusion
as an explanatory variable is open to question, given that it is largely
driven by downward solar irradiance and temperature. The inclusion of
this variable would significantly reduce the sample size, given that
ESRL only commenced reporting this variable in 1993.
In the model, CO2 concentrations are lagged one hour to
avoid the issue of possible two-way causality between temperature and
CO2 concentrations. The model also includes binary
variables representing the solar zenith angle, the hour-of-the-day,
day-of-the-year, and year. These variables are included as proxies for
the drivers of the diurnal variation in temperature, the seasonal
variation in temperature, and the possible non-anthropomorphic drivers
of temperature unrelated to total downward solar irradiance. In terms of
functional form, linearity is not presumed. Instead, the data are
permitted to speak for themselves on this important issue.
The initial version of the model is given by:
lnTempt = α0 + α1ZeroSolart + α2 Solart +
α3 (CO2t-1*ZeroSolart)
+ α4 (CO2t-1*Solart) +
α5Solart * CO2t-1 +\(\sum_{h=1}^{9}{\beta_{h}\text{Angle}_{\text{h\ }}}\)
+ \(\sum_{i=2}^{24}{\phi_{i}\text{HourofDay}_{\text{i\ }}}\) +\(\sum_{j=2}^{365}{\gamma_{j}\text{DOY}_{\text{j\ }}}\) +\(\sum_{k=1985}^{2014}{\delta_{k}\text{Yea}r_{k}}\) (1)
Where
lnTempt is the natural logarithm of temperature measured
in Kelvin in hour t.
ZeroSolart is a binary variable. The variable is
assigned a value of one if the downward total solar irradiance level at
Barrow in period t equals zero. Its value equals zero otherwise.
Solart equals the downward total solar irradiance level
at Barrow in period t.
CO2t-1 is the atmospheric level of CO2concentrations at Barrow in hour t-1.
PosSolart is a binary variable that equals one if the
level of downward total solar irradiance at Barrow in period t is
positive. Its value equals zero otherwise.
Angleh is a vector of nine variables representing the
solar zenith angle.
\(\text{HourofDay}_{\text{i\ }}\) is a series of 23 variables
representing the hour of the day.
\(\text{DOY}_{\text{j\ }}\) is a series of 364 binary variables
representing the day of the year.
\(\text{Yea}r_{k}\) is a series of 30 binary variables representing the
year.
Please note that α1, α2, and
α3, etc. are the coefficients corresponding to this
linear version of the model. From (1), the total number of coefficients
to be estimated equals 432. Some may strongly suspect that this number
of explanatory variables indicates that the model is ”overfitted.” If
this claim is true, the model would be unlikely to yield accurate
out-of-sample predictions even if the within-sample explanatory power is
very high (Brooks, 2019, p. 271). The “rule of thumb” by Trout (2006)
that overfitting is avoided when there are at least ten observations per
estimated coefficient does not support this possible suspicion given
that the structural model present in this paper entails over 500
observations per estimated coefficient. Moreover, as will be seen, the
model does not suffer from the consequences of overfitting in terms of
out-of-sample predictive accuracy.