Figure 9. The autocorrelations in hourly temperature at Barrow, 1 Jan 1985 – 31 Dec 2015
5 An ARCH/ARMAX Model of Hourly Temperature
The model employed in this paper is an Autoregressive Conditional Heteroskedasticity/ Autoregressive–Moving-Average with Exogenous Inputs model of temperature (henceforth, an ARCH/ARMAX model of temperature). The ARCH terms are employed to model the conditional heteroskedasticity, an important consideration in the convergence process. The Autoregressive–Moving-Average (ARMA) component models the autocorrelations in temperature depicted in Figure 9. In this section, the role of the exogenous inputs is discussed.
Following from Forbes and St. Cyr ( 2017, 2019) and Forbes and Zampelli (2019, 2020), the modeling approach employed in this paper accepts the proposition that “All models are wrong; some models are useful” (Box et al., 2005, p. 440). They are all “wrong” because they represent a simplification of reality; they can be useful if important features of that reality are captured. A possibly related proposition that may be relevant during these times of sharp differences in opinions is “that all modeling results can easily be dismissed out of hand as being wrong, even if they are useful.” In the case of this research, it may be asserted that the results are “wrong” because the model is adversely affected by “specification errors,” “multicollinearity,” “autocorrelation,” “heteroskedasticity,” “overfitting,” and “unit-root issues.” Other readers may conclude that the model is “wrong” because it somehow “forces” the estimated relationship between CO2 concentrations and temperature to be positive because both are rising over time ( note: the correlation between temperature and CO2 equals -0.1495). Still, others will argue that the results are “biased” because the model’s dependent variable is the natural logarithm of temperature.
Following from Forbes and Zampelli (2020, p. 13), this paper accepts the proposition that the “…vulnerability of a model to be deemed as wrong even though all models are “wrong” represents a challenge to the recognition of insights provided by models that are useful.” Fortunately, this challenge can be addressed by assessing a model’s predictive accuracy. Common sense informs us that a model that yields accurate predictions is useful if the evaluation interval is sufficiently long. Based on this perspective, the approach in this paper proceeds by estimating the model using 228,085 observations and performing an out-of-sample analysis with 13,175 observations.
In the model, the association between CO2 concentrations and temperature is presumed to be conditional on the level of downward total solar irradiance measured at the Earth’s surface, downward total solar irradiance being the primary driver of the weather and climate system. The other drivers of the surface energy balance, such as upward and downward longwave irradiance, are not included as explanatory variables in the model because they are hypothesized to be affected by CO2concentrations. Upward short-wave irradiance is not hypothesized to be directly affected by CO2 concentrations. Its inclusion as an explanatory variable is open to question, given that it is largely driven by downward solar irradiance and temperature. The inclusion of this variable would significantly reduce the sample size, given that ESRL only commenced reporting this variable in 1993.
In the model, CO2 concentrations are lagged one hour to avoid the issue of possible two-way causality between temperature and CO2 concentrations. The model also includes binary variables representing the solar zenith angle, the hour-of-the-day, day-of-the-year, and year. These variables are included as proxies for the drivers of the diurnal variation in temperature, the seasonal variation in temperature, and the possible non-anthropomorphic drivers of temperature unrelated to total downward solar irradiance. In terms of functional form, linearity is not presumed. Instead, the data are permitted to speak for themselves on this important issue.
The initial version of the model is given by:
lnTempt = α0 + α1ZeroSolart + α2 Solart + α3 (CO2t-1*ZeroSolart)
+ α4 (CO2t-1*Solart) + α5Solart * CO2t-1 +\(\sum_{h=1}^{9}{\beta_{h}\text{Angle}_{\text{h\ }}}\)
+ \(\sum_{i=2}^{24}{\phi_{i}\text{HourofDay}_{\text{i\ }}}\) +\(\sum_{j=2}^{365}{\gamma_{j}\text{DOY}_{\text{j\ }}}\) +\(\sum_{k=1985}^{2014}{\delta_{k}\text{Yea}r_{k}}\) (1)
Where
lnTempt is the natural logarithm of temperature measured in Kelvin in hour t.
ZeroSolart is a binary variable. The variable is assigned a value of one if the downward total solar irradiance level at Barrow in period t equals zero. Its value equals zero otherwise.
Solart equals the downward total solar irradiance level at Barrow in period t.
CO2t-1 is the atmospheric level of CO2concentrations at Barrow in hour t-1.
PosSolart is a binary variable that equals one if the level of downward total solar irradiance at Barrow in period t is positive. Its value equals zero otherwise.
Angleh is a vector of nine variables representing the solar zenith angle.
\(\text{HourofDay}_{\text{i\ }}\) is a series of 23 variables representing the hour of the day.
\(\text{DOY}_{\text{j\ }}\) is a series of 364 binary variables representing the day of the year.
\(\text{Yea}r_{k}\) is a series of 30 binary variables representing the year.
Please note that α1, α2, and α3, etc. are the coefficients corresponding to this linear version of the model. From (1), the total number of coefficients to be estimated equals 432. Some may strongly suspect that this number of explanatory variables indicates that the model is ”overfitted.” If this claim is true, the model would be unlikely to yield accurate out-of-sample predictions even if the within-sample explanatory power is very high (Brooks, 2019, p. 271). The “rule of thumb” by Trout (2006) that overfitting is avoided when there are at least ten observations per estimated coefficient does not support this possible suspicion given that the structural model present in this paper entails over 500 observations per estimated coefficient. Moreover, as will be seen, the model does not suffer from the consequences of overfitting in terms of out-of-sample predictive accuracy.