Autoregressive time-series models
A seasonal auto-regressive integrated moving average time-series
(SARIMA) model was fitted to the times series of IgG seroprevalence (all
age groups) for Gorakhpur district for 2013—2022. These data were
selected due to geographic proximity to the Gorakhpur Station at which
the climate data were recorded, and the largest proportion of samples
were from this division.
Stationarity of the time series of monthly IgG seroprevalence for
Gorakhpur division was assessed visually using autocorrelation function
(ACF) plots and statistical tests including the Ljung-Box test for
independence (null hypothesis = time independence in a given period of
lags), augmented Dickey-Fuller (ADF) t-statistic test for unit root
(null hypothesis = unit root present.), and the
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) for level and trend
stationarity (null hypotheses = time-series is stationary). Following
assessment of stationarity, ACF and partial autocorrelation function
(PACF) plots were used to guide the manual selection of autoregressive
time-series models for IgG seroprevalence, as well as automated model
fitting using the ‘auto.arima’ function in the ‘forecast’ package
(Hyndman & Khandakar, 2008). Fit was assessed visually using plots of
residuals, and statistically by minimising Akaike’s information
criterion, AIC.
Cross-correlation functions were then used to assess if a statistical
relationship existed between the time-series of IgG seroprevalence and
monthly climate variables of total rainfall, mean relative humidity and
mean minimum temperature. If correlation was detected, lagged climate
variables were included in the model and fit was assessed visually using
plots of residuals and statistically by minimising AIC.