This notebook is based on Chapter 10 of Cryer and Chan, “Time Series Analysis: With Applications in R”.
1 Monthly Carbon Dioxide Levels at Alert, NWT, Canada
1.1 Model specification
The figure below displays the monthly carbon dioxide levels at Alert, NWT, Canada. The data is available in the TSA package in R. The data is plotted from January 1994 to December 2004. The points are colored according to the month of the year.
Seasonality is more noticable in the second plot, where we colored and labeled the points according to the month of the year.
Code
library(TSA)data(co2)plot(co2,ylab='CO2', main='Monthly Carbon Dioxide Levels at Alert, NWT, Canada')
Code
# The points are colored according to the month of the year# Define colors for each month (12 distinct colors)month_colors <-c("red", "orange", "yellow", "green", "cyan", "blue", "purple", "magenta", "brown", "gray", "black", "pink")plot(window(co2,start=c(2000,1)),ylab='CO2', main ='Monthly Carbon Dioxide Levels at Alert, NWT, Canada')Month=c('J','F','M','A','M','J','J','A','S','O','N','D')points(window(co2,start=c(2000,1)),pch=Month, col = month_colors[cycle(co2)])
Code
acf(as.vector(co2),lag.max=36, main ='Sample ACF of Yt',xaxp=c(0,36,18))
The sample autocorrelation shows clear signs of seasonality.
We see a noticeable linear trend in the \(Y_t\) plot, indicating that the data is not stationary. We difference once, \(\nabla Y_t\), and plot the differenced series as well as the corresponding sample ACF.
Code
plot(diff(co2),ylab='First Difference of CO2',xlab='Time',main =expression(nabla * Y[t]))
Code
acf(as.vector(diff(co2)),lag.max=36,ci.type='ma',xaxp=c(0,36,18),main =expression("Sample ACF of "* nabla * Y[t]))
The general upward trend disappears after the first difference. The ACF of the first difference shows a clear seasonal pattern, with peaks at lags 12, 24, and 36. Perhaps seasonal differencing will bring us to a series that may be modeled parsimoniously. We difference it again at lag 12.
Now we are plotting \(\nabla_{12}\nabla Y_t\) and the corresponding ACF.
Code
plot(diff(diff(co2),lag=12), xlab='Time',ylab='First and Seasonal Difference of CO2', main =expression( nabla[12] * nabla * Y[t]))
Code
acf(as.vector(diff(diff(co2),lag=12)),lag.max=36,ci.type='ma',xaxp=c(0,36,18), main =expression("Sample ACF of "* nabla[12] * nabla * Y[t]))
Code
pacf(as.vector(diff(diff(co2),lag=12)),lag.max=36,xaxp=c(0,36,18), main =expression("Sample PACF of "* nabla[12] * nabla * Y[t]))
We see significant spikes at lags 1, 12 and 13. We can model this parsimoniously as \(MA(1)\times MA(1)_{12}\). So the original series follows ARIMA\((0,1,1)\times (0,1,1)_{12}\), i.e.
The coefficients are highly significant. We check the residuals of the model. Other than some strange behavior in the middle of the series, this plot does not suggest any major irregularities with the model
Code
plot(window(rstandard(m1.co2),start=c(1995,2)),ylab='Standardized Residuals',type='o', main =expression("Residuals from the ARIMA"~"("*0*","*1*","*1*")"~"\u00d7"~"("*0*","*1*","*1*")"[12] ~"Model"))abline(h=0)
To diagnose further, we plot the sample ACF of the residuals. The only ``statistically significant’’ correlation is at lag 22, and this correlation has a value of only −0.17, a very small correlation. Furthermore, we can’t think of any reasonable interpretation for dependence at lag 22. Finally, we should not be surprised that one autocorrelation out of the 36 displayed is statistically significant. This could easily happen by chance alone. Except for marginal significance at lag 22, the model seems to have captured the essence of the dependence in the series. The Ljing-Box test for the residuals confirms that the residuals are uncorrelated.
Code
acf(as.vector(window(rstandard(m1.co2),start=c(1995,2))),lag.max=36, main ='Sample ACF of Residuals',xaxp=c(0,36,18))
The AIC increased and the new estimated term \(\theta_2\) is not significant, while the other estimated terms have not changed much.
The ARIMA\((0,1,1)\times(0,1,1)_{12}\) model was popularized in the first edition of the seminal book of Box and Jenkins (1976) when it was found to characterize the logarithms of a monthly airline passenger time series. This model has come to be known as the airline model.
1.3 Forecasting
The first plot shows the forecasts and 95% forecast limits for a lead time of two years for the ARIMA\((0,1,1)\times (0,1,1)_{12}\) model that we fit. The last two years of observed data are also shown. The forecasts mimic the stochastic periodicity in the data quite well, and the forecast limits give a good feeling for the precision.
The second plot displays the last year of observed data and forecasts out four years. At this lead time, it is easy to see that the forecast limits are getting wider, as there is more uncertainty in the forecasts.
Code
plot(m1.co2,n1=c(2003,1),n.ahead=24,xlab='Year',type='o',ylab='CO2 Levels', main ='Forecasts from the ARIMA(0,1,1)×(0,1,1)12 Model, Lead Time=2 Years')
Code
plot(m1.co2,n1=c(2004,1),n.ahead=48,xlab='Year',type='b', ylab='CO2 Levels', main ='Forecasts from the ARIMA(0,1,1)×(0,1,1)12 Model, Lead Time=4 Years')
1.4 Summary
Multiplicative seasonal ARIMA models provide an efficient way to model time series whose seasonal tendencies are not as regular as we would have with a deterministic seasonal trend model which we covered earlier. Fortunately, these models are simply special ARIMA models so that no new theory is needed to investigate their properties. We illustrated the special nature of these models with a thorough modeling of an actual time series of CO2 data.