Regression Methods in Time Series Analysis

Author

Dr Sergey Kushnarev

1 Data Examples

  1. Random Walk
  2. US Population: Quadratic Trend
  3. Seasonal Means
  4. Residual Analysis
  5. Assessing Normality
  6. The Sample Autocorrelation Function
  7. Quantile-Quantile Plot of Los Angeles Annual Rainfall Series

2 Regression Methods in Time Series Analysis

Based on Chapter 3, Cryer and Chan

Time series are often decomposed into components: deterministic and stochastic. The decomposition is additive: \[ Y_t = \mu_t + X_t \] where \(Y_t\) is the observed value at time \(t\), \(\mu_t\) is the deterministic component, and \(X_t\) is the stochastic component.

Code
library(TSA)

2.2 Residual Analysis

After estimating the trend by \(\hat\mu_t\), we can predict the unobserved values of the stochastic component \(X_t\) by \(\hat{X}_t = Y_t - \hat{\mu}_t\).

2.2.1 Seasonal model without the intercept term

Code
plot(y=rstudent(model2),
     x=as.vector(time(tempdub)),
     xlab='Time',
     ylab='Standardized Residuals',
     type='o',
     main = 'Residuals from Seasonal Means Model w/o Intercept')

Code
plot(y=rstudent(model3),
     x=as.vector(time(tempdub)),
     xlab='Time',
     ylab='Standardized Residuals',
     type='l',
     main = 'Residuals from Seasonal Means Model')

points(y=rstudent(model3),
       x=as.vector(time(tempdub)),
       pch=as.vector(season(tempdub)),
       col = 1:4)

2.2.2 Residuals versus Fitted Values from Seasonal Means Model

Code
plot(y=rstudent(model3),
     x=as.vector(fitted(model3)),
     xlab='Fitted Trend Values',
     ylab='Standardized Residuals',type='n',
     main='Residuals vs Fitted Values from Seasonal Means Model')
     points(y=rstudent(model3),
     x=as.vector(fitted(model3)),
     pch=as.vector(season(tempdub)), 
     col = 1:4)

Code
plot(y=rstudent(model4),
     x=as.vector(fitted(model4)),
     xlab='Fitted Trend Values',
     ylab='Standardized Residuals',type='n',
     main='Residuals vs Fitted Values from Cosine Trends')
     points(y=rstudent(model4),
     x=as.vector(fitted(model4)),
     pch=as.vector(season(tempdub)), 
     col = 1:4)

2.2.3 Assessing normality

Histogram is a very rough tool to assess normality

Code
hist(rstudent(model3),
     xlab='Standardized Residuals',
     main='Histogram of Residuals from Seasonal Means Model')

QQ-plot is a much better tool:

Code
# Plotting the QQ-plot
qqnorm(rstudent(model3),
       main='QQ-plot of Residuals from Seasonal Means Model')
qqline(rstudent(model3),col='red',lty=2)

QQ-plot is an excellent visual diagnostic. We can use a Shapiro-Wilk test to assess normality:

Code
# Shapiro-Wilk test for normality. H0: data is normally distributed
shapiro.test(rstudent(model3))

    Shapiro-Wilk normality test

data:  rstudent(model3)
W = 0.9929, p-value = 0.6954

Test for independence: runs test

Code
# Runs test for independence. H0: data is independent
runs(rstudent(model3))
$pvalue
[1] 0.216

$observed.runs
[1] 65

$expected.runs
[1] 72.875

$n1
[1] 69

$n2
[1] 75

$k
[1] 0

2.3 The Sample Autocorrelation Function

Another tool for assessing dependency is a sample ACF.

Sample autocorrelation function, for \(k=1, 2, 3,\ldots\):

\[ r_k=\dfrac{\sum_{t=k+1}^n(Y_t-\bar{Y})(Y_{t-k}-\bar{Y})}{\sum_{t=1}^n(Y_t-\bar{Y})^2} \]

2.3.1 Plot of Sample ACF of residuals for seasonal means model versus \(k\) (correlogram)

Code
acf(rstudent(model3),
    main='ACF of Residuals from Seasonal Means Model')

2.3.2 Sample ACF for residuals of a linear fit to a random walk

Code
plot(y=rstudent(model1),
     x=as.vector(time(rw)),
     ylab='Standardized Residuals',
     xlab='Time',
     type='o',
     main='Residuals from Straight Line Fit to Random Walk')

Residuals versus Fitted Values from Straight Line Fit: larger values of residuals correspond to larger fitted values.

Code
plot(y=rstudent(model1),
     x=fitted(model1),
     ylab='Standardized Residuals',
     xlab='Fitted Trend Line Values',
     type='p')

Sample Autocorrelation of Residuals from the Straight Line fit to the random walk

Code
acf(rstudent(model1),
    main='ACF of Residuals from Straight Line Fit to Random Walk')

What if we try completely different approach to the random walk data. Let’s difference the data instead and plot the sample ACF of the differenced data?

\[ \nabla Y_t = Y_t-Y_{t-1} \]

Code
acf(diff(rw),
    main='ACF of Differenced Random Walk')

2.3.3 Quantile-Quantile Plot of Los Angeles Annual Rainfall Series

Turns out it’s an iid noise:

Code
data("larain")
acf(larain,
    main='ACF of Los Angeles Annual Rainfall Series')

Is it normal? Let’s check the QQ-plot:

Code
qqnorm(larain,
       main='QQ-plot of Los Angeles Annual Rainfall Series') 
qqline(larain)

Q: is it left-skewed or right-skewed?

Code
# Shapiro Wilk test
shapiro.test(larain)

    Shapiro-Wilk normality test

data:  larain
W = 0.94617, p-value = 0.0001614

If we log-transform the data, we can see that it becomes more symmetric:

Code
qqnorm(log(larain),
       main='QQ-plot of log-transformed Los Angeles Annual Rainfall Series') 
qqline(log(larain))

Code
# Shapiro Wilk test
shapiro.test(log(larain))

    Shapiro-Wilk normality test

data:  log(larain)
W = 0.98742, p-value = 0.3643