25 Spring 439/639 TSA: Lecture 12

Author

Dr Sergey Kushnarev

1 Partial Autocorrelation Function (PACF)

1.1 Definition 1

Last time, we introduced one definition of the partial autocorrelation function (partial ACF, PACF) \(\phi_{kk}\).

Definition 1: \[ \phi_{kk} = \operatorname{corr} \left( Y_t,\ Y_{t-k} \ \middle\vert \ Y_{t-1},\, Y_{t-2},\, \dots,\, Y_{t-k+1} \right). \] From this definition, it’s easy to show that: For AR(\(1\)) process, \(Y_t = \phi Y_{t-1}+ e_t\), \(\phi_{11} = \rho_1\), and \(\phi_{kk}=0\) for any lag \(k\ge 2\).

1.2 Definition 2

Definition 2: \[ \phi_{kk} = \mathrm{corr}(\operatorname{Res}_t, \operatorname{Res}_{t-k}), \] where \(\operatorname{Res}_t\) = residual from (linear) regressing \(Y_t\) on \(Y_{t-1}, \dots, Y_{t-k+1}\), and \(\operatorname{Res}_{t-k}\) = residual from (linear) regressing \(Y_{t-k}\) on \(Y_{t-1}, \dots, Y_{t-k+1}\). So \(\operatorname{Res}_t\) and \(\operatorname{Res}_{t-k}\) are the unexplained variation in \(Y_t\), \(Y_{t-k}\) after “partialling out” the effect of \(Y_{t-1}, \dots, Y_{t-k+1}\) (i.e., “controlling for” \(Y_{t-1}, \dots, Y_{t-k+1}\)).

Let’s look at the AR(\(1\)) example \(Y_t = \phi Y_{t-1}+ e_t\) again. To find \(\phi_{22}\), we need to regress \(Y_t\) on \(Y_{t-1}\), and regress \(Y_{t-2}\) on \(Y_{t-1}\).

So we need to find \(a, b\) by minimizing \[ \min_{a} \mathbb{E}\left( Y_t - a Y_{t-1} \right)^2,\quad \min_{b} \mathbb{E}\left( Y_{t-2} - b Y_{t-1} \right)^2, \] then \(\operatorname{Res}_t = Y_t - \widehat{Y}_t\) and \(\operatorname{Res}_{t-k}= Y_{t-2} -\widehat{Y}_{t-2}\), where \(\widehat{Y}_t = a Y_{t-1}\), \(\widehat{Y}_{t-2} = b Y_{t-1}\).

To solve \(\min_{a} \mathbb{E}\left( Y_t - a Y_{t-1} \right)^2\), we take the derivative of the objective function: \[ \frac{\partial}{\partial a} \, \mathbb{E}\left[ \left( Y_t - a Y_{t-1} \right)^2 \right] = -2 \mathbb{E}\left[ \left( Y_t - a Y_{t-1} \right) Y_{t-1} \right] \] Set the derivative equal to \(0\) gives \[ \mathbb{E}\left[ \left( Y_t - a Y_{t-1} \right) Y_{t-1} \right] = 0 \implies \gamma_1 - a \gamma_0 = 0 \implies a = \frac{\gamma_1}{\gamma_0} = \rho_1 = \phi. \] To solve \(\min_{b} \mathbb{E}\left( Y_{t-2} - b Y_{t-1} \right)^2\), we take the derivative of the objective function: \[ \frac{\partial}{\partial b} \mathbb{E}\left( Y_{t-2} - b Y_{t-1} \right)^2 = -2 \mathbb{E}\left[ \left( Y_{t-2} - b Y_{t-1} \right) Y_{t-1} \right] \] Set the derivative equal to \(0\) gives \[ \mathbb{E}\left[ \left( Y_{t-2} - b Y_{t-1} \right) Y_{t-1} \right] = 0 \implies \gamma_1 - b \gamma_0 = 0 \implies b = \frac{\gamma_1}{\gamma_0} = \rho_1 = \phi. \] Then we get \[ \phi_{22} = \mathrm{corr}(\operatorname{Res}_t, \operatorname{Res}_{t-2}) = \mathrm{corr}(Y_t - \phi Y_{t-1}, Y_{t-2} -\phi Y_{t-1}) = \mathrm{corr}(e_t, Y_{t-2} -\phi Y_{t-1}) =0. \] In general, for AR(\(p\)), the PACF at lags \(1\) through \(p\) can be nonzero, and PACF at lags \(k\ge p+1\) are all zero. And we also have the following results.

	MA(\(q\))	AR(\(p\))	ARMA(\(p,q\))
ACF	cuts off after \(q\)	exponential decay	exponential decay
PACF	exponential decay	cuts off after \(p\)	exponential decay

1.3 Definition 3

We also have an alternative, “computational” definition of \(\phi_{kk}\). The idea is to fit an AR(\(k\)) model to the stationary time series \((Y_t)\) of our interest, i.e, fit the linear regression to \((Y_t)\): \[ Y_t = \phi_{k1} Y_{t-1} + \phi_{k2} Y_{t-2} + \cdots + \phi_{kk} Y_{t-k} + \epsilon, \] or in other words, find \(\phi_{k1},\phi_{k2},\dots,\phi_{kk}\) to minimize \[ \min_{\phi_{k1},\phi_{k2},\dots,\phi_{kk}} \mathbb{E}\left( Y_t - \phi_{k1} Y_{t-1} - \phi_{k2} Y_{t-2} - \cdots - \phi_{kk} Y_{t-k} \right)^2 . \]

Definition 3 (We claim without proof that:) For stationary \((Y_t)\), its PACF at lag \(k\), i.e. \(\phi_{kk}\), is same as the fitted value of \(\phi_{kk}\) in the \((\phi_{k1},\phi_{k2},\dots,\phi_{kk})\) from the regression (namely, AR(\(k\)) fitting) above.

In other words, \(\phi_{kk}\) is the last \(\phi_{kj}\) term in the AR(\(k\)) approximation to \((Y_t)\).

From this definition, we can show the following fact: the solution \((\phi_{k1},\phi_{k2},\dots,\phi_{kk})\) to the regression above must satisfy \[ \Gamma_k \ \vec\phi_k = \vec\gamma_k, \] \[ \text{where}\quad \Gamma_k = \begin{bmatrix} \gamma_0 & \gamma_1 & \cdots & \gamma_{k-1} \\ \gamma_1 & \gamma_0 & \cdots & \gamma_{k-2} \\ \vdots & \vdots & & \vdots \\ \gamma_{k-1} & \gamma_{k-2} & \cdots & \gamma_0 \end{bmatrix},\quad \vec\phi_k = \begin{bmatrix} \phi_{k1} \\ \phi_{k2} \\ \vdots \\ \phi_{kk} \end{bmatrix},\quad \vec\gamma_k = \begin{bmatrix} \gamma_1 \\ \gamma_2 \\ \vdots \\ \gamma_k \end{bmatrix}. \] Note: the derivation for \(\Gamma_k \ \vec\phi_k = \vec\gamma_k\) is very similar to the AR(\(1\)) example after Definition 2.

So now we know the PACF \(\phi_{kk}\) is the last entry of \(\Gamma_k^{-1}\ \vec\gamma_k\). Inverting a \(k\times k\) matrix \(\Gamma_k\) can be computationally expensive. Instead, we can use Durbin–Levinson Recursion (we will get to it soon) to directly calculate the entries of \(\Gamma_k^{-1}\ \vec\gamma_k\) without computing matrix inverse.

From this Definition 3 and the fact above, we can also intuitively see the PACF behavior for AR(\(p\)) that we mentioned earlier. Suppose \((Y_n)\) is an AR(\(p\)), \(Y_t = \phi_{1} Y_{t-1} + \phi_{2} Y_{t-2} + \cdots + \phi_{p} Y_{t-p} + e_t\) where \(\phi_1,...,\phi_p\) are the true parameters of the AR(\(p\)). To get PACF \(\phi_{kk}\), we need to fit an AR(\(k\))-like linear regression for \(Y_t\). If \(k\ge p\), then intuitively, \[ \begin{split} \text{fitting a regression model}\quad & Y_t = \phi_{k1} Y_{t-1} + \phi_{k2} Y_{t-2} + \cdots + \phi_{kk} Y_{t-k} + \epsilon, \\ \text{for an }\operatorname{AR}(p)\quad & Y_t = \phi_{1} Y_{t-1} + \phi_{2} Y_{t-2} + \cdots + \phi_{p} Y_{t-p} + e_t, \end{split} \] should give us the fitted value \((\phi_{k1},\phi_{k2},\dots,\phi_{kk}) = (\phi_1,...,\phi_p,0,...,0)\). So we have the following results for AR(\(p\)).

If \(k=p\), then \(\phi_{pp} = \phi_{p}\).
If \(k\ge p+1\), i.e. \(k>p\), then \(\phi_{kk} = 0\).

Remark: For AR(\(p\)) and \(k\ge p\), the statement \((\phi_{k1},\phi_{k2},\dots,\phi_{kk}) = (\phi_1,...,\phi_p,0,...,0)\) can be verified by showing that it is indeed a solution to \(\Gamma_k \ \vec\phi_k = \vec\gamma_k\). The derivation reduces to YW equations for AR(\(p\)).

2 Durbin–Levinson Recursion

As we mentioned earlier, although the PACF \(\phi_{kk}\) can be characterized as the last entry of \(\Gamma_k^{-1}\ \vec\gamma_k\), it can be expensive to compute \(\Gamma_k^{-1}\). We can use Durbin–Levinson Recursion to directly calculate the entries of \(\Gamma_k^{-1}\ \vec\gamma_k\) without computing matrix inverse. (Remark: the basic idea is to utilize the special structure of the matrix \(\Gamma_k\) and the vector \(\vec\gamma_k\) to compute \(\Gamma_k^{-1}\ \vec\gamma_k\).)

Durbin–Levinson Recursion (DLR): Define \(\phi_{00} = 1\). For \(l \geq 0\), define \[ \begin{cases} \phi_{l+1,\ l+1} &=\ \frac{ \gamma_{l+1} - \sum_{j=1}^{l} \phi_{l,\, j}\, \gamma_{l+1-j} } { \gamma_0 - \sum_{j=1}^{l} \phi_{l,\, j}\, \gamma_j } = \frac{ \rho_{l+1} - \sum_{j=1}^{l} \phi_{l,\, j}\, \rho_{l+1-j} } { 1 - \sum_{j=1}^{l} \phi_{l,\, j}\, \rho_j }\\ \phi_{l+1,\, j} &=\ \phi_{l,\, j} - \phi_{l+1,\, l+1}\, \phi_{l,\, l+1-j} \qquad \text{for } 1 \le j \le l \end{cases} \] Note: For the first formula, we can use it either in terms of \(\gamma_k\) or \(\rho_k\).

DLR is useful because we do not need to compute the inverse of a certain (large) matrix.

Example: Find PACF for an AR(\(1\)) via DLR. Consider the AR(\(1\)) process \(Y_t = c Y_{t-1} + \epsilon_t\) with \(|c| < 1\), so we know \(\rho_k = c^k\). To do DLR, we first let \(l=1\) to compute \(\phi_{11}\): \[ \phi_{11} = \frac{\rho_1 - 0}{1 - 0} = \rho_1 = c. \] Then let \(l=1\) in DLR to compute \(\phi_{22},\phi_{21}\): \[ \phi_{22} = \frac{\rho_2 - \phi_{11} \rho_1}{1 - \phi_{11} \rho_1} = \frac{\rho_2 - \rho_1^2}{1 - \rho_1^2} = \frac{c^2 - c^2}{1 - c^2} = 0, \] \[ \phi_{21} = \phi_{11} - \phi_{22} \phi_{11} = \rho_1 - 0 \cdot \rho_1 = \rho_1 = c. \] Letting \(l=1\) in DLR gives \(\phi_{33},\phi_{32},\phi_{31}\). For example, \[ \phi_{33} = \frac{\rho_3 - \left( \phi_{21} \rho_2 + \phi_{22} \rho_1 \right)} {1 - \left( \phi_{21} \rho_1 + \phi_{22} \rho_2 \right)} = \frac{c^3 - \left(c \cdot c^2 + 0 \cdot c \right)}{1 - \left( c \cdot c + 0 \cdot c^2 \right)} = 0. \] Exercise: (1) Find \(\phi_{44}\). (2) Use induction to prove that \(\phi_{kk}=0\) for all \(k\ge 2\), and \(\phi_{k1}=c\) for all \(k\ge 1\).

3 Sample PACF

Method 1

(Similar to the idea of sample ACF) We can get sample PACF \(\widehat{\phi}_{kk}\) by using the sample ACF \(\widehat{\rho}_k = r_k\) in place of the theoretical ACF \(\rho_k\) in the DLR formula.

Method 2

Alternatively, we can also modify the equation \(\Gamma_k \ \vec\phi_k = \vec\gamma_k\) into an empirical version \(\widehat{R}_k \ \widehat{\vec\phi}_k = \widehat{\vec\rho}_k\) to get the solution \(\widehat{\vec\phi}_k = \widehat{R}_k^{-1}\ \widehat{\vec\rho}_k\), and take its last entry to get \(\widehat{\phi}_{kk}\). To be specific, recall that the theoretical PACF satisfies \[ \Gamma_k \ \vec\phi_k = \vec\gamma_k,\quad \text{where}\quad \Gamma_k = \begin{bmatrix} \gamma_0 & \gamma_1 & \cdots & \gamma_{k-1} \\ \gamma_1 & \gamma_0 & \cdots & \gamma_{k-2} \\ \vdots & \vdots & & \vdots \\ \gamma_{k-1} & \gamma_{k-2} & \cdots & \gamma_0 \end{bmatrix},\quad \vec\phi_k = \begin{bmatrix} \phi_{k1} \\ \phi_{k2} \\ \vdots \\ \phi_{kk} \end{bmatrix},\quad \vec\gamma_k = \begin{bmatrix} \gamma_1 \\ \gamma_2 \\ \vdots \\ \gamma_k \end{bmatrix}. \] Dividing both sides by \(\gamma_0\) transforms the ACVF into ACF: \[ \begin{bmatrix} \rho_0 & \rho_1 & \cdots & \rho_{k-1} \\ \rho_1 & \rho_0 & \cdots & \rho_{k-2} \\ \vdots & \vdots & & \vdots \\ \rho_{k-1} & \rho_{k-2} & \cdots & \rho_0 \end{bmatrix} \begin{bmatrix} \phi_{k1} \\ \phi_{k2} \\ \vdots \\ \phi_{kk} \end{bmatrix} = \begin{bmatrix} \rho_1 \\ \rho_2 \\ \vdots \\ \rho_k \end{bmatrix}. \] Then we use the sample ACF \(\widehat{\rho}_k = r_k\) in place of the theoretical ACF \(\rho_k\) to get the following equation. Solving this equation gives the sample PACF. \[ \widehat{R}_k \ \widehat{\vec\phi}_k = \widehat{\vec\rho}_k,\quad \text{where}\quad \widehat{R}_k = \begin{bmatrix} r_0 & r_1 & \cdots & r_{k-1} \\ r_1 & r_0 & \cdots & r_{k-2} \\ \vdots & \vdots & & \vdots \\ r_{k-1} & r_{k-2} & \cdots & r_0 \end{bmatrix},\quad \widehat{\vec\phi}_k = \begin{bmatrix} \widehat{\phi}_{k1} \\ \widehat{\phi}_{k2} \\ \vdots \\ \widehat{\phi}_{kk} \end{bmatrix},\quad \widehat{\vec\rho}_k = \begin{bmatrix} r_1 \\ r_2 \\ \vdots \\ r_k \end{bmatrix}. \] Remark: We just mentioned that \(\widehat{\vec\phi}_k = \widehat{R}_k^{-1}\ \widehat{\vec\rho}_k\). The matrix \(\widehat{R}_k\) defined above is indeed invertible, which is guaranteed by the particular way (and details) we used to construct the sample ACF \(r_k\) in lecture 9. In fact, we remarked in lecture 9 that our construction of \(r_k\) makes the “sample ACF matrix” invertible, and this matrix is nothing but the \(\widehat{R}_k\) we have just seen.

The theoretical PACF \(\phi_{kk}\) of a given time series model is a non-random number (but it can be unknown). The sample PACF \(\widehat{\phi}_{kk}\) is a random variable determined by the observed data (and the observations are random). So the sample PACF \(\widehat{\phi}_{kk}\) follows a certain sampling distribution that depends on the underlying time series model.

For an AR(\(p\)) process, the sampling distribution of \(\widehat{\phi}_{kk}\) is \[ \widehat{\phi}_{kk} \sim \mathcal{N}\left(0, \frac{1}{n}\right), \quad \text{for all }k \geq p + 1. \] So if we want to test whether a time series is an AR(\(p\)), we can use the following rejection region \[ \left| \widehat{\phi}_{kk} \right| > \frac{2}{\sqrt{n}} \] for sample PACFs with lag \(k \geq p + 1\).