Time Series Analysis

Exploratory and Spectral Analysis

\(\gamma(i,j):= \text{Cov} (X_i,X_j),\ \ \ \rho(i,j):= \text{Corr} (X_i,X_j) \).
Stationarity \( (X_i) _{i\in \mathbb{Z} } \ \begin{cases} \mu=\mu_i \\ \gamma(i+k,i)=\gamma(0,k)=: \gamma_k \end{cases} \).
Stationary ⇒ \(Var(X_i)=\gamma_0,\ \ \rho_k = \gamma_k/\gamma_0 \).
Cycles: sinusoidal mean model \( \begin{cases} \mu_i = \mu _{i+kf} \\ \beta \sin(2\pi t + \phi) = \beta_1\cos(2\pi t) + \beta_2 \sin(2\pi t) \end{cases} \) (become linear).
Trends : difference series \( \begin{cases} \nabla x_i:= x_i – x _{i-1} \\ \nabla^d x_i : = \nabla ^{d-1} x_i – \nabla ^{d-1} x _{i-1} \\ \nabla_s x_i := x_i – x _{i-s} \\ \nabla^d = (1-B)^d \end{cases} \).
Discrete Fourier transform, DFT \( a_j:= \frac{1}{\sqrt T} \sum\limits_{k=1}^{ T} x_k e ^{-ik\omega_j},\ \ \ j\in F_T \).
Periodogram of vector \( \boldsymbol{x} \in \mathbb{C} ^T,\ \ I(\omega_j):= |a_j|^2 \).

Box-Jenkins Models

Linear process \(X_i = \mu + \sum\limits_{j=-\infty}^{ \infty} c_j W _{i-j} \); causal if \(c_j =0,\ \forall j\lt 0 \).
Moving average \(\text{MA} (q)\ \begin{cases} X_i = \sum\limits_{j=0}^{ q} \theta_j W _{i-j} ,\ \ \theta_0=1\\ \gamma_i = \sigma^2 \sum\limits_{k=0}^{ q} \theta_k \theta _{k+i} ,\ \ 0\leq i\lt q \\ \theta (z): = 1 + \sum\limits_{j=1}^{ q} \theta_j z^j \\ \ ⇒\ X_i = \theta(B)W_i\end{cases} \).

Autoregressive \( \text{AR} (p) \ \ \begin{cases} X_i = W_i + \sum\limits_{j=1}^{ p} \phi_j X _{i-j} \\ \phi(z):= 1- \sum\limits_{j=1}^{ p} \phi_j z^j \\ \ ⇒\ X_i = \phi(B) ^{-1} W_i \end{cases} \).

\(\text{ARMA(p,q)} \ \begin{cases} X_i = \sum\limits_{j=1}^{ p} \phi_j X _{i-j} + \sum\limits_{j=0}^{ q} \theta_j W _{i-j} \\ \phi(B)X_i = \theta(B)W_i \\ \text{causal linear} \begin{cases} X_i = \sum\limits_{j=0}^{ \infty} \xi_j W _{i-j} \\ W_i = \sum\limits_{j=0}^{ \infty} \beta_j X _{i-j} \end{cases} \\ \ ⇒\ \sum\limits_{j=0}^{ \infty} \xi_j z^j = \frac{\theta(z)}{\phi(z)} ,\ \ \sum\limits_{j=0}^{ \infty} \beta_j z^j = \frac{\phi(z)}{\theta(z)} \end{cases} \).

\(\text{ARIMA} (p,d,q) \ ⇔ \ (\nabla^d X_i) \sim \text{ARMA} (p,q) \).

\( \text{SARMA} (P,Q)_s\ \begin{cases} X_i = \sum\limits_{j=1}^{ P} \varphi_j X _{i-js} + \sum\limits_{j=0}^{ Q} \vartheta_jW _{i-js} \\ \varphi(B^s)X_i = \vartheta(B^s)W_i \end{cases} \).

\(\text{SARIMA} (p,d,q)(P,D,Q)_s \ ⇔ \ \varphi(B^s)\phi(B) \nabla _{s}^{D} \nabla^d X_i = \vartheta(B^s)\theta(B)W_i \).

Parameter estimation

Mean and trend: set \(\tilde x_i = x_i – \bar x \).
Be aware in R, arima estimate should use arima(diff(x), order =c(p,d,q) for automatic intercept.

Yule-Walker equations for \( \text{AR} (p)\ \begin{cases} \boldsymbol{\gamma} = \Gamma \phi \\ \gamma_0 = \sigma^2 + \gamma^T \phi \end{cases}\), where \(\begin{cases} \boldsymbol{\gamma} = (\gamma_1,\dots,\gamma_p)^T \\ [\Gamma] _{ij} = \gamma _{|i-j|} \\ \phi = (\phi_1,\dots,\phi_p)^T\end{cases} \).
Yule-Walker estimates first calculate \( \hat\gamma_k \).

For YW estimates \( \hat\phi ^{(n)} :\ \ \sqrt n (\hat\phi ^{(n)} -\phi) \overset{n\to\infty}{\longrightarrow} N(0,\sigma^2 \Gamma ^{-1} )\).

Conditional least squares estimator \( (\boldsymbol{\hat\phi},\hat\mu):= \arg\min \limits_{(\boldsymbol{\phi} ,\mu)}^{} \underbrace{S_c (x_1,\dots,x_n; \boldsymbol{\phi},\mu)}_{:= \sum\limits_{i=p+1}^{ n} \big( x_i -\mu – \sum\limits_{j=1}^{ p} \phi_j (x _{i-j} -\mu) \big)^2 } \).
For large \(n \), YW estimates = CSS estimates = ML estimates

ML estimates \( \boldsymbol{\hat\beta} ^{(n)} = (\boldsymbol{\hat\phi} ^{(n)} , \boldsymbol{\hat\theta} ^{(n)} ) \text{ for ARMA} (p,q)\ \begin{cases} \sqrt n (\boldsymbol{\hat\beta} ^{(n)} = \boldsymbol{\beta} ) \overset{n\to \infty}{\longrightarrow} N(0,V( \boldsymbol{\beta} )) \\ \text{where } \begin{cases} V( \boldsymbol{\beta} ) =\sigma^2 \begin{bmatrix} \mathbb{E} \boldsymbol{UU} ^T & \mathbb{E} \boldsymbol{UV} ^T \\ \mathbb{E} \boldsymbol{VU} ^T & \mathbb{E} \boldsymbol{VV} ^T \end{bmatrix} \\ \boldsymbol{U} = (U_p,\dots,U_1) :\ \phi(B)U_i = W_i \\\boldsymbol{V} = (V_q,\dots,U_1) :\ \theta(B)V_i = W_i\end{cases} \\ \text{Hessian } \frac{1}{n} V(\boldsymbol{\beta} )\approx \bigg( \bigg[ – \frac{\partial^2}{\partial\beta_i\partial\beta_j} l( \boldsymbol{\hat\beta} ) \bigg] _{i,j} \bigg) ^{-1} \end{cases} \).

Model Selection Tools

Non-stationarity: removal of a trend, handle with external regressor.

For ANY stationary process \(X_i = \mu + \sum\limits_{j=-\infty}^{ \infty} c_j W_j \), the sample autocorrelations:
\((\hat\rho_1,\dots,\hat\rho_h) \sim N \bigg( (\rho_1,\dots,\rho_h),\ W_h/n \bigg) \\ \ \ \ [W_h] _{ij} = \sum\limits_{k=1}^{ \infty} (\rho _{k+i} -\rho _{k-i} -2 \rho_i\rho_k) (\rho _{k+j} -\rho _{k-j} -2 \rho_j\rho_k) \).

For \(\text{MA} (q) \), the sample autocorrelations: \(\sqrt n(\hat\rho_i – \rho_i) \to N \bigg(0, 1+2 \sum\limits_{j=1}^{ q} \rho_j^2\bigg) \).

Partial autocorrelation function (PACF) \( (\alpha_k) _{k\gt0}\ \begin{cases} \alpha_0 = \rho_0 =1 \\ \alpha_1 = \alpha_1 \\ \alpha_k = \text{Corr}(X _{k+1} – \hat X _{k+1 | 2:k} ,\ X_1 – \hat X _{1|2:k} ) \\ \ \text{ Best linerar estimator of } X \text{ given } X_2, \dots, X_k \\ \text{AR} (p) \ ⇒\ \begin{cases} \alpha_k =0 \\ \sqrt n \alpha_k ^{(n)} \overset{n\to\infty}{\longrightarrow} N(0,1)\end{cases} \\ \text{MA} (1) \ ⇒\ \alpha_k = – \frac{(-\theta_1)^k (1-\theta_1^2)}{1-\theta_1^{2(k+1)}} \end{cases} \).

Infomation criteria \(\begin{cases} \text{AIC} = -2\ln l _{ML} +2k \\ \text{AICc} = -2\ln l _{ML} + \frac{2kn}{n-k-1} \\ \text{BIC} = -2\ln l _{ML} +k \ln n \\ \ \ \ k=p+q+2,\ n=\# \text{obs} \end{cases} \).

Model diagnostics- residual tests \(\begin{cases} \text{Residuals} & e_i = x_i – \mathbb{E} [X_i | x_1,\dots, x _{i-1} ]\\ \text{Box-Pierce test} & Q= n \sum\limits_{j=1}^{ K} r_j^2,\ \ \ p+q\lt K \ll n \\ \text{Ljung-Box test} & \tilde Q = n \sum\limits_{j=1}^{ K} \frac{n+2}{n-j} \ r_j^2\sim \large\chi\normalsize_{ K-p-q}^2 \end{cases} \).

Spectrum of a stationary process

Spectral density \(\begin{cases} f(\lambda) = \frac{1}{2\pi} \sum\limits_{k=-\infty}^{ \infty} \gamma_k e ^{-ik\lambda} ,\ \ \ \lambda\in (-\pi,\pi] \\ \text{properties } \begin{cases} f(\lambda)=f(-\lambda) \\ f(\lambda)\geq 0,\ \forall\lambda\in \mathbb{R} \\ \gamma_k = \displaystyle\int_{ -\pi } ^\pi e ^{ik\lambda} f(\lambda)d\lambda = \displaystyle\int_{ -\pi } ^\pi \cos(k\lambda) f(\lambda)d\lambda \\ *\ f(\lambda + 2\pi m ) = f(\lambda) \text{ (complex exponential)} \end{cases} \\ \tilde I_T (\lambda) = \frac{1}{T} \bigg| \sum\limits_{k=1}^{ T} X_k e ^{-ik\lambda} \bigg| ^2,\ \ \lambda\in (-\pi,\pi] \\ \text{ARMA} (p,q),\ \phi(B)X_t= \theta(B)W_t \ ⇒\ f_X(\lambda) = \frac{\sigma^2}{2\pi} \frac{|\theta(e ^{-i\lambda})|^2}{|\phi(e ^{-i\lambda})|^2} \\ (Y_t) \text{ zero-mean, } X_t = \sum\limits_{j=-\infty}^{ \infty} \psi_j Y_{t-j} \ ⇒\ f_X(\lambda) = \bigg| \sum\limits_{j=-\infty}^{ \infty} \psi_j e ^{-ij\lambda} \bigg|^2 f_Y(\lambda) \end{cases} \).

State-space models

General time series models \( \begin{cases} \text{heteroschedasticity: vairance changes over time} \\ \text{irregularly sampled data} \\ \text{missing data} \end{cases} \).
Linear Gaussian State-Space \(\begin{cases} \begin{cases} X_k = F_k X _{k-1} + W_k & \text{system equation} \\ Y_k = H_k X_k + U_k & \text{observation equation} \end{cases} \\ \text{where, } \begin{cases} W_k \overset{iid}{\sim} N(0,Q_k) & Q_k \in \mathbb{R} ^{d\times d} \\ U_k \overset{iid}{\sim} N(0,R_k) & R_k \in \mathbb{R} ^{o\times o} \\ F_k \in \mathbb{R} ^{d\times d} ,\ H_k \in \mathbb{R} ^{o\times d} \end{cases} \\ \text{properties } \begin{cases} (X_k ) _{k\geq 1} \text{ Markov chain. “laten state”} \\ Y_k | X_k \sim N(H_k X_k, R_k) \text{ independent conditional} \end{cases} \end{cases} \).

Brief of R in Time Series

autocorrelation function acf(x); plot useful.
decomposition stl(x, s.window=’periodic’); plot useful
Spectrogram spectrum(x, taper=0, detrend=F); with or w/o tapering
Simulation arima.sim(…)
ARIMA fit m3 <- arima(x, order=c(0,1,1), xreg=1:140) . Notice xreg for intercept

Model selection using library tseries
non-stationarity: adf.test and pp.test
autocorrelation: acf(x); acf(x, ci.type=”ma”)
information criteria: aic(x, k= 2*n/(n-k-1) | log(n) )
Ljung-box statistics for residual test: tsdiag, this is also used for analyzing models

Example with prediction
h <- 48; t <- time(b) f1 <- arima(b, xreg=t, order=c(1,0,0), seasonal=list(order=c(1,0,0), season=12)) p1 <- predict(f1, h, newxreg=t[n]+(1:h)/12) m1 <- p1$pred; s1 <- p1$se ts.plot(b, m1, m1+1.96*s1, m1-1.96*s1, col=c(1,2,2,2), lty=c(1,1,2,2))

Useful texts with R applications: https://www.otexts.org/fpp/8

1. use acf to verify stationarity; if not try difference series. The plot should look like white noise.
2. use acf, pacf (or tsdisplay for both) to guess the suitable model. Criteria below:

The data may follow an ARIMA(p,d,0p,d,0) model if the ACF and PACF plots of the differenced data show the following patterns:
the ACF is exponentially decaying or sinusoidal;
there is a significant spike at lag pp in PACF, but none beyond lag pp.

The data may follow an ARIMA(0,d,q) model if the ACF and PACF plots of the differenced data show the following patterns:
the PACF is exponentially decaying or sinusoidal;
there is a significant spike at lag qq in ACF, but none beyond lag qq.

3. information criteria, usually AICc

The following picture is from R. Hydnman, G. Athana­sopou­los, Forecasting: principles and practice
![enter image description here](https://www.otexts.org/sites/default/files/resize/fpp/images/Figure-8-10-570×752.png)

You must be logged in to post a comment.