The ARMA Process - Stationary, Causal, and Invertible

Last edited: 2024-10-30 19:25:16

Thumbnail

The ARMA process, short for autoregressive moving average, is one the most basic time series models but it is an important building block in forecasting. Let us look at its definition and some of its properties such as stationarity, causality, and invertibility but also how to choose which order your ARMA model should have.

ARMA Process Definition

The ARMA(p,qp,q) process XtX_t has the following defintion:

Xti=1qϕjXti=Zt+i=1qθjZti, X_t - \sum_{i=1}^q \phi_j X_{t-i} = Z_t + \sum_{i=1}^q \theta_j Z_{t-i},

where ZtWN(0,σ2)Z_t \sim \text{WN}(0,\sigma^2) (white noise with mean 0 and variance σ2\sigma^2) and 1i=1qϕizi1 -\sum_{i=1}^q \phi_i z^i and 1+i=1qθizi1 + \sum_{i=1}^q \theta_i z^i have no common zeros. You might also define the ARMA model in the following short form:

ϕ(B)Xt=θ(B)Zt, \phi(B) X_t = \theta(B) Z_t,

where BB is the lag operator defined as BXt=Xt1B X_t = X_{t-1}. The two functions are defined as:

ϕ(z)=i=1qϕizi \phi(z) = \sum_{i=1}^q \phi_i z^i

and

θ(z)=i=1qθizi \theta (z) = \sum_{i=1}^q \theta_i z^i

When is an ARMA Process Stationary?

An ARMA process XtX_t has a unique (weakly) stationary if and only if ϕ(z)0\phi(z) \neq 0 for all zCz \in \mathbb{C} when z=1|z| = 1

When is an ARMA Process Causal?

An ARMA process XtX_t is causal if there exist a real value sequence (εi,iN0)(\varepsilon_i, i \in \mathbb{N}_0) such that for all tZt \in \mathbb{Z}:

i=0εi< \sum_{i=0}^\infty |\varepsilon_i| < \infty

and

Xt=i=0εiZti. X_t = \sum_{i=0}^\infty \varepsilon_i Z_{t-i}.

In other words, XtX_t can be described as an MA()(\infty) process. Equivalently an ARMA process XtX_t is causal if and only if ϕ(z)0\phi(z) \neq 0 for all zCz \in \mathbb{C} when z1|z| \leq 1.

When is an ARMA Process Invertible?

An ARMA process XtX_t is invertible if there exist a real value sequence (πi,iN0)(\pi_i, i \in \mathbb{N}_0) such that for all tZt \in \mathbb{Z}:

i=0πi< \sum_{i=0}^\infty |\pi_i| < \infty

and

Zt=i=0πiXti. Z_t = \sum_{i=0}^\infty \pi_i X_{t-i}.

In other words, XtX_t can be described as an AR()(\infty) process. Equivalently an ARMA process XtX_t is invertible if and only if θ(z)0\theta(z) \neq 0 for all zCz \in \mathbb{C} when z1|z| \leq 1.

Choosing The Order of The ARMA Process

It’s always possible to fit an ARMA(p,q)(p,q) model with excessively large pp and qq values, but this isn’t advantageous for forecasting. While this approach often yields a small estimated white noise variance, the mean squared error of forecasts is also affected by errors in parameter estimation. To address this, we introduce a "penalty factor" to discourage the selection of overly complex models.

AICC Criterion

We start by introducing the AICC criterion, where AIC stands for Akaike’s Information Criterion and the last C denotes biased-corrected. The AICC estimates the Kullback-Leibler divergence, which measures how different one probability distribution is from another, comparing the estimated distribution to the actual data distribution. It assumes that ZIIDN(0,σ2)Z \sim \text{IID} \mathcal{N} (0, \sigma^2), but it shows robustness to moderate deviations from normality, such as when ZtZ_t follows a tt-distribution.

The AICC criterion says that you have to choose pp, qq, ϕp\phi_p and θq\theta_q to minimize the following:

2lnL(ϕp,θq,S(ϕp,θq)/n)+2np+q+1npq2, -2 \ln{L(\phi_p, \theta_q, S(\phi_p,\theta_q)/n) + 2n \frac{p+q+1}{n-p-q-2}},

where ϕp=(ϕ1,...,ϕp)\phi_p = (\phi_1, ... , \phi_p) and θp=(θ1,...,θp)\theta_p = (\theta_1, ... , \theta_p). To apply this criterion in practice, we fit a wide range of models with varying orders (p, q) to the data and select the one that minimizes the negative log-likelihood, adjusted by the penalty factor 2np+q+1npq22n \frac{p+q+1}{n-p-q-2}

BIC Criterion

One issue with the AICC criterion is that the estimators for p and q aren’t consistent; they don’t converge almost surely to the true values. In contrast, consistent estimators can be derived using the Bayesian Information Criterion (BIC), which also penalizes the selection of large p and q values, helping to prevent overfitting.

The BIC criterion says that you have to choose pp and qq to minimize the following:

(npq)lnnσ^2npq+n(1+ln2π) (n-p-q) \ln{\frac{n\hat{\sigma}^2}{n-p-q}}+n (1+\ln{\sqrt{2 \pi}}) +(p+q)lnt=1nXt2nσ^2pq,+ (p+q) \ln{\frac{\sum_{t=1}^n X_t^2 - n \hat{\sigma}^2}{p-q}},

where σ^2\hat{\sigma}^2 is the maximum likelihood estimate of the white noise variance. A downside of the BIC is its efficiency in finding minimizers. While selecting the model that minimizes the AICC is asymptotically efficient for causal and invertible ARMA processes, this isn’t the case for the BIC. In this context, efficiency means that minimizing the AICC will, in the long run, lead to a model with the lowest one-step-ahead prediction errors.

When using constrained maximum likelihood estimation—where certain coefficients are assumed to be zero during the estimation process—the term p+q+1p+q+1 is replaced by mm, which represents the number of non-zero coefficients, in both the AICC and BIC.

Was the post useful? Feel free to donate!

DONATE