Last edited: 2024-10-30 19:25:16
The ARMA process, short for autoregressive moving average, is one the most basic time series models but it is an important building block in forecasting. Let us look at its definition and some of its properties such as stationarity, causality, and invertibility but also how to choose which order your ARMA model should have.
The ARMA() process has the following defintion:
where (white noise with mean 0 and variance ) and and have no common zeros. You might also define the ARMA model in the following short form:
where is the lag operator defined as . The two functions are defined as:
and
An ARMA process has a unique (weakly) stationary if and only if for all when
An ARMA process is causal if there exist a real value sequence such that for all :
and
In other words, can be described as an MA process. Equivalently an ARMA process is causal if and only if for all when .
An ARMA process is invertible if there exist a real value sequence such that for all :
and
In other words, can be described as an AR process. Equivalently an ARMA process is invertible if and only if for all when .
It’s always possible to fit an ARMA model with excessively large and values, but this isn’t advantageous for forecasting. While this approach often yields a small estimated white noise variance, the mean squared error of forecasts is also affected by errors in parameter estimation. To address this, we introduce a "penalty factor" to discourage the selection of overly complex models.
We start by introducing the AICC criterion, where AIC stands for Akaike’s Information Criterion and the last C denotes biased-corrected. The AICC estimates the Kullback-Leibler divergence, which measures how different one probability distribution is from another, comparing the estimated distribution to the actual data distribution. It assumes that , but it shows robustness to moderate deviations from normality, such as when follows a -distribution.
The AICC criterion says that you have to choose , , and to minimize the following:
where and . To apply this criterion in practice, we fit a wide range of models with varying orders (p, q) to the data and select the one that minimizes the negative log-likelihood, adjusted by the penalty factor
One issue with the AICC criterion is that the estimators for p and q aren’t consistent; they don’t converge almost surely to the true values. In contrast, consistent estimators can be derived using the Bayesian Information Criterion (BIC), which also penalizes the selection of large p and q values, helping to prevent overfitting.
The BIC criterion says that you have to choose and to minimize the following:
where is the maximum likelihood estimate of the white noise variance. A downside of the BIC is its efficiency in finding minimizers. While selecting the model that minimizes the AICC is asymptotically efficient for causal and invertible ARMA processes, this isn’t the case for the BIC. In this context, efficiency means that minimizing the AICC will, in the long run, lead to a model with the lowest one-step-ahead prediction errors.
When using constrained maximum likelihood estimation—where certain coefficients are assumed to be zero during the estimation process—the term is replaced by , which represents the number of non-zero coefficients, in both the AICC and BIC.
Was the post useful? Feel free to donate!
DONATE