Autoregressive Moving Average models

The generalised form of an AR model of order $p$ is given by Box and Jenkins

$$x_{t}=c+\rho_{1} y_{t-1}+\rho_{2} y_{t-2}+\cdots+\rho_{p} y_{t-p}+\epsilon_{t} \quad (1)$$

where $c$ is a constant, $\rho_{1}, …, \rho_{p}$ are the parameters (AR coefficients) of the model, $y_{t−1}, . . . , y_{t−p}$ are the time-lagged values of the series $y_{t}$ , and $\epsilon_{t}$ is the error term at time $t$ with mean zero and constant variance $\sigma^{2}_{\epsilon}$ . The notation $p$ in $AR_p$ indicates the order of the autoregressive process.

Moving Average $(MA)$ regresses against the past errors of the series, it is a type of stochastic process. Its generalised form of order $q$ is given by:

$$\begin{array}{l} x_{t}=c+\epsilon_{t}+\theta_{1} \epsilon_{t-1}+\theta_{2} \epsilon_{t-2}+\ldots+\theta_{q} \epsilon_{t-q} \quad (2) \end{array}$$

where $\theta_{1}, \ldots, \theta_{q}$ are the parameters (MA coefficients) of the model, and $\epsilon_{t-1}, \ldots, \epsilon_{t-q}$ are the time-lagged values of the error. Similar to the AR model, the term order $q$ refers to the highest order power in the polynomial.

Let $L$ operate on $y_{t}$, and $\epsilon_{t}$ as the **Lag operator** $$\begin{array}{l} L^{k} x_{t}=x_{t-k}, \quad \forall k \in \mathbb{Z} \end{array}$$

The Differencing Operator $\nabla$ is defined as:

$$ \begin{array}{cc} \nabla x_{t} &= x_{t}-x_{t-1}\\
&=(1-L) x_{t} \end{array} $$

With the lagging notation, expressions $(1)$ and $(2)$ become:

$$ \begin{array}{cc} x_{t}=c+(\rho_{1} L+\rho_{2} L^{2}+\ldots+\rho_{p} L^{p}) y_{t}+\epsilon_{t} \quad (3)\\[0.2cm] x_{t}=c+(1+\theta_{1} L+\theta_{2} L^{2}+\ldots+\theta_{q} L^{q}) \epsilon_{t} \quad (4) \end{array} $$

Setting the $AR$ polynomial of $L$ of order $p$ as:

$$P_{p}(L)=1-\rho_{1} L-\rho_{2} L^{2}-\ldots-\rho_{p} L^{p} \quad (5)$$

and the $MA$ polynomial of $L$ of order $q$ as:

$$\Theta_{q}(L)=1+\theta_{1} L+\theta_{2} L^{2}+\ldots+\theta_{q} L^{q} \quad (6)$$

then, from expressions $(5)$ and $(6)$, expressions $(3)$ and $(4)$ become:

$$\begin{array}{l} P_{p}(L) y_{t}=c+\epsilon_{t}\\[0.2cm] x_{t}=c+\Theta_{q}(L) \epsilon_{t} \end{array}$$

When these two models are coupled together they produce an $ARMA$ model with an order $(p, q)$ written as:

$$\begin{array}{l} x_{t}=c+\rho_{1} x_{t-1}+\rho_{2} x_{t-2}+\ldots+\rho_{p} x_{t-p}+\theta_{1} \epsilon_{t-1}+\theta_{2} \epsilon_{t-2}+\ldots+\theta_{q} \epsilon_{t-q}+\epsilon_{t} \quad (7) \end{array}$$

With the lagging and polynomial notations, expression $(7)$ becomes:

$$\begin{array}{l} \Phi_{p}(L) y_{t}=\Theta_{q}(L) \epsilon_{t} \end{array}$$

$$x_{t}=\rho_{1} x_{t-1}+\theta_{1} \epsilon_{t-1}+\epsilon_{t}$$


$p = 0 => MA(q)$

$q = 0 => AR(p)$

Any causal, invertible linear process has:

  • MA($\infty$) representation (from causality)
  • AR($\infty$) representation (from invertibility).

Real data cannot be exactly modelled using a finite number of parameters. We choose $p, q$ to give a simple but accurate model.

Parameter Estimation


  1. The model order ($p$ and $q$) is known
  2. The data has zero mean

Assume that ${X_t}$ is Gaussian, that is, $\Phi_{p}(L) X_{t}=\Theta_{q}(L) \epsilon_{t}$, where $\epsilon_t$ is $i.i.d.$ Gaussian.

Choose $\phi_{i}, \theta_{j}$ to maximize the likelihood: $L\left(\phi, \theta, \sigma^{2}\right)=f_{\phi, \theta, \sigma^{2}}\left(X_{1}, \ldots, X_{n}\right)$

where $f_{\phi, \theta, \sigma^{2}}$ is the joint (Gaussian) density for the given ARMA model

Maximum likelihood estimation

Suppose that $X_{1}, X_{2}, \ldots, X_{n}$ is drawn from a zero mean Gaussian ARMA $(p, q)$ process. The likelihood of parameters $\phi \in \mathbb{R}^{p}$, $\theta \in \mathbb{R}^{q}$, $\sigma_{w}^{2} \in \mathbb{R}_{+}$

is defined as the density of $$X=\left(X_{1}, X_{2}, \ldots, X_{n}\right)^{\prime}$$

under the Gaussian model with those parameters: $L\left(\phi, \theta, \sigma_{w}^{2}\right)=\frac{1}{(2 \pi)^{n / 2}\left(\Gamma_{n}\right)^{1 / 2}} \exp \left(-\frac{1}{2} X^{\prime} \Gamma_{n}^{-1} X\right)$

where $\vert A \vert$ denotes the determinant of a matrix $A$,

and $\Gamma_{n}$ is the variance/covariance matrix of $X$ with the given parameter values.

The maximum likelihood estimator (MLE) of $\phi, \theta, \sigma_{w}^{2}$ maximizes this quantity.

The exact Gaussian log-likelihood is then given by:

$$ 2 \ell\left(\mu, \phi, \theta, \sigma^{2}\right)=n \log 2 \pi+\log \left|\Gamma_{n}\right|+(\boldsymbol{X}-\mu)^{\prime} \Gamma_{n}^{-1}(\boldsymbol{X}-\mu) $$