Xycoon logo
Time Series Analysis
Home    Site Map    Site Search    Free Online Software    
horizontal divider
vertical whitespace

Time Series Analysis - ARIMA models - Basic Definitions and Theorems about ARIMA models

[Home] [Up] [AR(1) process] [AR(2) process] [AR(p) process] [MA(1) process] [MA(2) process] [MA(q) process] [ARMA(1,1) process] [ARMA(p,q) process] [Wold's decomp.] [Non stationarity] [Differencing] [Behavior] [Inverse Autocorr.] [Unit Root Tests] [Basics]

V.I.1.a Basic Definitions and Theorems about ARIMA models

First we define some important concepts. A stochastic process (c.q. probabilistic process) is defined by a T-dimensional distribution function.

Time Series Analysis - ARIMA models - Basic Definitions and Theorems about ARIMA models

marginal distribution function of a time series


Before analyzing the structure of a time series model one must make sure that the time series are stationary with respect to the variance and with respect to the mean. First, we will assume statistical stationarity of all time series (later on, this restriction will be relaxed).

Statistical stationarity of a time series implies that the marginal probability distribution is time-independent which means that:


the expected values and variances are constant

stationary time series - expected values and variances are constant


where T is the number of observations in the time series;


the autocovariances (and autocorrelations) must be constant

stationary time series - autocovariances (and autocorrelations) are constant


where k is an integer time-lag;


the variable has a joint normal distribution f(X1, X2, ..., XT) with marginal normal distribution in each dimension

stationary time series - normality assumption


If only this last condition is not met, we denote this by weak stationarity.

Now it is possible to define white noise as a stochastic process (which is statistically stationary) defined by a marginal distribution function (V.I.1-1), where all Xt are independent variables (with zero covariances), with a joint normal distribution f(X1, X2, ..., XT), and with

variance and expected value of white noise


It is obvious from this definition that for any white noise process the probability function can be written as

probability density function of white noise


Define the autocovariance as

autocovariance definition



autocovariance definition


whereas the autocorrelation is defined as

autocorrelation definition


In practice however, we only have the sample observations at our disposal. Therefore we use the sample autocorrelations

sample autocorrelation


for any integer k.

Remark that the autocovariance matrix and autocorrelation matrix associated with a stochastic stationary process

autocovariance matrix


autocorrelation matrix


is always positive definite, which can be easily shown since a linear combination of the stochastic variable

linear combination of stochastic variable


has a variance of

variance of linear combination of stochastic variable


which is always positive.

This implies for instance for T=3 that




Bartlett proved that the variance of autocorrelation of a stationary normal stochastic process can be formulated as


This expression can be shown to be reduced to


if the autocorrelation coefficients decrease exponentially like


Since the autocorrelations for i > q (a natural number) are equal to zero, expression (V.I.1-17) can be shown to be reformulated as


which is the so called large-lag variance. Now it is possible to vary q from 1 to any desired integer number of autocorrelations, replace the theoretical correlations by their sample estimates, and compute the square root of (V.I.1-20) to find the standard deviation of the sample autocorrelation.

Note that the standard deviation of one autocorrelation coefficient is almost always approximated by


The covariances between autocorrelation coefficients have also been deduced by Bartlett


which is a good indicator for dependencies between autocorrelations. Remind therefore that inter-correlated autocorrelations can seriously distort the picture of the autocorrelation function (ACF c.q. autocorrelations as a function of a time-lag).

It is however possible to remove the intervening correlations between Xt and Xt-k by defining a partial autocorrelation function (PACF)

The partial autocorrelation coefficients are defined as the last coefficient of a partial autoregression equation of order k


It is obvious that there exists a relationship between the PACF and the ACF since (V.I.1-23) can be rewritten as


or (on taking expectations and dividing by the variance)


Sometimes (V.I.1-25) is written in matrix formulation according to the Yule-Walker relations


or simply


Solving (V.I.1-27) according to Cramer's Rule yields


Note that the determinant of the numerator contains the same elements as the determinant of the denominator, except for the last column that has been replaced.

A practical numerical estimation algorithm for the PACF is given by Durbin




The standard error of a partial autocorrelation coefficient for k > p (where p is the order of the autoregressive data generating process; see later) is given by


Finally, we define the following polynomial lag-processes


where B is the backshift operator (c.q. BiYt = Yt-i) and where


These polynomial expressions are used to define linear filters. By definition a linear filter


generates a stochastic process


where at is a white noise variable.


for which the following is obvious


We call eq. (V.I.1-36) the random-walk model: a model that describes time series that are fluctuating around X0 in the short and in the long run (since at is white noise).

It is interesting to note that a random-walk is normally distributed. This can be proved by using the definition of white noise and computing the moment generating function of the random-walk



from which we deduce



A deterministic trend is generated by a random-walk model with an added constant


The trend can be illustrated by re-expressing (V.I.1-41) as


where ct is a linear deterministic trend (as a function of time).

The linear filter (V.I.1-35) is normally distributed with


due to the additivity property of eq. (I.III-33), (I.III-34), and (I.III-35) applied to at.

Now the autocorrelation of a linear filter can be quite easily computed as






Now it is quite evident that, if the linear filter (V.I.1-35) generates the variable Xt, then Xt is a stationary stochastic process ((V.I.1-1) - (V.I.1-3)) defined by a normal distribution (V.I.1-4) (and therefore strongly stationary), and a autocovariance function (V.I.1-45) which is only dependent on the time-lag k.

The set of equations resulting from a linear filter (V.I.1-35) with ACF (V.I.1-44) are sometimes called stochastic difference equations. These stochastic difference equations can be used in practice to forecast (economic) time series. The forecasting function is given by


On using (V.I.1-35), the density of the forecasting function (V.I.1-47) is




is known, and therefore equal to a constant term. Therefore it is obvious that



The concepts defined and described above are all time-related. This implies for instance that autocorrelations are defined as a function of time. Historically, this time-domain viewpoint is preceded by the frequency-domain viewpoint where it is assumed that time series consist of sine and cosine waves at different frequencies.

In practice there are both advantages and disadvantages to both viewpoints. Nevertheless, both should be seen as complementary to each other.


for the Fourier series model


In (V.I.1-53) we define


The least squares estimates of the parameters in (V.I.1-52) are computed by


In case of a time series with an even number of observations T = 2 q the same definitions are applicable except for


It can furthermore be shown that



such that





It is also possible to show that














which state the orthogonality properties of sinusoids and which can be proved. Remark that (V.I.1-67) is a special case of (V.I.1-64) and (V.I.1-68) is a special case of (V.I.1-66). Particularly eq. (V.I.1-66) is interesting for our discussion in regard to (V.I.1-60) and (V.I.1-53), since it states that sinusoids are independent.

If (V.I.1-52) is redefined as


then I(f) is called the sample spectrum.

The sample spectrum is in fact a Fourier cosine transformation of the autocovariance function estimate. Denote the covariance-estimate of (V.I.1-7)by the sample-covariance (c.q. the numerator of (V.I.1-10)), the complex number i, and the frequency by f, then


On using (V.I.1-55)and (V.I.1-70) it follows that


which can be substituted into (V.I.1-70) yielding


Now from (V.I.1-10) it follows


and if (t - t') is substituted by k then (V.I.1-72) becomes


which proves the link between the sample spectrum and the estimated autocovariance function.

On taking expectations of the spectrum we obtain


for which it can be shown that


On combining (V.I.1-75) and (V.I1.1-76) and on defining the power spectrum as p(f) we find


It is quite obvious that


so that it follows that the power spectrum converges if the covariance decreases rather quickly. The power spectrum is a Fourier cosine transformation of the (population) autocovariance function. This implies that for any theoretical autocovariance function (cfr. the following sections) a respective theoretical power spectrum can be formulated.

Of course the power spectrum can be reformulated with respect to autocorrelations in stead of autocovariances


which is the so-called spectral density function.



it follows that


and since g(f) > 0 the properties of g(f) are quite similar to those of a frequency distribution function.

Since it can be shown that the sample spectrum fluctuates wildly around the theoretical power spectrum a modified (c.q. smoothed) estimate of the power spectrum is suggested as


vertical whitespace

AR(1) process
AR(2) process
AR(p) process
MA(1) process
MA(2) process
MA(q) process
ARMA(1,1) process
ARMA(p,q) process
Wold's decomp.
Non stationarity
Inverse Autocorr.
Unit Root Tests
horizontal divider
horizontal divider

© 2000-2022 All rights reserved. All Photographs (jpg files) are the property of Corel Corporation, Microsoft and their licensors. We acquired a non-transferable license to use these pictures in this website.
The free use of the scientific content in this website is granted for non commercial use only. In any case, the source (url) should always be clearly displayed. Under no circumstances are you allowed to reproduce, copy or redistribute the design, layout, or any content of this website (for commercial use) including any materials contained herein without the express written permission.

Information provided on this web site is provided "AS IS" without warranty of any kind, either express or implied, including, without limitation, warranties of merchantability, fitness for a particular purpose, and noninfringement. We use reasonable efforts to include accurate and timely information and periodically updates the information without notice. However, we make no warranties or representations as to the accuracy or completeness of such information, and it assumes no liability or responsibility for errors or omissions in the content of this web site. Your use of this web site is AT YOUR OWN RISK. Under no circumstances and under no legal theory shall we be liable to you or any other person for any direct, indirect, special, incidental, exemplary, or consequential damages arising from your access to, or use of, this web site.

Contributions and Scientific Research: Prof. Dr. E. Borghers, Prof. Dr. P. Wessa
Please, cite this website when used in publications: Xycoon (or Authors), Statistics - Econometrics - Forecasting (Title), Office for Research Development and Education (Publisher), http://www.xycoon.com/ (URL), (access or printout date).

Comments, Feedback, Bugs, Errors | Privacy Policy