Advanced DSP

Linear Estimation

Lecture 7

Conducted by: Udayan Kanade

The basis-orthogonalizing coefficients found in the Levinson-Durbin algorithm give the best autoregressive predictor for a signal (such that the signal that will have to be input to this predictor – the error or innovations signal – has minimum energy.) This prediction can be used for signal compression. Predicting and sending the error across is the most common compression methodology.

The best linear predictor of X from Y, X≈bY has b=E[XY]/E[Y²] – the correlation upon the moment – called the correlation coefficient. Estimating the correlation coefficient takes a much smaller sample space than estimating the conditional expectations for all values of Y. There are obvious algorithmic advantages both while designing and running the linear predictor. Many a times (when orthogonality is due to independence) linear prediction is the best one can do.

If we let Y=1, the random variable which always outputs 1, we get b=E[X] – which, as we know, is the best constant estimate of X.

To get the best affine estimate from Y, i.e. estimate of the sort X≈bY+c, we realize that we want an estimate of the form bY+c1, which is X estimated with two vectors. Using successive orthogonalization, we first project X onto 1 to give E[X], with a remaining error X−E[X]. We orthogonalize Y with respect to 1 to give Y−E[Y]. We project X−E[X] onto this, to give an estimate of q(Y−E[Y]), where q=E[(X−E[X])(Y−E[Y])]/E[(Y−E[Y])²] – the covariance upon the variance. The whole prediction is b(Y−E[Y])+E[X]. We get b=q and c=E[X]−bE[Y].

To avoid the above odd looking math throughout, we are going to claim that we always orthogonalize with respect to 1 to begin with, giving zero mean random variables. For these, the best linear estimate is the best affine estimate. Henceforth, whenever we say random variable, we mean zero mean.

Suppose we have two independent zero mean random variables. Since there is no way to predict one from the other, the best prediction is going to be the mean – zero. This can be thought of as linear prediction with a zero coefficient. Thus, independent random variables are orthogonal!

How do we linear-predict a random variable X from a bunch of other random variables Y₁, Y₂, ...Y_k? This is a least squares modeling problem. The inverse of the autocorrelation matrix multiplied by the crosscorrelation vector (the pseudoinversion formula) gives the required predictor. If we decided to use successive orthogonalization, we would orthogonalize some Y_i with respect to the previous Y’s, giving a new orthogonal error Q_i in terms of Y₁, Y₂, ...Y_i. We can then project X onto Q_i (which can be done since we know Q_i in terms of the Ys, and we know the covariance between the Ys and X), and continue in the same fashion.

When can we use Levinson-Durbin? When the correlation E[Y_iY_j] depends only upon the distance i−j and not on the actual position i or j. An infinite row of Ys like this is called a stationary random process in the wide sense. If we find (using Levinson-Durbin) the (possibly infinite) basis-orthogonalizing coefficients of, say Y₀ from all previous Y’s, the same coefficients will apply for the prediction of any Y_i. This gives a filter which can be used to orthogonalize the stationary process. This orthogonal process “behind” the stationary process is called the innovations process. We can apply a filter (the inverse of the above) to the innovations to get the stationary process.

Links:
Last year's lecture: Linear Prediction
Last year's lecture: Random Processes
Last year's lecture: the Innovations Process

Relations:

Multivariable linear estimation is an extention of linear estimation, using the algebra of random variables. It is an application of the least squares methodology. A variant of the successive orthogonalization method can be used to predict a random variable from an array of others. The Levinson-Durbin algorithm can be used if the process being predicted from is stationary. Orthogonalization of a process is the most basic form of the Wiener filter.