$next$ $up$ $previous$
Next: EM for Time-Varying Channels Up: Sequence Estimation over Linearly-Constrained Previous: EM and ML Sequence

EM for Non-Time-Varying Channels

For a non-time-varying channel, using (8) in (15), (13) becomes:

$\begin{displaymath}Q({\bf A}\vert{\bf B}) \doteq E[-\vert{\bf r}-{\bf Ah}\vert^2... ...f r}-{\bf Ah}\vert^2 f({\bf p}\vert{\bf r},{\bf B}) d{\bf p} , \end{displaymath}$

(16)

with

$\begin{displaymath}% latex2html id marker 1348 f({\bf p}\vert{\bf r},{\bf B}) = ... ...\vert{\bf r}-{\bf Bh}\vert^2} {\sigma^2} } f({\bf p}) \; \; , \end{displaymath}$

(17)

where $% latex2html id marker 1623 $K_{\ref{eq:f(p\vert r,B)}}$$ can be determined if needed by setting the integral of $f({\bf p}\vert{\bf r},{\bf B})$ to 1. To evaluate (16) we only need the mean and covariance of the channel coefficients ${\bf h}$ given ${\bf r}$ and ${\bf B}$ using the distribution of (17). We will derive these for specific cases of $f({\bf p })$ subsequently; for now, we define them symbolically as ${\bf g}$ and ${\bf G}$ :

$\displaystyle {\bf g}$	$\textstyle \stackrel {\triangle} {=}$	$\displaystyle E[{\bf h}\vert{\bf r},{\bf B}] ~ = ~ {\bf M} {\bf d} + {\bf\bar{M}} {\bf\bar{p}}$	(18)
$\displaystyle {\bf G}$	$\textstyle \stackrel {\triangle} {=}$	$\displaystyle E[({\bf h} - {\bf g}) ({\bf h} - {\bf g})^{T} \vert{\bf r},{\bf B}] ~ = ~ {\bf M} {\bf D} {\bf M}^{T} \; ,$	(19)

where ${\bf d}$ and ${\bf D}$ are the mean and covariance of the underlying channel coefficients ${\bf p}$ given ${\bf r}$ and ${\bf B}$ :

$\displaystyle {\bf d}$	$\textstyle \stackrel {\triangle} {=}$	$\displaystyle E[{\bf p}\vert{\bf r},{\bf B}]$	(20)
$\displaystyle {\bf D}$	$\textstyle \stackrel {\triangle} {=}$	$\displaystyle E[({\bf p} - {\bf d}) ({\bf p} - {\bf d})^{*T} \vert{\bf r},{\bf B}] \; \; .$	(21)

The noise variance $\sigma^2$ is needed in general to compute ${\bf g}$ and ${\bf G}$ . If $\sigma^2$ is not known it could be estimated from the variances of the transmitted and received data. The norm-squared term in (16), for each candidate $\bf A$ , can be represented as a path through a trellis, whose states at a given time are the states of the of the ISI channel. A path represents a summation of incremental state transition costs. The minimum cost path can be determined by the Viterbi algorithm. Rewriting (16) in terms of the incremental costs:

$\begin{displaymath}Q({\bf A}\vert{\bf B}) \doteq - \sum_{k=1}^N V_k \; \; , \end{displaymath}$

(22)

where V_k is:

$\begin{displaymath}V_k = E[\vert r_k - {\bf a}_k^T {\bf h}\vert^2 \ \vert{\bf r},{\bf B}] \; . \end{displaymath}$

(23)

The expected value is taken over ${\bf p}$ using (17) for $f({\bf p}\vert{\bf r},{\bf B})$ . Expanding the norm squared, and taking the expected value, yields:

$\begin{displaymath}V_k = \vert r_k - {\bf a}_k^T {\bf g}\vert^2 + {\bf a}_k^{T} {\bf G}{\bf a}_k^* \; . \end{displaymath}$

(24)

Equation (24) represents the general form of the Viterbi algorithm incremental cost function for non-time-varying random ISI channels. To use (24) in an EM algorithm, ${\bf g}$ and ${\bf G}$ from (18) and (19) are computed in the M-step; these computations depend on the specific form of the apriori underlying channel distribution $f({\bf p })$ . The following subsections examine the specific cases of deterministic, uniform, and Gaussian $f({\bf p })$ .

DETERMINISTIC CHANNEL COEFFICIENTS

The case of deterministic channel coefficients is examined first because it is the simplest case, and because if there are no constraints it reduces to previously published algorithms [1,12,2] which do not use any known apriori statistics for the channel. Unlike the other cases considered subsequently, for the deterministic channel we do not know the apriori mean of the coefficients, and as shown below, the E-step of the EM algorithm can not use $E[{\bf h}\vert{\bf r,B}]$ . If ${\bf h}$ is not random, then let $f({\bf p}) = \delta({\bf p}-{\bf d})$ , where ${\bf d}$ is an unknown constant, and ${\bf h} = {\bf M} {\bf d} + {\bf\bar{M}} {\bf\bar{p}} = {\bf g}$ from (3), where ${\bf g}$ is the deterministic but unknown value of ${\bf h}$ . In this case, $E[{\bf h}] = {\bf g}$ , and using (17) to estimate ${\bf d}$ does not provide anything useful since it reduces to ${\bf d}$ , and ${\bf d}$ is unknown. So to estimate ${\bf g}$ in this case, we must use an ad-hoc method. Note that given ${\bf r}$ and ${\bf B}$ , ${\bf p}$ satisfies (7):

$\begin{displaymath}{\bf r} = {\bf B} {\bf M} {\bf p} + {\bf B} {\bf\bar{M}} {\bf\bar{p}} + {\bf n} \; . \end{displaymath}$

(25)

Taking the expected value of the right-hand-side of (25) with respect to ${\bf p}$ and ${\bf n}$ yields:

$\begin{displaymath}{\bf r} = {\bf B} {\bf M} {\bf d} + {\bf B} {\bf\bar{M}} {\bf\bar{p}} \; \; , \end{displaymath}$

(26)

which can be solved for a linear least-square-error estimate of ${\bf d}$ using the pseudo-inverse of $\bf B M$ :

$\begin{displaymath}{\bf d} = ({\bf B M})^+ ({\bf r} - {\bf B} {\bf\bar{M}} {\bf\bar{p}}) \; \; . \end{displaymath}$

(27)

In this case, ${\bf G} = 0$ , and the Viterbi algorithm incremental cost function from (24) becomes:

$\begin{displaymath}V_k = \vert r_k - {\bf a}_k^T {\bf g}\vert^2 \; \; . \end{displaymath}$

(28)

For the E-step, (27) and (18) are used to estimate ${\bf g}$ . For the M-step, the Viterbi algorithm is used with incremental cost function (28) to estimate $\bf A$ . For EM initialization, we could select ${\bf B}$ randomly, or use all 1's, etc. But to reduce the number of iterations required, and to help the algorithm converge to a proper solution, it is better to initialize ${\bf B}$ with some reasonable estimate of $\bf A$ . This can be done, for example, by initially using a pure delay channel, i.e. using ${\bf g} = [0 ~ \ldots ~ 0 ~ 1 ~ 0 ~ \ldots ~ 0]^T$ in (28), then using the Viterbi algorithm to estimate $\bf A$ , then setting ${\bf B}$ to this estimate of $\bf A$ . Note that if there were no constraints, or if the constraints were unknown or simply not taken into account, then ${\bf M} = {\bf I}$ , ${\bf\bar{M}} = {\bf0}$ , and ${\bf g} = {\bf B}^+ {\bf r}$ ; this estimate of ${\bf g}$ would then be a maximum-likelihood estimate of ${\bf h}$ given ${\bf r}$ and ${\bf B}$ since it maximizes $f({\bf r}\vert{\bf h},{\bf B})$ (i.e. minimizes $\vert{\bf r} - {\bf B h}\vert^2$ ). The EM algorithm in this case would be equivalent to previously published algorithms for optimal joint sequence and channel estimation [1,12,2].

UNIFORM UNDERLYING CHANNEL COEFFICIENTS

Using $f({\bf p}) = 1$ in (17), ${\bf d}$ from (27), and assuming that ${\bf B}$ has full rank so that $({\bf B M})^+ = ({\bf M}^{*T} {\bf B}^{*T} {\bf B} {\bf M})^{-1} {\bf M}^{*T} {\bf B}^{*T}$ , then, dropping constants, we obtain:

$\begin{displaymath}f({\bf p}\vert{\bf r},{\bf B})) \doteq e^{- \frac {({\bf p}-{... ...f B}^{*T} {\bf B} {\bf M} ({\bf p} -{\bf d})}{\sigma^2}} \; . \end{displaymath}$

(29)

This shows that, given ${\bf r}$ and ${\bf B}$ , ${\bf p}$ has a joint Gaussian distribution with mean ${\bf d}$ as in (27), and covariance:

$\begin{displaymath}{\bf D} = \left[ \frac { {\bf M}^{*T} {\bf B}^{*T} {\bf B} {\bf M} } {{\sigma}^2} \right] ^ {-1} \; \; . \end{displaymath}$

(30)

Therefore, for the case of uniformly distributed underlying channel coefficients, the EM algorithm is almost the same as for deterministic channel coefficients, except that the Viterbi algorithm incremental cost function from (24), which includes ${\bf G}$ , is used instead of (28), and ${\bf G}$ is given by (30) and (19). For the E-step, (27) and (30) are used in (18) and (19) to estimate ${\bf g}$ and ${\bf G}$ . For the M-step, the Viterbi algorithm is used with incremental cost function (24) to estimate $\bf A$ .

GAUSSIAN UNDERLYING CHANNEL COEFFICIENTS

Let $f({\bf p })$ be a joint Gaussian distribution with mean $\bf c$ and covariance $\bf C$ :

$\begin{displaymath}f({\bf p}) \doteq e^{-({\bf p}-{\bf c})^{*T} {\bf C}^{-1} ({\bf p}-{\bf c})} \; \; , \end{displaymath}$

(31)

where $\bf c$ and $\bf C$ are known. Then (17) reduces to:

$\begin{displaymath}f({\bf p}\vert{\bf r},{\bf B}) \doteq e^{- ({\bf p}-{\bf d})^{*T} {\bf D}^{-1} ({\bf p}-{\bf d})} \; , \end{displaymath}$

(32)

with:

$\displaystyle {\bf D}$	=	$\displaystyle \left[ \frac {{\bf M}^{T} {\bf B}^{T} {\bf B} {\bf M}}{{\sigma}^2} + {\bf C}^{-1} \right] ^{-1}$	(33)
$\displaystyle {\bf d}$	=	$\displaystyle {\bf D} \left( \frac { {\bf M}^{T} {\bf B}^{T} ({\bf r} - {\bf B} {\bf\bar{M}} {\bf\bar{p}} ) }{{\sigma}^2 } + {\bf C}^{-1} {\bf c} \right) .$	(34)

So in this case, the distribution of ${\bf p}$ given ${\bf r}$ and ${\bf B}$ is also Gaussian. Note that the formulas above for the mean and covariance of ${\bf p}$ given ${\bf r}$ and ${\bf B}$ do not use $({\bf B M})^+$ as in the previous deterministic and uniform cases. The Gaussian distribution of ${\bf p}$ is intermediate between the known deterministic channel case and uniform unknown ${\bf p}$ . In the limit as $\bf C$ goes to 0, ${\bf D}$ approaches $\bf C$ and ${\bf d}$ approaches known $\bf c$ , which corresponds to the Gaussian distribution approaching $\delta({\bf p}-{\bf c})$ . This is similar to the deterministic channel ${\bf p}$ case except that here the mean is known as opposed to being estimated using (27). In the limit as $\bf C$ goes to infinity, ${\bf D}$ approaches $\left[ \frac {{\bf M}^{*T} {\bf B}^{*T} {\bf B} {\bf M}}{{\sigma}^2} \right] ^{-1}$ and ${\bf d}$ approaches $({\bf B M})^+ ({\bf r} - {\bf B} {\bf\bar{M}} {\bf\bar{p}})$ , which corresponds to the Gaussian distribution approaching a uniform distribution. Also, if N >> L or the noise variance is small, the term involving ${\bf B}$ in (33) can dominate the ${\bf C}^{-1}$ term, and then ${\bf D}$ approaches 0, and ${\bf d}$ approaches $({\bf B M})^+ ({\bf r} - {\bf B} {\bf\bar{M}} {\bf\bar{p}})$ , which corresponds to the Gaussian EM algorithm becoming equivalent to the algorithm for deterministic channel coefficients. For the E-step, (33) and (34) are used in (18) and (19) to estimate ${\bf G}$ and ${\bf g}$ . For the M-step, the Viterbi algorithm is used with incremental cost function (24) to estimate $\bf A$ .

$next$ $up$ $previous$
Next: EM for Time-Varying Channels Up: Sequence Estimation over Linearly-Constrained Previous: EM and ML Sequence

Rick Perry
2000-03-16