$next$ $up$ $previous$
Next: EM for Non-Time-Varying Channels Up: Sequence Estimation over Linearly-Constrained Previous: Communication System Model

EM and ML Sequence Estimation

To minimize BER, we must find $\bf A$ to maximize $f({\bf A}\vert{\bf r})$ . Using Bayes rule:

$\begin{displaymath}% latex2html id marker 1332 f({\bf A}\vert{\bf r}) = \frac {f... ...f A})f({\bf A }) \doteq f({\bf r}\vert{\bf A})f({\bf A }) \; , \end{displaymath}$

(9)

where $% latex2html id marker 1507 $K_{\ref{eq:f(A\vert r)}}$$ is a constant which takes into account the effect of $f({\bf r})$ for normalizing the distribution.¹ MAP estimators minimize BER by maximizing $f({\bf r}\vert{\bf A})f({\bf A })$ . If $f({\bf A })$ is unknown or assumed to be uniform, then the ML estimator which maximizes the likelihood function $f({\bf r}\vert{\bf A})$ can be used instead. All of the EM algorithms derived in the subsequent sections produce iterative solutions to the ML problem of maximizing $f({\bf r}\vert{\bf A})$ . But if $f({\bf A })$ is known and not uniform, then this could be incorporated into the following EM algorithms as an additive term in the cost function, thus producing a MAP estimator. The dependence of $f({\bf r}\vert{\bf A})$ on the underlying random channel coefficients is:

$\begin{displaymath}f({\bf r}\vert{\bf A}) = E[f({\bf r}\vert{\bf P},{\bf A})] = ... ...\bf P}} f({\bf r}\vert{\bf P},{\bf A}) f({\bf P }) d{\bf P } , \end{displaymath}$

(10)

where $f({\bf r}\vert{\bf P},{\bf A})$ is given by (6) for time-varying channels. Evaluating this multidimensional integral and then maximizing the result with respect to $\bf A$ seems to be an intractable problem in general. Yet this is exactly the problem which EM algorithms can solve iteratively. The only apparent case where direct evaluation and maximization of $f({\bf r}\vert{\bf A})$ from (10) seems tractable is if ${\bf P}$ is non-time-varying and deterministic. Then $f({\bf P}) = \delta({\bf p}-{\bf d})$ , where ${\bf d}$ is an unknown constant, ${\bf h}_k = {\bf M} {\bf d} + {\bf\bar{M}} {\bf\bar{p}} = {\bf g}$ from (3), and (10) reduces to (8) with ${\bf h} = {\bf g}$ , where ${\bf g}$ is the deterministic but unknown value of ${\bf h}$ , as discussed further in Section 4. This provides a theoretical basis for the joint $({\bf h},{\bf A})$ estimation algorithms such as [1,12,2,4]. However for arbitrary $f({\bf P})$ we must work with (10) directly, or use an EM algorithm, in order to produce optimal estimates of $\bf A$ .

To maximize (10) using EM, define the auxiliary function [5]:

$\displaystyle Q({\bf A}\vert{\bf B})$	=	$\displaystyle E [ \log (f({\bf r},{\bf P}\vert{\bf A})) \ \vert {\bf r},{\bf B} ]$
	=	$\displaystyle \int_{\bf P} \log ( f({\bf r},{\bf P}\vert{\bf A}) ) f({\bf P }\vert{\bf r},{\bf B}) d {\bf P} ,$	(11)

where we desire the ML estimate of $\bf A$ , and ${\bf B}$ represents the current estimate of $\bf A$ . In the terminology of EM, ${\bf r}$ is the observed data, ${\bf P}$ is the hidden data, and $({\bf r},{\bf P})$ is the complete data.
The general EM algorithm for this problem is:

1.: Initialize ${\bf B}$ to an estimate of $\bf A$ .
2.: E-step (Expectation): construct $Q({\bf A}\vert{\bf B})$
3.: M-step (Maximization): find $\bf A$ to maximize $Q({\bf A}\vert{\bf B})$
4.: Set ${\bf B}={\bf A}$ and repeat steps 2 and 3 until converged.

In general, to construct $Q({\bf A}\vert{\bf B})$ , start with $f({\bf r},{\bf P}\vert{\bf A})$ :

$\begin{displaymath}f({\bf r},{\bf P}\vert{\bf A}) = f({\bf r}\vert{\bf P},{\bf A}) f({\bf P}\vert{\bf A}) . \end{displaymath}$

(12)

We can drop the $f({\bf P}\vert{\bf A})$ term from this since ${\bf P}$ and $\bf A$ are independent so $f({\bf P}\vert{\bf A})$ is not a function of $\bf A$ and will not affect the maximization in the M-step. $Q({\bf A}\vert{\bf B})$ from (11) may then be written as:

$\displaystyle Q({\bf A}\vert{\bf B})$	$\textstyle \doteq$	$\displaystyle E[ \log ( f({\bf r}\vert{\bf P},{\bf A}) ) \ \vert{{\bf r},{\bf B}}]$
	=	$\displaystyle \int_{\bf P} \log ( f({\bf r}\vert{\bf P},{\bf A}) ) f({\bf P }\vert{\bf r},{\bf B}) d {\bf P} .$	(13)

To evaluate $Q({\bf A}\vert{\bf B})$ , we need $f({\bf P }\vert{\bf r},{\bf B})$ , which can be written in terms of the apriori underlying channel coefficient density function $f({\bf P})$ as:

$\begin{displaymath}f({\bf P}\vert{\bf r},{\bf B}) = \frac {f({\bf r}\vert{\bf P}... ... B}) f({\bf P}\vert{\bf B})} {f({\bf r}\vert{\bf B})} \; \; , \end{displaymath}$

(14)

where the denominator term $f({\bf r}\vert{\bf B})$ can be treated as a constant since it simply serves to normalize the distribution, and is not a function of ${\bf P}$ or $\bf A$ , so will not affect the maximization step. Also, $f({\bf P}\vert{\bf B}) = f({\bf P })$ since the underlying channel coefficients and transmitted data are independent. But note that given ${\bf r}$ and ${\bf B}$ , $f({\bf P }\vert{\bf r},{\bf B}) \not= f({\bf P })$ . Therefore, for $f({\bf P }\vert{\bf r},{\bf B})$ in (13) we may use:

$\begin{displaymath}f({\bf P}\vert{\bf r},{\bf B}) \doteq f({\bf r}\vert{\bf P},{\bf B}) f({\bf P}) . \end{displaymath}$

(15)

In the specific cases considered subsequently, we will show that the E-step reduces to computing sufficient statistics of $Q({\bf A}\vert{\bf B})$ , which are the mean and covariance of ${\bf P}$ conditioned on ${\bf r}$ and ${\bf B}$ . This enables the Viterbi algorithm [13] to be used for the M-step. The Viterbi algorithm is a well-known efficient method for ML sequence estimation over known channels. It can be used here in the unknown channel case because of the estimates provided by the E-step. The E-step and M-step formulas will be derived in detail in subsequent sections for various specific underlying channel coefficient distributions.

$next$ $up$ $previous$
Next: EM for Non-Time-Varying Channels Up: Sequence Estimation over Linearly-Constrained Previous: Communication System Model

Rick Perry
2000-03-16