next up previous
Next: EM Algorithms for Specific Up: EM Algorithms for Sequence Previous: Communication System Model

   
EM and ML Sequence Estimation

To minimize BER, we must find ${\bf A}$ to maximize $f({\bf A}\vert{\bf r})$. Using Bayes rule:

 
$\displaystyle f({\bf A}\vert{\bf r})$ = $\displaystyle \frac {f({\bf r}\vert{\bf A})f({\bf A})}
{f({\bf r})}$  
  = % latex2html id marker 2817
$\displaystyle K_{\ref{eq:f(A\vert r)}} f({\bf r}\vert{\bf A})f({\bf A })\doteq f({\bf r}\vert{\bf A})f({\bf A })
,$ (4)

where % latex2html id marker 876
$K_{\ref{eq:f(A\vert r)}}$ is a constant which takes into account the effect of $f({\bf r})$ for normalizing the distribution.1

MAP estimators minimize BER by maximizing $f({\bf r}\vert{\bf A})f({\bf A })$. If $f({\bf A })$ is unknown or assumed to be uniform, then the ML estimator which maximizes the likelihood function $f({\bf r}\vert{\bf A})$ can be used instead. All of the EM algorithms derived in the subsequent sections produce iterative solutions to the ML problem of maximizing $f({\bf r}\vert{\bf A})$. But if $f({\bf A })$ is known and not uniform, then this could be incorporated into the following EM algorithms as an additive term in the cost function, thus producing a MAP estimator.

The dependence of $f({\bf r}\vert{\bf A})$ on the random channel coefficients is:

 \begin{displaymath}f({\bf r}\vert{\bf A}) =
E[f({\bf r}\vert{\bf h},{\bf A})] =
...
...\bf h}} f({\bf r}\vert{\bf h},{\bf A})
f({\bf h }) d{\bf h } ,
\end{displaymath} (5)

where $f({\bf r}\vert{\bf h},{\bf A})$ is given by (3). Evaluating this multidimensional integral and then maximizing the result with respect to ${\bf A}$ seems to be an intractable problem in general. Yet this is exactly the problem which EM algorithms can solve iteratively.

The only apparent case where direct evaluation and maximization of $f({\bf r}\vert{\bf A})$from (5) seems tractable is if ${\bf h}$ is deterministic. Then $f({\bf h}) = \delta({\bf h}-{\bf g})$and (5) reduces to (3) with ${\bf h} = {\bf g}$, where ${\bf g}$ is the deterministic but unknown value of ${\bf h}$, as discussed further in Section 4.1. This provides a theoretical basis for the joint $({\bf h},{\bf A})$ estimation algorithms such as [3]. However for arbitrary $f({\bf h})$ we must work with (5) directly, or use an EM algorithm, in order to produce optimal estimates of ${\bf A}$.

To maximize (5) using EM, define the auxiliary function [4]:

 
$\displaystyle Q({\bf A}\vert{\bf B})$ = $\displaystyle E [ \log (f({\bf r},{\bf h}\vert{\bf A})) \ \vert {\bf r},{\bf B} ]$  
  = $\displaystyle \int_{\bf h} \log ( f({\bf r},{\bf h}\vert{\bf A}) ) f({\bf h }\vert{\bf r},{\bf B}) d {\bf h} ,$ (6)

where we desire the ML estimate of ${\bf A}$, and ${\bf B}$ represents the current estimate of ${\bf A}$. In the terminology of EM, ${\bf r}$ is the observed data, ${\bf h}$ is the hidden data, and $({\bf r},{\bf h})$ is the complete data.

The general EM algorithm for this problem is:

1.
Initialize ${\bf B}$ to an estimate of ${\bf A}$.
2.
E-step (Expectation): construct $Q({\bf A}\vert{\bf B})$
3.
M-step (Maximization): find ${\bf A}$ to maximize $Q({\bf A}\vert{\bf B})$
4.
Set ${\bf B}={\bf A}$ and repeat steps 2 and 3 until converged.

In general, to construct $Q({\bf A}\vert{\bf B})$, start with $f({\bf r},{\bf h}\vert{\bf A})$:

 \begin{displaymath}f({\bf r},{\bf h}\vert{\bf A}) =
f({\bf r}\vert{\bf h},{\bf A}) f({\bf h}\vert{\bf A}) .
\end{displaymath} (7)

We can drop the $f({\bf h}\vert{\bf A})$ term from this since ${\bf h}$ and ${\bf A}$ are independent so $f({\bf h}\vert{\bf A})$is not a function of ${\bf A}$ and will not affect the maximization in the M-step. $Q({\bf A}\vert{\bf B})$ from (6) may then be written as:
 
$\displaystyle Q({\bf A}\vert{\bf B})$ $\textstyle \doteq$ $\displaystyle E[ \log ( f({\bf r}\vert{\bf h},{\bf A}) ) \ \vert{{\bf r},{\bf B}}]$  
  = $\displaystyle \int_{\bf h} \log ( f({\bf r}\vert{\bf h},{\bf A}) ) f({\bf h }\vert{\bf r},{\bf B}) d {\bf h} .$ (8)

To evaluate $Q({\bf A}\vert{\bf B})$, we need $f({\bf h }\vert{\bf r},{\bf B})$, which can be written in terms of the apriori channel coefficient density function $f({\bf h})$ as:

 \begin{displaymath}f({\bf h}\vert{\bf r},{\bf B}) =
\frac {f({\bf r}\vert{\bf h},{\bf B}) f({\bf h}\vert{\bf B})}
{f({\bf r}\vert{\bf B})} ,
\end{displaymath} (9)

where the denominator term $f({\bf r}\vert{\bf B})$ can be treated as a constant since it simply serves to normalize the distribution, and is not a function of ${\bf h}$ or ${\bf A}$, so will not affect the maximization step. Also, $f({\bf h}\vert{\bf B}) = f({\bf h })$since the channel coefficients and transmitted data are independent. But note that given ${\bf r}$ and ${\bf B}$, $f({\bf h }\vert{\bf r},{\bf B}) \not= f({\bf h })$.

Therefore, for $f({\bf h }\vert{\bf r},{\bf B})$ in (8) we may use:

 \begin{displaymath}f({\bf h}\vert{\bf r},{\bf B}) \doteq
f({\bf r}\vert{\bf h},{\bf B}) f({\bf h}) .
\end{displaymath} (10)

In the specific cases considered subsequently, we will show that the E-step reduces to computing sufficient statistics of $Q({\bf A}\vert{\bf B})$, which are the mean and covariance of ${\bf h}$conditioned on ${\bf r}$ and ${\bf B}$. This enables the Viterbi algorithm [1] to be used for the M-step. The Viterbi algorithm is a well-known efficient method for ML sequence estimation over known channels. It can be used here in the unknown channel case because of the estimates provided by the E-step. The E-step and M-step formulas will be derived in detail in subsequent sections for various specific channel coefficient distributions.


next up previous
Next: EM Algorithms for Specific Up: EM Algorithms for Sequence Previous: Communication System Model
Rick Perry
1999-10-28