Note: this is an old internal report which is being superseded by various
journal and conference publications in progress.
______________________________________________________________________________

EM ISI with time-varying channel coefficients            R. Perry, 28 May 1999
______________________________________________________________________________

0. Introduction
---------------
Following the report "EM ISI Examples", EM algorithms are derived for the
case of time-varying channel coefficients.  First the most general case is
considered.  Then three specific cases of channel coefficient distributions
are considered: independent, independent Gaussian, and first-order Gauss-Markov.

1. General EM Setup
-------------------
The received data at time k is:

(1)  r(k) = A(k)'H(k) + n(k),  k=1..N

A(k) is a column vector containing transmitted data a(k-i+1), i=1..L,
and a(t) == 0 for t <= 0.

H(k) is a column vector containing the channel impulse response coefficients
(which are unknown) at time k.

n(k) is the white Gaussian noise at time k with variance s.

Letting H represent {H(1)..H(N)} and A represent {A(1)..A(N)}, we have:

(2)  f(r|H,A) = (1/((s 2pi)^(N/2))) product k=1..N
                {
		  exp(-(r(k)-A(k)'H(k))^2/(2s))
                }

For the EM algorithm, define Q(A|B):

(3)  Q(A|B) = E[ -log f(r,H|A) | r,B]

where we desire the ML estimate of A, B is the current estimate of A,
r is the observed data, H is the hidden data, and (r,H) is the complete
data. 

f(r,H|A) can be written as:   f(r,H|A) = f(r|H,A) f(H|A)

We can drop the f(H|A) term from this since H and A are independent so f(H|A)
is not a function of A and will not affect the minimization in the M-step.
Using f(r|H,A) from (2), and dropping constants which will not affect the
minimization:

(4)  Q(A|B) = E[ sum k=1..N { (r(k)-A(k)'H(k))^2 } | r,B]

            = integral over H
              {
		  dH f(H|r,B) sum k=1..N { (r(k)-A(k)'H(k))^2 }
              }

Given an apriori density function f(H) for the channel coefficients,
f(H|r,B) can be written as:

(5)  f(H|r,B) = f(r|H,B) f(H|B) / f(r|B)

where the denominator term f(r|B) can be ignored since it simply serves to
normalize the distribution, and is not a function of A, so will not affect
the minimization step.  Also, f(H|B) = f(H) since H and B are assumed to be
independent.  Using (2) for f(r|H,B), and combining constants into a
normalizing factor K, (5) becomes:

(6)  f(H|r,B) = K f(H) product k=1..N
                {
		  exp(-(r(k)-B(k)'H(k))^2/(2s))
                }

and K can be determined by setting the integral of f(H|r,B) to 1.

2. Independent Channel Coefficients
-----------------------------------
If the elements of H are independent over time, then f(H) can be written as:

(7)  f(H) = product k=1..N { f(H(k)) }

and f(H|r,B) becomes:

(8)  f(H|r,B) = K product k=1..N { f(H(k)|r,B) }

with:

(9)  f(H(k)|r,B) = f(H(k)) exp(-(r(k)-B(k)'H(k))^2/(2s))

Using (8) and (9) in (4) we obtain:

(10)  Q(A|B) = E[ sum k=1..N { (r(k)-A(k)'H(k))^2 } | r,B]

	     = sum k=1..N { E[ (r(k)-A(k)'H(k))^2 | r,B] }

             = sum k=1..N
	       {
		 integral over H(k)
                 {
		  dH(k) f(H(k)) exp(-(r(k)-B(k)'H(k))^2/(2s)) (r(k)-A(k)'H(k))^2
                 }
	       }

Define g(k) and G(K):

(11)  g(k) == E[H(k)|r,B]

(12)  G(k) == E[(H(k)-g(k))(H(k)-g(k))'|r,B]

Following the derivation of (11) in "EM ISI Examples", the integral in (10)
may be written as:

(13)  V(k) = (r(k) - A(k)'g(k))^2 + A(k)'G(k)A(k)

and represents the Viterbi incremental cost function at time k.

3. Independent Gaussian Channel Coefficients
--------------------------------------------
If f(H(k)) is Gaussian with mean d and covariance R, then similar to
the derivation of (22)-(24) in "EM ISI Examples", we obtain:

(14)  f(H(k)|r,B) = K4 exp(-(H(k)-g(k))'inv(G(k))(H(k)-g(k))/2)

where K4 is a normalizing constant and:

(15)  G(k) = inv(B(k)B(k)'/s+inv(R))

(16)  g(k) = G(k) (B(k)r(k)/s+inv(R)d)

So in this case, the distribution of H(k) given r and B is also Gaussian,
and the EM algorithm is:

  1.  Initialize B to some estimate of A by initially estimating
      g(k) and G(k), k=1..N, then using the Viterbi algorithm to estimate A,
      then setting B=A.

  2.  E-step: use (15) and (16) to estimate G(k) and g(k), k=1..N

  3.  M-step: use the Viterbi algorithm with incremental cost function (13)
      to estimate A.

  4.  Set B=A and repeat steps 2 and 3 until converged.

Some initial BER simulation results, using 1000 trials:

                           EM algorithm
   N  SNR      deterministic h      Gaussian h        known channel
   6   6       0.1785               0.13383           0.12967
  10   6       0.1879               0.1234            0.1203

4. Gauss-Markov Channel Coefficients
------------------------------------
Let `a' represent the first-order Markov factor such that a = exp(-wT)
where T is the sampling period and w/pi is the Doppler spread.  Let the
time-varying channel coefficients follow a Gauss-Markov distribution
such that, at time k:

(17)  H(k) = a H(k-1) + v(k)

where v(k) is white and Gaussian with mean 0 and autocorrelation I.

In this case, f(H(k)|H(k-1)) is Gaussian with mean = a H(k-1) and
autocorrelation I.  And f(H(k)) is Gaussian with mean 0 and autocorrelation
I/(1-a^2).  Also, f(H(k-1)|H(k)) is Gaussian with mean = a H(k) and
autocorrelation I.

Q(A|B) from (4) can be written in terms of f(H(k)|r,B):

(18)  Q(A|B) = sum k=1..N { E[ (r(k)-A(k)'H(k))^2 | r,B] }

             = sum k=1..N
	       {
		 integral over H(k)
                 {
		  dH(k) f(H(k)|r,B) (r(k)-A(k)'H(k))^2
                 }
	       }

and the definitions of g(k), G(k), and V(k) from (11-13) still apply.

f(H) can be written as:

(19)  f(H) = f(H(N)|H(N-1)) f(H(N-1)|H(N-2)) ... f(H(2)|H(1)) f(H(1))

so f(H|r,B) from (6) is:

(20)  f(H|r,B) = K f(H(N)|H(N-1)) f(H(N-1)|H(N-2)) ... f(H(2)|H(1)) f(H(1))

	         * product k=1..N
                   {
		     exp(-(r(k)-B(k)'H(k))^2/(2s))
                   }

To determine f(H(k)|r,b), and thereby g(k) and G(k), we must integrate
(20) over H(i), i=1..N, i!=k.  The derivation of this integration result
is shown in Appendix A.  The result is that f(H(k)|r,B) is Gaussian,
with g(k) and G(k) determined by the following algorithm:

(21)  Initialize:

        for k=N..2

          G(k) = inv(I + B(k)B(k)'/s + a^2(I - G(k+1))

          q(k) = r(k)B(k)/s + a G(k+1) q(k+1)

(22)  Iterate:

        for k=1..N

          G(k) = inv((1-a^2)I + B(k)B(k)'/s + a^2 (2I - G(k-1) - G(k+1)))

          q(k) = r(k)B(k)/s + a (G(k-1)q(k-1) + G(k+1)q(k+1))

          g(k) = G(k)q(k)

          Use G(k) and g(k) in the Viterbi incremental cost function for time k.

          Update for next iteration:

          G(k) = inv(I + B(k)B(k)'/s + a^2 (I - G(k-1)))

          q(k) = r(k)B(k)/s + a G(k-1) q(k-1)

with  G(0) == G(N+1) == I  and  q(0) == q(N+1) == 0.

It is apparent from the above algorithm that f(H(k)|r,B) at any time k
depends on all of the received data and estimated symbols from time 1 to N.

Initial simulation results indicate that the Gauss-Markov EM algorithm
performs almost as well as the known channel.

Appendix A
----------
To derive the formulas for g(k) and G(k) in the case of the Gauss-Markov
channel coefficient distribution, first note that (19) can be written
in terms of f(H(k)) for any value of k.  (19) shows this for k=1.
For k=2, we can rewrite f(H(2)|H(1)) as f(H(1)|H(2))f(H(2))/f(H(1))
to obtain:

(A1)  f(H) = f(H(N)|H(N-1)) ... f(H(3)|H(2)) f(H(2)) f(H(1)|H(2))

For k=3, rewrite f(H(3)|H(2)) as f(H(2)|H(3))f(H(3))/f(H(2)) to obtain:

(A2)  f(H) = f(H(N)|H(N-1)) ... f(H(4)|H(3)) f(H(3)) f(H(2)|H(3)) f(H(1)|H(2))

In general:

(A3)  f(H) = f(H(N)|H(N-1)) ... f(H(k+1)|H(k)) f(H(k))

	       f(H(k-1)|H(k)) f(H(k-2)|H(k-1)) ... f(H(1)|H(2))

To evaluate f(H(k)|r,B), we can use the above form of f(H) in (20) and integrate
over H(i), i=1..N, i!=k.

Consider two of these integrations, say the integrals over H1 and H2.  By
examining the form of this result, we will be able to produce the general
result.  Showing just the terms involving H1:

(A4)  I1 = integral over H1
           {
             dH1 K1 exp( - (H1 - a H2)'(H1 - a H2)/2 - (r1-B1'H1)^2/(2s))
           }

where K1 is a constant.  Writing the exponential part as exp(-E1/2) we have:

(A5)  E1 = (H1 - a H2)'(H1 - a H2) + (r1-B1'H1)^2/s

         = H1'H1 - 2 a H1'H2 + a^2 H2'H2 + r1^2/s - 2 r1 H1'B1/s + H1'B1B1'H1/s

         = (H1-g1)'inv(G1)(H1-g1) - g1'inv(G1)g1 + a^2 H2'H2

where we have dropped the constant r1^2/s term, and G1 and g1 will be
defined below.  Since the last two terms in E1 are not functions of H1,
they can be factored out of the integral.  And the integral over H1
involving the first term from E1 produces a constant which can be ignored.
Thus, the result is:

(A6)  I1 = exp( -(-g1'inv(G1)g1 + a^2 H2'H2) / 2)

with:

(A7)  G1 = inv(I + B1B1'/s)

      q1 = r1 B1/s

      g1 = G1 (a H2 + q1)

Expanding the first term in I1:

(A8)  g1'inv(G1)g1 = a^2 H2' G1 H2 + 2 a H2' G1 q1

where we have dropped the constant q1' G1 q1 term.  So we now have:

(A9)  I1 = exp( -(-a^2 H2' G1 H2 - 2 a H2' G1 q1 + a^2 H2'H2) / 2)

This result shows how the integral over H1 produces terms involving H2 which
must be included when performing the integral over H2.  Showing just the
terms involving H2 for this next integral:

(A10)  I2 = integral over H2
            {
              dH2 K2 exp(
		          - (H2 - a H3)'(H2 - a H3)/2 - (r2-B2'H2)^2/(2s)

		          - (-a^2 H2' G1 H2 - 2 a H2' G1 q1 + a^2 H2'H2) / 2)
		        )
            }

where K2 is a constant.  Writing the exponential part as exp(-E2/2) we have:

(A11)  E2 = (H2 - a H3)'(H2 - a H3) + (r2-B2'H2)^2/s

	      - a^2 H2' G1 H2 - 2 a H2' G1 q1 + a^2 H2'H2

	  = (H2-g2)'inv(G2)(H2-g2) - g2'inv(G2)g2 + a^2 H3'H3

where we have dropped the constant r2^2/s term, and G2 and g2 will be
defined below.  Since the last two terms in E2 are not functions of H2,
they can be factored out of the integral.  And the integral over H2
involving the first term from E2 produces a constant which can be ignored.
Thus, the result is:

(A12)  I2 = exp( -(-g2'inv(G2)g2 + a^2 H3'H3) / 2)

with:

(A13)  G2 = inv(I + B2B2'/s + a^2(I - G1))

       q2 = r2 B2/s + a G1 q1

       g2 = G2 (a H3 + q2)

From the forms of (A6) and (A12) we can easily deduce the general form of the
result of integrating over H(i).

(A13) indicates the general form of the equations for G(i) and q(i).  These
equations have more terms than (A7) because in deriving (A7) there were no
previous integration results to incorporate.  However, if we define G0 = I
and q0 = 0, then (A7) may also be written in the same form as (A13).

The preceding examples used (A3) to integrate (20) over H1 and H2, and show
the recursive forms of the update equations which appear at the end of (22).
If we start at the "other end" of (A3), and integrate over H(N) first, we
obtain similar recursive equations which appear in (21).

Finally, to produce f(H(k)|r,B) after integrating over H(i), i=1..N, i!=k,
we have:

(A14)  f(H(k)|r,B) = Kk exp(-Ek/2)

(A15)  Ek = (1-a^2)H(k)'H(k) + (r(k) - B(k)'H(k))^2/s

	      + a^2 H(k)'H(k) - a^2 H(k)'G(k+1)H(k) - 2 a H(k)'G(k+1)q(k+1)

	      + a^2 H(k)'H(k) - a^2 H(k)'G(k-1)H(k) - 2 a H(k)'G(k-1)q(k-1)

where the second and third lines of (A15) show the cumulative contributions
of the integrals over H(i) for i>k and i<k.

Combining terms and ignoring constants (A15) can be written as:

(A16)  Ek = (H(k)-g(k))'G(k)(H(k)-g(k))

indicating that f(H(k)|r,B) has a Gaussian distribution with:

(A17)  G(k) = inv((1-a^2)I + B(k)B(k)'/s + a^2 (2I - G(k-1) - G(k+1)))

       q(k) = r(k)B(k)/s + a (G(k-1)q(k-1) + G(k+1)q(k+1))

       g(k) = G(k)q(k)

The results of (A17) are the equations that appear at the beginning of (22).