Time-Recursive EM Examples				R. Perry, 30 June 1998
______________________________________________________________________________

0. Introduction
---------------

Time-recursive implementations are presented for the scaled binary sequence
examples from:

[1]	Sequence Estimation EM Examples, R. Perry, 25 June 1998

Section 1 considers a simple scaled binary sequence in noise.  In section 2
the binary sequence is constrained in such a way that it becomes convenient
to use a time-recursive Viterbi algorithm for the M-step of the EM algorithm.

______________________________________________________________________________

1. Scaled Binary Sequence
-------------------------

Let the observed data Y be a sequence of N 0's and 1's scaled by a fixed but
unknown non-zero parameter c and corrupted by additive Gaussian noise Z:

	Y = c B + Z

where B represents a sequence of deterministic binary 0 and 1 values.


The block EM algorithm for this problem derived in [1] is:

E-step:
	a = avg(Y')

	where Y' is the subset of Y values associated with the current
	estimates of B which are 1's.

M-step:
	For each b,y in B,Y

	  set b = 1  if (a > 0 and y > a/2) or (a < 0 and y < a/2)

	  otherwise set b = 0


And the implementation in M from [1] is:

  b = ones( 1, N);		/* initial estimate */

  for( i = 0; i < n; ++i) {	/* n iterations of EM */

    /* E-step */

        j = find( b == 1);
        c = sum(y(j)) / cols(j);

    /* M-step */

        b = (c > 0) ? (y > c/2) : (y < c/2);
  }

where y is the row-vector of observed data, b is the row-vector of
estimates of B, and c represents the estimate of c called "a" above.


In general, an optimal time-recursive EM algorithm should be derived by
formulating the problem in terms of maximizing a probability function like
P(b(t)|b(t-1)), where the estimate from the previous time instant is used
in computing the current estimate.  However, such an approach seems to
produce the same EM algorithm as shown above for the problem under
consideration, since c and B are deterministic.

Therefore, we will take the approach of simply starting with the block EM
implementation shown above, and rewrite it to be time-recursive.

The two main features that a time-recursive implementation should have are:
all of the past samples and estimates are not stored; given input data
samples one at a time, estimates are produced one at a time.  Due to the loss
of information caused by not storing all past samples, the time-recursive
implementations derived here will only be approximate EM algorithms.

Let's start by considering an implementation which doesn't save any of the
past samples or estimates, but does save the previously computed value of
the scale factor c.  Let n1 represent the number of 1's in B used to
estimate c, and let n1=0 and c=0 at the start of the algorithm.

Let d represent an updated estimate of c to be used during the iterations of
the E-step and M-step.  After the iterations have converged, c=d can be
performed to update the value of c.

Recall that c represents the average of n1 values of Y corresponding to B
values which are estimated as being 1's.  Given a new input data sample y,
if the corresponding value of b is estimated as 1, then d can be computed as:

(1.1)	d  =  (n1 c + y) / (n1 + 1)

	   =  c + (y - c)/(n1 + 1)

In this case, d represents the average of (n1 + 1) values of Y.  Using (1.1),
a time-recursive implementation of the EM algorithm is:

  bK = 1;			/* initial estimate */

  for( i = 0; i < n; ++i) {	/* n iterations of EM */

    /* E-step */

        d = c;

	if( bK == 1)
	  d += (yK - c)/(n1 + 1);

    /* M-step */

        bK = (d > 0) ? (yK > d/2) : (yK < d/2);
  }

  if( bK == 1) {
    ++n1;
    c = d;
  }

where yK is the current (K'th) input data value, and bK is the associated
estimated 0 or 1 bit value.  Note that if bK turns out to be a 1 at the end,
then n1 is incremented and c is updated with the value of d.

This implementation may not perform very well for two reasons.  First, the
update of c whenever bK is 1 at the end introduces an error because the
past values of b are not being updated based on the changed value of c.
Since we are not saving the past values of b, we can't go back and change
them.  c is supposed to be computed using the values of y corresponding to
b values which are estimated as being 1's.  As c changes, if we could go
back and look at previous values of y, some of the previous values of b
may change based on the threshold comparison of y with c/2.  Changes in
previous values of b should be reflected in an updated estimate of c, but we
can not do that here.

Second, the update formula for d from (1.1), while mathematically correct,
causes new values of yK to have a decreasing effect on the estimates as n1
increases since (yK - c) is divided by (n1 + 1).  c will tend to converge
to an estimate based on past values of y and b, which tend to be in error
as noted in the previous paragraph.  This problem could be alleviated by
resetting n1 to some smaller value when it gets too large.

To alleviate the first problem, consider a buffered or pipelined version of
the EM implementation from above using a sliding window of size K.  Let
y_buf and b_buf be arrays of size K.  Given the current input data value
yK, we place it in y_buf(K).  y_buf(1:K-1) contains the previous K-1 input
values.  Then we perform a block EM algorithm using block size K.  At the
end, we commit to an estimate only for b_buf(1), and shift the buffers left
by one in preparation for the next input value:

  y_buf(K) = yK;
  b_buf(K) = 1;

  for( i = 0; i < n; ++i) {	/* n iterations of EM */

    /* E-step */

        j = find( b_buf == 1);
        jn1 = cols(j);
        d = c;
        if( jn1 > 0) {
          s = sum( y_buf(j));
          d += (s - jn1 * c) / (n1 + jn1);
        }

    /* M-step */

        b_buf = (d > 0) ? (y_buf > d/2) : (y_buf < d/2);
  }

  b1 = b_buf(1);
  y1 = y_buf(1);

  if( K > 1) {
    b_buf = [b_buf(2:K), 0];	/* shift */
    y_buf = [y_buf(2:K), 0];
  }

  if( b1 == 1) {
    ++n1;
    c += (y1 - c) / n1;
  }

Note that in the E-step we are estimating d using c (which is an average of
n1 values of y) and s (the sum of jn1 values of y in y_buf corresponding to
values of b in b_buf which are 1's).  A derivation of that update formula is:

(1.2)	d  =  (n1 c + s) / (n1 + jn1)

	   =  c + (s - jn1 c)/(n1 + jn1)

At the end, we update c based on the estimate from b_buf(1) which is shifted
out of the buffer.  c continues to be an estimate of n1 past values of y
which are not saved, while d is an estimate that incorporates c as well as
the K buffered values of y.

If K=1, this implementation reduces to the one previously described which
does not use buffering.  For yK = y(t), the result b1 is an estimate of
b(t-K+1); thus, the outputs are delayed from the inputs by K-1 due to the
pipelining.


A limited number of simulations were run to compare the bit-error-rates of
the block and time-recursive EM solutions with one using a known value of
the scale factor c.  Using N = 10000 data values with P(1) = 0.4 and c = 2,
10 trials were performed for each SNR value shown in the table below.
For the EM solutions, 3 iterations of the E-step and M-step were
performed.  For the known channel case, c was computed using the block
E-step formula on the original generated data.

				  SNR (dB)
                      3              6              9             12
    Algorithm
     known      0.07413        0.01998        0.00162              0
     block      0.07614        0.02035        0.00158              0
     K=100      0.07358        0.02006        0.00157              0
     K=50       0.07358        0.02003        0.00160              0
     K=1        0.07397        0.02030        0.00178        0.00011

			BER vs. Algorithm and SNR

The block algorithm corresponds to using K=N in the time-recursive
implementation.  I would expect BER to increase as K decreases, but the
results above do not indicate any dramatic change in BER vs. algorithm.
More trials for each case would have to be performed to obtain more
accurate estimates of BER.

______________________________________________________________________________

2. Constrained Binary Sequence
------------------------------

To make this problem more interesting, and complicated enough so that it
becomes convenient to use the Viterbi algorithm for the M-step, section 2
of [1] introduced the constraint that the process which generates B never
produces two 1's in a row.

For this constrained EM algorithm, the M-step uses the Viterbi algorithm
to estimate B.  The block implementation in M from [1] is the same as
shown above at the beginning of section 1 except for the M-step:

    /* M-step */

        b = vit(y,2) - 1;     /* map 1,2 from viterbi to 0,1 */

A time-recursive implementation will be the same as shown at the end
of section 1 above except for the M-step:

    /* M-step */

        b_buf = vit(y_buf,2,istate) - 1;

where istate is an additional optional argument to the Viterbi routine used
to specify the initial state.  istate < 1 is used to indicate that the
initial state is unknown.

Initially, istate = 0.  At the end, when b1 is shifted out from b_buf,
istate is updated in preparation for the next iteration:

  istate = b1 + 1;	/* initial viterbi state for next call */

In this implementation, vit() uses states numbered 1 and 2, corresponding
to bit values 0 and 1.  Thus, we subtract 1 from the state vector returned
by vit() to obtain the bit values, and add 1 to b1 to set istate.

The use of the Viterbi algorithm here is time-recursive in the sense that
the initial state (istate) is updated based on the result (b1) of the
previous iteration.  But it is not completely recursive since each time
vit() is called it starts from scratch with the buffer of y values y_buf.
It would be possible to implement a completely recursive Viterbi routine by
passing just the current yK value to vit() and having vit() shift its
internal trellis and path index arrays by one each time a new data sample
is available.  This would be more efficient since the Viterbi algorithm
would only have to be performed on the last branch section of the trellis
for each new data sample.  However, in this example, the estimate of c
changes at the end of each EM iteration if b1=1, and c is used to compute
the Viterbi path costs.  Thus, if the Viterbi algorithm reused its trellis,
simply shifted by one from the previous iteration, it would be using
incorrect cost estimates.


A limited number of simulations were run to compare the bit-error-rates of
the block and time-recursive EM solutions with one using a known value of
the scale factor c.  Using N = 1000 data values with P(1) = 0.8 and c = 2,
10 trials were performed for each SNR value shown in the table below.
For the EM solutions, 3 iterations of the E-step and M-step were
performed.  For the known channel case, c was computed using the block
E-step formula on the original generated data.

			         SNR (dB)
                     3              4              5              6
    Algorithm
     known      0.0403         0.0265         0.0160         0.0118
     block      0.0442         0.0288         0.0174         0.0121
     K=5        0.0437         0.0285         0.0168         0.0119
     K=4        0.0435         0.0287         0.0167         0.0118
     K=3        0.0439         0.0285         0.0166         0.0118
     K=2        0.0463         0.0312         0.0168         0.0119
     K=1        0.0585         0.0389         0.0248         0.0188

			BER vs. Algorithm and SNR

The block algorithm corresponds to using K=N in the time-recursive
implementation.  The results indicate that the time-recursive algorithm
performs well even for small values of the buffer size K (i.e. the
Viterbi trellis size).