Time-Recursive EM Examples R. Perry, 30 June 1998 ______________________________________________________________________________ 0. Introduction --------------- Time-recursive implementations are presented for the scaled binary sequence examples from: [1] Sequence Estimation EM Examples, R. Perry, 25 June 1998 Section 1 considers a simple scaled binary sequence in noise. In section 2 the binary sequence is constrained in such a way that it becomes convenient to use a time-recursive Viterbi algorithm for the M-step of the EM algorithm. ______________________________________________________________________________ 1. Scaled Binary Sequence ------------------------- Let the observed data Y be a sequence of N 0's and 1's scaled by a fixed but unknown non-zero parameter c and corrupted by additive Gaussian noise Z: Y = c B + Z where B represents a sequence of deterministic binary 0 and 1 values. The block EM algorithm for this problem derived in [1] is: E-step: a = avg(Y') where Y' is the subset of Y values associated with the current estimates of B which are 1's. M-step: For each b,y in B,Y set b = 1 if (a > 0 and y > a/2) or (a < 0 and y < a/2) otherwise set b = 0 And the implementation in M from [1] is: b = ones( 1, N); /* initial estimate */ for( i = 0; i < n; ++i) { /* n iterations of EM */ /* E-step */ j = find( b == 1); c = sum(y(j)) / cols(j); /* M-step */ b = (c > 0) ? (y > c/2) : (y < c/2); } where y is the row-vector of observed data, b is the row-vector of estimates of B, and c represents the estimate of c called "a" above. In general, an optimal time-recursive EM algorithm should be derived by formulating the problem in terms of maximizing a probability function like P(b(t)|b(t-1)), where the estimate from the previous time instant is used in computing the current estimate. However, such an approach seems to produce the same EM algorithm as shown above for the problem under consideration, since c and B are deterministic. Therefore, we will take the approach of simply starting with the block EM implementation shown above, and rewrite it to be time-recursive. The two main features that a time-recursive implementation should have are: all of the past samples and estimates are not stored; given input data samples one at a time, estimates are produced one at a time. Due to the loss of information caused by not storing all past samples, the time-recursive implementations derived here will only be approximate EM algorithms. Let's start by considering an implementation which doesn't save any of the past samples or estimates, but does save the previously computed value of the scale factor c. Let n1 represent the number of 1's in B used to estimate c, and let n1=0 and c=0 at the start of the algorithm. Let d represent an updated estimate of c to be used during the iterations of the E-step and M-step. After the iterations have converged, c=d can be performed to update the value of c. Recall that c represents the average of n1 values of Y corresponding to B values which are estimated as being 1's. Given a new input data sample y, if the corresponding value of b is estimated as 1, then d can be computed as: (1.1) d = (n1 c + y) / (n1 + 1) = c + (y - c)/(n1 + 1) In this case, d represents the average of (n1 + 1) values of Y. Using (1.1), a time-recursive implementation of the EM algorithm is: bK = 1; /* initial estimate */ for( i = 0; i < n; ++i) { /* n iterations of EM */ /* E-step */ d = c; if( bK == 1) d += (yK - c)/(n1 + 1); /* M-step */ bK = (d > 0) ? (yK > d/2) : (yK < d/2); } if( bK == 1) { ++n1; c = d; } where yK is the current (K'th) input data value, and bK is the associated estimated 0 or 1 bit value. Note that if bK turns out to be a 1 at the end, then n1 is incremented and c is updated with the value of d. This implementation may not perform very well for two reasons. First, the update of c whenever bK is 1 at the end introduces an error because the past values of b are not being updated based on the changed value of c. Since we are not saving the past values of b, we can't go back and change them. c is supposed to be computed using the values of y corresponding to b values which are estimated as being 1's. As c changes, if we could go back and look at previous values of y, some of the previous values of b may change based on the threshold comparison of y with c/2. Changes in previous values of b should be reflected in an updated estimate of c, but we can not do that here. Second, the update formula for d from (1.1), while mathematically correct, causes new values of yK to have a decreasing effect on the estimates as n1 increases since (yK - c) is divided by (n1 + 1). c will tend to converge to an estimate based on past values of y and b, which tend to be in error as noted in the previous paragraph. This problem could be alleviated by resetting n1 to some smaller value when it gets too large. To alleviate the first problem, consider a buffered or pipelined version of the EM implementation from above using a sliding window of size K. Let y_buf and b_buf be arrays of size K. Given the current input data value yK, we place it in y_buf(K). y_buf(1:K-1) contains the previous K-1 input values. Then we perform a block EM algorithm using block size K. At the end, we commit to an estimate only for b_buf(1), and shift the buffers left by one in preparation for the next input value: y_buf(K) = yK; b_buf(K) = 1; for( i = 0; i < n; ++i) { /* n iterations of EM */ /* E-step */ j = find( b_buf == 1); jn1 = cols(j); d = c; if( jn1 > 0) { s = sum( y_buf(j)); d += (s - jn1 * c) / (n1 + jn1); } /* M-step */ b_buf = (d > 0) ? (y_buf > d/2) : (y_buf < d/2); } b1 = b_buf(1); y1 = y_buf(1); if( K > 1) { b_buf = [b_buf(2:K), 0]; /* shift */ y_buf = [y_buf(2:K), 0]; } if( b1 == 1) { ++n1; c += (y1 - c) / n1; } Note that in the E-step we are estimating d using c (which is an average of n1 values of y) and s (the sum of jn1 values of y in y_buf corresponding to values of b in b_buf which are 1's). A derivation of that update formula is: (1.2) d = (n1 c + s) / (n1 + jn1) = c + (s - jn1 c)/(n1 + jn1) At the end, we update c based on the estimate from b_buf(1) which is shifted out of the buffer. c continues to be an estimate of n1 past values of y which are not saved, while d is an estimate that incorporates c as well as the K buffered values of y. If K=1, this implementation reduces to the one previously described which does not use buffering. For yK = y(t), the result b1 is an estimate of b(t-K+1); thus, the outputs are delayed from the inputs by K-1 due to the pipelining. A limited number of simulations were run to compare the bit-error-rates of the block and time-recursive EM solutions with one using a known value of the scale factor c. Using N = 10000 data values with P(1) = 0.4 and c = 2, 10 trials were performed for each SNR value shown in the table below. For the EM solutions, 3 iterations of the E-step and M-step were performed. For the known channel case, c was computed using the block E-step formula on the original generated data. SNR (dB) 3 6 9 12 Algorithm known 0.07413 0.01998 0.00162 0 block 0.07614 0.02035 0.00158 0 K=100 0.07358 0.02006 0.00157 0 K=50 0.07358 0.02003 0.00160 0 K=1 0.07397 0.02030 0.00178 0.00011 BER vs. Algorithm and SNR The block algorithm corresponds to using K=N in the time-recursive implementation. I would expect BER to increase as K decreases, but the results above do not indicate any dramatic change in BER vs. algorithm. More trials for each case would have to be performed to obtain more accurate estimates of BER. ______________________________________________________________________________ 2. Constrained Binary Sequence ------------------------------ To make this problem more interesting, and complicated enough so that it becomes convenient to use the Viterbi algorithm for the M-step, section 2 of [1] introduced the constraint that the process which generates B never produces two 1's in a row. For this constrained EM algorithm, the M-step uses the Viterbi algorithm to estimate B. The block implementation in M from [1] is the same as shown above at the beginning of section 1 except for the M-step: /* M-step */ b = vit(y,2) - 1; /* map 1,2 from viterbi to 0,1 */ A time-recursive implementation will be the same as shown at the end of section 1 above except for the M-step: /* M-step */ b_buf = vit(y_buf,2,istate) - 1; where istate is an additional optional argument to the Viterbi routine used to specify the initial state. istate < 1 is used to indicate that the initial state is unknown. Initially, istate = 0. At the end, when b1 is shifted out from b_buf, istate is updated in preparation for the next iteration: istate = b1 + 1; /* initial viterbi state for next call */ In this implementation, vit() uses states numbered 1 and 2, corresponding to bit values 0 and 1. Thus, we subtract 1 from the state vector returned by vit() to obtain the bit values, and add 1 to b1 to set istate. The use of the Viterbi algorithm here is time-recursive in the sense that the initial state (istate) is updated based on the result (b1) of the previous iteration. But it is not completely recursive since each time vit() is called it starts from scratch with the buffer of y values y_buf. It would be possible to implement a completely recursive Viterbi routine by passing just the current yK value to vit() and having vit() shift its internal trellis and path index arrays by one each time a new data sample is available. This would be more efficient since the Viterbi algorithm would only have to be performed on the last branch section of the trellis for each new data sample. However, in this example, the estimate of c changes at the end of each EM iteration if b1=1, and c is used to compute the Viterbi path costs. Thus, if the Viterbi algorithm reused its trellis, simply shifted by one from the previous iteration, it would be using incorrect cost estimates. A limited number of simulations were run to compare the bit-error-rates of the block and time-recursive EM solutions with one using a known value of the scale factor c. Using N = 1000 data values with P(1) = 0.8 and c = 2, 10 trials were performed for each SNR value shown in the table below. For the EM solutions, 3 iterations of the E-step and M-step were performed. For the known channel case, c was computed using the block E-step formula on the original generated data. SNR (dB) 3 4 5 6 Algorithm known 0.0403 0.0265 0.0160 0.0118 block 0.0442 0.0288 0.0174 0.0121 K=5 0.0437 0.0285 0.0168 0.0119 K=4 0.0435 0.0287 0.0167 0.0118 K=3 0.0439 0.0285 0.0166 0.0118 K=2 0.0463 0.0312 0.0168 0.0119 K=1 0.0585 0.0389 0.0248 0.0188 BER vs. Algorithm and SNR The block algorithm corresponds to using K=N in the time-recursive implementation. The results indicate that the time-recursive algorithm performs well even for small values of the buffer size K (i.e. the Viterbi trellis size).