Dual Binomial EM Examples				R. Perry, 25 June 1998
______________________________________________________________________________


0. Introduction
---------------

A simple binomial example is presented to illustrate the EM algorithm.

EM, standing for Expectation Maximization, had been known to statisticians
for a long time, but was made better known to others by this paper:

     A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum Likelihood
     from Incomplete Data via the EM Algorithm", J. Roy. Stat. Soc.,
     vol. 39, No. 1, 1977, pp. 1-38.

Since then, the EM algorithm has found wide applicability in various areas.
We are particularly interested in its application to communication systems
and radar tracking.

In section 1 we describe the basic problem for the example under consideration,
and derive the obvious direct maximum-likelihood solution.

Section 2 provides an informal derivation of an EM algorithm for the problem.
In section 3 a formal derivation is given in terms of the Q() function from
Dempster et. al.  Section 4 presents numerical results.

In section 5 we consider a related problem: a Bernoulli process with additive
noise.  We derive the EM algorithm and present numerical results.

______________________________________________________________________________

1. Binomial ML
--------------

Let the observed data Y be a sequence of N 0's and 1's.  For example, with N=4:

	Y = 1, 1, 1, 0

Assume that each value of Y is a Bernoulli random variable with probability p.
We would like to obtain a maximum-likelihood estimate of p.

In this case, the parameter of interest from the data is the count of the
number of 1's.  Let n1 represent that count and let n0 represent the count
of the number of 0's, where n1 + n0 = N.

Then the probability mass function g(n1;N,p) is binomial:

(1.1)	g(n1;N,p) = c p^n1 (1-p)^n0

	where c = N!/(n1! n0!)

The ML estimate of p which maximizes g() can easily be obtained by
differentiating log(g()) with respect to p, setting that derivative equal to 0,
then solving for p:

	log(g()) = log(c) + n1 log(p) + n0 log(1-p)

	d(log(g()))/dp = 0 = n1/p - n0/(1-p)

Solve for p:

(1.2)	p = n1/(n1+n0)

______________________________________________________________________________

2. EM approach
--------------

Although this problem is most easily solved as shown above, directly
obtaining the ML estimate of p using differentiation of log(g()), we will
apply the EM algorithm to illustrate its use.

Assume that the observed data Y and associated counts n1 and n0 represent
incomplete data produced by the following process: each value of y was
generated by randomly selecting a data generator from a set of two such
generators, g1 and g2.  The probability of selecting g1 is s; the probability
of selecting g2 is (1-s).  The data points generated from g1 and g2 are
Bernoulli random variables with associated probabilities q1 and q2.

If we know s, q1, and q2, then we can derive p as follows:

(2.1)	p = E[p|generator#] = s q1 + (1-s) q2

Let s be fixed at some value in the interval (0.0,1.0), i.e. 0 < s < 1.
(If s was exactly 0 or 1 then one of the generators g1 or g2 would never
be selected and the problem would degenerate to a single binomial distribution.)

Let q1 and q2 be unknown.

If we can use EM to produce ML estimates of q1 and q2, then we can use (2.1) to
determine p.

Let N represent the total number of data samples, and let n1 and n0
represent the numbers of 1's and 0's, as in the previous section.

Let N1 represent the total number of data samples which are produced using
g1, and let n11 and n01 represent the numbers of 1's and 0's in these samples.

Let N2 represent the total number of data samples which are produced using
g2, and let n12 and n02 represent the numbers of 1's and 0's in these samples.

Using this notation, g1() and g2() are binomial distributions:

(2.2)	g1(n11;N1,q1) = c1 q1^n11 (1-q1)^n01

	g2(n12;N2,q2) = c2 q2^n12 (1-q2)^n02

	where c1 = N1!/(n11! n01!)

	and   c2 = N2!/(n12! n02!)

N, N1, N2, n1, n0, n11, n12, n01, n02 are related as follows:

(2.3)	N = N1 + N2 = n1 + n0

	N1 = n11 + n01

	N2 = n12 + n02

	n1 = n11 + n12

	n0 = n01 + n02

Note that n1 and n0 represent the observed or incomplete data.  n11, n12,
n01, and n02 represent the complete data.  q1 and q2 represent the
parameters for which we desire maximum likelihood estimates.


Given estimates of q1 and q2, we must estimate the the expected values of
the binomial probability densities g1() and g2().  To do so, we must estimate
n11, n12, n01, and n02 as follows:


E-step:
		n11 = s q1 n1     / (s q1 + (1-s) q2)

		n12 = (1-s) q2 n1 / (s q1 + (1-s) q2)


		n01 = s (1-q1) n0     / (s (1-q1) + (1-s) (1-q2))

		n02 = (1-s) (1-q2) n0 / (s (1-q1) + (1-s) (1-q2))


This follows because the probability of selecting g1 and obtaining a 1 is s q1;
the probability of selecting g2 and obtaining a 1 is (1-s) q2.  Considering
just the generated 1's, these two events represent a reduced sample space,
thus the probabilities are normalized by dividing by their sum.  n11 as
shown in the E-step represents E[n11|n1,n0,q1,q2].  A similar argument
applies for each of the E-step equations.

There are two degenerate cases to consider.  In the case of q1=q2=0, set
n11=n12=0 rather than using the first two equations of the E-step.  In the
case of q1=q2=1, set n01=n02=0 rather than using the last two equations of
the E-step.  

Given estimates of n11, n12, n01, and n02, the ML estimates for q1 and q2 are:


M-step:
		q1 = n11/(n11+n01)

		q2 = n12/(n12+n02)


This follows from the formulation in (1.1) and (1.2) since the number of 1's
produced by g1 and g2 are binomial random variables.  Note that in this
M-step, the computations of the ML estimates of q1 and q2 are decoupled.
To apply EM, initialize q1 and q2 to any values in the range (0.0,1.0)
and apply the E-step then M-step until convergence is obtained.

If q1=0 at the beginning of the E-step, n11 will be 0 and the M-step will
produce q1=0 again.  If q1=1 at the beginning of the E-step, n01 will be 1
and the M-step will produce q1=1 again.  Similar observations may be made
about q2=0 and q2=1.  This means that 0 and 1 are fixed points of the
iterations.  If only one of q1 and q2 is "stuck" at a fixed point, the
remaining one will still converge to a correct ML solution.  If both q1 and
q2 are "stuck", the algorithm will have converged to a possibly wrong
solution.  I believe that given non-fixed-point initial values of q1 and q2
the algorithm will only converge to one of the fixed points if that is
indeed the correct ML solution.  However, one should certainly avoid using
these fixed points as initial estimates of q1 or q2.

Note that although n1 and n0 are integers, n11, n12, n01, and n02 do not have
to be integers and the E-step generally produces non-integral values for them
during the EM iterations.  Dempster et. al. make a similar note about x1 and
x2 from their Example 1.  This is ok because, for example, n11 here really is
an expected value of n11, not an actual value of n11, and expected values
(being averages) do not have to lie in the space of possible actual values.

______________________________________________________________________________

3. EM theory
------------

In Dempster et. al. equation (2.17) defines a function Q():

(3.1)	Q(phi'|phi) = E[ log f(X|phi') | Y,phi ]

where phi and phi' are instances of the parameter vector for which we
desire an ML estimate, X is the complete data, and Y is the observed data.

The E-step says to compute Q() and the M-step says to find a parameter
vector phi' which maximizes Q().  Their paper proves that iterative
application of the E-step and M-step converges to an ML estimate of the
parameter vector under certain generally applicable conditions.  The fact
that there may be degenerate cases in which it does not converge to the
correct solution was discussed at the end of the previous section.

To prove that the algorithm presented in the previous section is an EM
algorithm, we must first produce the appropriate Q() function for the
example under consideration.  In our case, phi' is (q1,q2),
X is (n11,n12,n01,n02), and Y is (n1,n0).  For phi we will use (r1,r2)
representing the current estimates of (q1,q2):

(3.2)	Q(q1,q2|r1,r2) = E[ log(g1(n11;N1,q1) g2(n12;N2,q2)) | n1,n0,r1,r2 ]

Recall the forms of g1() and g2 from Eq.(2.2):

	g1(n11;N1,q1) = c1 q1^n11 (1-q1)^n01

	g2(n12;N2,q2) = c2 q2^n12 (1-q2)^n02

Evaluating the log of the product of g1() with g2() yields:

(3.3)	Q(q1,q2|r1,r2) =

	 E[ log(c1) + n11 log(q1) + n01 log(1-q1) +

	    log(c2) + n12 log(q2) + n02 log(1-q2) | n1,n0,r1,r2 ]

We can drop the log(c1) and log(c2) terms, since only terms involving
q1 and q2 have any effect in the M-step.  Continuing, and evaluating the
expected value yields:

(3.4)	Q(q1,q2|r1,r2) =

	 E[n11|n1,n0,r1,r2] log(q1) + E[n01|n1,n0,r1,r2] log(1-q1) +

	 E[n12|n1,n0,r1,r2] log(q2) + E[n02|n1,n0,r1,r2] log(1-q2)

Formulas for the expected values of n11, n12, n01, and n02 needed here
have already been derived in the E-step of the previous section.  There,
we used (q1,q2) to represent the current estimates; the same estimates
are being used in (3.4) but we are calling them (r1,r2) because this
equation is also a function of the estimates to be evaluated in the
next step which are called (q1,q2) here.

Applying the formulas from the E-step of the previous section:

(3.5)	Q(q1,q2|r1,r2) =

	 n11 log(q1) + n01 log(1-q1) + n12 log(q2) + n02 log(1-q2)

Note that going from (3.4) to (3.5) represents a computation, not a
substitution of notation.  For example, if we were to fully write out the
computation expressed by the  n11 log(q1)  term in (3.5) it would be:

	((s r1 n1)/(s r1 + (1-s) r2)) log(q1)

This is why Dempster's E-step says to "compute" Q().  The general form of
Q() is fixed for a specific EM algorithm application, e.g. (3.4) here.
For each iteration of the E-step, an instantiation of Q() must be computed,
given the current estimates.  For our example, this simply involves
computing the coefficients of the log() terms.  In other problems,
computing Q() may be more difficult.

Continuing, the M-step must produce values for q1 and q2 which maximize Q()
from (3.5).  In this case, we can simply set the derivatives of Q() with
respect to q1 and q2 to zero and solve:

(3.6)	d(Q(q1,q2|r1,r2)))/dq1 = 0 = n11/q1 - n01/(1-q1) ==>> q1 = n11/(n11+n01)

	d(Q(q1,q2|r1,r2)))/dq2 = 0 = n12/q2 - n02/(1-q2) ==>> q2 = n12/(n12+n02)

The resulting expressions for the optimal values of q1 and q2 are exactly
the same as were specified in the M-step of the previous section.
Since g1 and g2 are independent, the computations of of the ML estimates of
q1 and q2 are decoupled.


We should be able to write g() in terms of g1() and g2(), corresponding to
Dempster's Eq.(1.1) where the density function of the observed data is
written as an integral over the density function of the complete data.
Given that n1 = n11 + n12 from Eq.(2.3), the probability mass function of
g() is the convolution of g1() with g2().  But recall that generator g1()
is selected with probability s for each trial, so the total number of 1's
and 0's generated by g1() is a random variable, N1.  By conditioning on N1,
we can explicitly derive the result:

    P(n11+n12=n1|N1) = sum k=0 to n1 { P(n11=k|N1) P(n12=n1-k|N1) }

		     = sum k=0 to n1 { g1(k;N1,q1) g2(n1-k;N-N1,q2) }

    g(n1;N,p) = E[ P(n11+n12=n1|N1) ]

	       = sum j=0 to N { P(n11+n12=n1|N1=j) P(N1=j) }

where P(N1=j) is a binomial distribution with probability s.  We do not need
to evaluate or implement the above equations, so the fact that it appears
very messy to do so is of no consequence.


But note that if q1 = q2, then g() is binomial with the same probability
parameter p = q1 = q2 as g1() and g2().  The implication of this is that
the EM algorithm converges in one step if q1 = q2, since the ML estimates
of q1 and q2 correspond exactly to the ML estimate of p and in fact use the
same formula for their computation (i.e. the M-step and Eq.(1.2)).  This is
verified by a numerical example in the following section.

______________________________________________________________________________


4. Numerical Examples
---------------------

In the following examples, iter(i) is a function which performs i iterations
of the EM algorithm for the dual binomial problem under consideration.
When invoked with no arguments it just performs one iteration.  For each
iteration it prints the p value derived from Eq.(2.1) using the estimated
values of q1 and q2.  After the last iteration it prints q1, q2, n11, n12,
n01, and n02.

p_ml() is a function which computes the ML estimate of p directly from
Eq.(1.2).  p from the EM algorithm should converge to this value.


The first example demonstrates that, starting with q1 = q2, convergence
is obtained after just one iteration.

  s =              0.4
  q1 =             0.5
  q2 =             0.5
  n1 =               9
  n0 =               8
  m:71> iter
  p =       0.5294118
  q1 =       0.5294118
  q2 =       0.5294118
  n11 =             3.6
  n12 =             5.4
  n01 =             3.2
  n02 =             4.8
  m:72> p_ml
  ans =       0.5294118
  m:73> p-p_ml
  ans =               0
  m:74>

The next example, using the same s, n1, and n0 values as above, shows
iterative convergence starting with q1 != q2:

  m:66> q1=0.1; q2=0.2
  m:67> iter
  p =       0.5162193
  q1 =       0.3962264
  q2 =       0.5962145
  n11 =            2.25
  n12 =            6.75
  n01 =        3.428571
  n02 =        4.571429
  m:68> iter
  p =       0.5289048
  q1 =       0.4089445
  q2 =       0.6088784
  n11 =        2.763196
  n12 =        6.236804
  n01 =        3.993701
  n02 =        4.006299
  m:69> iter(8)
  p =       0.5293922
  p =        0.529411
  p =       0.5294117
  p =       0.5294118
  p =       0.5294118
  p =       0.5294118
  p =       0.5294118
  p =       0.5294118
  q1 =       0.4094561
  q2 =       0.6093822
  n11 =        2.784302
  n12 =        6.215698
  n01 =        4.015698
  n02 =        3.984302
  m:70> p-p_ml
  ans =   -2.442491e-15
  m:71>

The following example, using the same parameters as the previous example
except for different starting values of q1 and q2, shows that the final
estimates of q1, q2, and the complete data (n11, n12, n01, n02) are not
unique, though the ML estimate of p is unique:

  m:66> q1=0.2; q2=0.1
  m:67> iter
  p =       0.5139612
  q1 =       0.6333879
  q2 =       0.4343434
  n11 =        5.142857
  n12 =        3.857143
  n01 =        2.976744
  n02 =        5.023256
  m:68> iter(8)
  p =       0.5288267
  p =       0.5293897
  p =       0.5294109
  p =       0.5294117
  p =       0.5294118
  p =       0.5294118
  p =       0.5294118
  p =       0.5294118
  q1 =       0.6482015
  q2 =       0.4502186
  n11 =         4.40777
  n12 =         4.59223
  n01 =         2.39223
  n02 =         5.60777
  m:69> p-p_ml
  ans =   -6.405987e-14
  m:70>
______________________________________________________________________________


5. Binomial with Noise
----------------------

Let the observed data be a sequence of N 0's and 1's corrupted by additive
Gaussian noise.  For example, with N=4:

	Y = 1.1, 0.9, 0.4, -0.1

In this case we would like to simultaneously determine optimal estimates
for each value of Y mapping to 0 or 1 values, as well as determining the ML
estimate of p.

Each value of Y is:

	y = x + z

where x is a Bernoulli random variable with probability p, z is zero-mean
Gaussian noise with variance v, and x and z are independent.

For this EM formulation, Y is the observed data, X is the complete data,
and p is the parameter for which we desire a maximum likelihood estimate.


Let G(z;m,v) represent the probability density function for a Gaussian random
variable z with mean m and variance v:

(5.1)	G(z;m,v) = exp(-(z-m)^2/2v) / sqrt(2 pi v)

The Bernoulli probability description may be written as a list of possible
values, as a probability mass function, or as a probability density
function using delta functions:


(5.2)	P(x=1) = p,  P(x=0) = 1-p

	P(x;p) = p^x (1-p)^(1-x)  for x = 0, 1

	b(x;p) = p delta(x-1) + (1-p) delta(x)

The density function for y is easily obtained from the convolution of b(x;p)
with G(z;0,v) producing:

(5.3)	f(y;p,v) = p G(y;1,v) + (1-p) G(y;0,v)


Given an estimate of p, we must produce estimates of X.  To do so, we could
try to use maximum-likelihood estimates by comparing f(y;p,v|x=0) with
f(y;p,v|x=1) and choosing the larger one.  However, f(y;p,v|x) = G(y;x,v)
which is not a function of p, so this method would not be appropriate to use
in an EM algorithm which is supposed to be finding an ML estimate of p.
Note that checking  G(y;1,v) > G(y;0,v)  reduces to checking  y > 0.5
and does yield ML estimates of X.  But ML estimates of X do not help us
in finding an ML estimate of p.

So consider examining P(x=0|y) and P(x=1|y) and choosing whichever one is
larger, where:

(5.4)	P(x=0|y) = P(y|x=0) P(x=0) / P(y)

	P(x=1|y) = P(y|x=1) P(x=1) / P(y)

Substituting f(y;p,v) for P(y) and G(y;x,v) for P(y|x) we obtain:

(5.5)	P(x=0|y) = (1-p) G(y;0,v) / f(y;p,v)

	P(x=1|y) = p G(y;1,v) / f(y;p,v)

In the E-step of the EM algorithm, we could evaluate the probabilities from
(5.5) and choose the value of x corresponding to the larger probability.
However, since the actual values of the probabilities in (5.5) do not matter,
only their relative sizes, we can simplify this step as follows.

If P(x=1|y) > P(x=0|y) we will choose x = 1; otherwise we will choose x = 0.
P(x=1|y) > P(x=0|y) can be reduced:

(5.6)	P(x=1|y) > P(x=0|y)

	p G(y;1,v) > (1-p) G(y;0,v)

	p exp(-(y-1)^2/2v) > (1-p) exp(-y^2/2v)

	log(p) - (y-1)^2/2v > log(1-p) - y^2/2v

Rearrange terms:

(5.7)	y > 0.5 + v (log(1-p) - log(p))

Equation (5.7) expresses the decision step for choosing x = 0 or x = 1
from (5.5) as a threshold decision on y.  If y is greater than the threshold,
then the estimate for x is 1; otherwise it is 0.

We will assume that v (the variance of the noise) is fixed and known.  If v
was not known, it could be included as a parameter to be determined by the EM
algorithm and then estimated in the E-step.  Since var(Y) = var(X) + var(Z),
we have  var(Y) = p (1-p) + v  which can be solved for  v = var(Y) - p (1-p)
where var(Y) is the variance of the observed data Y.  This estimate of v
could be problematic in practice, since it may come out negative depending
on the data values and the current estimate of p.  Also, no investigation
has been done here to prove that such an estimate of v would be optimal
in any sense.

Given estimates for X, we could set n1 as the count of the number of 1's in
X, and then use the M-step of the EM algorithm to produce a maximum
likelihood estimate of p using Eq.(1.2).

However, the algorithm just described would not really be an EM algorithm
because the E-step is not using expected values.  This flaw was revealed by
numerical simulations in which p converged to either 0 or 1 instead of the
correct ML value in cases of low SNR.

Before specifying the correct EM algorithm for this problem, we will first
show that the estimates from Eq.(5.7) are actually equivalent to an
expected-value-based approach.


To estimate X using expected values, given p:

(5.8)	E[x|y]  =  (0) P(x=0|y) + (1) P(x=1|y)  =  P(x=1|y)

So, using P(x=1|y) from Eq.(5.5):

(5.9)	E[x|y]  =  p G(y;1,v) / f(y;p,v)

If we assume that in order to generate n1, the count of the number
of 1's in X needed for the M-step, we compare E[x|y] with 0.5
for each x and count it as a 1 if E[x|y] exceeds 0.5, otherwise
count it as a 0, then we have:

(5.10)	E[x|y] > 0.5

	p G(y;1,v) / f(y;p,v) > 0.5

	p G(y;1,v) > 0.5 f(y;p,v)

	p G(y;1,v) > 0.5 [ p G(y;1,v) + (1-p) G(y;0,v) ]

	p G(y;1,v) > (1-p) G(y;0,v)

At this point, note that the result is the same as the second step in
(5.6), so this formulation leads to the same threshold equation as (5.7).

In this case, we have not estimated the X values as being 0 or 1.  We have
just estimated their expected values, so for each x we have E[x|y] as a
number in the range 0.0 to 1.0.  However, to produce n1 as an integer
count, we made the assumption that if E[x|y] > 0.5 then x is 1, otherwise x
is 0.


To summarize the results so far, we know that determination of 0 and 1 values
for X using either  P(x=1|y) > P(x=0|y)  or  E[x|y] > 0.5  leads to the
same threshold decision on y, Eq.(5.7).


To continue, and produce a correct EM algorithm, we must assume that
n1 does not have to be an integer.  If X contains all 0 and 1 values,
n1 can be written as sum(X).  If we use the non-integral E[x|y] values
as estimates of X, then we can estimate n1 as sum(E[X|Y]).  Thus, when
we write n1, we really mean the expected value of n1.

So the EM algorithm is:

E-step:
		E[x|y] = p G(y;1,v) / f(y;p,v)

		n1 = sum(E[X|Y])

M-step:
		p = n1/N


To prove that this is an EM algorithm, consider the Q() function from (3.1):

	Q(phi',phi) = E[ log f(X|phi') | Y,phi ]

In this example, phi' is p, and for phi we will use r, representing the
current estimate of p.  For f(X|p) we substitute P(X;p) using (5.2) for
P(Xi;p), i = 1..N:

(5.11)	Q(p,r) = E[ log P(X;p) | Y,r ]

(5.12)	P(X;p) = P(x1;p) P(x2;p) ... P(xN;p)

	       = p^x1 (1-p)^(1-x1) p^x2 (1-p)^(1-x2) ... p^xN (1-p)^(1-xN)

	       = p^S + (1-p)^(N-S)

	where  S = x1 + x2 + ... + xN

Evaluating the log of P(X;p) yields:

(5.13)	Q(p,r) = E[ S log(p) + (N-S) log(1-p) | Y,r ]

Evaluating the expected value yields:

(5.14)	Q(p,r) = E[S|Y,r] log(p) + (N - E[S|Y,r]) log(1-p)

	with  E[S|Y,r] = E[x1|Y,r] + E[x2|Y,r] + ... + E[xN|Y,r]

The formula for E[S|Y,r] needed here has already been derived in the E-step
above.  There, we used p to represent the current parameter estimate; the
same estimate is being used in (5.14) but we are calling it r because this
equation is also a function of the estimate to be evaluated in the next
step which is called p here.

Rewriting the formula from the E-step above in terms of the notation being
used here we have:

(5.15)	E[xi|Y,r] = r G(yi;1,v) / f(yi;r,v)	i = 1..N

	n1 = E[S|Y,r] = sum i=1 to N { E[xi|Y,r] }

Substituting the result of the computations from (5.15) into (5.14),
and letting  n0 = N - n1, we have:

(5.16)	Q(p,r) = n1 log(p) + n0 log(1-p)

The M-step must produce the value of p which maximizes Q() in (5.16).
Set the derivative of Q() with respect to p to 0 and solve for p:

(5.17)	d(Q(p,r))/dp = 0 = n1/p - n0/(1-p)  ==>>  p = n1/(n1+n0)

which is exactly the same formula as the M-step previously specified, and
confirms that the algorithm derived in this section is an EM algorithm.


Following are some numerical examples using N = 50 and p = 0.4
with 0.5 as the initial estimate of p.

To check the values of p produced by the EM algorithm, the function p_ml()
computes the ML estimate of p directly by numerically maximizing f(Y;p,v)
with respect to p, where f(Y;p,v) = f(y1;p,v) f(y2;p,v) ... f(yN;p,v)
(The implementation of p_ml() actually minimizes -log(f(Y;p,v)) by first
evaluating it on a grid of 1001 points of p from 0.0 to 1.0 to get an
initial estimate of the minimum; then an iterative interval-halving method
is used to produce an estimate of the minimum with a relative error
tolerance of 1E-12).


In this example, v = 0.1 (SNR = 3.8 dB):

  p =             0.5
  m:271> iter(5)
  p =       0.3717019
  p =       0.3527167
  p =       0.3497434
  p =        0.349271
  p =       0.3491957
  m:272> pml = p_ml()
  m:273> pml
  pml =       0.3491815
  m:274> p-pml
  ans =    1.426443e-05
  m:275> sum((ex>0.5)!=x_save)
  ans =               3
  m:276> n1
  n1 =        17.45979
  m:277> sum(x_save)
  ans =              17

After 5 iterations, the algorithm reached a value of p very close to the ML
estimate.  After convergence, we can estimate X as 0 or 1 values by
comparing ex (representing E[x|y]) with 0.5 which shows in this case that 3
of the estimated X values differ from the original generated data (x_save).

Note that the final estimate of n1 is not an integral value, but ends up
close to the actual number of 1's in the original generated data.

Following is an example with low SNR using v = 0.5 (SNR = -3.2 dB):

  p =             0.5
  m:271> iter(5)
  p =       0.4272446
  p =       0.3843555
  p =       0.3591917
  p =       0.3443184
  p =       0.3354554
  m:272> iter(10)
  p =       0.3301409
  p =       0.3269405
  p =       0.3250079
  p =       0.3238389
  p =        0.323131
  p =        0.322702
  p =        0.322442
  p =       0.3222843
  p =       0.3221887
  p =       0.3221307
  m:273> pml=p_ml()
  m:274> pml
  pml =       0.3220413
  m:275> p-pml
  ans =    8.942352e-05
  m:276> sum((ex>0.5)!=x_save)
  ans =              10
  m:277> n1
  n1 =        16.10653
  m:278> sum(x_save)
  ans =              17

This example converged slowly, and reached an accurate ML estimate of p.
However the final estimates of X are not very good, since 10 out of 50 of
them are wrong.  Nevertheless, n1 is fairly close to the number of 1's in
the original generated data.