These are bad/wrong sections from an earlier version of NOTES/normalization
which I'm just keeping here for reference, lest someone make the same
mistakes again.

---
The following normalization attempt is wrong.  Normalization is only
needed if P(K) is not uniform, and simply involves multiplying by P(K).
---

  3. Attempt to Normalize the Probabilities
  -----------------------------------------
  Let's attempt to normalize the probabilities in this example.  For each
  value of K, we define the normalization value as the sum of all
  probabilities for all possible combinations and permutations of the
  measurement associations, including missed target events.  This includes
  measurement association combinations, permutations, and missed target
  events which are not actually considered as possibilities in the Viterbi
  initialization step (due to reasons mentioned above for required Kalman
  filter initializations and eliminating redundant paths), but which are
  nevertheless actually possible.
  
  The result, from track/normalization.m, is:
  
    K  cost      P
    -  ----      -
    0  0.00000   1.000000
    1  1.15268   0.315789
    2  2.10006   0.122449
    3  2.78166   0.061935
  
  This indicates that K=0 is the best choice.  In fact, the normalized
  probability for K=0 will always turn out to be 1, regardless of the apriori
  probabilities of missed target events and false alarms.  This will always
  be the best choice, since under the assumption that K=0 (no targets
  present) there is only one probability involved (the Poisson probability
  for all measurements being false alarms), and after normalization that
  becomes 1.
  
  This seems like a circular argument, since if we assume K=0, then the
  probability that K=0 will be 1.  Perhaps we also need apriori probabilities
  for K?

---
The following EM approach is wrong since it does not identify what the
complete data is.  A correct approach would try to estimate the complete
data, such as numbers of missed target events vs. time, not just Pmiss.
---

  6. EM Approach
  --------------
  Let's perform some initial investigation of a possible EM approach, using a
  pragmatic implementation rather then deriving all of the necessary
  prerequisite equations such as Dempster's Q() function.  If this turns out
  good, then we should go back and perform a more rigorous derivation.
  
  If K, Pmiss, and false_alarm_rate are all unknown, I think that derivation
  of a correct solution will be impossible.  Various correct and incorrect
  estimates of K, Pmiss, and false_alarm_rate will probably lead to the same
  global maximum of P(K|MM).  Of course, as long as the estimate of K comes
  out right, the estimated values of Pmiss and false_alarm_rate may not be
  very important.  Still, Pmiss and false_alarm_rate are used in the Viterbi
  cost calculations, so they are important.
  
  In any case, to simplify this initial investigation, let's assume that
  false_alarm_rate is known, and only Pmiss and K are unknown.  We'll start
  with some initial estimate of Pmiss, then iteratively compute ML estimates
  of K and expected values of Pmiss.
  
  It would seem that given an ML estimate of K, the best way to estimate
  Pmiss would be using the results of the Viterbi algorithm for that value of
  K, since that gives us an explicit list of target tracks, including missed
  target events.  However, for this simplified initial investigation, let's
  just try to estimate K and Pmiss using the ML estimate of K with no Viterbi
  algorithm.
  
  Given a value for Pmiss, the ML solution for K was already derived above.
  To perform the EM algorithm, all we now need is a way to compute the
  expected value of Pmiss given K.  This can be done as follows:
  
    E[MM(i)] = K + E[#false detections] - E[#missed target events]
  
  So for each i=1..N,
  
    estimated #missed target events = K + false_alarm_rate - MM(i)
  
  and:
  
    E[#missed target events] = K + false_alarm_rate - sum(MM)/N
  
  Given the Binomial distribution of missed target events:
  
    E[#missed target events] = K * Pmiss
  
  So we can estimate Pmiss as:
  
    Pmiss = (K + false_alarm_rate - sum(MM)/N) / K
  
  This applies for K>0.  For K=0, Pmiss is not needed, since there are no
  targets.
  
  Checking a numerical example, using 2 targets with N=21, Pmiss=0.2, and
  false_alarm_rate=2 to generate the data, we have:
  
    MM = [4  3  2  2  5  9  5  4  4  0  3  3  5  2  2  5  2  3  3  2  2]
  
  Using an initial estimate of Pmiss=0.6, the ML estimate of K is 3.  Using
  K=3, E[Pmiss]=0.55556.  Using that expected value of Pmiss and redoing the
  ML estimate of K we again get K=3, so the EM algorithm has converged, but
  the solution does not agree with the generated data.
  
  Viterbi results, just pasted in here but not analyzed:
  
  octave:132> [combos(:,path_index(:,1))' n1' n2' n_false_detect']
  ans =
  
     1   2   4   1   1   2
     1   2  10   1   1   1
     1  10  10   0   1   1
     1   2  10   1   1   0
     1   2  10   1   1   3
     1   2   4   1   1   7
     1   2  10   1   1   3
     1   2  10   1   1   2
     1   2  10   1   1   2
    10  10  10   0   0   0
     2  10  10   0   1   2
     1   2  10   1   1   1
     1  10  10   1   0   4
     1  10  10   1   0   1
     1  10  10   1   0   1
     1  10  10   1   1   3
     1  10  10   1   0   1
     1  10  10   1   0   2
     1  10  10   1   0   2
    10  10  10   0   1   1
     1  10  10   1   1   0
  
  octave:133> L
  L = 4
  octave:134> w_min
  w_min =
  
    188.53
    188.84
    189.00
    189.11
  
  octave:135> Pmiss
  Pmiss = 0.60000
  octave:136>
  
  Starting over, with an initial estimate of Pmiss=0.4, the ML estimate of K
  is 2.  Using K=2, E[Pmiss]=0.33333.  Using that expected value of Pmiss and
  redoing the ML estimate of K we again get K=2, so the EM algorithm has
  converged to the correct value of K in this case.
  
  Clearly, this EM algorithm converges quickly but is sensitive to the
  initial estimate of Pmiss.