0. Introduction
---------------
Here we consider using Viterbi cost results vs. K to determine an optimal
estimate of the number of tracks.  First, the initial cost assignments are
examined for an example with just one measurement set in sections 1 and 2.
Then section 3 presents the formulation of the general tracking problem.
Sections 4 and 5 consider estimating K based only on the numbers of
measurements vs. time, without using the X,Y measurement values, and
compares this with the Viterbi probability estimates.  Section 6 derives an
optimal estimate of K using all of the available data, and section 7
presents some numerical results.


1. Initial Viterbi Cost Assignments
-----------------------------------
Consider the initial cost assignments, using one set of measurements with
3 x,y values, i.e. N=1, MM(1)=3

Let mi=1,2,3 represent the measurement index used for track #i, with mi=4
indicating a missed target event.  Let n_false represent the number of
false alarms, Pmiss the apriori probability of a missed detection, and
p_poisson(n,rate) the Poisson probability of n events given a rate (apriori
rate of false detections).

If K=0:

  P = p_poisson( 3, false_alarm_rate)

If K=1:

  m1  n_false   P
  --  -------   -
   1    2       (1-Pmiss) * p_poisson( 2, false_alarm_rate)
   2    2       (1-Pmiss) * p_poisson( 2, false_alarm_rate)
   3    2       (1-Pmiss) * p_poisson( 2, false_alarm_rate)
   4    3       Pmiss     * p_poisson( 3, false_alarm_rate)

  We can see in this case that the associations of measurement #1, 2, or 3
  with track #1 all have the same probability.  The missed target event
  has a different probability.  These probabilities do not sum to 1.

  Furthermore, the missed target event is not allowed in the current
  simulation initialization step, since that would mean that no Kalman
  filters would be initialized.  For missed target events, the Kalman
  filter prediction from the previous stage is used, but in this
  initialization step there is no previous stage.  So we are really
  only considering the first three lines from the above table.

If K=2:

  m1  m2  n_false   P
  --  --  -------   -
   1   2    1       (1-Pmiss)**2      * p_poisson( 1, false_alarm_rate)
   1   3    1       (1-Pmiss)**2      * p_poisson( 1, false_alarm_rate)
   2   3    1       (1-Pmiss)**2      * p_poisson( 1, false_alarm_rate)
   1   4    2       Pmiss * (1-Pmiss) * p_poisson( 2, false_alarm_rate)
   2   4    2       Pmiss * (1-Pmiss) * p_poisson( 2, false_alarm_rate)
   3   4    2       Pmiss * (1-Pmiss) * p_poisson( 2, false_alarm_rate)
   4   4    3       Pmiss**2          * p_poisson( 3, false_alarm_rate)

  In general, for K>1, all possible permutations of the association of
  measurement indices to tracks is considered.  But, in the initialization
  step we are examining here, only the first permutation is allowed by the
  Viterbi cost function.  This prevents the algorithm from producing redundant
  paths which simply have all measurements interchanged between the detected
  tracks.  For later stages of the algorithm, all permutations are considered,
  so, for example besides the m1=1, m2=2 entry, there would also be an entry
  for m1=2, m2=1.

  Also, as noted above for K=1, missed target events are not allowed in the
  initialization step, so the lines in the table above having a 4 for m1 or m2
  are not used.

If K=3:

  m1  m2  m3  n_false   P
  --  --  --  -------   -
   1   2   3    0       (1-Pmiss)**3 * p_poisson( 0, false_alarm_rate)

  Here, I have omitted the redundant path associations and lines representing
  missed target events which would not be used in the initialization step.
  Thus, with 3 measurement and K=3, only one possibility is allowed.

With 3 initial measurements, K>3 is not allowed, so we can stop here with
the theoretical analysis of this example.

2. Numerical Example of Initial Viterbi Cost Assignments
--------------------------------------------------------
Let's now look at it numerically.  Using Pmiss=0.2, false_alarm_rate=2,
and showing costs as -log(P), after the Viterbi algorithm initialization
using a single set of 3 measurements (the actual measurement values do not
matter, since no Kalman filter predictions are used in the initialization
step):

  K  cost     P(Viterbi)
  -  ----     ----------
  0  1.7123   0.180450
  1  1.5300   0.216536
  2  1.7531   0.173236
  3  2.6694   0.069294

A question arises, that since the various probabilities for different
values of K were not normalized to sum to 1, can we really directly compare
these costs and say that K=1 is the best choice because it has lowest
cost?  The following section answers this question and shows that the
Viterbi probabilities vs. K can be directly compared if P(K) has a uniform
distribution.

3. General Tracking Problem Formulation
---------------------------------------
The Bayesian probability function we want to maximize for the general
tracking problem is:

  P(tau|Z) = P(Z|tau) * P(tau) / P(Z)

where tau represents a choice of K and a set of measurement-target
associations or tracks, including possible missed target events and false
detections.  Z=(Z1,Z2,...,ZN) represents all of the measurement data, with
Zi being the MM(i) measurements at time i=1..N.  P(Z|tau) is computed from
Kalman filter innovations for measurements corresponding to targets, or
from the uniform distribution of false detections.  P(tau) is the apriori
probability of a set of tracks, and can be computed if the probability of
missed detections and the false alarm rate are known.

P(Z) acts basically as a normalization factor, since it is the sum of all
possible numerator terms for P(tau|Z), and serves to make the possible
values of P(tau|Z) sum to 1.  Since P(Z) is independent of tau, it can be
ignored.

The normalization question from the previous section concerns a factor
which might be needed for different values of K.  Since any instance of the
K-path Viterbi algorithm presumes K given, P(tau|Z) can be written as:

  P(tau|Z) = P(Z|tau',K) * P(tau'|K) * P(K) / P(Z)

where tau=(tau',K) partitions tau into a track association part, tau',
and the number of tracks part, K.

P(Z|tau',K)*P(tau'|K) is the probability produced and maximized by the
Viterbi algorithm for each K.  So clearly, if P(K) has a uniform
distribution, no normalization of Viterbi probabilities is needed,
and values of P(Z|tau',K)*P(tau'|K) for different values of K can
be directly compared.  If P(K) does not have a uniform distribution,
then the P(Z|tau',K)*P(tau'|K) values would have to be normalized
by multiplying by P(K).

Note that for the Viterbi initialization step, using the first set of MM(1)
measurements, the P(Z1|tau',K) factor is treated as 1, and only
P(tau'|K) is used for the initial cost.  The various track initiation
probabilities are computed based only on the number of measurements, as
shown in section 1 above.  However, the measurement values Z1 are used to
initialize the Kalman filters.

4. Bayesian Estimate of K for N=1
---------------------------------
To directly compute an optimal estimate of K given one set of MM
measurements, write P(K|MM) using Bayes rule:

  P(K|MM) = P(MM|K) * P(K) / P(MM)

where P(MM) is defined by the rule of total probability:

  P(MM) = sum K=0..K_max { P(MM|K) * P(K) }

We are assuming that K_max is the maximum possible value of K simply for
notational convenience here, but K_max could be infinite, so there is
no loss of generality in this notation.

P(MM) acts basically as a normalization factor, since it is the sum
of all possible numerator terms for P(K|MM), and serves to make the
the possible values of P(K|MM) sum to 1.

We need values for P(K), K=0..K_max, the apriori probabilities for the
occurrence of K targets.  If P(K) has a uniform distribution, then
P(K)/P(MM) is a constant independent of K; in this case, finding the
maximum of P(K|MM) is the same as finding the maximum of P(MM|K).

Note that the probabilities used to initialize the costs in the Viterbi
algorithm come from P(tau'|K) and are not the same as P(MM|K) which we will
discuss below.

Continuing, P(MM|K) is given by:

  P(MM|K) = sum j=max(0,MM-K):MM
	    {
	      p_binomial( K, K+j-MM, Pmiss) * p_poisson( j, false_alarm_rate)
	    }

where j is the number of false alarms, K+j-MM is the number of missed target
events, and:

  p_binomial(K,m,p) = C(K,m) * p**m * (1-p)**(K-m)

where C(K,m) is the number of combinations of K things taken m at a time,
and for the degenerate case of K=0, p_binomial(0,0,p) == 1.

Checking this numerically, using Pmiss=0.2, false_alarm_rate=2, MM=3,
and K_max=3 as in the previous examples, and assuming a uniform distribution
for P(K) (using track/ML.m):

  K   P(K|MM)  P(MM|K)  P(K)   P(Viterbi)
  -   -------  -------  ----   ----------
  0   0.20032  0.18045  0.25   0.180450
  1   0.28045  0.25263  0.25   0.216536
  2   0.29647  0.26706  0.25   0.173236
  3   0.22276  0.20066  0.25   0.069294

The last column just relists the probabilities which would be used in
the Viterbi cost initialization from section 2 above for comparison.

Note that both P(K|MM) and P(MM|K) have a maximum for K=2, and K=2
would be the optimal Bayesian estimate of K for this example.

For K=0, both P(MM|K) and the Viterbi probability are equal to
p_poisson( MM, false_alarm_rate).

For K=1:

  P(Viterbi) =  (1-Pmiss) * p_poisson( 2, false_alarm_rate)

  P(MM|K)    =  (1-Pmiss) * p_poisson( 2, false_alarm_rate)
	       + Pmiss    * p_poisson( 3, false_alarm_rate)

That is, P(MM|K) is the total probability that MM=3 given K=1, considering
both possible cases of 0 or 1 missed target event.

But P(Viterbi) is the probability of a specific combination of measurement
events, namely 1 target event and 2 false alarms.  The Viterbi
initialization step does not allow missed target events, for reasons
discussed above.  However, in later steps of the Viterbi algorithm, where
missed target events are allowed, P(Viterbi) will still not be the same as
P(MM|K) because each combination of measurement events is considered
separately.  That is, the Viterbi algorithm would consider two separate
probabilities:

  P1 = (1-Pmiss) * p_poisson( 2, false_alarm_rate)

  P0 =  Pmiss    * p_poisson( 3, false_alarm_rate)

and each of these would be associated with separate state transitions.

In the previous tables showing numerical Viterbi initialization
probabilities, only a single largest probability was shown, since that is
what would be chosen as the best solution to the Viterbi trellis.  However,
at the end of the Viterbi algorithm, we actually have a set of costs,
representing the different choices for associating measurements with
tracks.  For this example, with K=1 and only 1 set of 3 measurements, the
final Viterbi cost array, in terms of probabilities, is:

  m1   P(Viterbi)
  --   ----------
   1   0.21654
   2   0.21654
   3   0.21654
   4   0.00000

where m1=i means measurement i is chosen for track #1.  Since the first
three probabilities are equal, this indicates that any of the 3 measurements
are equally likely to be associated with the single track.  m1=4, representing
the missed target event, was not allowed, thus has a probability of 0.

For K=2:

  P(Viterbi) =  (1-Pmiss)**2           * p_poisson( 1, false_alarm_rate)

  P(MM|K)    =  (1-Pmiss)**2           * p_poisson( 1, false_alarm_rate)
	       + 2 * Pmiss * (1-Pmiss) * p_poisson( 2, false_alarm_rate)
	       + Pmiss**2              * p_poisson( 3, false_alarm_rate)

As before, the Viterbi algorithm does not allow missed target events during
initialization, so the only probability used corresponds to 2 detected
targets and 1 false alarm.  And P(MM|K) is the total probability, including
all possible combinations of missed target events.

The complete set of probabilities produced by the Viterbi algorithm for
this case, for the different possible association of measurements with
tracks, is:

  m1  m2   P(Viterbi)
  --  --   ----------
   1   2   0.17323
   1   3   0.17323
   2   3   0.17323
   1   4   0.00000
   2   4   0.00000
   3   4   0.00000
   4   4   0.00000

So the first three possibilities, representing association of different
combinations of 2 out of the 3 the measurements with the 2 tracks, are
equally likely.  And the missed target event possibilities have 0
probability, since they are not allowed in the Viterbi initialization
step.  During the Viterbi initialization, only one permutation of the
measurement associations with tracks is allowed.  But in general, during
later stages of the Viterbi algorithm, all possible permutations are
considered, thus additional probabilities would be considered.  For
example, besides the m1=1, m2=2 case in the first line of the table
above, the case of m1=2, m2=1 would also be considered.

For K=3:

  P(Viterbi) =  (1-Pmiss)**3              * p_poisson( 0, false_alarm_rate)

  P(MM|K)    =  (1-Pmiss)**3              * p_poisson( 0, false_alarm_rate)
	       + 3 * Pmiss * (1-Pmiss)**2 * p_poisson( 1, false_alarm_rate)
	       + 3 * Pmiss**2 * (1-Pmiss) * p_poisson( 2, false_alarm_rate)
	       + Pmiss**3                 * p_poisson( 3, false_alarm_rate)

The complete set of probabilities produced by the Viterbi algorithm for
this case, for the different possible association of measurements with
tracks, is:

  m1  m2  m3   P(Viterbi)
  --  --  --   ----------
   1   2   3   0.06929
   1   2   4   0.00000
   1   3   4   0.00000
   2   3   4   0.00000
   1   4   4   0.00000
   2   4   4   0.00000
   3   4   4   0.00000
   4   4   4   0.00000

It is clear that P(Viterbi) differs from P(MM|K) for N=1, and it is not
clear which represents the best solution.  Certainly, with the Viterbi
algorithm, we are not only estimating the number of tracks, but also the
association of measurements with tracks, which is the overall problem we
are trying to solve.  But in the degenerate case of N=1, the various
possible measurement to track associations all have the same probability
for a given value of K, so P(Viterbi) doesn't really help to decide on
the best measurement-track associations.

5. Bayesian Estimate of K for N>1
---------------------------------
The analysis in the previous section for N=1 is of some interest because
neither the direct Bayesian estimate of K nor the Viterbi algorithm use the
measurement X,Y values; they only use MM, the number of measurements.

We can extend the Bayesian estimate of K to the case of N>1, again using
only the number of measurements at each time and not using the measurement
X,Y values.  But for N>1, the Viterbi algorithm will use the measurement
X,Y values, and therefore must be a better solution.  Nevertheless, a
separate easily computed estimate of K may be of some use in cases where
the number of tracks may be changing with time, so it seems worth
investigating.

Consider having N measurement sets, so MM is a vector and MM(i) is the
number of measurements at time i=1..N.  P(K|MM) is given by Bayes rule:

  P(K|MM) = P(MM(1),MM(2),...,MM(N)|K) * P(K) / P(MM)

If the numbers of measurements at each time are independent we have:

  P(K|MM) = P(MM(1)|K) * P(MM(2)|K) * ... P(MM(N)|K) * P(K) / P(MM)

where P(MM(i)|K) can be computed using the formula from the previous section.

Checking this for the numerical example from NOTES/Kx9-4.txt, with N=21,
K_max=4, Pmiss=0.1, and false_alarm_rate=1:

  K   P(K|MM)    P(MM|K)   P(K)   P(Viterbi)
  -   -------    -------   ----   ----------
  0   2.01e-13   3.99e-25   .2    2.37e-91
  1   5.11e-05   1.02e-16   .2    1.29e-40
  2   1.00e+00   1.99e-12   .2    2.15e-18
  3   1.59e-04   3.17e-16   .2    7.18e-38
  4   6.72e-14   1.33e-25   .2    7.42e-56

Note that for K=0:

  P(MM|K=0) = product i=1:N { p_poisson(MM(i),false_alarm_rate)) }

  P(Viterbi) = P(MM|K=0) * product i=2:N { (1/16)**MM(i) }

since the Viterbi probability includes P(Z|tau',K) which is given by
a uniform distribution of false detections over the X,Y 0..4,0..4
observation region.

For K!=2, P(K|MM) is very small, and for K=2, P(K|MM) is almost 1 and is
shown rounded to 1.00 above.  The maximum of P(K|MM) and P(Viterbi) both
occur at K=2, which is correct since 2 tracks were used to generate the
measurements for that simulation.

Checking this for the numerical example from NOTES/Kx9-5.txt, with N=21,
Pmiss=0.2, and false_alarm_rate=2:

  K   P(K|MM)    P(MM|K)   P(K)   P(Viterbi)
  -   -------    -------   ----   ----------
  0   1.51e-05   6.37e-20   .2    1.34e-100
  1   6.57e-02   2.78e-16   .2    1.96e-71
  2   9.20e-01   3.89e-15   .2    1.86e-59
  3   1.42e-02   6.01e-17   .2    1.05e-70
  4   4.37e-07   1.85e-21   .2    8.11e-94

Again the maximum of P(K|MM) and P(Viterbi) both occur at K=2, which
matches the 2 tracks used to generate the simulation data.

The most interesting aspect of these results is that, given only the
numbers of measurements at each time, and not using the actual measurement
X,Y values, a Bayesian estimate of K can be computed, and correctly matches
the value used to generate the data.  This is possible because of the
knowledge of the apriori probabilities of missed target events and false
detections.

This seems to indicate that, given apriori probabilities for missed target
events and false detections, a Bayesian estimation of the number of tracks
could be used in conjunction with the Viterbi algorithm to perform the data
association and produce the estimated X,Y values for the tracks.

A further consideration is that if the probabilities of missed target
events or false detections are unknown, an EM algorithm may be devised to
estimate these in conjunction with producing a maximum likelihood estimate
of the tracks.

6. Maximizing P(K|Z)
--------------------
The Bayesian probability which we want to maximize for the general tracking
problem was discussed in section 3 and can be written as:

  P(tau|Z) = P(tau',K|Z)

where tau=(tau',K) partitions tau into a track association part, tau',
and the number of tracks part, K.  To determine the number of tracks,
we need to maximize:

  P(K|Z) = sum over tau' { P(tau',K|Z) }

This should be "better" than looking at P(K|MM) which can be computed from
just the numbers of measurements, MM, and does not use the measurement X,Y
values.  P(K|Z) should be less sensitive than P(K|MM) to the values of the
apriori probabilities of missed target events and false alarms, so P(K|Z)
should be a more robust indicator of K than P(K|MM) when the values of the
apriori probabilities are uncertain or not precisely known.

In the Viterbi algorithm, keeping just the L best sets of tracks (tau'), we
can not perform the sum over all possible tau' to compute P(K|Z).
Nevertheless, we can perform the sum over the L best sets of tracks that we
do have, and that may be a useful approximation to P(K|Z).

But at the end of the Viterbi algorithm, we actually have L sets of tracks
for each ending state, i.e. L*M sets of tracks where M is the number of
states.  If the best L sets of tracks do not all end in the same state, we
can actually extract more than the best L sets of tracks.  In any case, we
have L*M costs at the end, so we will use all of these in order to provide
a better estimate for the sum over tau' than just using the final L best.

Furthermore, we will consider extracting the the best k-path results for
k=1..K from a single run of the Viterbi algorithm for a given value of K,
thus enabling us to generate estimates of P(k|Z) for k<=K without rerunning
the algorithm from scratch for the lower values of k.  This is possible by
simply keeping tract of the contribution of the individual tracks to the
K-track total cost function.

So as currently implemented, P(k|Z) for k=1..K is computed from a single
run of the K-path Viterbi algorithm, with all L*M final costs being used in
the computation for k=K, but just the L-best K-path results used to extract
costs for k<K.  This may be extended to use all L*M final costs in the
computations for k<K at the expense of some additional work to produce the
extra paths at the end of the Viterbi algorithm.

7. Numerical Examples of Maximizing P(K|Z)
------------------------------------------
First, let's reexamine the example from sections 2 and 4, which just use
one set of 3 measurements:

  K   P(K|MM)  P(MM|K)  P(K)   P(Viterbi)   P(K|Z)
  -   -------  -------  ----   ----------   ------
  0   0.20032  0.18045  0.25   0.180450     0.054825
  1   0.28045  0.25263  0.25   0.216536     0.208333
  2   0.29647  0.26706  0.25   0.173236     0.396930
  3   0.22276  0.20066  0.25   0.069294     0.339912

The first five columns above are repeated from section 4.  The last column
is P(K|Z) from section 6, produced by summing Viterbi's P(tau',K|Z) over
all possible tau', including all possible combinations and permutations of
measurement associations and missed target events, even those not actually
allowed in our Viterbi algorithm initialization step for algorithmic
reasons, since all of these are possible theoretically.  The P(K|Z)
computations for this example are implemented in the track/PK1.m script.

P(K|Z) has been normalized so that it sums to 1 over K.  That normalization
does not affect the relative values of P(K|Z) vs. K and is simply done for
convenience.  The maximum of P(K|Z) occurs for K=2 whether it is normalized
or not.

Note that P(Viterbi), which comes from a single Viterbi best path estimate
for different values of K, peaks at K=1, whereas P(K|Z) peaks at K=2 which
agrees with the maximum of P(K|MM).  P(K|Z) seems to have a more clearly
visible maximum at K=2 compared to P(K|MM).  Neither one uses the actual
measurement X,Y values in any probability calculation, but P(K|Z) includes
all measurement-target association possibilities, reflected in the sum over
tau', so it includes more probabilities based on the number of measurements
than P(K|MM).

This example is a rather degenerate case, since it has only one set of
measurements, and the tracking problem, by definition, involves the
association of measurement sequences with tracks over successive time
instants, thus requiring more than one set of measurements.

Before considering more realistic problems, let's first examine a case with
two sets of measurements, constructed by hand (i.e. not using Pmiss and
false_alarm_rate to generate the data) which has two very apparent tracks.
In this case, at the second stage of the Viterbi algorithm, the Kalman
filter initializations from Z1, and Kalman predictions and innovations
based on Z2, are used to compute P(Viterbi) and P(K|Z), so these are using
the X,Y values whereas P(K|MM) is not and is only using the information
about the numbers of measurements at each time.

x23.m creates two sets of three (X,Y) measurements:

   time=1       time=2
   X    Y       X    Y
  ---  ---     ---  ---
  1.0  1.0     1.1  1.1
  3.0  3.0     3.1  3.1
  0.0  4.0     4.0  0.0

The obvious manual/visual analysis of this data would conclude that
there appear to be two tracks: (1,1)->(1.1,1.1) and (3,3)->(3.1,3.1),
with (0,4) and (4,0) being either false alarms or a third track.

L=6 will be used for this example, which produces the complete set of all
possible tracks and associated costs at the end of the Viterbi algorithm.
So in this example, the estimate of P(K|Z) does implement the complete sum
over all possible tau' and is not an approximation.

Using Pmiss=0.2, false_alarm_rate=2:

  K   P(K|Z)     P(K|MM)
  -   ------     -------
  0   0.0002     0.157
  1   0.0439     0.307
  2   0.8855     0.343
  3   0.0703     0.194

P(K|Z) seems to have a more clearly visible maximum at K=2 compared to P(K|MM).

Using Pmiss=0.4, false_alarm_rate=4:

  K   P(K|Z)    P(K|MM)
  -   ------    -------
  0   0.0093    0.4228
  1   0.2387    0.3055
  2   0.7111    0.1814
  3   0.0410    0.0903

In this case, P(K|Z) still has a clearly visible maximum at K=2, whereas
P(K|MM) has a maximum at K=0, so P(K|Z) is less sensitive to the values of
the apriori probabilities than P(K|MM).

Now consider a more realistic problem, using x99.m to generate N=21
measurements of two parabolic tracks with Pmiss=0.2 and false_alarm_rate=2.

      P(K|Z) vs. K and L

  K   L=1          L=4          L=8          L=16
  -   ---          ---          ---          ----
  0   4.9733e-51   4.5822e-53   2.8603e-53   1.6323e-53
  1   5.7788e-18   2.0052e-19   2.061e-19    1.7232e-19
  2   1            1            0.99944      0.99958
  3   1.2451e-06   1.2039e-06   0.00055676   0.00041684
  4   8.952e-49    6.5129e-24   2.7783e-24   9.8716e-24

      6.4          30           61.4         128.86	  CPU minutes

The last row shows the Matlab run-time in minutes on dft.ee for computing
each column of the table.  Octave run-times would be about twice as long.

These results clearly indicate that K=2 is the best Bayesian estimate of K,
and K=2 does correspond to the generated data for this example.  The
results seem to indicate that very large values of L are not needed in
order to estimate P(K|Z).  Of course, larger values of L may be appropriate
and necessary in order to determine various alternative (and significantly
different) measurement-track associations, and L>q is necessary to
determine the ultimate q best tracks.  But for simply estimating the number
of tracks, it seems that a large value of L is not necessary.

The run-time and number of computations are basically proportional to L,
since for each next Viterbi state the L-best of each previous state must be
examined.  Since dft.ee has a 168 MHz CPU, the new Suns with 333 MHz CPUs
should run these simulations about twice as fast.  Nevertheless, these
run-times are getting excessive for large problems due to Matlab and Octave
slowness in executing non-vectorized loops over scalar data.  Although
vectorization was used wherever possible, the Viterbi code should be
reexamined for places to increase vectorization and eliminate loops to
speed it up.

The P(K|Z) computations for x99.m as shown above were implemented in the
track/PK.m script which runs the Viterbi algorithm separately for each
value of K.  The newer version of those computations, implemented in the
track/PKK.m script, works after running the Viterbi algorithm just once for
some value of K, then produces estimates of P(k|Z) for k=0..K as described
at the end of the previous section.

For the x99.m example above, after running the Viterbi algorithm with K=4,
PKK.m produces:

  K          PKZ          PKZ1
  -          ---          ----
  0        97.47          96.806
  1        26.52          26.295
  2   3.3227e-12      4.0129e-12
  3       68.469          67.829
  4       28.884          29.204

where -log(P) costs are shown, PKZ represents the estimates of P(K|Z)
obtained by summing over a subset of tau', and PKZ1 represents estimates
using just the single lowest cost paths for each K (which is similar to but
not exactly the same as using L=1).  These results look good, and are
consistent with the previously shown computations for this example.
Results for PKZ vs. PKZ1 can be compared to see if PKZ produces better
estimates than PKZ1, which is supposed to happen at least in certain cases,
according to the Buckley Marginalization Theorem.