0. Introduction --------------- Here we consider using Viterbi cost results vs. K to determine an optimal estimate of the number of tracks. First, the initial cost assignments are examined for an example with just one measurement set in sections 1 and 2. Then section 3 presents the formulation of the general tracking problem. Sections 4 and 5 consider estimating K based only on the numbers of measurements vs. time, without using the X,Y measurement values, and compares this with the Viterbi probability estimates. Section 6 derives an optimal estimate of K using all of the available data, and section 7 presents some numerical results. 1. Initial Viterbi Cost Assignments ----------------------------------- Consider the initial cost assignments, using one set of measurements with 3 x,y values, i.e. N=1, MM(1)=3 Let mi=1,2,3 represent the measurement index used for track #i, with mi=4 indicating a missed target event. Let n_false represent the number of false alarms, Pmiss the apriori probability of a missed detection, and p_poisson(n,rate) the Poisson probability of n events given a rate (apriori rate of false detections). If K=0: P = p_poisson( 3, false_alarm_rate) If K=1: m1 n_false P -- ------- - 1 2 (1-Pmiss) * p_poisson( 2, false_alarm_rate) 2 2 (1-Pmiss) * p_poisson( 2, false_alarm_rate) 3 2 (1-Pmiss) * p_poisson( 2, false_alarm_rate) 4 3 Pmiss * p_poisson( 3, false_alarm_rate) We can see in this case that the associations of measurement #1, 2, or 3 with track #1 all have the same probability. The missed target event has a different probability. These probabilities do not sum to 1. Furthermore, the missed target event is not allowed in the current simulation initialization step, since that would mean that no Kalman filters would be initialized. For missed target events, the Kalman filter prediction from the previous stage is used, but in this initialization step there is no previous stage. So we are really only considering the first three lines from the above table. If K=2: m1 m2 n_false P -- -- ------- - 1 2 1 (1-Pmiss)**2 * p_poisson( 1, false_alarm_rate) 1 3 1 (1-Pmiss)**2 * p_poisson( 1, false_alarm_rate) 2 3 1 (1-Pmiss)**2 * p_poisson( 1, false_alarm_rate) 1 4 2 Pmiss * (1-Pmiss) * p_poisson( 2, false_alarm_rate) 2 4 2 Pmiss * (1-Pmiss) * p_poisson( 2, false_alarm_rate) 3 4 2 Pmiss * (1-Pmiss) * p_poisson( 2, false_alarm_rate) 4 4 3 Pmiss**2 * p_poisson( 3, false_alarm_rate) In general, for K>1, all possible permutations of the association of measurement indices to tracks is considered. But, in the initialization step we are examining here, only the first permutation is allowed by the Viterbi cost function. This prevents the algorithm from producing redundant paths which simply have all measurements interchanged between the detected tracks. For later stages of the algorithm, all permutations are considered, so, for example besides the m1=1, m2=2 entry, there would also be an entry for m1=2, m2=1. Also, as noted above for K=1, missed target events are not allowed in the initialization step, so the lines in the table above having a 4 for m1 or m2 are not used. If K=3: m1 m2 m3 n_false P -- -- -- ------- - 1 2 3 0 (1-Pmiss)**3 * p_poisson( 0, false_alarm_rate) Here, I have omitted the redundant path associations and lines representing missed target events which would not be used in the initialization step. Thus, with 3 measurement and K=3, only one possibility is allowed. With 3 initial measurements, K>3 is not allowed, so we can stop here with the theoretical analysis of this example. 2. Numerical Example of Initial Viterbi Cost Assignments -------------------------------------------------------- Let's now look at it numerically. Using Pmiss=0.2, false_alarm_rate=2, and showing costs as -log(P), after the Viterbi algorithm initialization using a single set of 3 measurements (the actual measurement values do not matter, since no Kalman filter predictions are used in the initialization step): K cost P(Viterbi) - ---- ---------- 0 1.7123 0.180450 1 1.5300 0.216536 2 1.7531 0.173236 3 2.6694 0.069294 A question arises, that since the various probabilities for different values of K were not normalized to sum to 1, can we really directly compare these costs and say that K=1 is the best choice because it has lowest cost? The following section answers this question and shows that the Viterbi probabilities vs. K can be directly compared if P(K) has a uniform distribution. 3. General Tracking Problem Formulation --------------------------------------- The Bayesian probability function we want to maximize for the general tracking problem is: P(tau|Z) = P(Z|tau) * P(tau) / P(Z) where tau represents a choice of K and a set of measurement-target associations or tracks, including possible missed target events and false detections. Z=(Z1,Z2,...,ZN) represents all of the measurement data, with Zi being the MM(i) measurements at time i=1..N. P(Z|tau) is computed from Kalman filter innovations for measurements corresponding to targets, or from the uniform distribution of false detections. P(tau) is the apriori probability of a set of tracks, and can be computed if the probability of missed detections and the false alarm rate are known. P(Z) acts basically as a normalization factor, since it is the sum of all possible numerator terms for P(tau|Z), and serves to make the possible values of P(tau|Z) sum to 1. Since P(Z) is independent of tau, it can be ignored. The normalization question from the previous section concerns a factor which might be needed for different values of K. Since any instance of the K-path Viterbi algorithm presumes K given, P(tau|Z) can be written as: P(tau|Z) = P(Z|tau',K) * P(tau'|K) * P(K) / P(Z) where tau=(tau',K) partitions tau into a track association part, tau', and the number of tracks part, K. P(Z|tau',K)*P(tau'|K) is the probability produced and maximized by the Viterbi algorithm for each K. So clearly, if P(K) has a uniform distribution, no normalization of Viterbi probabilities is needed, and values of P(Z|tau',K)*P(tau'|K) for different values of K can be directly compared. If P(K) does not have a uniform distribution, then the P(Z|tau',K)*P(tau'|K) values would have to be normalized by multiplying by P(K). Note that for the Viterbi initialization step, using the first set of MM(1) measurements, the P(Z1|tau',K) factor is treated as 1, and only P(tau'|K) is used for the initial cost. The various track initiation probabilities are computed based only on the number of measurements, as shown in section 1 above. However, the measurement values Z1 are used to initialize the Kalman filters. 4. Bayesian Estimate of K for N=1 --------------------------------- To directly compute an optimal estimate of K given one set of MM measurements, write P(K|MM) using Bayes rule: P(K|MM) = P(MM|K) * P(K) / P(MM) where P(MM) is defined by the rule of total probability: P(MM) = sum K=0..K_max { P(MM|K) * P(K) } We are assuming that K_max is the maximum possible value of K simply for notational convenience here, but K_max could be infinite, so there is no loss of generality in this notation. P(MM) acts basically as a normalization factor, since it is the sum of all possible numerator terms for P(K|MM), and serves to make the the possible values of P(K|MM) sum to 1. We need values for P(K), K=0..K_max, the apriori probabilities for the occurrence of K targets. If P(K) has a uniform distribution, then P(K)/P(MM) is a constant independent of K; in this case, finding the maximum of P(K|MM) is the same as finding the maximum of P(MM|K). Note that the probabilities used to initialize the costs in the Viterbi algorithm come from P(tau'|K) and are not the same as P(MM|K) which we will discuss below. Continuing, P(MM|K) is given by: P(MM|K) = sum j=max(0,MM-K):MM { p_binomial( K, K+j-MM, Pmiss) * p_poisson( j, false_alarm_rate) } where j is the number of false alarms, K+j-MM is the number of missed target events, and: p_binomial(K,m,p) = C(K,m) * p**m * (1-p)**(K-m) where C(K,m) is the number of combinations of K things taken m at a time, and for the degenerate case of K=0, p_binomial(0,0,p) == 1. Checking this numerically, using Pmiss=0.2, false_alarm_rate=2, MM=3, and K_max=3 as in the previous examples, and assuming a uniform distribution for P(K) (using track/ML.m): K P(K|MM) P(MM|K) P(K) P(Viterbi) - ------- ------- ---- ---------- 0 0.20032 0.18045 0.25 0.180450 1 0.28045 0.25263 0.25 0.216536 2 0.29647 0.26706 0.25 0.173236 3 0.22276 0.20066 0.25 0.069294 The last column just relists the probabilities which would be used in the Viterbi cost initialization from section 2 above for comparison. Note that both P(K|MM) and P(MM|K) have a maximum for K=2, and K=2 would be the optimal Bayesian estimate of K for this example. For K=0, both P(MM|K) and the Viterbi probability are equal to p_poisson( MM, false_alarm_rate). For K=1: P(Viterbi) = (1-Pmiss) * p_poisson( 2, false_alarm_rate) P(MM|K) = (1-Pmiss) * p_poisson( 2, false_alarm_rate) + Pmiss * p_poisson( 3, false_alarm_rate) That is, P(MM|K) is the total probability that MM=3 given K=1, considering both possible cases of 0 or 1 missed target event. But P(Viterbi) is the probability of a specific combination of measurement events, namely 1 target event and 2 false alarms. The Viterbi initialization step does not allow missed target events, for reasons discussed above. However, in later steps of the Viterbi algorithm, where missed target events are allowed, P(Viterbi) will still not be the same as P(MM|K) because each combination of measurement events is considered separately. That is, the Viterbi algorithm would consider two separate probabilities: P1 = (1-Pmiss) * p_poisson( 2, false_alarm_rate) P0 = Pmiss * p_poisson( 3, false_alarm_rate) and each of these would be associated with separate state transitions. In the previous tables showing numerical Viterbi initialization probabilities, only a single largest probability was shown, since that is what would be chosen as the best solution to the Viterbi trellis. However, at the end of the Viterbi algorithm, we actually have a set of costs, representing the different choices for associating measurements with tracks. For this example, with K=1 and only 1 set of 3 measurements, the final Viterbi cost array, in terms of probabilities, is: m1 P(Viterbi) -- ---------- 1 0.21654 2 0.21654 3 0.21654 4 0.00000 where m1=i means measurement i is chosen for track #1. Since the first three probabilities are equal, this indicates that any of the 3 measurements are equally likely to be associated with the single track. m1=4, representing the missed target event, was not allowed, thus has a probability of 0. For K=2: P(Viterbi) = (1-Pmiss)**2 * p_poisson( 1, false_alarm_rate) P(MM|K) = (1-Pmiss)**2 * p_poisson( 1, false_alarm_rate) + 2 * Pmiss * (1-Pmiss) * p_poisson( 2, false_alarm_rate) + Pmiss**2 * p_poisson( 3, false_alarm_rate) As before, the Viterbi algorithm does not allow missed target events during initialization, so the only probability used corresponds to 2 detected targets and 1 false alarm. And P(MM|K) is the total probability, including all possible combinations of missed target events. The complete set of probabilities produced by the Viterbi algorithm for this case, for the different possible association of measurements with tracks, is: m1 m2 P(Viterbi) -- -- ---------- 1 2 0.17323 1 3 0.17323 2 3 0.17323 1 4 0.00000 2 4 0.00000 3 4 0.00000 4 4 0.00000 So the first three possibilities, representing association of different combinations of 2 out of the 3 the measurements with the 2 tracks, are equally likely. And the missed target event possibilities have 0 probability, since they are not allowed in the Viterbi initialization step. During the Viterbi initialization, only one permutation of the measurement associations with tracks is allowed. But in general, during later stages of the Viterbi algorithm, all possible permutations are considered, thus additional probabilities would be considered. For example, besides the m1=1, m2=2 case in the first line of the table above, the case of m1=2, m2=1 would also be considered. For K=3: P(Viterbi) = (1-Pmiss)**3 * p_poisson( 0, false_alarm_rate) P(MM|K) = (1-Pmiss)**3 * p_poisson( 0, false_alarm_rate) + 3 * Pmiss * (1-Pmiss)**2 * p_poisson( 1, false_alarm_rate) + 3 * Pmiss**2 * (1-Pmiss) * p_poisson( 2, false_alarm_rate) + Pmiss**3 * p_poisson( 3, false_alarm_rate) The complete set of probabilities produced by the Viterbi algorithm for this case, for the different possible association of measurements with tracks, is: m1 m2 m3 P(Viterbi) -- -- -- ---------- 1 2 3 0.06929 1 2 4 0.00000 1 3 4 0.00000 2 3 4 0.00000 1 4 4 0.00000 2 4 4 0.00000 3 4 4 0.00000 4 4 4 0.00000 It is clear that P(Viterbi) differs from P(MM|K) for N=1, and it is not clear which represents the best solution. Certainly, with the Viterbi algorithm, we are not only estimating the number of tracks, but also the association of measurements with tracks, which is the overall problem we are trying to solve. But in the degenerate case of N=1, the various possible measurement to track associations all have the same probability for a given value of K, so P(Viterbi) doesn't really help to decide on the best measurement-track associations. 5. Bayesian Estimate of K for N>1 --------------------------------- The analysis in the previous section for N=1 is of some interest because neither the direct Bayesian estimate of K nor the Viterbi algorithm use the measurement X,Y values; they only use MM, the number of measurements. We can extend the Bayesian estimate of K to the case of N>1, again using only the number of measurements at each time and not using the measurement X,Y values. But for N>1, the Viterbi algorithm will use the measurement X,Y values, and therefore must be a better solution. Nevertheless, a separate easily computed estimate of K may be of some use in cases where the number of tracks may be changing with time, so it seems worth investigating. Consider having N measurement sets, so MM is a vector and MM(i) is the number of measurements at time i=1..N. P(K|MM) is given by Bayes rule: P(K|MM) = P(MM(1),MM(2),...,MM(N)|K) * P(K) / P(MM) If the numbers of measurements at each time are independent we have: P(K|MM) = P(MM(1)|K) * P(MM(2)|K) * ... P(MM(N)|K) * P(K) / P(MM) where P(MM(i)|K) can be computed using the formula from the previous section. Checking this for the numerical example from NOTES/Kx9-4.txt, with N=21, K_max=4, Pmiss=0.1, and false_alarm_rate=1: K P(K|MM) P(MM|K) P(K) P(Viterbi) - ------- ------- ---- ---------- 0 2.01e-13 3.99e-25 .2 2.37e-91 1 5.11e-05 1.02e-16 .2 1.29e-40 2 1.00e+00 1.99e-12 .2 2.15e-18 3 1.59e-04 3.17e-16 .2 7.18e-38 4 6.72e-14 1.33e-25 .2 7.42e-56 Note that for K=0: P(MM|K=0) = product i=1:N { p_poisson(MM(i),false_alarm_rate)) } P(Viterbi) = P(MM|K=0) * product i=2:N { (1/16)**MM(i) } since the Viterbi probability includes P(Z|tau',K) which is given by a uniform distribution of false detections over the X,Y 0..4,0..4 observation region. For K!=2, P(K|MM) is very small, and for K=2, P(K|MM) is almost 1 and is shown rounded to 1.00 above. The maximum of P(K|MM) and P(Viterbi) both occur at K=2, which is correct since 2 tracks were used to generate the measurements for that simulation. Checking this for the numerical example from NOTES/Kx9-5.txt, with N=21, Pmiss=0.2, and false_alarm_rate=2: K P(K|MM) P(MM|K) P(K) P(Viterbi) - ------- ------- ---- ---------- 0 1.51e-05 6.37e-20 .2 1.34e-100 1 6.57e-02 2.78e-16 .2 1.96e-71 2 9.20e-01 3.89e-15 .2 1.86e-59 3 1.42e-02 6.01e-17 .2 1.05e-70 4 4.37e-07 1.85e-21 .2 8.11e-94 Again the maximum of P(K|MM) and P(Viterbi) both occur at K=2, which matches the 2 tracks used to generate the simulation data. The most interesting aspect of these results is that, given only the numbers of measurements at each time, and not using the actual measurement X,Y values, a Bayesian estimate of K can be computed, and correctly matches the value used to generate the data. This is possible because of the knowledge of the apriori probabilities of missed target events and false detections. This seems to indicate that, given apriori probabilities for missed target events and false detections, a Bayesian estimation of the number of tracks could be used in conjunction with the Viterbi algorithm to perform the data association and produce the estimated X,Y values for the tracks. A further consideration is that if the probabilities of missed target events or false detections are unknown, an EM algorithm may be devised to estimate these in conjunction with producing a maximum likelihood estimate of the tracks. 6. Maximizing P(K|Z) -------------------- The Bayesian probability which we want to maximize for the general tracking problem was discussed in section 3 and can be written as: P(tau|Z) = P(tau',K|Z) where tau=(tau',K) partitions tau into a track association part, tau', and the number of tracks part, K. To determine the number of tracks, we need to maximize: P(K|Z) = sum over tau' { P(tau',K|Z) } This should be "better" than looking at P(K|MM) which can be computed from just the numbers of measurements, MM, and does not use the measurement X,Y values. P(K|Z) should be less sensitive than P(K|MM) to the values of the apriori probabilities of missed target events and false alarms, so P(K|Z) should be a more robust indicator of K than P(K|MM) when the values of the apriori probabilities are uncertain or not precisely known. In the Viterbi algorithm, keeping just the L best sets of tracks (tau'), we can not perform the sum over all possible tau' to compute P(K|Z). Nevertheless, we can perform the sum over the L best sets of tracks that we do have, and that may be a useful approximation to P(K|Z). But at the end of the Viterbi algorithm, we actually have L sets of tracks for each ending state, i.e. L*M sets of tracks where M is the number of states. If the best L sets of tracks do not all end in the same state, we can actually extract more than the best L sets of tracks. In any case, we have L*M costs at the end, so we will use all of these in order to provide a better estimate for the sum over tau' than just using the final L best. Furthermore, we will consider extracting the the best k-path results for k=1..K from a single run of the Viterbi algorithm for a given value of K, thus enabling us to generate estimates of P(k|Z) for k<=K without rerunning the algorithm from scratch for the lower values of k. This is possible by simply keeping tract of the contribution of the individual tracks to the K-track total cost function. So as currently implemented, P(k|Z) for k=1..K is computed from a single run of the K-path Viterbi algorithm, with all L*M final costs being used in the computation for k=K, but just the L-best K-path results used to extract costs for k(1.1,1.1) and (3,3)->(3.1,3.1), with (0,4) and (4,0) being either false alarms or a third track. L=6 will be used for this example, which produces the complete set of all possible tracks and associated costs at the end of the Viterbi algorithm. So in this example, the estimate of P(K|Z) does implement the complete sum over all possible tau' and is not an approximation. Using Pmiss=0.2, false_alarm_rate=2: K P(K|Z) P(K|MM) - ------ ------- 0 0.0002 0.157 1 0.0439 0.307 2 0.8855 0.343 3 0.0703 0.194 P(K|Z) seems to have a more clearly visible maximum at K=2 compared to P(K|MM). Using Pmiss=0.4, false_alarm_rate=4: K P(K|Z) P(K|MM) - ------ ------- 0 0.0093 0.4228 1 0.2387 0.3055 2 0.7111 0.1814 3 0.0410 0.0903 In this case, P(K|Z) still has a clearly visible maximum at K=2, whereas P(K|MM) has a maximum at K=0, so P(K|Z) is less sensitive to the values of the apriori probabilities than P(K|MM). Now consider a more realistic problem, using x99.m to generate N=21 measurements of two parabolic tracks with Pmiss=0.2 and false_alarm_rate=2. P(K|Z) vs. K and L K L=1 L=4 L=8 L=16 - --- --- --- ---- 0 4.9733e-51 4.5822e-53 2.8603e-53 1.6323e-53 1 5.7788e-18 2.0052e-19 2.061e-19 1.7232e-19 2 1 1 0.99944 0.99958 3 1.2451e-06 1.2039e-06 0.00055676 0.00041684 4 8.952e-49 6.5129e-24 2.7783e-24 9.8716e-24 6.4 30 61.4 128.86 CPU minutes The last row shows the Matlab run-time in minutes on dft.ee for computing each column of the table. Octave run-times would be about twice as long. These results clearly indicate that K=2 is the best Bayesian estimate of K, and K=2 does correspond to the generated data for this example. The results seem to indicate that very large values of L are not needed in order to estimate P(K|Z). Of course, larger values of L may be appropriate and necessary in order to determine various alternative (and significantly different) measurement-track associations, and L>q is necessary to determine the ultimate q best tracks. But for simply estimating the number of tracks, it seems that a large value of L is not necessary. The run-time and number of computations are basically proportional to L, since for each next Viterbi state the L-best of each previous state must be examined. Since dft.ee has a 168 MHz CPU, the new Suns with 333 MHz CPUs should run these simulations about twice as fast. Nevertheless, these run-times are getting excessive for large problems due to Matlab and Octave slowness in executing non-vectorized loops over scalar data. Although vectorization was used wherever possible, the Viterbi code should be reexamined for places to increase vectorization and eliminate loops to speed it up. The P(K|Z) computations for x99.m as shown above were implemented in the track/PK.m script which runs the Viterbi algorithm separately for each value of K. The newer version of those computations, implemented in the track/PKK.m script, works after running the Viterbi algorithm just once for some value of K, then produces estimates of P(k|Z) for k=0..K as described at the end of the previous section. For the x99.m example above, after running the Viterbi algorithm with K=4, PKK.m produces: K PKZ PKZ1 - --- ---- 0 97.47 96.806 1 26.52 26.295 2 3.3227e-12 4.0129e-12 3 68.469 67.829 4 28.884 29.204 where -log(P) costs are shown, PKZ represents the estimates of P(K|Z) obtained by summing over a subset of tau', and PKZ1 represents estimates using just the single lowest cost paths for each K (which is similar to but not exactly the same as using L=1). These results look good, and are consistent with the previously shown computations for this example. Results for PKZ vs. PKZ1 can be compared to see if PKZ produces better estimates than PKZ1, which is supposed to happen at least in certain cases, according to the Buckley Marginalization Theorem.