Classical classifier combination techniques: Voting approaches, Borda Counts and Behavior-Knowledge space

Published in

Towards Data Science

6 min readNov 1, 2020

In this article, we review classical classifier combination techniques that are usually used as benchmarks against a newly proposed method.

In the rest of this article, we denote ôᵏ(x) the decision of the kᵗʰ classifier about the test sample x and by K the number of classifiers in the panel whose decisions are to be combined. The total number of class labels is m.

Voting approaches

An obvious way to reconcile a pool of decisions is to resort to a voting system. several variants of voting schemes can be employed for classifier fusion and some of them can integrate contextual information such as individual classifiers accuracies. The idea behind voting principle is based on the intuition that the higher the number of classifiers ( experts ) in the panel, the more likely to have a correct final decision.

Popular voting schemes are the following ones:

Unanimity voting is a system in which the combined decision is the label cᵢ if all individual classifier decisions are cᵢ, otherwise the combined classifier rejects the input x:

Modified unanimity voting is a system in which the combined decision is the label cᵢ if all individual classifier decisions are either cᵢ or cᵣₑ, otherwise the combined classifier rejects the input x. In other words, no classifier should predict a class label different than cᵢ

Majority voting is a system in which the combined decision is the label cᵢ if the number of classifiers predicting cᵢ is maximal

Borda and wBorda counts

Borda counts is a rank-based combination scheme where each classifier ranks the classes (candidates) according to their chances to be the correct (true) class. Each rank associated to a score starting from m-1 for the first rank to 0 for the last rank where m is the total number of classes. In a second time, the sum of the scores that each class has received is calculated, and the class label achieving the highest accumulated scores is the panel final decision. There may be ties, i.e. several class labels maximizing the accumulated score in which case, an additional algorithmic step is necessary to resolve the tie. A baseline strategy consists in randomly picking one of the labels with maximal accumulated score. The Borda method was developed by a French politician named Jean-Claude Borda in the quest for a genuinely democratic electoral system. It is used currently to elect the deputies in the Republic of Nauru and the Republic of Slovenia.

Example 1: Suppose there are four candidate class labels Ω = {c₁,c₂,c₃,c₄}. For the three classifiers where decision function are {ô¹,ô² and ô³} Borda counts works as follows: Each classifier ôᵏ ranks the labels (Table 1) :

+---------------+------------------+
| Classifiers   | Labels ranking   |
+---------------+------------------+
|       ô¹      |c₂ - c₃ - c₁ - c₄ |
|       ô²      |c₃ - c₁ - c₂ - c₄ |
|       ô³      |c₃ - c₂ - c₁ - c₄ |
+---------------+------------------+
Table 1: Ranks of the four candidates (labels)

and then the accumulated scores of each label is calculated (Table 2):

+---------------+------------------+
| Candidate     |      Score       | 
+---------------+------------------+
|      c₁       | 1 + 2 + 1 = 4    |
|      c₂       | 3 + 1 + 2 = 6    |
|      c₃       | 3 + 3 + 2 = 8    |
|      c₄       | 0 + 0 + 0 = 0    |
+---------------+------------------+
Table 2: Candidates and their accumulated points

Borda counts is considered one of the simplest non -linear combination algorithms. In some cases (mostly when the number of classifiers is large) a conflict can be observed between the opinion of the majority and the decision deduced by Borda counts. This issue was evoked by Parker [0]. In Borda counts,

the difference in score between every two consecutive candidates ranked by the same classifier is one so that all candidates are equally distant and distributed in a uniform manner on the same axis

BUT this assumption is not always justified.

For instance, a probabilistic classifier may assign very high probabilities for two labels and a low probability value for the third one (e.g. for a three classes classification problem, with posterior probabilities 49%,48%,3%; obviously, when applying Borda counts combination method, having a unit difference between each candidate does not reflect the reality of the situation and the classifier level of confidence) .

Parker advocates that it might be useful to weight the ranks to have a less biased reflection. Fishburn [1] argues that contextual knowledge issued from confusion matrices can be used to estimate those weights. Assigning such weights to candidate labels is considered as a generalization of the uniform Borda counts and usually referred to wBorda

Example 2: Considering m class labels, when uniformity is applied, an item of rank r will be assigned a score equal to (m-r) but in wBorda it can assigned a score equal to (m-r)*wᵖ where p =( r-1) and w is the weight. The difference between Borda and wBorda is illustrated in the figure bellow

Behavior-knowledge space

Huang and Suen [2,3] proposed the Behavior-Knowledge Space (BKS) method that has the advantage not to rely on prerequisite hypotheses such as statistical independence between classifier outputs.

By definition a BKS is a K-dimensional space where the kᵗʰ dimension is related to the kᵗʰ classifier. On each dimension k, m decisions can be taken which corresponds to all possible labels in Ω. For each test sample x, we obtain a vector of decisions [ôᵏ(x)] // k:{1,..,K} which corresponds to a point in the BKS.

For each point out of the mᵏ possible ones, we keep track of the number of training samples mapped to this point as well as their class labels.

BKS is built at training time and it has mᵏ nodes to be learned. Thus, this method does not scale well in either the number of classifiers in the panel or the number of classes as there may be some configurations that are never visited. To decide the class of a test sample, the outputs [ôᵏ(x)] // k:{1,..,K} are obtained and mapped to the corresponding BKS node. The predicted label is the one with the highest number of occurrences in this node.

Conclusion

In this article, we saw three classical classifier combination methods: voting systems, Borda counts and BKS. These techniques showed good performances in many applications.

This article is a continuation of my first story entitled “Insights on classifier combination” . It has been extracted from my PhD manuscript that you can find in this link .

References

[0] Parker, J. R. (2001). Rank and response combination from confusion matrix data. Information fusion, 2(2), 113–120.

[1] Fishburn, P. (1999). Preference structures and their numerical representations. Theoretical Computer Science, 217(2), 359–383.

[2] Huang, Y. S., & Suen, C. Y. (1993, June). The behavior-knowledge space method for combination of multiple classifiers. In IEEE computer society conference on computer vision and pattern recognition (pp. 347–347). Institute of Electrical Engineers Inc (IEEE).

[3] Huang, Y. S., & Suen, C. Y. (1995). A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE transactions on pattern analysis and machine intelligence, 17(1), 90–94.