A survey of commonly used ensemble-based classification techniques

Anna Jurek; Yaxin Bi; Shengli Wu; Chris Nugent

doi:10.1017/S0269888913000155

A survey of commonly used ensemble-based classification techniques

Published online by Cambridge University Press: 03 May 2013

Anna Jurek ,

Yaxin Bi ,

Shengli Wu and

Chris Nugent

Show author details

Anna Jurek: Affiliation:
School of Computing and Mathematics University of Ulster, Jordanstown, Shore Road, Newtownabbey, Co. Antrim, BT37 0QB, UK; e-mails: jurek-a@email.ulster.ac.uk, y.bi@ulster.ac.uk, s.wu1@ulster.ac.uk, cd.nugent@ulster.ac.uk
Yaxin Bi: Affiliation:
School of Computing and Mathematics University of Ulster, Jordanstown, Shore Road, Newtownabbey, Co. Antrim, BT37 0QB, UK; e-mails: jurek-a@email.ulster.ac.uk, y.bi@ulster.ac.uk, s.wu1@ulster.ac.uk, cd.nugent@ulster.ac.uk
Shengli Wu: Affiliation:
School of Computing and Mathematics University of Ulster, Jordanstown, Shore Road, Newtownabbey, Co. Antrim, BT37 0QB, UK; e-mails: jurek-a@email.ulster.ac.uk, y.bi@ulster.ac.uk, s.wu1@ulster.ac.uk, cd.nugent@ulster.ac.uk
Chris Nugent: Affiliation:
School of Computing and Mathematics University of Ulster, Jordanstown, Shore Road, Newtownabbey, Co. Antrim, BT37 0QB, UK; e-mails: jurek-a@email.ulster.ac.uk, y.bi@ulster.ac.uk, s.wu1@ulster.ac.uk, cd.nugent@ulster.ac.uk

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The combination of multiple classifiers, commonly referred to as a classifier ensemble, has previously demonstrated the ability to improve classification accuracy in many application domains. As a result this area has attracted significant amount of research in recent years. The aim of this paper has therefore been to provide a state of the art review of the most well-known ensemble techniques with the main focus on bagging, boosting and stacking and to trace the recent attempts, which have been made to improve their performance. Within this paper, we present and compare an updated view on the different modifications of these techniques, which have specifically aimed to address some of the drawbacks of these methods namely the low diversity problem in bagging or the over-fitting problem in boosting. In addition, we provide a review of different ensemble selection methods based on both static and dynamic approaches. We present some new directions which have been adopted in the area of classifier ensembles from a range of recently published studies. In order to provide a deeper insight into the ensembles themselves a range of existing theoretical studies have been reviewed in the paper.

Type: Articles
Information: The Knowledge Engineering Review , Volume 29 , Issue 5 , November 2014 , pp. 551 - 581

DOI: https://doi.org/10.1017/S0269888913000155 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abdelazeem, S. 2008. A greedy approach for building classification cascades. In Proceedings of the Seventh International Conference on Machine Learning and Applications, San Diego, CA, USA, 115–120.Google Scholar

Altınçay, H. 2005. A Dempster-Shafer theoretic framework for boosting based ensemble design. Pattern Analysis and Applications 8(3), 287–302.Google Scholar

Altınçay, H. 2004. Optimal resampling and classifier prototype selection in classifier ensembles using genetic algorithms. Pattern Analysis & Applications 7(3), 285–295.Google Scholar

Batista, L., Granger, E., Sabourin, R. 2011. Dynamic ensemble selection for off-line signature verification. In Proceedings of the 10th International Conference on Multiple Classifier Systems, Naples, Italy, 157–166.Google Scholar

Bauer, E., Kohavi, R. 1999. An empirical comparison of voting classification algorithms: bagging, boosting and variants. Machine Learning 36(1–2), 105–139.Google Scholar

Bi, Y., Guan, J., Bell, D. 2008. The combination of multiple classifiers using an evidential reasoning approach. Artificial Intelligence 172(15), 1731–1751.CrossRef Google Scholar

Bi, Y., Wu, S., Wang, H., Guo, G. 2011. Combination of evidence-based classifiers for text categorization. In Proceedings of the 23rd IEEE International Conference on Tools with Artificial Intelligence, Boca Raton, USA, 422–429.Google Scholar

Bostrom, H., Johansson, R., Karlsson, A. 2008. On evidential combination rules for ensemble classifiers. In Proceedings of the 11th International Conference on Information Fusion, Cologne, Germany, 1–8.Google Scholar

Breiman, L. 1996. Bagging predictors. Machine Learning 24(2), 123–140.Google Scholar

Breiman, L. 1996. Heuristics of instability and stabilization in model selection. The Annals of Statistics 24(6), 2350–2383.CrossRef Google Scholar

Breiman, L. 2001. Random forest. Machine Learning 45(1), 5–32.Google Scholar

Bryll, R., Gutierrez-Osuna, R., Quek, F. K. 2003. Attribute bagging:improving accuracy of classiffer ensembles. Pattern Recognition 36(6), 1291–1302.Google Scholar

Buciu, I., Kotropoulos, C., Pitas, I. 2006. Demonstrating the stability of support vector machine for classification. Signal Processing 86(9), 2364–2380.Google Scholar

Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A. 2004. Ensemble selection from libraries of models. In Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 137–144.Google Scholar

Cevikalp, H., Polikar, R. 2008. Local Classifier Weighting by Quadratic Programming. IEEE Transactions on Neural Networks 19(10), 1832–1838.Google Scholar

Danesh, A., Moshiri, B., Fatemi, O. 2007. Improve text classification accuracy based on classifier fusion methods. International Conference on Information Fusion, Quebec, QC, Canada, 1–6.Google Scholar

Datta, S., Pihur, V. 2010. An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data. BMC Bioinformatics 11(1), 427–438.Google Scholar

De Stefano, C., Fontanella, F., Folino, G. 2011. A Bayesian approach for combining ensembles of GP classifiers. In Proceedings of the 10th International Conference on Multiple Classifier Systems, Naples, Italy, 26–35.Google Scholar

Diao, R., Shen, Q. 2011. Fuzzy-rough classifier ensemble selection. In Proceedings of the IEEE International Conference on Fuzzy Systems, Taipei, Taiwan, 1516–1522.Google Scholar

Didaci, L., Giacinto, G., Roli, F., Marcialis, G. 2005. A study on the performance of dynamic classifier selection based on local accuracy estimation. Pattern Recognation 38(11), 2188–2191.Google Scholar

Dietterich, T. 2000. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning 40(2), 139–157.CrossRef Google Scholar

Dietterich, T. 2000. Ensemble methods in machine learning. International Workshop on Multiple Classifiers Systems, Cagliari, Italy, 1–15.Google Scholar

Dimililer, N., Varoglu, E., Altincay, H. 2007. Vote-based classifier selection for biomedical NER using genetic algorithm. In Proceedings of the 3rd Iberian Conference on Pattern Recognition and Image Analysis, Girona, Spain, 202–209.Google Scholar

Domingo, C., Watanabe, O. 2000. MadaBoost: a modification of AdaBoost. In Proceedings of the 13th Annual Conference on Computational Learning Theory, Stanford, CA, USA, 180–189.Google Scholar

Dos Santos, E. M., Sabourin, R., Maupin, P. 2008. A dynamic overproduce-and-choose strategy for the selection of classifier ensembles. Pattern Recognition 41(10), 2993–3009.Google Scholar

Dzeroski, S., Zenko, B. 2004. Is combining classifiers with stacking better than selecting the best one? Machine Learning 54(3), 255–273.Google Scholar

Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M. 2004. Bagging decision multi-trees. In International Workshop on Multiple Classifier Systems, Cagliari, Italy. Springer, 41–51.Google Scholar

Folino, G., Pizzuti, C., Spezzano, G. 1999. A cellular genetic programming approach to classification. Genetic and Evolutionary Computation Conference, Orlando, Florida, 1015–1020.Google Scholar

Freund, Y., Schapire, R. E. 1999. A short introduction to boosting. Japanese Society for Artificial Intelligence 14(5), 771–780.Google Scholar

Fürnkranz, J. 2002. Pairwise classification as an ensemble technique. In Proceedings of the 13th European Conference on Machine Learning, Helsinki, Finland, 97–110.Google Scholar

Gan, Z. G., Xiao, N. F. 2009. A new ensemble learning algorithm based on improved K-Means. International Symposium on Intelligent Information Technology and Security Informatics, Moscow, Russia, 8–11.Google Scholar

Garcia-Pedrajas, N. 2009. Constructing ensembles of classifiers by means of weighted instance selection. IEEE Transactions on Neural Networks 20(2), 258–277.Google Scholar

García-Pedrajas, N., Ortiz-Boyer, D. 2009. Boosting k-nearest neighbor classifier by means of input space projection. Expert Systems with Applications 36(7), 10570–10582.Google Scholar

Geem, Z. W., Kim, J. H., Loganthan, G. V. 2001. A new heuristic optimization algorithm: harmony search. Simulation 70(2), 60–68.Google Scholar

Giacinto, G., Roli, F. 2001. Dynamic classifier selection based on multiple classifier behaviour. Pattern Recognition 34(9), 1879–1881.Google Scholar

Grove, A. J., Schuurmans, D. 1998. Boosting in the limit: maximization the margin of learned ensemble. National Conference on Artificial Intelligence, 692–699.Google Scholar

Hansen, J. 2000. Combining Predictors. Meta Machine Learning Methods and Bias/Variance & Ambiguity Decompositions. PhD dissertation, Aurhus University.Google Scholar

Hansen, L. K., Salamon, P. 1990. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(10), 993–1001.Google Scholar

Hastie, T., Tibshirani, R., Friedman, J. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.Google Scholar

He, L., Song, Q., Shen, J., Hai, Z. 2010. Ensemble numeric prediction of nearest-neighbor learning. Information Technology Journal 9(3), 535–544.CrossRef Google Scholar

Hothorn, T., Lausen, B. 2003. Double-bagging: combining classiffiers by bootstrap aggregation. Pattern Recognition 36(6), 1303–1309.Google Scholar

Hu, X. 2001. Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications. In Proceedings of the 1st IEEE International Conference on Data Mining, San Jose, CA, USA, 233–240.Google Scholar

Jensen, R., Shen, Q. 2009. New approach to fuzzy-rough feature selection. IEEE Transaction on Fuzzy Systems 17(4), 824–838.Google Scholar

Jurek, A., Bi, Y., Wu, S., Nugent, C. 2011. Classification by cluster analysis: a new meta-learning based approach. In 10th International Workshop on Multiple Classifier Systems, Naples, Italy, 259–268.Google Scholar

Jurek, A., Bi, Y., Wu, S., Nugent, C. 2011. Classification by clusters analysis—an ensemble technique in a semi-supervised classification. In 23rd IEEE International Conference on Tools with Artificial Intelligence, Boca Raton, FL, USA, 876–878.Google Scholar

Kittler, J., Hatef, M., Duin, R. P. 1998. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226–239.CrossRef Google Scholar

Kittler, J., Roli, F. 2001. Genetic algotirhms for multi-classifier system configuration: a case study in character recognition. In Proceedings of the 2nd International Workshop on Multiple Classifier System, Cambridge, UK, 99–108.Google Scholar

Ko, A. H., Sabourin, R., Britto, A. Jr 2007. K-Nearest Oracle for dynamic ensemble selection. In Proceedings of the 9th International Conference on Document Analysis and Recognition, Curitiba, Brazil, 422–426.CrossRef Google Scholar

Ko, A. H., Sabourin, R., Britto, A. S. 2008. From dynamic classifier selection to dynamic ensemble selection. Pattern Recognition 41(5), 1718–1731.CrossRef Google Scholar

Kohavi, R., Wolpert, D. 1996. Bias plus variance decomposition for zero-one loss functions. In 13th International Conference on Machine Learning, Bari, Italy, 275–283.Google Scholar

Krogh, A., Vedelsby, J. 1995. Neural network ensembles, cross validation and active learning. Advances in Neural Information Processing Systems 7, 231–238.Google Scholar

Kuncheva, L., Jain, L. 2000. Designing classifier fusion systems by genetic algorithms. IEEE Transactions on Evolutionary Computation 4(4), 327–336.Google Scholar

Kuncheva, L. I., Whitaker, C. J. 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51(2), 181–207.Google Scholar

Kurzynski, M., Woloszynski, T., Lysiak, R. 2010. On two measures of classifier competence for dnamic ensemble selection — experimental comparative analysis. International Symposium on Communications and Information Technologies, Tokyo, Japan, 1108–1113.Google Scholar

Lam, L., Suen, C. 1995. Optimal combination of pattern classifiers. Pattern Recogn Lett 16(9), 945–954.CrossRef Google Scholar

Li, K., Hao, L. 2009. Naïve Bayes ensemble learning based on oracle selection. In Proceedings of the 21st International Conference on Chinese Control and Decision Conference, Guilin, China, 665–670.Google Scholar

Li, X., Wang, L., Sung, E. 2005. A study of AdaBoost with SVM based weak learners. In Proceedings of the IEEE International Joint Conference on Neural Networks, Chongqing, China, 196–201.Google Scholar

Löfström, T., Johansson, U., Boström, H. 2008. On the use of accuracy and diversity measures for evaluating and selecting ensembles of classifiers. In Proceedings of the 7th International Conference on Machine Learning and Applications, San Diego, CA, USA, 127–132.Google Scholar

Machova, K., Barcak, F. 2006. A bagging method using decision trees in the role of base classifiers. Acta Polytechnica Hungarica 3(2), 121–132.Google Scholar

Maclin, R. 1997. An empirical evaluation of bagging and boosting. In Proceedings of the 14th National Conference on Artificial Intelligence, Providence, Rhode Island, 546–551.Google Scholar

Marigineantu, D., Dietterich, T. 1997. Pruning adaptive boosting. In Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, USA, 211–218.Google Scholar

Melville, P., Mooney, R. 2003. Constructing diverse classifier ensemble using artificial training examples. In Proceedings of the 8th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, 505–510.Google Scholar

Menahem, E., Rokach, L., Elovici, Y. 2009. Troika – an improved stacking schema for classification tasks. Information Sciences 179(24), 4097–4122.Google Scholar

Merler, S., Capriel, B., Furlanello, C. 2007. Parallelizing AdaBoost by weights dynamics. Computational Statistics & Data Analysis 51(5), 2487–2498.Google Scholar

Murrugarra-Llerena, N., Lopes, A. 2011. An adaptive graph-based K-Nearest Neighbor. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 1–11.Google Scholar

Parvin, H., Alizadeh, H. 2011. Classifier ensemble based class weightening. American Journal of Scientific Research 19, 84–90.Google Scholar

Pillai, I., Fumera, G., Roli, F. 2011. Classifier selection approaches for multi-label problems. In 10th International Workshop on Multiple Classifier Systems, Naples, Italy, 167–166.Google Scholar

Reid, S., Grudic, G. 2009. Regularized linear models in stacked generalization. In Proceedings of the 8th International Workshop on Multiple Classifier Systems, 112–121.Google Scholar

Rodríguez, J. J., Maudes, J. 2008. Boosting recombined weak classifiers. Pattern Recognition Letters 29(8), 1049–1059.Google Scholar

Rogova, G. 1994. Combining the results of several neaural networks. Neural Networks 7(5), 777–781.Google Scholar

Rokach, L. 2010. Ensemble-based classifiers. Artificial Intelligence Review 33(1–2), 1–39.Google Scholar

Ruta, D., Gabrys, B. 2005. Classifier selection for majority voting. Information Fusion 6(1), 63–81.Google Scholar

Saeedian, M. F., Beigy, H. 2009. Dynamic classifier selection using clustering for spam detection. Symposium on Computational Intelligence and Data Mining, Nashville, TN, USA, 84–88.Google Scholar

Sait, S. M., Youssef, H. 1999. Iterative Computer Algorithms with Applications in Engineering: Solving Combinatorial Optimization Problems. Wiley-IEEE Computer Society Press.Google Scholar

Schapire, R. E., Freund, Y., Bartlett, P., Lee, W. 1998. Boosting the margin: a new explanation for the effectiveness of voting methods. Annals of Statistics 26(5), 1651–1686.Google Scholar

Schölkopf, B., Sung, K., Burges, C., Girosi, F., Niyogi, P., Poggio, T. 1997. Comparing support vector machines with gaussian kernels to radial basis function classifiers. IEEE Transactions on Signal Processing 45(11), 2758–2765.Google Scholar

Schwenk, H., Bengio, Y. 2000. Boosting neural networks. Neural Computation 12(8), 1869–1887.Google Scholar

Seewald, A. K. 2002. How to make stacking better and faster while also taking care of an unknown weakness. In Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, 554–561.Google Scholar

Sen, M., Erdogan, H. 2011. Max-margin Stacking and Sparse Regularization for Linear Classifier Combination and Selection. Master Thesis, Cornell University Library, New York, USA.Google Scholar

Shannon, W., Banks, D. 1999. Combining classification trees using MLE. Statististics in Medicine 18(6), 727–740.Google Scholar

Shi, H., Lv, Y. 2008. An ensemble classifier based on attribute selection and diversity measure. In Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery, Shandong, China, 106–110.Google Scholar

Shin, H., Sohn, S. 2005. Selected tree classifier combination based on both accuracy and error diversity. Pattern Recognition 38(2), 191–197.Google Scholar

Skurichina, M., Duin, R. P. 1998. Bagging for linear classifiers. Pattern Recognition 31(7), 909–930.Google Scholar

Skurichina, M., Kuncheva, L. I., Duin, R. P. 2002. Bagging and boosting for the nearest mean classifier: effects of sample size on diversity and accuracy. In Proceedings of the Third International Workshop on Multiple Classifier Systems, Cagliari, Italy, 62–71.Google Scholar

Smits, P. 2002. Multiple classifier systems for supervised remote sensing image classification based on dynamic classifier selection. IEEE Transactions on Geoscience and Remote Sensing 40(4), 801–813.Google Scholar

Stanfill, C., Waltz, D. 1986. Toward memory based reasoning. Communications of ACM 29(12), 1213–1228.Google Scholar

Tahir, M. A., Smith, J. 2010. Creating diverse nearest-neighbour ensembles using simultaneous metaheuristic feature selection. Pattern Recognition Letters 31(11), 1470–1480.Google Scholar

Ting, K., Witten, I. 1999. Issues in stacked generalization. Artificial Intelligence Research 10, 271–289.Google Scholar

Ting, K. M., Witten, I. H. 1997. Stacked generalization: when does it work? In Proceedings of the 15th International Joint Conference on Artificial Intelligence, Aichi, Japan, 866–871.Google Scholar

Todorovski, L., Dzeroski, S. 2000. Combining multiple models with meta decision trees. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery Table, Lyon, France, 54–64.Google Scholar

Tsoumakas, G., Partalas, I., Vlahavas, I. 2008. A taxonomy and short review of ensemble selection. ECAI: Workshop on Supervised and Unsupervised Ensemble Methods and their Applications.Google Scholar

Valentini, G. 2004. Random aggregated and bagged ensembles of SVMs: an empirical bias-variance analysis. International Workshop Multiple Classifier Systems, Lecture Notes in Computer Science 3077, 263–272.Google Scholar

Valentini, G. 2005. An experimental bias-variance analysis of svm ensembles based on resampling techniques. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 35(6), 1252–1271.Google Scholar

Vezhnevets, A., Barinova, O. 2007. Avoiding boosting overfitting by removing confusing samples. In Proceedings of the 18th European Conference on Machine Learning, Warsaw, Poland, 430–441.Google Scholar

Wang, Y., Lin, C. D. 2007. Learning by Bagging and Adaboost based on support vector machine. In Proceedings of the International Conference on Industrial Informatics, Vienna, Australia, 663–668.Google Scholar

Webb, G., Conilione, P. 2003. Estimating bias and variance from data. Technical report, School of Computer Science and Software Engineering, Monash University.Google Scholar

Webb, G. I. 2000. MultiBoosting: a technique for combining boosting and wagging. Machine Learning 40(2), 159–196.Google Scholar

Wickramaratna, J., Holden, S., Buxton, B. 2001. Performance degradation in boosting. In Proceedings of the Multiple Classifier Systems, Cambridge, UK, 11–21.Google Scholar

Woods, K., Kegelmery, W., Bowyer, K. 1997. Combination of multiple classifiers using local accuracy estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 405–410.Google Scholar

Xiao, J., He, C. 2009. Dynamic classifier ensemble selection based on GMDH. In Proceedings of the International Joint Conference on Computational Sciences and Optimization, Sanya, Hainan Island, China, 731–734.Google Scholar

Zeng, X., Chao, S., Wong, F. 2010. Optimization of bagging classifiers based on SBCB algorithm. In Proceedings of the International Conference on Machine Learning and Cybernetics, Qingdao, China, 262–267.Google Scholar

Zenko, B., Todorovski, L., Dzeroski, S. 2001. A comparison of stacking with MDTs to bagging, boosting, and other stacking methods. European Conference on Machine Learning, Workshop: Integrating Aspects of Data Mining, Decision Support and Meta-Learning, Freiburg, Germany, 163–175.Google Scholar

Zenobi, G., Cunningham, P. 2001. Using diversity in preparing ensembles of classifiers based on different features subsets to minimize generalization error. In Proceedings of the 12th European Conference on Machine Learning, Freiburg, Germany, 576–587.Google Scholar

Zhang, C., Zhang, J. 2008. A local boosting algorithm for solving classification problems. Computational Statistics & Data Analysis 52(4), 1928–1941.Google Scholar

Zhang, L., Zhou, W. 2011. Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognition 44(1), 97–106.Google Scholar

Zhiqiang, Z., Balaji, P. 2007. Constructing ensembles from data envelopment analysis. INFORMS Journal on Computing 1, 486–496.Google Scholar

Zhou, Z. 2009. When semi-supervised learning meets ensemble learning. In Proceedings of the 8th International Workshop on Multiple Classifier Systems, Reykjavik, Iceland. Springer-Verlag, 5519, 529–538.Google Scholar

Zhou, Z., Yu, Y. 2005. Adapt bagging to nearest neighbor classifiers. Computer Science and Technology 20(1), 48–54.Google Scholar

Zhu, D. 2010. A hybrid approach for efficient ensembles. Decision Support Systems 48(3), 480–487.Google Scholar

Article contents

A survey of commonly used ensemble-based classification techniques

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests