Hostname: page-component-848d4c4894-r5zm4 Total loading time: 0 Render date: 2024-06-27T18:27:04.535Z Has data issue: false hasContentIssue false

A survey of commonly used ensemble-based classification techniques

Published online by Cambridge University Press:  03 May 2013

Anna Jurek
Affiliation:
School of Computing and Mathematics University of Ulster, Jordanstown, Shore Road, Newtownabbey, Co. Antrim, BT37 0QB, UK; e-mails: jurek-a@email.ulster.ac.uk, y.bi@ulster.ac.uk, s.wu1@ulster.ac.uk, cd.nugent@ulster.ac.uk
Yaxin Bi
Affiliation:
School of Computing and Mathematics University of Ulster, Jordanstown, Shore Road, Newtownabbey, Co. Antrim, BT37 0QB, UK; e-mails: jurek-a@email.ulster.ac.uk, y.bi@ulster.ac.uk, s.wu1@ulster.ac.uk, cd.nugent@ulster.ac.uk
Shengli Wu
Affiliation:
School of Computing and Mathematics University of Ulster, Jordanstown, Shore Road, Newtownabbey, Co. Antrim, BT37 0QB, UK; e-mails: jurek-a@email.ulster.ac.uk, y.bi@ulster.ac.uk, s.wu1@ulster.ac.uk, cd.nugent@ulster.ac.uk
Chris Nugent
Affiliation:
School of Computing and Mathematics University of Ulster, Jordanstown, Shore Road, Newtownabbey, Co. Antrim, BT37 0QB, UK; e-mails: jurek-a@email.ulster.ac.uk, y.bi@ulster.ac.uk, s.wu1@ulster.ac.uk, cd.nugent@ulster.ac.uk

Abstract

The combination of multiple classifiers, commonly referred to as a classifier ensemble, has previously demonstrated the ability to improve classification accuracy in many application domains. As a result this area has attracted significant amount of research in recent years. The aim of this paper has therefore been to provide a state of the art review of the most well-known ensemble techniques with the main focus on bagging, boosting and stacking and to trace the recent attempts, which have been made to improve their performance. Within this paper, we present and compare an updated view on the different modifications of these techniques, which have specifically aimed to address some of the drawbacks of these methods namely the low diversity problem in bagging or the over-fitting problem in boosting. In addition, we provide a review of different ensemble selection methods based on both static and dynamic approaches. We present some new directions which have been adopted in the area of classifier ensembles from a range of recently published studies. In order to provide a deeper insight into the ensembles themselves a range of existing theoretical studies have been reviewed in the paper.

Type
Articles
Copyright
Copyright © Cambridge University Press 2013 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abdelazeem, S. 2008. A greedy approach for building classification cascades. In Proceedings of the Seventh International Conference on Machine Learning and Applications, San Diego, CA, USA, 115–120.Google Scholar
Altınçay, H. 2005. A Dempster-Shafer theoretic framework for boosting based ensemble design. Pattern Analysis and Applications 8(3), 287302.Google Scholar
Altınçay, H. 2004. Optimal resampling and classifier prototype selection in classifier ensembles using genetic algorithms. Pattern Analysis & Applications 7(3), 285295.Google Scholar
Batista, L., Granger, E., Sabourin, R. 2011. Dynamic ensemble selection for off-line signature verification. In Proceedings of the 10th International Conference on Multiple Classifier Systems, Naples, Italy, 157–166.Google Scholar
Bauer, E., Kohavi, R. 1999. An empirical comparison of voting classification algorithms: bagging, boosting and variants. Machine Learning 36(1–2), 105139.Google Scholar
Bi, Y., Guan, J., Bell, D. 2008. The combination of multiple classifiers using an evidential reasoning approach. Artificial Intelligence 172(15), 17311751.CrossRefGoogle Scholar
Bi, Y., Wu, S., Wang, H., Guo, G. 2011. Combination of evidence-based classifiers for text categorization. In Proceedings of the 23rd IEEE International Conference on Tools with Artificial Intelligence, Boca Raton, USA, 422–429.Google Scholar
Bostrom, H., Johansson, R., Karlsson, A. 2008. On evidential combination rules for ensemble classifiers. In Proceedings of the 11th International Conference on Information Fusion, Cologne, Germany, 1–8.Google Scholar
Breiman, L. 1996. Bagging predictors. Machine Learning 24(2), 123140.Google Scholar
Breiman, L. 1996. Heuristics of instability and stabilization in model selection. The Annals of Statistics 24(6), 23502383.CrossRefGoogle Scholar
Breiman, L. 2001. Random forest. Machine Learning 45(1), 532.Google Scholar
Bryll, R., Gutierrez-Osuna, R., Quek, F. K. 2003. Attribute bagging:improving accuracy of classiffer ensembles. Pattern Recognition 36(6), 12911302.Google Scholar
Buciu, I., Kotropoulos, C., Pitas, I. 2006. Demonstrating the stability of support vector machine for classification. Signal Processing 86(9), 23642380.Google Scholar
Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A. 2004. Ensemble selection from libraries of models. In Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 137–144.Google Scholar
Cevikalp, H., Polikar, R. 2008. Local Classifier Weighting by Quadratic Programming. IEEE Transactions on Neural Networks 19(10), 18321838.Google Scholar
Danesh, A., Moshiri, B., Fatemi, O. 2007. Improve text classification accuracy based on classifier fusion methods. International Conference on Information Fusion, Quebec, QC, Canada, 16.Google Scholar
Datta, S., Pihur, V. 2010. An adaptive optimal ensemble classifier via bagging and rank aggregation with applications to high dimensional data. BMC Bioinformatics 11(1), 427438.Google Scholar
De Stefano, C., Fontanella, F., Folino, G. 2011. A Bayesian approach for combining ensembles of GP classifiers. In Proceedings of the 10th International Conference on Multiple Classifier Systems, Naples, Italy, 26–35.Google Scholar
Diao, R., Shen, Q. 2011. Fuzzy-rough classifier ensemble selection. In Proceedings of the IEEE International Conference on Fuzzy Systems, Taipei, Taiwan, 1516–1522.Google Scholar
Didaci, L., Giacinto, G., Roli, F., Marcialis, G. 2005. A study on the performance of dynamic classifier selection based on local accuracy estimation. Pattern Recognation 38(11), 21882191.Google Scholar
Dietterich, T. 2000. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning 40(2), 139157.CrossRefGoogle Scholar
Dietterich, T. 2000. Ensemble methods in machine learning. International Workshop on Multiple Classifiers Systems, Cagliari, Italy, 1–15.Google Scholar
Dimililer, N., Varoglu, E., Altincay, H. 2007. Vote-based classifier selection for biomedical NER using genetic algorithm. In Proceedings of the 3rd Iberian Conference on Pattern Recognition and Image Analysis, Girona, Spain, 202–209.Google Scholar
Domingo, C., Watanabe, O. 2000. MadaBoost: a modification of AdaBoost. In Proceedings of the 13th Annual Conference on Computational Learning Theory, Stanford, CA, USA, 180–189.Google Scholar
Dos Santos, E. M., Sabourin, R., Maupin, P. 2008. A dynamic overproduce-and-choose strategy for the selection of classifier ensembles. Pattern Recognition 41(10), 29933009.Google Scholar
Dzeroski, S., Zenko, B. 2004. Is combining classifiers with stacking better than selecting the best one? Machine Learning 54(3), 255273.Google Scholar
Estruch, V., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M. 2004. Bagging decision multi-trees. In International Workshop on Multiple Classifier Systems, Cagliari, Italy. Springer, 41–51.Google Scholar
Folino, G., Pizzuti, C., Spezzano, G. 1999. A cellular genetic programming approach to classification. Genetic and Evolutionary Computation Conference, Orlando, Florida, 10151020.Google Scholar
Freund, Y., Schapire, R. E. 1999. A short introduction to boosting. Japanese Society for Artificial Intelligence 14(5), 771780.Google Scholar
Fürnkranz, J. 2002. Pairwise classification as an ensemble technique. In Proceedings of the 13th European Conference on Machine Learning, Helsinki, Finland, 97–110.Google Scholar
Gan, Z. G., Xiao, N. F. 2009. A new ensemble learning algorithm based on improved K-Means. International Symposium on Intelligent Information Technology and Security Informatics, Moscow, Russia, 8–11.Google Scholar
Garcia-Pedrajas, N. 2009. Constructing ensembles of classifiers by means of weighted instance selection. IEEE Transactions on Neural Networks 20(2), 258277.Google Scholar
García-Pedrajas, N., Ortiz-Boyer, D. 2009. Boosting k-nearest neighbor classifier by means of input space projection. Expert Systems with Applications 36(7), 1057010582.Google Scholar
Geem, Z. W., Kim, J. H., Loganthan, G. V. 2001. A new heuristic optimization algorithm: harmony search. Simulation 70(2), 6068.Google Scholar
Giacinto, G., Roli, F. 2001. Dynamic classifier selection based on multiple classifier behaviour. Pattern Recognition 34(9), 18791881.Google Scholar
Grove, A. J., Schuurmans, D. 1998. Boosting in the limit: maximization the margin of learned ensemble. National Conference on Artificial Intelligence, 692–699.Google Scholar
Hansen, J. 2000. Combining Predictors. Meta Machine Learning Methods and Bias/Variance & Ambiguity Decompositions. PhD dissertation, Aurhus University.Google Scholar
Hansen, L. K., Salamon, P. 1990. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(10), 9931001.Google Scholar
Hastie, T., Tibshirani, R., Friedman, J. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.Google Scholar
He, L., Song, Q., Shen, J., Hai, Z. 2010. Ensemble numeric prediction of nearest-neighbor learning. Information Technology Journal 9(3), 535544.CrossRefGoogle Scholar
Hothorn, T., Lausen, B. 2003. Double-bagging: combining classiffiers by bootstrap aggregation. Pattern Recognition 36(6), 13031309.Google Scholar
Hu, X. 2001. Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications. In Proceedings of the 1st IEEE International Conference on Data Mining, San Jose, CA, USA, 233–240.Google Scholar
Jensen, R., Shen, Q. 2009. New approach to fuzzy-rough feature selection. IEEE Transaction on Fuzzy Systems 17(4), 824838.Google Scholar
Jurek, A., Bi, Y., Wu, S., Nugent, C. 2011. Classification by cluster analysis: a new meta-learning based approach. In 10th International Workshop on Multiple Classifier Systems, Naples, Italy, 259–268.Google Scholar
Jurek, A., Bi, Y., Wu, S., Nugent, C. 2011. Classification by clusters analysis—an ensemble technique in a semi-supervised classification. In 23rd IEEE International Conference on Tools with Artificial Intelligence, Boca Raton, FL, USA, 876878.Google Scholar
Kittler, J., Hatef, M., Duin, R. P. 1998. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 226239.CrossRefGoogle Scholar
Kittler, J., Roli, F. 2001. Genetic algotirhms for multi-classifier system configuration: a case study in character recognition. In Proceedings of the 2nd International Workshop on Multiple Classifier System, Cambridge, UK, 99–108.Google Scholar
Ko, A. H., Sabourin, R., Britto, A. Jr 2007. K-Nearest Oracle for dynamic ensemble selection. In Proceedings of the 9th International Conference on Document Analysis and Recognition, Curitiba, Brazil, 422–426.CrossRefGoogle Scholar
Ko, A. H., Sabourin, R., Britto, A. S. 2008. From dynamic classifier selection to dynamic ensemble selection. Pattern Recognition 41(5), 17181731.CrossRefGoogle Scholar
Kohavi, R., Wolpert, D. 1996. Bias plus variance decomposition for zero-one loss functions. In 13th International Conference on Machine Learning, Bari, Italy, 275–283.Google Scholar
Krogh, A., Vedelsby, J. 1995. Neural network ensembles, cross validation and active learning. Advances in Neural Information Processing Systems 7, 231238.Google Scholar
Kuncheva, L., Jain, L. 2000. Designing classifier fusion systems by genetic algorithms. IEEE Transactions on Evolutionary Computation 4(4), 327336.Google Scholar
Kuncheva, L. I., Whitaker, C. J. 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51(2), 181207.Google Scholar
Kurzynski, M., Woloszynski, T., Lysiak, R. 2010. On two measures of classifier competence for dnamic ensemble selection — experimental comparative analysis. International Symposium on Communications and Information Technologies, Tokyo, Japan, 1108–1113.Google Scholar
Lam, L., Suen, C. 1995. Optimal combination of pattern classifiers. Pattern Recogn Lett 16(9), 945954.CrossRefGoogle Scholar
Li, K., Hao, L. 2009. Naïve Bayes ensemble learning based on oracle selection. In Proceedings of the 21st International Conference on Chinese Control and Decision Conference, Guilin, China, 665–670.Google Scholar
Li, X., Wang, L., Sung, E. 2005. A study of AdaBoost with SVM based weak learners. In Proceedings of the IEEE International Joint Conference on Neural Networks, Chongqing, China, 196–201.Google Scholar
Löfström, T., Johansson, U., Boström, H. 2008. On the use of accuracy and diversity measures for evaluating and selecting ensembles of classifiers. In Proceedings of the 7th International Conference on Machine Learning and Applications, San Diego, CA, USA, 127–132.Google Scholar
Machova, K., Barcak, F. 2006. A bagging method using decision trees in the role of base classifiers. Acta Polytechnica Hungarica 3(2), 121132.Google Scholar
Maclin, R. 1997. An empirical evaluation of bagging and boosting. In Proceedings of the 14th National Conference on Artificial Intelligence, Providence, Rhode Island, 546–551.Google Scholar
Marigineantu, D., Dietterich, T. 1997. Pruning adaptive boosting. In Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, USA, 211–218.Google Scholar
Melville, P., Mooney, R. 2003. Constructing diverse classifier ensemble using artificial training examples. In Proceedings of the 8th International Joint Conference on Artificial Intelligence, Acapulco, Mexico, 505–510.Google Scholar
Menahem, E., Rokach, L., Elovici, Y. 2009. Troika – an improved stacking schema for classification tasks. Information Sciences 179(24), 40974122.Google Scholar
Merler, S., Capriel, B., Furlanello, C. 2007. Parallelizing AdaBoost by weights dynamics. Computational Statistics & Data Analysis 51(5), 24872498.Google Scholar
Murrugarra-Llerena, N., Lopes, A. 2011. An adaptive graph-based K-Nearest Neighbor. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 1–11.Google Scholar
Parvin, H., Alizadeh, H. 2011. Classifier ensemble based class weightening. American Journal of Scientific Research 19, 8490.Google Scholar
Pillai, I., Fumera, G., Roli, F. 2011. Classifier selection approaches for multi-label problems. In 10th International Workshop on Multiple Classifier Systems, Naples, Italy, 167–166.Google Scholar
Reid, S., Grudic, G. 2009. Regularized linear models in stacked generalization. In Proceedings of the 8th International Workshop on Multiple Classifier Systems, 112–121.Google Scholar
Rodríguez, J. J., Maudes, J. 2008. Boosting recombined weak classifiers. Pattern Recognition Letters 29(8), 10491059.Google Scholar
Rogova, G. 1994. Combining the results of several neaural networks. Neural Networks 7(5), 777781.Google Scholar
Rokach, L. 2010. Ensemble-based classifiers. Artificial Intelligence Review 33(1–2), 139.Google Scholar
Ruta, D., Gabrys, B. 2005. Classifier selection for majority voting. Information Fusion 6(1), 6381.Google Scholar
Saeedian, M. F., Beigy, H. 2009. Dynamic classifier selection using clustering for spam detection. Symposium on Computational Intelligence and Data Mining, Nashville, TN, USA, 84–88.Google Scholar
Sait, S. M., Youssef, H. 1999. Iterative Computer Algorithms with Applications in Engineering: Solving Combinatorial Optimization Problems. Wiley-IEEE Computer Society Press.Google Scholar
Schapire, R. E., Freund, Y., Bartlett, P., Lee, W. 1998. Boosting the margin: a new explanation for the effectiveness of voting methods. Annals of Statistics 26(5), 16511686.Google Scholar
Schölkopf, B., Sung, K., Burges, C., Girosi, F., Niyogi, P., Poggio, T. 1997. Comparing support vector machines with gaussian kernels to radial basis function classifiers. IEEE Transactions on Signal Processing 45(11), 27582765.Google Scholar
Schwenk, H., Bengio, Y. 2000. Boosting neural networks. Neural Computation 12(8), 18691887.Google Scholar
Seewald, A. K. 2002. How to make stacking better and faster while also taking care of an unknown weakness. In Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, 554–561.Google Scholar
Sen, M., Erdogan, H. 2011. Max-margin Stacking and Sparse Regularization for Linear Classifier Combination and Selection. Master Thesis, Cornell University Library, New York, USA.Google Scholar
Shannon, W., Banks, D. 1999. Combining classification trees using MLE. Statististics in Medicine 18(6), 727740.Google Scholar
Shi, H., Lv, Y. 2008. An ensemble classifier based on attribute selection and diversity measure. In Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery, Shandong, China, 106–110.Google Scholar
Shin, H., Sohn, S. 2005. Selected tree classifier combination based on both accuracy and error diversity. Pattern Recognition 38(2), 191197.Google Scholar
Skurichina, M., Duin, R. P. 1998. Bagging for linear classifiers. Pattern Recognition 31(7), 909930.Google Scholar
Skurichina, M., Kuncheva, L. I., Duin, R. P. 2002. Bagging and boosting for the nearest mean classifier: effects of sample size on diversity and accuracy. In Proceedings of the Third International Workshop on Multiple Classifier Systems, Cagliari, Italy, 62–71.Google Scholar
Smits, P. 2002. Multiple classifier systems for supervised remote sensing image classification based on dynamic classifier selection. IEEE Transactions on Geoscience and Remote Sensing 40(4), 801813.Google Scholar
Stanfill, C., Waltz, D. 1986. Toward memory based reasoning. Communications of ACM 29(12), 12131228.Google Scholar
Tahir, M. A., Smith, J. 2010. Creating diverse nearest-neighbour ensembles using simultaneous metaheuristic feature selection. Pattern Recognition Letters 31(11), 14701480.Google Scholar
Ting, K., Witten, I. 1999. Issues in stacked generalization. Artificial Intelligence Research 10, 271289.Google Scholar
Ting, K. M., Witten, I. H. 1997. Stacked generalization: when does it work? In Proceedings of the 15th International Joint Conference on Artificial Intelligence, Aichi, Japan, 866–871.Google Scholar
Todorovski, L., Dzeroski, S. 2000. Combining multiple models with meta decision trees. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery Table, Lyon, France, 54–64.Google Scholar
Tsoumakas, G., Partalas, I., Vlahavas, I. 2008. A taxonomy and short review of ensemble selection. ECAI: Workshop on Supervised and Unsupervised Ensemble Methods and their Applications.Google Scholar
Valentini, G. 2004. Random aggregated and bagged ensembles of SVMs: an empirical bias-variance analysis. International Workshop Multiple Classifier Systems, Lecture Notes in Computer Science 3077, 263–272.Google Scholar
Valentini, G. 2005. An experimental bias-variance analysis of svm ensembles based on resampling techniques. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 35(6), 12521271.Google Scholar
Vezhnevets, A., Barinova, O. 2007. Avoiding boosting overfitting by removing confusing samples. In Proceedings of the 18th European Conference on Machine Learning, Warsaw, Poland, 430–441.Google Scholar
Wang, Y., Lin, C. D. 2007. Learning by Bagging and Adaboost based on support vector machine. In Proceedings of the International Conference on Industrial Informatics, Vienna, Australia, 663–668.Google Scholar
Webb, G., Conilione, P. 2003. Estimating bias and variance from data. Technical report, School of Computer Science and Software Engineering, Monash University.Google Scholar
Webb, G. I. 2000. MultiBoosting: a technique for combining boosting and wagging. Machine Learning 40(2), 159196.Google Scholar
Wickramaratna, J., Holden, S., Buxton, B. 2001. Performance degradation in boosting. In Proceedings of the Multiple Classifier Systems, Cambridge, UK, 11–21.Google Scholar
Woods, K., Kegelmery, W., Bowyer, K. 1997. Combination of multiple classifiers using local accuracy estimates. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 405410.Google Scholar
Xiao, J., He, C. 2009. Dynamic classifier ensemble selection based on GMDH. In Proceedings of the International Joint Conference on Computational Sciences and Optimization, Sanya, Hainan Island, China, 731–734.Google Scholar
Zeng, X., Chao, S., Wong, F. 2010. Optimization of bagging classifiers based on SBCB algorithm. In Proceedings of the International Conference on Machine Learning and Cybernetics, Qingdao, China, 262–267.Google Scholar
Zenko, B., Todorovski, L., Dzeroski, S. 2001. A comparison of stacking with MDTs to bagging, boosting, and other stacking methods. European Conference on Machine Learning, Workshop: Integrating Aspects of Data Mining, Decision Support and Meta-Learning, Freiburg, Germany, 163–175.Google Scholar
Zenobi, G., Cunningham, P. 2001. Using diversity in preparing ensembles of classifiers based on different features subsets to minimize generalization error. In Proceedings of the 12th European Conference on Machine Learning, Freiburg, Germany, 576–587.Google Scholar
Zhang, C., Zhang, J. 2008. A local boosting algorithm for solving classification problems. Computational Statistics & Data Analysis 52(4), 19281941.Google Scholar
Zhang, L., Zhou, W. 2011. Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognition 44(1), 97106.Google Scholar
Zhiqiang, Z., Balaji, P. 2007. Constructing ensembles from data envelopment analysis. INFORMS Journal on Computing 1, 486496.Google Scholar
Zhou, Z. 2009. When semi-supervised learning meets ensemble learning. In Proceedings of the 8th International Workshop on Multiple Classifier Systems, Reykjavik, Iceland. Springer-Verlag, 5519, 529538.Google Scholar
Zhou, Z., Yu, Y. 2005. Adapt bagging to nearest neighbor classifiers. Computer Science and Technology 20(1), 4854.Google Scholar
Zhu, D. 2010. A hybrid approach for efficient ensembles. Decision Support Systems 48(3), 480487.Google Scholar