25 June 2015
Ensemble techniques in machine learning
This talk can be considered as a brief introduction to collective (ensemble) learning methods, which is one of the most prominent fields of modern machine learning. Basically, an ensemble classifier (also known as a committee) is a collective (ensemble) of rather rough classifiers whose decisions are aggregated into the overall decision using the simple majority voting technique. Nevertheless, as it follows from the famous results by R.Schapire, Y.Freund, L.Breiman, V.Mazurov et al., such a simple aggregation rule has a wonderful power in increasing of performance of resulting ensembles. Generally speaking, for any given accuracy bound, to produce a committee decision rule having misclassification probability less than this bound, it is sufficient to provide a class of weak classifiers, such that, for any finite training sample, there is a classifier making it’s decision better then random guessing. This shining result accompanying by theory of V. Vapnik and A. Chervonenkis provides a possibility to construct decision rules and forecasting technique of very high performance. There are several methods to produce such an ensembles, among them are boosting by R. Schapire and Y. Freund, bagging by L.Breiman, and committee method by V. Mazurov. In the presentation, the famous Adaboost algorithm and Mazurov’s committee method closely related to it are the main objects of interest. The subjects under consideration are non-formal description of the algorithms, illustrative examples, curse of overfitting during the leaning by finite training sample, and the ability of boosting algorithms to overcome it automatically by maximizing of learning margin.
Approximability to the Multiple Salesmen Problem
Multiple Salesmen Problem appears to be a natural generalization of the well-known Traveling Salesman combinatorial optimization problem (TSP) and is closely related to multiple statements of Vehicle Routing Problem (VRP). All these problems are strongly NP-hard, i.e. they can not be solved efficiently unless P = NP . Also, all of them are hardly approximable in general case. But, for more specific settings of TSP and VRP, there are known efficient approximation algorithms with theoretically proven performance guarantees and even polynomial time approximation schemes (PTAS). A combinatorial minimization problem has PTAS if for any instance of the problem and for any given ε > 0 the optimum can be approximated within a factor 1 + ε in polynomial time of the instance length (but depending on 1/ε).
It is shown that
(i) Multiple Salesmen Problem is strongly NP-hard and hardly approximable in general case;
(ii) Metric setting of the problem can be approximated within an attainable factor 2;
(iii) Euclidean setting of the problem has PTAS for any fixed dimension.
International Institute for Applied Systems Analysis (IIASA)
Schlossplatz 1, A-2361 Laxenburg, Austria
Phone: (+43 2236) 807 0 Fax:(+43 2236) 71 313