improved algorithm for adaboost with svm base classifiers

Thông tin tài liệu

IMPROVED ALGORITHM FOR ADABOOST WITH SvM BASE CLASSIFIERS Xiaodan WANG, Chongming WU, Chunying ZHENG, Wei WANG Department of Computer Engineering, Air Force Engineering University afeu_wgyahoo.com.cn Abstract the margin[5], which could enhance AdaBoost's The relation between the performance of AdaBoost generalization capability. and the performance of base classifiers was analyzed, Support vector machine[6] was developed from the and the approach of improving the classification theory of structural risk minimization. By using a kernel performance of AdaBoostSVM was studied There is trick to map the training samples from an input space to a inconsistency existed between the accuracy and diversity high dimensional feature space, SVM finds an optimal of base classifiers, and the inconsistency affect separating hyperplane in the feature space and uses a generalization performance of the algorithm. A new regularization parameter, C, to balance its model variable c-AdaBoostSVM was proposed by adjusting the complexity and training error. kernelfunction parameter of the base classifier based on How about the generalization performance of using the distribution of training samples. The proposed SVM as the base learner of AdaBoost? Does this algorithm improves the classification performance by AdaBoost have some advantages over the existing ones? making a balance between the accuracy and diversity of Also, compared with using a single SVM, what is the base classifiers. Experimental results indicate the benefit of using this AdaBoost which is a combination of effectiveness of the proposed algorithm. multiple SVMs? These are the attractive research issues in recent years[7] [8][9] [ 1O0]. Keywords: Support Vector Machine; AdaBoost. After analyzing the relation between the performance of AdaBoost and the performance of base classifiers, the approach of improving the classification performance of 1. INTRODUCTION AdaBoost with SVM base classifiers was studied in this paper. A new variable a-AdaBoostSVM was proposed by Boosting is the machine-learning method that working adjusting the kernel function parameter of the base in Valiant's PAC (probably approximately correct) classifier based on the distribution of training samples. learning model[1]. A "weak" learning algorithm that The proposed algorithm improves the classification performs just slightly better than random guessing in the performance by making a balance between the accuracy PAC model can be "boosted" into an arbitrarily accurate and diversity of base classifiers. Experimental results for "strong" learning algorithm. Schapire[2] came up with the benchmark dataset indicate the effectiveness of the the first provable polynomial-time boosting algorithm. proposed algorithm. Freund[3] developed a much more efficient boosting algorithm which, although optimal in a certain sense, nevertheless suffered from certain practical drawbacks. 2. ADABOOST The AdaBoost algorithm, introduced by Freund and Schapire[4], solved many of the practical difficulties of Given a set of training samples, {(x1,y1), , the earlier boosting algorithms, and can be easily used in (x ,y,)} , where each training sample xi belongs to solving practical problems. AdaBoost[4] creates a . . ~~some domain or instance space X, and each class label collection of weak learners by maintaining a set of . . weights over training samples and adjusting these ylsi s i label set Y x ase X,ayinY={-1,+1}t weights after each weak learning cycle adaptively: the AdaBoost calls a given weak or base learning algorithm weights of the samples which are misclassified by current repeatedly in a series of rounds t = 1, ,T . One of the weak learner will be increased while the weights of the main ideas of the algorithm is to maintain a distribution samples which are correctly classified will be decreased. or set of weights over the training set. The weight of this The success of AdaBoost can be explained as enlarging distribution on training example xi on round t is Proc. 5th IEEE Int. Conf. on Cognitive Informatics (ICCI'06) Y.Y. Yao, Z.Z. Shi, Y. Wang, and W. Kinsner (Eds.)94 1 -4244-0475-4/06/$20.OO @2006 IEEE94 denoted w, (i), i.e. w, (i) is the weight of sample xi at 3. Do for t = I ,T the iteration round t . Initially, all weights are set (1) Train the base classifier Ct on the weighted equally, but on each round, the weights of incorrectly training sample set; Alternatively, a subset of classified examples are increased so that the base learner the training examples can be sampled is forced to focus on the hard examples in the training according to wt, and these resampled set. examples are used to train the base learner The base learner's job is to find a base classifier Ct, C1. The decision function of C is ht. ht is the decision function of base classifier Ct, and (2) Calculate the training error e of C: ht(xi) gives the class label (+1 or -1) of the training n sample xi, In the simplest case, the range of each ht is 8t = W, (i), Y1 . '1 (xi) binary, i.e., restricted to {-1, +1 }; the base learner's job (3) Set weight of base classifier C,: then is to minimize the error: £, = Pri- [h,(xi) . y]= w,(i) (1) '= -n( t) t ~ ~ ~ ih()y2 e Notice that the error is measured with respect to the (4) Update training samples' weights: distribution w, on which the base learner was trained. In Wt+i w, (i) exp{-at,yh, (xi)} practice, the base learner may be an algorithm that can Zt use the weights wt on the training examples. wt (i) Fe if yi =ht (xi) Alternatively, when this is not possible, a subset of the Z X Lea if y; . h, (x1) training examples can be sampled according to wt, and these (unweighted) resampled examples can be used to Where Z, is a normalization factor, and E w,+1 (i) = 1 train the base learner. Both resampling and reweighting can be used to train AdaBoost. 4. Output: the final classifier: Once the base classifier ht has been received, H(x) = sign[T a,h (x)] AdaBoost chooses a parameter at that intuitively t=1 measures the importance that is assigned to ht . For The most basic theoretical property of AdaBoost binary ht . asinteorignaldecriptinofAconcerns its ability to reduce the training error, i.e., the bivenary hre,dasn theporigina dcption oA o fraction of mistakes on the training set. Let the error et given by Freund and Schapire[4], typically set a 1-8t of ht be et =1/2- yt, Yt measures how much better 2t = -In( ' (2) than random (which has an error rate of 1/2) are h, 's Note that at > 0 if et <1/2 , and that a, gets larger classifications. Freund and Schapire[4] prove that the training error (the fraction of mistakes on the training set) as et gets smaller. of the final hypothesis H is at most: The distribution w, is then updated using the rule T T T t f I ~~~~J7[ 2 Ve,(1-e,)] II- 4 yt2 . exp(-2>,yt) (3) shown in the algorithm. The effect of this rule is to t=, t=, t=1 increase the weight of examples misclassified by ht, and Thus, if each base hypothesis is slightly better than to decrease the weight of correctly classified examples. random so that Yt 2 y for some y > 0, then the training Thus, the weight tends to concentrate on "hard" error drops exponentially fast. examples. Previous boosting algorithms required that such a The final or combined classifier H is a weighted lower bound y be known a priori before boosting begins. majority vote of the T base classifiers where at is the In practice, knowledge of such a bound is very difficult weight that assigned to ht . to obtain. AdaBoost, on the other hand, is adaptive in that The algorithm for AdaBoost is given below: it adapts to the error rates of the individual base 1. Input: a set of training samples with labels hypotheses. This is the basis of its name - "Ada" is short D= {(xl,y1), .,(xYn)} , xi , Y = 1-1,+1. for "adaptive". Base learner algorithm, the number of cycles T. 2. Initialize: the weight of samples: w1 (i) = 1 / n, for alli =l,*, n 949 3. AN IMPROVED ALGORITHM FOR In order to avoid the problem resulting from using a ADABOOST WITH SVM BASE single and fixed a to all RBFSVM base classifiers, and CLASSIFIERS get good classification performance, it is necessary to find suitable a for each base classifier RBFSVM. Because if a roughly suitable C is given and the variance Diversity is known to be an important factor affecting of the training samples is used as the Gaussian width a of the generalization performance of ensemble methods RBFSVM, the SVM may get comparable good [11][12], which means that the errors made by different classification performance, we will use the variance of base classifiers are uncorrelated. If each base classifier is the training samples for each base classifier as the moderately accurate and these base classifiers disagree Gaussian width a of RBFSVM in this paper, this will with each other, the uncorrelated errors of these base generate a set of moderately accurate RBFSVM classifiers will be removed by the voting process so as to classifiers for AdaBoost, and an improved algorithm will achieve good ensemble results[13]. This also applies to be obtained, we call it the variable o- AdaBoost. AdaBoostRBFSVM. Studies that using SVM as the base learner of In the proposed AdaBoostSVM, the obtained SVM AdaBoost have been reported[7][8][9][10]. These studies base learners are mostly moderately accurate, which give showed the good generalization performance of chances to obtain more un-correlated base learners. AdaBoost. Through adjusting the a value according to the variance For AdaBoost, it is known that there exists a dilemma of the training samples, a set of SVM base learners with between base learner's accuracy and diversity[14], which different learning abilities is obtained. The proposed means that the more accurate two base learners become, variable G-AdaBoostRBFSVM is hoped to achieve the less they can disagree with each other. Therefore, higher generalization performance than AdaBoostSVM how to select SVM base learners for AdaBoost? Select which using a single and fixed a to all RBFSVM base accurate but not diverse base learners? Or select diverse classifiers. In the proposed algorithm, without loss of but not too accurate ones? generality, re-sampling technique is used. Suppose we can keep an accuracy and diversity The algorithm for variable a -AdaBoostRBFSVM: balance among different base classifiers, the superior 1. Input: a set of training samples with labels result of AdaBoost will be gotten, but there is no D= {(x, , yI ), - , (x, yJ )}, xi E X, y1 E Y = {-1,+1}. effective way to get a desirable result for AdaBoost. We will then analyze the instance of using RBFSVM as the Bs hlerer the number of cycles T. base classifier of AdaBoost. 2. Initialize: the weight of samples: w1 (i) = 1/n, for The problem of model selection is very important for all i = 1, , n SVM, the classification performance of SVM is affected 3. Do for t = 1 'T by its parameters. For RBFSVM, they are the Gaussian (1) A subset of the training examples can be width a, and the regularization parameter C. The variation of either of them leads to the change of sampled accrdn tonwtitand these classification performance. However, as reported in [7] resampled examples constitute the new although RBFSVM cannot learn well when a very low training data set d1, dt will be used to train value of C is used, its performance largely depends on the base classifier Ct . The decision function the a value if a roughly suitable C is given. of C is h How to set the a value for the base learners when t using the RBFSVM as base learner for AdaBoost? (2) Calculate the variance a of dt: Problems are encountered when applying a single and a=sqrt(mean(var(dt ))). fixed a to all RBFSVM base learners. In detail[10], an (3) Using dt as the training sample set, a as over-large a often results in too weak RBFSVM. Its classification accuracy is often less than 5000 and cannot G t meet the requirement on a base learner given in the RBFSVM with Gaussian width (, and AdaBoost; On the other hand, a smaller a often makes ht is the decision function of Ct; the RBFSVM stronger and boosting them may become (4) Calculate the training error et of Ct: inefficient because the errors of these base learners are n highly correlated. Furthermore, too small a can even Ct =wt (i), Yi #ht(xi); make RBFSVM overfit the training samples and they also cannot be used as base learners. Hence, finding a suitable a for AdaBoost with SVM base learners becomes a problem[10]. 950 (5) Set weight of base classifier Ct number of training samples, and axis Y gives the correct 1 1- classification rates. In Fig.1, Ada-SVM stands for at =-ln( t); AdaBoostSVM, and improved Ada-SVM stands for 2 Et variable a-AdaBoostRBF SVM. (6) Update training samples' weights: 90 - w, (i) exp {-ar,yih, (xi )} wt+1 W 8 Where Z, is a normalization factor, and 5/+mprvdAaV 427 w,+1 (i) = 1; * 4 i rn 10 0 2i 0O 2ai 300 3;iO 400 4ai ai0Z t+ n ~~~~~~~~~~~~~~~~~~NuberfThaig Saples 4.Otu:tefia lsiir Fig. 1. Performance comparison for Westontoynonliner H(x) = seign[ ah (xh)]. t 61 4. EXPERIMENTS AND RESU LTS .S t Ada~~~~~~~~~~~~~~~~~~~~~~~~- SVM Impove Ada-SV To evaluate the performance of the variable ci- 79 ~Ad-V AdaBoostRBFSVM, and make a experimental 7 J100 1i comparison between AdaBoostSVM which using a single Fig.2 Performbance compailes Fig. 2. Performance comparison for Wetnooiner and fixed a to all RBFSVM base classifiers and our improved algorithm, experiments for the Westontoynon- For the Wine data set, the training and testing samples liner data set and the Wine data set[8] were conducted, were also chosen randomly from the given datasets, and the results of the classification experiments are 50,80,100,130,150 are the numbers of training samples given. used in the experiments, and 79 is the number of testing The SVM we used is from Steve Gunn SVM Toolbox. samples used in the experiments. The Westontoynonliner data set consists of 1000 samples For SVM and AdaBoostSVM, set the Gaussian width of 2 classes, each sample having 52 attributes. The Wine ai of RBFSVM to 2, 6, and 12, the average correct data set consists of 178 samples of 3 classes, each sample classification rates for randomly chosen testing data sets having 13 attributes, class 1 is used as the positive class are calculated. and the other two classes belong to the negative class in For variable ci-AdaBoostRBFSVM and AdaBoost- the classification experiments. SVM, 1/2-1/8 of the training samples are used to train the The training of SVMs for the variable ci- base classifiers, and the average correct classification AdaBoostRBFSVM, AdaBoostSVM and SVM are under rates for 3 randomly chosen testing data sets are the same parameter when comparing the performance of calculated. the algorithms, C= 1000. For SVM and AdaBoostSVM, Fig.2 gives the results of performance comparison for set the Gaussian width ai of RBFSVM to 12. Let T be the the Wine data set, axis X indicates the number of training number of base classifiers and T=10 in the experiments, samples, and axis Y gives the correct classification rates. For the Westontoynonliner data set, the training and In Fig.2, Ada-SVM stands for AdaBoostSVM, and testing samples are chosen randomly from the given improved Ada-SVM stands for variable ci-AdaBoost- datasets, 50,150,200,300,500 are the numbers of training RB3FSVM. samples used in the experiments, and 128 is the number From Fig. 1 and Fig.2 we can know that comparing of testing samples used in the experiments. AdaBoostSVM with a single SVM, they have almost the For variable ci-AdaBoostRBFSVM and AdaBoost- same classification performances, but our improved SVM, 1/2-1/10 of the training samples are used to train AdaBoostRBFSVM improves the average correct the base classifiers, and the average correct classification classification rates obviously. rates for 3 randomly chosen testing data sets are For the Wine data set, the distribution of the training calculated. samples is unbalanced, because there are 59 training Fig. 1 gives the results of performance comparison for samples in the positive class and 119 training samples in the Westontoynonliner data set, axis X indicates the the negative class. From Fig.2 we can know that the 951 variable a-AdaBoostRBFSVM is more efficient for [4] Y. Freund, R. E. Schapire, "A decision-theoretic unbalanced data set. generalization of online learning and an application to Compared with using a single SVM, the benefit of boosting", Journal of Computer and System Sciences, using the improved AdaBoostRBFSVM is its advantage vol. 55,no. 1, pp.119-139, August 1997. of model selection; and compared with using AdaBoost [5] R. E. Schapire, Y. Singer, P. Bartlett, and W. Lee, of a single and fixed a to all RBFSVM base classifiers, it "Boosting the margin: A new explanation for the has better generation performance. effectiveness of voting methods," The Annals of Statistics, vol. 26, no. 5, pp. 1651-1686, 1998. [6] Vladimir Vapnik, Statistical Learning Theory, John 5. CONCLUSIONS Wiley and Sons Inc., New York, 1998. [7] G. Valentini, T. G. Dietterich, "Bias-variance AdaBoost is a general method for improving the analysis of support vector machines for the accuracy of any given learning algorithm. After development of svm-based ensemble methods", analyzing the relation between the performance of Journal of Machine Learning Research, vol. 5, pp. AdaBoost and the performance of base classifiers, the 725-775, 2004. approach of improving the classification performance of [8] Dmitry Pavlov, Jianchang Mao, "Scaling-up Support AdaBoostSVM was studied in this paper. Vector Machines Using Boosting Algorithm", In There is inconsistency existed between the accuracy Proceedings of ICPR 2000. and diversity of base classifiers, and the inconsistency [9] Hyun-Chul Kim, Shaoning Pang, Hong-Mo Je, affect generalization performance of the algorithm. How Daijin Kim, and Sung Yang Bang, "Constructing to deal with the dilemma between base classifier's support vector machine ensemble," Pattern accuracy and diversity of AdaBoost is very important for Recognition, vol. 36, no. 12, pp. 2757-2767, Dec improving the performance of AdaBoost. A new variable 2003. a-AdaBoostSVM was proposed by adjusting the kernel [10]Xuchun Li, Lei Wang, Eric Sung, "A Study of function parameter of the base classifier based on the AdaBoost with SVM Based Weak Learners", In distribution of training samples. The proposed algorithm Proceedings of IJCNN 2005. improves the classification performance by making a [11] P. Melville, R. J. Mooney, "Creating diversity in balance between the accuracy and diversity of base ensembles using articial data", Information Fusion, classifiers. Experimental results for the benchmark vol. 6, no. 1,pp. 99-111, Mar 2005. dataset indicate the effectiveness of the proposed [12] I. K. Ludmila, J. W. Christopher, "Measures of algorithm. The experimental results also indicate that the diversity in classifier ensembles and their relationship proposed algorithm is more efficient for unbalanced data with the ensemble accuracy", Machine Learning, vol. set. 51, no. 2,pp. 181-207, May 2003. [13] H. W. Shin and S. Y. Sohn, "Selected tree classifier combination based on both accuracy and error diversity," Pattern Recognition, vol. 38, pp. 191-197, Acknowledgements 2005. [14] Thomas G. Dietterich, "An experimental This work is supported by Natural Science Basic comparison of three methods for constructing Research Plan in Shaanxi Province of China under Grant ensembles of decision trees: Bagging, boosting, and 2004F36, and partially supported by NSFC under Grant randomization," Machine Learning, vol. 40, no. 2, 50505051. pp. 139-157, Aug 2000. References [1] L. G. Valiant, "A theory of the learnable", Communications of the ACM, vol. 27, no. 11, pp.1134-1142, November 1984. [2] R. E. Schapire, "The strength of weak learnability", Machine Learning, vol. 5, no. 2, pp. 197- 227, 1990. [3] Yoav Freund, "Boosting a weak learning algorithm by majority", Information and Computation, vol. 121, no. 2, pp.256-285, 1995. 952 . be known a priori before boosting begins. majority vote of the T base classifiers where at is the In practice, knowledge of such a bound is very difficult weight that assigned to ht . to obtain. AdaBoost, on the other hand, is adaptive in that The algorithm for AdaBoost is given below: it adapts to the error rates of the individual base 1. Input: a set of training samples with labels hypotheses. This is the basis of its name - "Ada" is short D= {(xl,y1), .,(xYn)} , xi , Y = 1-1,+1. for "adaptive". Base learner algorithm, the number of cycles T. 2. Initialize: the weight of samples: w1 (i) = 1 / n, for alli =l,*, n 949 3. AN IMPROVED ALGORITHM FOR In order to avoid the problem resulting from using a ADABOOST WITH SVM BASE single and fixed a to all RBFSVM base classifiers, and CLASSIFIERS get good classification performance, it is necessary to find suitable a for each base classifier RBFSVM. Because if a roughly suitable C is given and the variance Diversity is known to be an important factor affecting of the training samples is used as the Gaussian width a of the generalization performance of ensemble methods RBFSVM, the SVM may get comparable good [11][12], which means that the errors made by different classification performance, we will use the variance of base classifiers are uncorrelated. If each base classifier is the training samples for each base classifier as the moderately accurate and these base classifiers disagree Gaussian width a of RBFSVM in this paper, this will with each other, the uncorrelated errors of these base generate a set of moderately accurate RBFSVM classifiers will be removed by the voting process so as to classifiers for AdaBoost, and an improved algorithm will achieve good ensemble results[13]. This also applies to be obtained, we call it the variable o- AdaBoost. AdaBoostRBFSVM. Studies that using SVM as the base learner of In the proposed AdaBoostSVM, the obtained SVM AdaBoost have been reported[7][8][9][10]. These studies base learners are mostly moderately accurate, which give showed the good generalization performance of chances to obtain more un-correlated base learners. AdaBoost. Through adjusting the a value according to the variance For AdaBoost, it is known that there exists a dilemma of the training samples, a set of SVM base learners with between base learner's accuracy and diversity[14], which different learning abilities is obtained. The proposed means that the more accurate two base learners become, variable G-AdaBoostRBFSVM is hoped to achieve the less they can disagree with each other. Therefore, higher generalization performance than AdaBoostSVM how to select SVM base learners for AdaBoost? Select which using a single and fixed a to all RBFSVM base accurate but not diverse base learners? Or select diverse classifiers. In the proposed algorithm, without loss of but not too accurate ones? generality, re-sampling technique is used. Suppose we can keep an accuracy and diversity The algorithm for variable a -AdaBoostRBFSVM: balance among different base classifiers, the superior 1. Input: a set of training samples with labels result of AdaBoost will be gotten, but there is no D= {(x, , yI ), - , (x, yJ )}, xi E X, y1 E Y = {-1,+1}. effective way to get a desirable result for AdaBoost. We will then analyze the instance of using RBFSVM as the Bs hlerer the number of cycles T. base classifier of AdaBoost. 2. Initialize: the weight of samples: w1 (i) = 1/n, for The problem of model selection is very important for all i = 1,. be known a priori before boosting begins. majority vote of the T base classifiers where at is the In practice, knowledge of such a bound is very difficult weight that assigned to ht . to obtain. AdaBoost, on the other hand, is adaptive in that The algorithm for AdaBoost is given below: it adapts to the error rates of the individual base 1. Input: a set of training samples with labels hypotheses. This is the basis of its name - "Ada" is short D= {(xl,y1), .,(xYn)} , xi , Y = 1-1,+1. for "adaptive". Base learner algorithm, the number of cycles T. 2. Initialize: the weight of samples: w1 (i) = 1 / n, for alli =l,*, n 949 3. AN IMPROVED ALGORITHM FOR In order to avoid the problem resulting from using a ADABOOST WITH SVM BASE single and fixed a to all RBFSVM base classifiers, and CLASSIFIERS get good classification performance, it is necessary to find suitable a for each base classifier RBFSVM. Because if a roughly suitable C is given and the variance Diversity is known to be an important factor affecting of the training samples is used as the Gaussian width a of the generalization performance of ensemble methods RBFSVM, the SVM may get comparable good [11][12], which means that the errors made by different classification performance, we will use the variance of base classifiers are uncorrelated. If each base classifier is the training samples for each base classifier as the moderately accurate and these base classifiers disagree Gaussian width a of RBFSVM in this paper, this will with each other, the uncorrelated errors of these base generate a set of moderately accurate RBFSVM classifiers will be removed by the voting process so as to classifiers for AdaBoost, and an improved algorithm will achieve good ensemble results[13]. This also applies to be obtained, we call it the variable o- AdaBoost. AdaBoostRBFSVM. Studies that using SVM as the base learner of In the proposed AdaBoostSVM, the obtained SVM AdaBoost have been reported[7][8][9][10]. These studies base learners are mostly moderately accurate, which give showed the good generalization performance of chances to obtain more un-correlated base learners. AdaBoost. Through adjusting the a value according to the variance For AdaBoost, it is known that there exists a dilemma of the training samples, a set of SVM base learners with between base learner's accuracy and diversity[14], which different learning abilities is obtained. The proposed means that the more accurate two base learners become, variable G-AdaBoostRBFSVM is hoped to achieve the less they can disagree with each other. Therefore, higher generalization performance than AdaBoostSVM how to select SVM base learners for AdaBoost? Select which using a single and fixed a to all RBFSVM base accurate but not diverse base learners? Or select diverse classifiers. In the proposed algorithm, without loss of but not too accurate ones? generality, re-sampling technique is used. Suppose we can keep an accuracy and diversity The algorithm for variable a -AdaBoostRBFSVM: balance among different base classifiers, the superior 1. Input: a set of training samples with labels result of AdaBoost will be gotten, but there is no D= {(x, , yI ), - , (x, yJ )}, xi E X, y1 E Y = {-1,+1}. effective way to get a desirable result for AdaBoost. We will then analyze the instance of using RBFSVM as the Bs hlerer the number of cycles T. base classifier of AdaBoost. 2. Initialize: the weight of samples: w1 (i) = 1/n, for The problem of model selection is very important for all i = 1,. , n SVM, the classification performance of SVM is affected 3. Do for t = 1 'T by its parameters. For RBFSVM, they are the Gaussian (1) A subset of the training examples can be width a, and the regularization parameter C. The variation of either of them leads to the change of sampled accrdn tonwtitand these classification performance. However, as reported in [7] resampled examples constitute the new although RBFSVM cannot learn well when a very low training data set d1, dt will be used to train value of C is used, its performance largely depends on the base classifier Ct . The decision function the a value if a roughly suitable C is given. of C is h How to set the a value for the base learners when t using the RBFSVM as base learner for AdaBoost? (2) Calculate the variance a of dt: Problems are encountered when applying a single and a=sqrt(mean(var(dt ))). fixed a to all RBFSVM base learners. In detail[10], an (3) Using dt as the training sample set, a as over-large a often results in too weak RBFSVM. Its classification accuracy is often less than 5000 and cannot G t meet the requirement on a base learner given in the RBFSVM with Gaussian width (, and AdaBoost; On the other hand, a smaller a often makes ht is the decision function of Ct; the RBFSVM stronger and boosting them may become (4) Calculate the training error et of Ct: inefficient because the errors of these base learners are n highly correlated. Furthermore, too small a can even Ct =wt (i), Yi #ht(xi); make RBFSVM overfit the training samples and they also cannot be used as base learners. Hence, finding a suitable a for AdaBoost with SVM base learners becomes a problem[10]. 950 (5) Set weight of base classifier Ct number of training samples, and axis Y gives the correct 1 1- classification rates. In Fig.1, Ada -SVM stands for at =-ln( t); AdaBoostSVM, and improved Ada -SVM stands for 2 Et variable a-AdaBoostRBF SVM. (6) Update training samples' weights: 90 - w, (i) exp {-ar,yih, (xi )} wt+1 W 8 Where Z, is a normalization factor, and 5/+mprvdAaV 427 w,+1 (i) = 1; * 4 i rn 10 0 2i 0O 2ai 300 3;iO 400 4ai ai0Z t+ n ~~~~~~~~~~~~~~~~~~NuberfThaig Saples 4.Otu:tefia lsiir Fig. 1. Performance comparison for Westontoynonliner H(x) = seign[ ah (xh)]. t 61 4. EXPERIMENTS AND RESU LTS .S t Ada~~~~~~~~~~~~~~~~~~~~~~~~- SVM Impove Ada-SV To evaluate the performance of the variable ci- 79 ~Ad-V AdaBoostRBFSVM, and make a experimental 7 J100 1i comparison between AdaBoostSVM which using a single Fig.2 Performbance compailes Fig. 2. Performance comparison for Wetnooiner and fixed a to all RBFSVM base classifiers and our improved algorithm, experiments for the Westontoynon- For the Wine data set, the training and testing samples liner data set and the Wine data set[8] were conducted, were also chosen randomly from the given datasets, and the results of the classification experiments are 50,80,100,130,150 are the numbers of training samples given. used in the experiments, and 79 is the number of testing The SVM we used is from Steve Gunn SVM Toolbox. samples used in the experiments. The Westontoynonliner data set consists of 1000 samples For SVM and AdaBoostSVM, set the Gaussian width of 2 classes, each sample having 52 attributes. The Wine ai of RBFSVM to 2, 6, and 12, the average correct data set consists of 178 samples of 3 classes, each sample classification rates for randomly chosen testing data sets having 13 attributes, class 1 is used as the positive class are calculated. and the other two classes belong to the negative class in For variable ci-AdaBoostRBFSVM and AdaBoost- the classification experiments. SVM, 1/2-1/8 of the training samples are used to train the The training of SVMs for the variable ci- base classifiers, and the average correct classification AdaBoostRBFSVM, AdaBoostSVM and SVM are under rates for 3 randomly chosen testing data sets are the same parameter when comparing the performance of calculated. the algorithms, C= 1000. For SVM and AdaBoostSVM, Fig.2 gives the results of performance comparison for set the Gaussian width ai of RBFSVM to 12. Let T be the the Wine data set, axis X indicates the number of training number of base classifiers and T=10 in the experiments, samples, and axis Y gives the correct classification rates. For the Westontoynonliner data set, the training and In Fig.2, Ada -SVM stands for AdaBoostSVM, and testing samples are chosen randomly from the given improved Ada -SVM stands for variable ci -AdaBoost- datasets, 50,150,200,300,500 are the numbers of training RB3FSVM. samples used in the experiments, and 128 is the number From Fig. 1 and Fig.2 we can know that comparing of testing samples used in the experiments. AdaBoostSVM with a single SVM, they have almost the For variable ci-AdaBoostRBFSVM and AdaBoost- same classification performances, but our improved SVM, 1/2-1/10 of the training samples are used to train AdaBoostRBFSVM improves the average correct the base classifiers, and the average correct classification classification rates obviously. rates for 3 randomly chosen testing data sets are For the Wine data set, the distribution of the training calculated. samples is unbalanced, because there are 59 training Fig. 1 gives the results of performance comparison for samples in the positive class and 119 training samples in the Westontoynonliner data set, axis X indicates the the negative class. From Fig.2 we can know that the 951 variable a-AdaBoostRBFSVM is more efficient for [4] Y. Freund, R. E. Schapire, "A decision-theoretic unbalanced data set. generalization of online learning and an application to Compared with using a single SVM, the benefit of boosting", Journal of Computer and System Sciences, using the improved AdaBoostRBFSVM is its advantage vol. 55,no. 1, pp.119-139, August 1997. of model selection; and compared with using AdaBoost [5] R. E. Schapire, Y. Singer, P. Bartlett, and W. Lee, of a single and fixed a to all RBFSVM base classifiers, it "Boosting the margin: A new explanation for the has better generation performance. effectiveness of voting methods," The Annals of Statistics, vol. 26, no. 5, pp. 1651-1686, 1998. [6] Vladimir Vapnik, Statistical Learning Theory, John 5. CONCLUSIONS Wiley and Sons Inc., New York, 1998. [7] G. Valentini, T. G. Dietterich, "Bias-variance AdaBoost is a general method for improving the analysis of support vector machines for the accuracy of any given learning algorithm. After development of svm- based ensemble methods", analyzing the relation between the performance of Journal of Machine Learning Research, vol. 5, pp. AdaBoost and the performance of base classifiers, the 725-775, 2004. approach of improving the classification performance of [8] Dmitry Pavlov, Jianchang Mao, "Scaling-up Support AdaBoostSVM was studied in this paper. Vector Machines Using Boosting Algorithm& quot;, In There is inconsistency existed between the accuracy Proceedings of ICPR 2000. and diversity of base classifiers, and the inconsistency [9] Hyun-Chul Kim, Shaoning Pang, Hong-Mo Je, affect generalization performance of the algorithm. How Daijin Kim, and Sung Yang Bang, "Constructing to deal with the dilemma between base classifier's support vector machine ensemble," Pattern accuracy and diversity of AdaBoost is very important for Recognition, vol. 36, no. 12, pp. 2757-2767, Dec improving the performance of AdaBoost. A new variable 2003. a-AdaBoostSVM was proposed by adjusting the kernel [10]Xuchun Li, Lei Wang, Eric Sung, "A Study of function parameter of the base classifier based on the AdaBoost with SVM Based Weak Learners", In distribution of training samples. The proposed algorithm Proceedings of IJCNN 2005. improves the classification performance by making a [11] P. Melville, R. J. Mooney, "Creating diversity in balance between the accuracy and diversity of base ensembles using articial data", Information Fusion, classifiers. Experimental results for the benchmark vol. 6, no. 1,pp. 99-111, Mar 2005. dataset indicate the effectiveness of the proposed [12] I. K. Ludmila, J. W. Christopher, "Measures of algorithm. The experimental results also indicate that the diversity in classifier ensembles and their relationship proposed algorithm is more efficient for unbalanced data with the ensemble accuracy", Machine Learning, vol. set. 51, no. 2,pp. 181-207, May 2003. [13] H. W. Shin and S. Y. Sohn, "Selected tree classifier combination based on both accuracy and error diversity," Pattern Recognition, vol. 38, pp. 191-197, Acknowledgements 2005. [14] Thomas G. Dietterich, "An experimental This work is supported by Natural Science Basic comparison of three methods for constructing Research Plan in Shaanxi Province of China under Grant ensembles of decision trees: Bagging, boosting, and 2004F36, and partially supported by NSFC under Grant randomization," Machine Learning, vol. 40, no. 2, 50505051. pp. 139-157, Aug 2000. References [1] L. G. Valiant, "A theory of the learnable", Communications of the ACM, vol. 27, no. 11, pp.1134-1142, November 1984. [2] R. E. Schapire, "The strength of weak learnability", Machine Learning, vol. 5, no. 2, pp. 197- 227, 1990. [3] Yoav Freund, "Boosting a weak learning algorithm by majority", Information and Computation, vol. 121, no. 2, pp.256-285, 1995. 952

Ngày đăng: 24/04/2014, 13:05

Xem thêm: improved algorithm for adaboost with svm base classifiers, improved algorithm for adaboost with svm base classifiers

improved algorithm for adaboost with svm base classifiers

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan