A Study of Generalization in Genetic Programming = Nghiên cứu khả năng khái quát hóa của lập trình di truyền (tóm tắt + toàn văn)

27 454 0
A Study of Generalization in Genetic Programming = Nghiên cứu khả năng khái quát hóa của lập trình di truyền (tóm tắt + toàn văn)

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

MINISTRY OF NATIONAL DEFENSE MILITARY TECHNICAL ACADEMY NGUYEN THI HIEN A STUDY OF GENERALIZATION IN GENETIC PROGRAMMING Specialized in: Applied Mathematics and Computer Science Code: 60 46 01 10 SUMMARY OF THE THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN MATHEMATICS Hanoi –2014 THE THESIS IS SUBMITTED TO MILITARY TECHNICAL ACADEMY - MINISTRY OF NATIONAL DEFENSE Academic supervisors: 1. Assoc/Prof. Dr. Nguyen Xuan Hoai 2. Prof. Dr. R.I. (Bob) McKay Reviewer 1: Prof. Dr. Vu Duc Thi Reviewer 2: Assoc/Prof. Dr. Nguyen Duc Nghia Reviewer 3: Dr. Nguyen Van Vinh The thesis was evaluated by the examination board of the academy by the decision number 1767/QĐ-HV, 1 st July 2014 of the Rector of Military Technical Academy, meeting at Military Technical Academy on day month year This thesis can be found at: - Library of Military Technical Academy - National Library of Vietnam INTRODUCTION Genetic programming (GP) paradigm was first proposed by Koza (1992) can be viewed as the discovery of computer programs which produce desired outputs for particular inputs. Despite advances in GP research, when it is applied to learning tasks, the issue of generalization of GP has not been taken the attention that it deserves. This thesis focuses on the generalization aspect of GP and proposes mechanisms to improve GP generalization ability. 1. Motivations GP has been proposed as a machine learning (ML) method. The main goal when using GP or other ML techniques is not only to create a program that exactly cover the training examples, but also to have a good generalization ability Although, recently, the issue of generalization in GP has received more attention that it deserves, the use of more traditional ML techniques has been rather limited. It is hoped that adapting ML techniques to GP would help to improve GP generalization ability. 2. Research Perspectives The majority of theoretical work has been derived from exper- imentation. The approach taken in this thesis is also based on the careful designed experiments and the analysis of experimental results. 3. Contributions of the Thesis This thesis makes the following contributions: 1) An early stopping approach for GP that is based on the estimate of the generalization error as well as propose some 1 new criteria apply to early stopping for GP. The GP training process will be stopped early at the time which promises to bring the solution with the smallest generalization error. 2) A progressive sampling framework for GP, which divides GP learning process into several layers with the training set size, starting from being small, gradually get bigger at each layer. The termination of training on each layer will be based on certain early stopping criteria. 4. Organization of the Thesis Chapter 1 first introduces the basic components of GP as well as some benchmark problem domains. Two important research is- sues and a metaphor of GP search are then discussed. Next, the major issues of GP are discussed especially when the GP is consid- ered as a ML system. It first overviews the approaches proposed in the literature that help to improve the generalization ability of GP. Then, the solution complexity (code bloat), in the particular context of GP, and its relations to GP learning performance are discussed. Chapter 2 provides a backgrounds on a number of con- cepts and techniques subsequently used in the thesis for improv- ing GP learning performance. They include progressive sampling, layer learning, and early stopping in ML. Chapter 3 presents one of the main contributions of the thesis. It proposes some criteria used to determine when to stop the training of GP with an aim to avoid over-fitting. Chapter 4 develops a learning framework for GP that is based layer learning and progressive sampling. Chap- ter 4.4. concludes the thesis summarizing the main results and proposing suggestions for future works that extend the research in this thesis. 2 Chapter 1 BACKGROUNDS This chapter describes the representation and the specific al- gorithm components used in the canonical version of GP. The chapter ends with a comprehensive overview of the literature on the approaches used to improve GP generalization ability. 1.1. Genetic Programming The basic algorithm is as follows: 1) Initialise a population of solutions 2) Assign a fitness to each population member 3) While the Stopping criterion is not met 4) Produce new individuals using operators and the existing population 5) Place new individuals into the population 6) Assign a fitness to each population member, and test for the Stopping criterion 7) Return the best fitness found 1.1.1. Representation, Initialization and Operators in GP 1.1.2. Representation GP programs are expressed as expression trees. The variables and constants in the program are called terminals in GP, which are the leaves of the tree. The arithmetic operations are internal nodes (called functions in the GP literature). The sets of allowed functions and terminals together form the primitive symbol set of a GP system. 3 1.1.3. Initializing the Population The ramped half-n-half method is the most commonly used to perform tree initialisation. It was introduced by Koza and proba- bilistically selects between two recursive tree generating methods: Grow and Full. 1.1.4. Fitness and Selection Fitness is the measure used by GP to indicate how well a pro- gram has learned to predict the output(s) from the input(s). Error fitness function and squared error fitness function are used com- monly in GP. The most commonly employed method for selecting individuals in GP is tournament selection. 1.1.5. Recombination • Choose two individuals as parents that is based on repro- duction selection strategy. • Select a random subtree in each parent. The subtrees con- stituting terminals are selected with lower probability than other subtrees. • Swap the selected subtrees between the two parents. The resulting individuals are the children. 1.1.6. Mutation An individual is selected for mutation using fitness proportional selection. A single function or terminal is selected at random from among the set of function and terminals making up the original individual as the point of mutation. The mutation point, along with the subtree stemming from mutation point, is then removed 4 from the tree, and replaced with the new, randomly generated subtree. 1.1.7. Reproduction The reproduction operator copies an individual from one popula- tion into the next. 1.1.8. Stopping Criterion A maximum number of generations usually defines the stopping criterion in GP. However, when it is possible to achieve an ideal individual, this can also stop GP evolutionary process. In this thesis, some other criteria are proposed, such as a measure of generalization loss, a lack of fitness improvement, or a run of gen- erations with over-fitting. 1.1.9. Some Variants of GP The GP community has proposed numerous different approaches to program evolution: Linear GP, Graph Based GP, Grammar Based GP. 1.2. An Example of Problem Domain This thesis uses the 10 polynomials in Table 1.1 with white (Gaus- sian) noise of standard deviation 0.01. It means each target func- tion is F (x) = f (x) + ϵ. 4 more real-world data sets are from the UCI ML repository and StatLib are also used as shown in Table 1.2. 1.3. GP and Machine Learning Issues This section explores some open questions in GP research from ML perspective, which motivates the research in this thesis. 5 1 F 1 (x) = x 4 + x 3 + x 2 + x 2 F 2 (x) = cos(3x) 3 F 3 (x) = √ x 4 F 4 (x) = x 1 x 2 + sin ((x 1 − 1)(x 2 − 1)) 5 F 5 (x) = x 4 1 − x 3 1 + x 2 2 2 − x 2 Friedman1 F 6 (x) = 10 sin (πx 1 x 2 ) + 20(x 3 − 0.5) 2 + 10x 4 + 5x 5 Friedman2 F 7 (x) = √ x 2 1 + (x 2 x 3 − 1 x 2 x 4 ) 2 Gabor F 8 (x) = π 2 e −2(x 2 1 +x 2 2 ) cos[2π(x 1 + x 2 )] Multi F 9 (x) = 0.79 + 1.27x 1 x 2 + 1.56x 1 x 4 + 3.42x 2 x 5 + 2.06x 3 x 4 x 5 3-D Mexican Hat F 1 0(x) = sin( √ x 1 1 +x 2 2 ) √ x 2 1 +x 2 2 Table 1.1: Test Functions Data sets Features Size Source Concrete Compressive Strength 9 1030 UCI Pollen 5 3848 StatLib Chscase.census6 7 400 StatLib No2 8 500 StatLib Table 1.2: The real-world data sets 1.3.1. Search Space GP is a search technique that explores the space of computer pro- grams. Particularly it changes with respect to the size of program. In the implementation of GP the maximum depth of a tree is not the only parameter that limits the search space of GP, an alter- native could be the maximum number of nodes of an individual or both (depth and size). 1.3.2. Bloat In the course of evolution, the average size of the individual in the GP population often get increased largely. Typically the increase in program size is not accompanied by any corresponding increase in fitness. The origin of this phenomenon, which is know as bloat, has effectively been a subject of research for over a decade. 6 1.3.3. Generalization and Complexity Most of research in improving generalization ability of the GP has concentrated on avoiding over-fitting on training data. One of main cause of over-fitting has been identified as the "complex- ity" of the hypothesis generated by the learning algorithm. When GP trees grow up to fit or to specialize on "difficult" learning cases, it will no longer have the ability to generalize further, this is entirely consistent with the principle of Occam’s razor stating that the simple solutions should always be preferred. The rela- tionship between generalization and individual complexity in GP has often been studied in the context of code bloat, which is ex- traordinary enlargement of the complexity of solutions without increasing their fitness. 1.4. Generalization in GP 1.4.1. Overview of Studies on GP Generalization Capability Common to most of the research are the problems with obtaining generalization in GP and the attempts to overcome these prob- lems. These approaches can be categorized as follows: • Using training and testing and in order to promote general- ization in supervised learning problems. • Changing training instances from one generation to another and evaluation of performance based on subsets of training instances are suggested, or consider to combine GP with ensemble learning techniques. • Changing the formal implementation of GP: representation GP, genetic operators and selection strategy. 7 1.4.2. Problem of Improving Training Efficiency To derive and evaluate a model which uses GP as the core learning component we are sought to answer the following questions: 1) Performance: How sensitive are the model to changes of the learning problem, or initial conditions? 2) The effective size of the solution - Complexity: the princi- ple of Occam’s razor states that among being equally, the simpler is more effective. 3) Training time: How long will the learning take, i.e., how fast or slow is the training phase? Chapter 2 EARLY STOPPING, PROGRESSIVE SAMPLING, AND LAYERED LEARNING Early stopping, progressive sampling, and layered learning are the techniques for enhancing the learning performance in ML. They have been used in combination with a number of ML techniques. 2.1. Over-training, Early Stopping and Regularization The overtraining problem is common in the field of ML. This problem is related with learning process of learning machines. The overtraining has been an open topic for discussion motivating the proposal of several techniques like regularization, early stopping. 2.1.1. Over-training When the number of training samples is infinitely large and they are unbiased, the ML system parameters converge to one of the 8 [...]... of training techniques in traditional machine learning for dealing with large training data sets As Provost et al (1999) describe PS, a learning algorithm is trained and retrained with increasingly larger random samples until the accuracy (generalization) of the learnt model stops improving For a training data set of size N , the task of PS is to determine a sampling schedule S = {n0 , n1 , n2 , ,... on the approaches and methods that are currently employed in evolutionary research A particular emphasis is given to one of the sub-area of evolutionary learning research: genetic programming (GP) 2) Generalization performance as one of the performance measure of a learner was shown to be of crucial importance in improving on practices and approaches of evolutionary learning, and in particular of GP... generations (usually at the beginning of the run) where the best individual fitness on the training set has a better validation fitness value than the training one; tbtp stands for "training at best valid point",i.e the training fitness of the individual that has validation fitness equal to btp; Training Fit(g) is a function that returns the best training fitness in the population at generation g; Val... training phase of a learning machine, the generalization error might decrease in an early period reaching a minimum and then increase as the training goes on, while the training error monotonically decreases Therefore, it is considered better to stop training at an adequate time, the class of techniques that are based on this are referred to as early stopping 2.1.3 Regularization Learning Regularization... and Xuan Hoai Nguyen, Learning in Stages: A Layered Learning Approach for Genetic Programming, In The Proceedings of the 9th IEEE-RIVF International Conference on Computing and Communication Technologies, RIVF 2012, Vietnam, 12-16 July 2012 ACM 3 Nguyen Thi Hien, Xuan Hoai Nguyen and Bob McKay, A Study on Genetic Programming with Layered Learning and Incremental Sampling, In Proceedings of World Congress... a problem into subtasks, each of which is then associated with a layer in the problem-solving process LL is a ML paradigm defined as a set of principles for the construction of a hierarchical, learned solution to a complex task 10 Chapter 3 AN EMPIRICAL STUDY OF EARLY STOPPING FOR GP In this chapter we empirically investigate several early stopping criteria for GP learning process The first group of. .. learning/evolutionary process is divided into l layers Learning starts as in standard GP, but only a subset of the training examples are used The next layer commences when the stopping criteria of the current layer are met, using the final population of 17 the previous layer as its starting population However the training sample is incremented with new samples drawn under the same distribution – usually... propose a variant of the second stopping criterion called the true adaptive stopping criteria and its work as in Algorithm 2 (where d is the number of generations those over-fit values of increasing, f g is the generation started the chain of over-fit value not decreasing, lg is the last generation of the chain of over-fit value not decreasing, random is a random value in [0, 1], g is current generation,... were formally and experimentally shown to be efficient for learning algorithms of polynomial time complexity 2.3 Layered Learning The layered learning (LL) paradigm is described formally by Stone and Veloso (2000) Intended as a means for dealing with problems for which finding a direct mapping from inputs to outputs is intractable with existing learning algorithms, the essence of the approach is to... chapter studies in detail the modified GPLL system first proposed in [2], itself an extension of the original GPLL of [3] 21 with ideas from progressive sampling described in chapter 2 This modified GPLL has shown, at one extreme of tuning, very large improvements in search time with a small cost in generalization capability; and at the other, small improvements in generalization, while retaining a . traditional machine learn- ing for dealing with large training data sets. As Provost et al. (1999) describe PS, a learning algorithm is trained and retrained with increasingly larger random samples. a decade. 6 1.3.3. Generalization and Complexity Most of research in improving generalization ability of the GP has concentrated on avoiding over-fitting on training data. One of main cause of. the training phase? Chapter 2 EARLY STOPPING, PROGRESSIVE SAMPLING, AND LAYERED LEARNING Early stopping, progressive sampling, and layered learning are the techniques for enhancing the learning

Ngày đăng: 23/09/2014, 08:29

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan