A STUDY OF GENERALIZATION IN GENETIC PROGRAMMING

MINISTRY OF EDUCATION AND TRAINING MINISTRY OF NATIONAL DEFENSE MILITARY TECHNICAL ACADEMY NGUYEN THI HIEN A STUDY OF GENERALIZATION IN GENETIC PROGRAMMING THE THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN MATHEMATICS Hanoi – 2014 MINISTRY OF EDUCATION AND TRAINING MINISTRY OF NATIONAL DEFENSE MILITARY TECHNICAL ACADEMY A STUDY OF GENERALIZATION IN GENETIC PROGRAMMING Specialized in: Applied Mathematics and Computer Science Code: 60 46 01 10 THE THESIS IS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN MATHEMATICS ACADEMIC SUPERVISORS : 1. Assoc/Prof. Dr. Nguyen Xuan Hoai 2. Prof. Dr. R.I. (Bob) McKay Hanoi - 2014 i Abstract Genetic Programming (GP) is a meta-heuristic technique that simulates biological evolu- tion. GP differs to other Evolutionary Algorithms (EAs) in that it generates solutions to problems in the form of computer programs. The quality criterion for each individual is often referred to as fitness in EAs and it is with which we determine which individuals shall be selected. GP induces a population of computer programs that improve their fitness automatically as they experience the data on which they are trained. According this, GP can be seen as one of the machine learning methods. Machine learning is a process that trains a system over some training data for capturing the succinct causal relationships among data. The key parts of this process are the ”learning domain”, the ”training set”, the ”learning system” and the ”testing” (validating) of the learnt results. The quality of a learnt solution is its ability to predict the outputs from future unseen data (often simulated on a test set), which is often referred to as ”generalization”. The goal of this thesis is to in- vestigate the aspects that affect the generalization ability of GP as well as techniques from more traditional machine learning literature that help to improve the generalization and learning efficiency of GP. In particular, a number of early stopping criteria were proposed and tested. The early stopping techniques (with the proposed stopping criteria) helped to avoid over-fitting in GP producing much simpler solutions in much shorter training time. Over-fitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. A model which has been over-fit will ii generally have poor predictive performance, as it can exaggerate minor fluctuations in the data. The thesis also proposes a framework to combine progressive sampling, a traditional machine learning technique, with early stopping for GP. The experimental results showed that progressive sampling in combination with early stopping help to further improve the learning efficiency of GP (i.e. to avoid over-fitting and produce simpler solutions in much shorter training time). The early stopping techniques (with the proposed stopping criteria) helped to avoid over-fitting in GP producing much simpler solutions in much shorter training time. iii Acknowledgements The first person I would like to thank is my supervisor for directly guiding me through the PhD progress. He’s enthusiasm is the power source to motivate me in this work. His guide has inspired much of the research in this thesis. I also wish to thank my co-supervisor. Although staying far away, we usually had on- line research chats. Working with him, I have learnt how to do research systematically. In particular, the leaders of the Department of Software Engineering and Faculty of Informa- tion Technology, Military Technical Academy have frequently supported my research with regards to facilities and materials for running the experiments. Dr. Nguyen Quang Uy has discussed a number of issues related to this research with me. I would like to thank him for his thorough comments and suggestions on my research. I also would like to acknowledge the supports from my family, especially my parents, Nguyen Van Hoan and Nguyen Thi Quy, who have worked hard and always believed strongly in their children and to my husband, Nguyen Hai Trieu for sharing happiness and difficulty in the life with me. To my beloved daughter Nguyen Thi Hai Phuong, who was born before this thesis was completed, I would like to express my thanks for being such a good girl always cheering me up. i Contents Abstract ii List of Figures v List of Tables vii Abbreviations ix 0.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 0.2 Research Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 0.3 Contributions of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 3 0.4 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1 Backgrounds 6 1.1 Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.1 Representation, Initialization and Operators in GP . . . . . . . . . 7 1.1.2 Some Variants of GP . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.2 An Example of Problem Domain . . . . . . . . . . . . . . . . . . . . . . . 19 1.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4 GP and Machine Learning Issues . . . . . . . . . . . . . . . . . . . . . . . 21 1.4.1 Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.4.2 Bloat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 ii 1.4.3 Generalization and Complexity . . . . . . . . . . . . . . . . . . . . 22 1.5 Generalization in GP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.5.1 Overview of Studies on GP Generalization Ability . . . . . . . . . . 24 1.5.2 Methods for Improving GP Generalization Ability . . . . . . . . . . 25 1.5.3 Problem of Improving Training Efficiency . . . . . . . . . . . . . . . 33 1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2 Early Stopping, Progressive Sampling, and Layered Learning 36 2.1 Over-training, Early Stopping and Regularization . . . . . . . . . . . . . . 36 2.1.1 Over-training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.1.2 Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.1.3 Regularization Learning . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2 Progressive Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2.1 Types of Progressive Sampling Schedule . . . . . . . . . . . . . . . 44 2.2.2 Detection of Convergence . . . . . . . . . . . . . . . . . . . . . . . 47 2.3 Layered Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3 An Empirical Study of Early Stopping for GP 51 3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.1.1 Proposed Classes of Stopping Criteria . . . . . . . . . . . . . . . . . 53 3.2 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.3.1 Comparison with GP . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.3.2 Comparison with Tarpeian . . . . . . . . . . . . . . . . . . . . . . . 76 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 iii 4 A Progressive Sampling Framework for GP - An Empirical Approach 79 4.1 The Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.1.1 The Sampling Schedule . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.1.2 The Initial Sample Set Size . . . . . . . . . . . . . . . . . . . . . . 84 4.1.3 The Number of Learning Layers . . . . . . . . . . . . . . . . . . . . 86 4.1.4 The Stopping Criterion for Each Layer . . . . . . . . . . . . . . . . 86 4.2 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.2.1 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.2.2 GP Systems Configurations . . . . . . . . . . . . . . . . . . . . . . 87 4.3 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.3.1 Learning Effectiveness Comparison . . . . . . . . . . . . . . . . . . 89 4.3.2 Learning Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.3.3 Solution Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.3.4 Distribution of Best Solutions by Layers . . . . . . . . . . . . . . . 93 4.3.5 Synergies between System Components . . . . . . . . . . . . . . . . 94 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 5 Conclusions and Future Work 99 5.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 5.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Bibliography 104 iv List of Figures 1.1 GP’s main loop [57] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 GP expression tree representing max(x*x,x+3y). [57] . . . . . . . . . . . . . 8 1.3 Creation of a full tree having maximum depth 2 (and therefore a total of seven nodes) using the Full initialisation method (t=time). [57] . . . . . . . 10 1.4 Creation of a five node tree using the Grow initialisation method with a maximum depth of 2 (t=time). A terminal is chosen at t = 2, causing the left branch of the root to be terminated at that point even though the maximum depth has not been reached. [57] . . . . . . . . . . . . . . . . . . 12 1.5 Example of subtree crossover. . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.6 Example of subtree mutation. [57] . . . . . . . . . . . . . . . . . . . . . . . 16 2.1 Idealized training and validation error curves. Vertical axis: errors; horizontal axis:time [88] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.2 A real validation error curve. Vertical: validation set error; horizontal: time (in training epochs) [88]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.3 Learning curves and progressive sample. . . . . . . . . . . . . . . . . . . . 44 2.4 Regions of the Efficiency of Arithmetic and Geometric Schedule Sampling [89]. 46 3.1 Distribution of stop time (last generation of the run) of the OF criterion and GPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 v [...]... 4.11 Average Size of Solutions Found by all Systems 97 viii Abbreviations Abbreviation ARMA Meaning AutoregressiveMoving-Average EA Evolutionary Algorithm DSS Dynamic Subset Selection GA Genetic Algorithms GGGP GP GPLL ILP KDD Grammar Guided Genetic Programming Genetic Programming Layered Learning Genetic Programming Inductive Logic Programming Knowledge Discovery and Data Mining LL Layered... discrimination to obtain parallel classification programs for signals and images The programs in PADO are represented as graphs, although their semantics and execution strategy are very different from those of PDGP 18 1.2 AN EXAMPLE OF PROBLEM DOMAIN 1.1.2.3 Grammar Based GP In Grammar Based GP, each program is a derivation tree generated by a grammar; the grammar, often context-free, defines a language... criterion Generalization is an important issue in ML The generalization ability of learning machine depends on a balance between the information in the training examples, and the complexity of the models Bad 20 1.4 GP AND MACHINE LEARNING ISSUES generalization occurs if • Over-fitting of data: the information does not match the complexity - the training data set is too small, or the complexity of the models... efficient as well In certain applications, the efficiency of the learning or inference algorithm, namely, its space and time complexity, may be as important as its predictive accuracy Therefore, there are several possible criteria for evaluating a learning algorithm as evaluating performance: predictive accuracy of classifier, speed of learner, speed of classifier, space requirements Therein, predictive accuracy... under a special root node that acts as the glue The traditional language used in AI research (e.g., Lisp and Prolog), and a number of modern programming languages (e.g., Ruby and Python), and the languages associated 9 1.1 GENETIC PROGRAMMING with the scientific programming tools (e.g., MATLAB and Mathemtica) have widely been used to implement GP systems They provide automatic garbage collection and dynamic... evolutionary and natural computation, GP plays a special role In the Koza’s seminal book, the ambition of GP was to evolve in a population of programs that can learn automatically from the training data Since its birth, GP has been developed fast with various fruitful applications Some of the developments are in terms of the novel mechanisms regarding to GP evolutionary process, others were on finding potential... population often get increased largely, even at some point it would start growing at a rapid pace Typically the increase in program size is not accompanied by any corresponding increase in fitness The origin of this phenomenon, which is know as bloat [5, 63], has effectively been a subject of research for over a decade In fact that, the increase in size of the individuals with each generation is perfectly reasonable... size, starting from being small, gradually get 3 0.4 ORGANIZATION OF THE THESIS bigger at each layer The termination of training on each layer will be based on certain early stopping criteria 0.4 Organization of the Thesis The organization of the rest of the thesis is as follows: Chapter 1 first introduces the basic components of GP - the algorithm, representation, and operators as well as some benchmark... or a model depending on the type of problem GP is a simple and powerful technique which has been applied to a wide range of problems in combinatorial optimization, automatic programming and model induction The direct encoding of solutions as variable-length computer programs allows GP to provide solutions that can be evaluated and also examined to understand their internal workings In the field of evolutionary... selecting, mating, and offspring producing The offspring replace existing individuals in the same population A maximum number of generations usually defines the stopping criterion in GP However, when it is possible to achieve an ideal individual (with ideal fitness), this can also stop GP evolutionary process In this thesis, some other criteria are proposed, such as a measure of generalization loss, a lack of . learning is a process that trains a system over some training data for capturing the succinct causal relationships among data. The key parts of this process are the ”learning domain”, the ”training set”, the. Selection GA Genetic Algorithms GGGP Grammar Guided Genetic Programming GP Genetic Programming GPLL Layered Learning Genetic Programming ILP Inductive Logic Programming KDD Knowledge Discovery and Data. the training set size, starting from being small, gradually get 3 0.4. ORGANIZATION OF THE THESIS bigger at each layer. The termination of training on each layer will be based on certain early

A STUDY OF GENERALIZATION IN GENETIC PROGRAMMING

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan