Nghiên cứu và áp dụng kỹ thuật tự động hóa tiên tiến vào tóm tắt tự động văn bản

10 220 0
Nghiên cứu và áp dụng kỹ thuật tự động hóa tiên tiến vào tóm tắt tự động văn bản

Đang tải... (xem toàn văn)

Thông tin tài liệu

VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY DO THUY DUONG RESEARCH AND APPLY EVOLUTIONARY COMPUTATION TECHNIQUES ON AUTOMATIC TEXT SUMMARIZATION MASTER THESIS IN INFORMATION TECHNOLOGY HANOI - 2015 VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY DO THUY DUONG RESEARCH AND APPLY EVOLUTIONARY COMPUTATION TECHNIQUES ON AUTOMATIC TEXT SUMMARIZATION Field: Information technology Major: Software Engineering Code: 60480103 MASTER THESIS IN INFORMATION TECHNOLOGY SUPERVISOR: Assoc Prof Nguyen Xuan Hoai HANOI - 2015 Declaration of authorship I, Do Thuy Duong, declare that this thesis ‘Research and apply evolutionary computation techniques on automatic text summarization’ and the work presented in it are my own I confirm that: This work was done wholly or mainly while in candidature for a research degree at this University; Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated; Where I have consulted the published work of others, this is always clearly attributed; I have acknowledged all main sources of help; Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself; Signed: …………………………………………………………………………………… Date: …………………………………………………………………………………… Acknowledgements I am heartily thankful to my supervisor, Prof Nguyen Xuan Hoai, whose encouragement, guidance and support from the initial to the final level have enabled me to develop an understanding of the topic I would like to show my gratitude to the teachers in the University of Engineering and Technology, Vietnam National University, Hanoi for helping me to gain a large body of knowledge during my two years of studying Lastly, I offer my regards and blessings to my friends and my family, who have always encouraged me so that I could finish this challenging research 5 Contents Declaration of authorship Acknowledgements Contents List of figures List of tables Chapter Introduction 1 Motivation Research Objectives 10 Thesis overview 10 Chapter 11 Background knowledge 11 Automatic text summarization 11 2.1.1 Definition 11 2.1.2 Types of text summarization 12 2.1.3 Methodologies for automatic text summarization 15 2 Evolutionary computation 16 Differential evolution (DE) 19 Conclusion 26 Chapter 27 Automatic text summarization using differential evolution algorithm 27 Automatic text summarization using differential evolution (DE) 27 3.1.1 Document collection representation 27 3.1.2 Objective/ Fitness function 28 3.1.3 Main steps of differential evolution 30 3.1.4 Experiment, result and discussion 35 3.2.1 Method 40 3.2.2 Experiment, result and discussion 42 3 Improvement 40 Conclusion 46 Chapter 47 Conclusion and future work 47 Contributions 47 Future work 47 Reference 48 List of figures Figure 2.1 A typical summarization system 12 Figure 2.2 A summarizer highlights all sentences included in an extractive summary 13 Figure 2.3 An example of the abstract summary 14 Figure 2.4 Multi-document summarization 15 Figure 2.5 The general scheme of an Evolutionary Algorithm in pseudo-code 17 Figure 2.6 General scheme of evolutionary algorithms 18 Figure 2.7 Correlation between number of generations and best fitness in population 19 Figure 2.8 Steps of differential evolution algorithm 20 Figure 2.9 Steps to get the next X1 (generation 1) 25 Figure 3.1 Illustration of mutation operation 32 Figure 3.2 Illustration of crossover operation 33 Figure 3.3 Changes in summary length in [DE] method on DUC2004 38 Figure 3.4 Changes in summary length in [DE] method on DUC2007 39 Figure 3.5 Summary length in [MultiDE] method on DUC2004 43 Figure 3.6 Summary length in [MultiDE] method on DUC2007 43 Figure 3.7 Comparison between F-values of [DE] and [MultiDE] on DUC2004 45 Figure 3.8 Comparison between F-values of [DE] and [MultiDE] on DUC2007 46 List of tables Table 2.1 The basic evolutionary computation linking natural evolution to problem solving 17 Table 2.2.Fitness of six individuals at generation 22 Table 2.3 Creation of mutant vector V1 23 Table 2.4 Creation of trial vector Z1 23 Table 2.5 Values of X1 in generation 24 Table 3.1 Description of the datasets used in the experiment 35 Table 3.2 Parameter settings of the first experiment 37 Table 3.3 Summary lengths of some document collections in DUC2004 using [DE] method 38 Table 3.4 Summary lengths of some document collections in DUC2007 using [DE] method 40 Table 3.5 F-Values of three evaluation measures of method [DE] on DUC2004 and DUC2007 40 Table 3.6 Parameter settings of the second experiment 42 Table 3.7 Summary lengths of some document collections in DUC2004 using [MultiDE] method 44 Table 3.8 Summary lengths of some document collections in DUC2007 using [MultiDE] method 44 Table 3.9 F-Values of three evaluation measures of method [MultiDE] on DUC2004 and DUC2007 45 Chapter Introduction Automatic text summarization means detecting important and condensed contents in one or more documents This is a very challenging problem, relating to many scientific areas such as artificial intelligence, statistics, linguistics, etc Many researches have been conducted world wide since 1950 and produced some systems such as SUMMARIST, SweSUM, MEAD, SUMMON, etc However, this research area is still challenging and attracts more and more attention In this thesis, we are going to study some evolutionary computation techniques, then apply the differential evolution algorithm to the practical problem: automatic text summarization, in particular, multi-document summarization Moreover, we also attempt to deal with constraint on the summary length that has not been handled effectively in these stochastic popular-based methods 1 Motivation Evolutionary computation techniques use different algorithms to evolve a population of individuals over a certain number of generations These population are applied with operations on such as mutation, crossover and selection to reproduce new offspring, which then compete with each other and the previous generation to survive based on some evaluation function The process ends when a stopping criteria is reached and we found the best individual – the best solution to our real-world problem Evolutionary algorithms have been applied to solve numerous problems in various fields, one of which is automatic text summarization However, we have found it has a weak point in handling the summary length, not like other sentence ranking methods Therefore, this research attempts to improve this aspect of these algorithms 10 Research Objectives The thesis is aimed to study evolutionary computation techniques, especially the differential evolution algorithm, and its application to the problem of automatic text summarization We find the limitation of other researchers’ ways to handle the summary length of this algorithm, then propose a new method to manage this length constraint satisfying users’ demand, but still keep the quality of the summary Thesis overview The rest of this thesis is organized as follows In chapter 2, we review the background knowledge of text summarization, its classification and introduce the main principles of evolutionary computation In particular, the differential evolution algorithm is discussed Chapter explains in details the above algorithm when applied to automatic text summarization, in our case it is on multi-document collections Then, an experiment is performed to test the original differential evolution algorithm Besides, we improve the result of the previous experiment, dealing with the summary length so that the document collection is compressed quickly and effectively Chapter will recapitulate the thesis, present our contributions and state some future research directions in this field

Ngày đăng: 16/11/2016, 22:16

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan