BUILDING A BOOK RECOGNITION PROGRAM ON ANDROID SMARTPHONE

Thông tin tài liệu

1 ĐẠI HỌC QUỐC GIA HÀ NỘI TRƯỜNG ĐẠI HỌC CÔNG NGHỆ CÔNG TRÌNH DỰ THI GIẢI THƯỞNG “SINH VIÊN NGHIÊN CỨU KHOA HỌC” NĂM 2012 Tên công trình BUILDING A BOOK RECOGNITION PROGRAM ON ANDROID SMARTPHONE Họ tên sinh viên: Hoàng Thanh Tùng, lớp K53CA-KHMT Nguyễn Hữu Cường, lớp K53CA-KHMT Đỗ Tất Thắng, lớp K53CA-KHMT Khoa: Công nghệ thông tin Giáo viên hướng dẫn: TS. Nguyễn Phương Thái HÀ NỘI, 2012 2 Contents Abstract 3 Chapter 1: Introduction 4 1. Objective 4 2. Related works 4 Chapter 2: Approaches to image retrieval 5 1. Image meta search 5 2. Content based image retrieval 5 3. Approach for our system 6 Chapter 3: Image retrieval with OpenCV 7 1. Overview of OpenCV library 7 2. Interest point in Image Retrieval 8 3. Speeded Up Robust Feature 9 3.1. SURF’s properties 9 4. Image search in our system 11 Chapter 4: Building the system 14 1. Overview of system architecture 14 2. Handling client’s requests 15 2.1. Search for book information 15 2.2. Search for related book 16 2.3. Search for nearby bookshop 17 2.4. Rate a book 18 Chapter 5: Experimental result 19 Chapter 6: Future works 21 Chapter 7: Conclusion 22 References 23 Links to software and sites 23 3 Abstract Information is an essential need of human. Searching for information using search engine like Google, Bing and Yahoo is familiar to every people. However, searching in text is boring and sometimes does not give the correct information. Recently, Google have introduced the new image search engine which can search for images similar to the image uploaded by user. We have developed a system which allows people to use images of book cover as queries for information about the book. Our system provides an easier and more interesting way of searching. The system was built for users with mobile devices which have camera function. The system applies the modern technologies of Content Based Image Retrieval to provide a fast and reliable search engine. Experiments on our database show that the system has high accuracy and is robust to many kind graphical deformations. 4 Chapter 1: Introduction 1. Objective Currently, the rapid development of the Internet leads to an exponential growth in the amount of data. Automatic searching and retrieving data from large database is currently one of the most important research fields. Image retrieval (IR) is the problem of finding and retrieving images from digital image database. Traditional methods utilize the metadata associated with images such as captioning, keywords to classify images and perform retrieval task. The metadata are often created manually, these methods, therefore cannot be applied for large database. Content based image retrieval (CBIR) is the whole new approach to IR problem. In CBIR, the image will be classified and retrieved based of the actual content of the image such as lines, colors, shapes, texture and any other information which can be derived from the image. CBIR, thus, can provide better classification and more reliable search result. A CBIR system also eliminates the need of human force in annotating the images. Using CBIR on computer have become familiar for many people while there are quite a few number of CBIR programs for mobile devices. We aim to a simple but helpful CBIR system for portable device users. Our program, however, is not a trivial CBIR system where only the closest matches of the input images are returned. At the time this report is written, a user of the system can take image of book cover, sending them to the server to receive information of the book such as title, author, publisher, price, criticisms. User can also search for related books, bookshops and give opinion about books. The experimental result shows that the accuracy of the system is high for clear and large input image. The result is still acceptable when the image is noisy, small or when only a part of the cover is captured. 2. Related works Google Goggles allows user to search for information about scenes, books, and any objects they see just by taking and uploading photos of the scenes or objects. The program is accurate for high quality input images but the accuracy decreases dramatically when the image is noisy or is taken from different views or in poor lighting condition. Goggles is also much slower than the traditional search engine provided by Google (it could take 10 seconds to complete a search on phone with 3G connection). 5 Chapter 2: Approaches to image retrieval As we have discussed in chapter 1, there are two main approaches to IR problem. The traditional approach use metadata to perform the search while CBIR use information extracted from the image itself. In this chapter we look deeper at each approach to see their advantages and disadvantages. 1. Image meta search In a meta search system, metadata are usually in text form and are indexed and stored in database. The data are external to the images and are added to the images to make the meta search possible. Image search in these systems is performed in the same way as in other text search engines. Input to the system is the description of the input image (the description may be created by user of the system or derived from the context of the image). The search engine compares the description with metadata of images in database to find closest matches and returns the result in descending order of relevancy. One advantage of meta search systems is that we could reuse the powerful text search engines to perform image retrieval. Because indexing and searching in a text database is much faster than that in a multimedia database, this approach has better time performance than the CBIR approach. The most common search engines today including Google, Bing and Yahoo use this approach to provide image search. A big disadvantage of the approach is that the metadata is external to the image and may not precisely describe the actual content of the image. Not good metadata will produce a large number of irrelevant images in the search result. Although many methods for creating the metadata automatically have been proposed (LDA for image retrieval, see [2], [3], [4]), the achieved result did not satisfy high expectation users. The quality of search results relies largely on the quality of descriptions of input which are often created by users. Users may not always give good description to their images; the accuracy of the system therefore decreases accordingly. Furthermore, requiring users to describe the images makes the search more complex and less interesting. Thus, a more accurate and friendly search engine is desirable. 2. Content based image retrieval CBIR approach makes use of modern Computer Vision (CV) techniques to solve the image retrieval problem. Differ from the meta search systems, CBIR systems do not store the metadata of images but the information derived from the images themselves including color, intensity, shapes, textures, lines, interesting points and many other useful information. Different CBIR systems select different features to store and use different algorithm for classifying and searching images. When users want to search for some 6 images, they just need to give their image and the system will automatically detect the relevant features from the image, comparing the information to that of the database images to find the best matches. The results, thus, are graphically related to the input. This help CBIR systems remove a large number of garbage results which are normally produced by meta search system. CBIR systems also allow people to draw an approximation of their image and use that as the input to the search engine. This breaks the limit of traditional IR systems where images can only described by words. CBIR systems, however cannot completely replace the old meta search systems. Current algorithms for extracting visual features from images and searching in database of those features are still very complex in both time and space. As a result, CBIR is not efficient for huge database or system with large number of queries per time interval. Besides, searching for visually related images does not always give good result. When users want to find different images related to some events or people, CBIR is not suitable because the images may be graphically similar but they do not relate to the events or people. 3. Approach for our system As we have discussed, each approach has its own advantages and drawbacks. We have selected CBIR as the method for developing our system. There are a number of reasons for this decision. Firstly, our primary goal is creating an image search program for mobile users so we need an interactive way of searching and sharing information. Searching with text is very common and somewhat boring. With our program, people can take images with their smart phone or digital cameras and use those images to search for necessary information. Secondly, we want to create a system which can give users information about thing that users do not know what it is or how to describe it. This circumstance occurs when people travel to strange place, seeing things that they have never encountered before. Our system can provide reliable information for users by searching for similar images and returning the information associated with those images to users. While there are too many meta search programs, there are only a few number of CBIR programs for mobile devices. Hence, developing a CBIR program for those devices would be promising. Android is currently the most popular operating system for mobile devices such as smartphone and tablet thus we have chosen Android as the platform for client program. 7 Chapter 3: Image retrieval with OpenCV 1. Overview of OpenCV library OpenCV [9] is an open source library for real time computer vision developed by Intel and currently supported by Willow Garage. OpenCV offers many advanced functions for computer vision and image processing. OpenCV is released under the BSD license. The library is available for Linux, Windows, Mac OS and Android platform. Originally written in C but C#, Java, Python, Ruby wrapper for OpenCV are available to users now. According to Willow Garage, OpenCV has over 500 functions with more than 2500 optimized algorithms. OpenCV’s functions can be categorized as follow:  General Image Processing Functions  Image Pyramid  Geometric descriptors  Camera calibration, Stereo, 3D  Fitting  Tracking  Machine learning: detection and recognition  Transforms  Segmentation  Utilities and Data structures  Features 8 Figure 1: Overview of OpenCV’s functions Because of this rich collection of function, OpenCV is used by more than 40,000 people for both academic and commercial purpose. We have used OpenCV for detecting, describing interest points in images and matching those sets of interest point. A set of keypoint carries the information about the image and we can expect that two similar images should have two similar sets of keypoint. Therefore, by comparing the two sets, we can measure the difference between the two images. While OpenCV can detect many types of interest point, we have selected Speeded Up Robust Feature (SURF). The main properties of SURF are given in the next part of this chapter. 2. Interest point in Image Retrieval According to Herbert Bay et al. [1] the process of finding similar images in database consists of three steps. First we need to detect interest points at distinctive location in the image, these points could be corners, blobs or T-junctions. We evaluate most the repeatability of an interest point detector. A good detector should be able to reliably find the same physical interest points under different viewing condition. The next step is describing the neighborhood of the detected interest points by a feature vector. The two most important properties of this feature vector are distinctiveness and robustness. Distinctiveness means that two feature vectors of two different images are different. The 9 feature vector computed from a noisy, transformed version of the image should not be too different from the vector of the original image; this property is called the robustness of the detector. The last step is matching the descriptor vectors of different images. We measure the dissimilarity between the two vectors by the distance between them (the distance could be Mahalanobis or Euclidean distance). Distance between the vectors somehow reflects the distance between images, we need some other mechanism to refine the results and then order to images in database. Due to the curse of dimensionality, matching high dimensional vectors is still a time consuming task. Various techniques have been developed for matching high dimensional vectors. OpenCV provide an approximate but fast algorithm for this problem called Best Bin First [5, 7] which we have used in our program. 3. Speeded Up Robust Feature Because we focused at using SURF for image retrieval, not studying how to detect and describe it, we do not give details about the mathematics foundation and other specialized knowledge. For complete description of SURF, please consult Herbert Bay et al. [1]. 3.1. SURF’s properties SURF was proposed by Herbert et al. in 2008 and since then, it has been used in a wide range of CV application. Performance of SURF can be compared to state of the art detector and descriptor while SURF is much faster. SURF is built on the best to date detector and descriptor (a Hessian matrix based detector and a distribution based descriptor). SURF simplifies the detector and descriptor to achieve high speed while keeps the performance unchanged. As stated by the authors, SURF has high repeating score, distinctiveness and is robust to image deformations. The following figures are taken from Herbert Bay’s paper that show the performance of SURF in some typical benchmark databases. 10 Figure 2: Repeatability score for image rotation of up to 180 degrees. Fast-Hessian is the more accurate detector and is used for SURF detector in OpenCV. [...]... “Artificial Intelligence A Modern Approach” in our database We can notice that 4 of 5 books in the result is the book we want to find 2.2 Search for related book When received information of a book, user may want to find related books in database Server searches for books which have the same author or tags and return them to user Books are manually tagged; tags of a book represent the main content of the book. .. Google Map or make a call to the shop 2.4 Rate a book Users can give their opinion about a book by rating it Server updates rating of the book right after receiving a rate request Figure 14: Interface for rating a book 18 Chapter 5: Experimental result We have run a number of tests to assess our system functionality and performance The hardware configuration of our system is as follow:  Server has an E2180... Allocation [4] Eva Horster, Rainer Lienhart, and Malcolm Slaney Image Retrieval on LargeScale Image Databases [5] David Marshall Nearest Neighbour Searching in High Dimensional Metric Space [6] David Lowe Scale Invariant Feature Transform [7] Haifeng Liu; Deng, M.; Chuangbai Xiao; Coll of Comput Sci & Technol., Beijing Univ of Technol., Beijing, China An improved best bin first algorithm for fast image... software engineering principles The system has achieved positive result on a sample database The method for extending the system was also proposed 22 References [1] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool Speeded-Up Robust Features (SURF) [2] Xiaogang Wang and Eric Grimson Spatial Latent Dirichlet Allocation [3] David M Blei, Andrew Y Ng, and Michael I Jordan Latent Dirichlet Allocation... client program is written in Java on Android platform We use the free MySQL database management system for managing our database The main components of the system are shown in figure 7 Socket interface Client Program request request Controller result result request Active Thread search request result Inactive thread search result Search engine DBMS Figure 7: System architecture A typical scenario in... and noisy images of rotated, folded, resized book covers in various lighting conditions and backgrounds Complete test data can be downloaded from project home page [10] 20 Chapter 6: Future works When the database is large, comparing input image to every image in database is impossible We proposed a method for fast finding the best matches of input image Because images in database are characterized by... for related book, search for nearby bookshops and rate a book Details about the requests are given in the next part of this chapter 2 Handling client’s requests As mentioned in the previous part, server has to handle the following types of request: 2.1 Search for book information User uploads his/her image of the book cover to search for information about the book Server receives the request and passes... to an idle thread The thread extracts the image and passes it to Search engine Search engine searches for the best matches of that image The thread looks for the corresponding information of those best matches and returns it to user Figure 8: Realization of use case “Search for book information” 15 Figure 9: Some images of book covers used in our test Figure 10: Result of searching for the book “Artificial... 11: Realization of use case “Search for related book 2.3 Search for nearby bookshop Server has to give user information about bookshops that sell the book user want to buy If there is no shop selling the book, server returns a default list of shops Figure 12: Realization of use case “Search for nearby shop” 17 Figure 13: Result of finding bookshops User can either view the position of the shop on Google... in the second image The result of matching keypoints and descriptor of two images is stored in a vector of DMatch class Each instance of DMatch class contains the indices and the distance between the two keypoints The smaller the distance is the more similar two keypoints are Figure 5 also shows that the number of matched pair can be used to measure the similarity between images The distance and the . 5 books in the result is the book we want to find. 2.2. Search for related book When received information of a book, user may want to find related books in database. Server searches for books. user. Books are manually tagged; tags of a book represent the main content of the book. 17 Figure 11: Realization of use case “Search for related book 2.3. Search for nearby bookshop. different request to the system including: search for book information, search for related book, search for nearby bookshops and rate a book. Details about the requests are given in the next

Ngày đăng: 12/04/2014, 15:39

Xem thêm: BUILDING A BOOK RECOGNITION PROGRAM ON ANDROID SMARTPHONE, BUILDING A BOOK RECOGNITION PROGRAM ON ANDROID SMARTPHONE

BUILDING A BOOK RECOGNITION PROGRAM ON ANDROID SMARTPHONE

Thông tin tài liệu

Từ khóa liên quan

Mục lục

Abstract

Chapter 1: Introduction

1. Objective

2. Related works

Chapter 2: Approaches to image retrieval

1. Image meta search

2. Content based image retrieval

3. Approach for our system

Chapter 3: Image retrieval with OpenCV

1. Overview of OpenCV library

2. Interest point in Image Retrieval

3. Speeded Up Robust Feature

3.1. SURF’s properties

4. Image search in our system

Chapter 4: Building the system

1. Overview of system architecture

2. Handling client’s requests

2.1. Search for book information

2.2. Search for related book

2.3. Search for nearby bookshop

2.4. Rate a book

Chapter 5: Experimental result

Chapter 6: Future works

Chapter 7: Conclusion

References

Links to software and sites

Tài liệu cùng người dùng

Tài liệu liên quan