INTRODUCTION TO IMAGE PROCESSING AND COMPUTER VISION

Thông tin tài liệu

INTRODUCTION TO IMAGE PROCESSING AND COMPUTER VISION tài liệu xử lý ảnh

INTRODUCTION TO IMAGE PROCESSING AND COMPUTER VISION Knowledge Discovery and Data Mining Contents Preface Overview References Chapter Image Presentation 1.1 Visual Perception 1.2 Color Representation 1.3 Image Capture, Representation and Storage Chapter Statistical Operations 2.1 Gray-level Transformation 2.2 Histogram Equalization 2.3 Multi-image Operations Chapter Spatial Operations and Transformations 3.1 3.2 3.3 3.4 Spatial Dependent Transformation Templates and Convolutions Other Window Operations Two-dimensional geometric transformations Chapter Segmentation and Edge Detection 4.1 4.2 4.3 4.4 4.5 4.6 Region Operations Basic Edge detection Second-order Detection Pyramid Edge Detection Crack Edge Relaxation Edge Following Chapter Morphological and Other Area Operations 5.1 Morphological Defined 5.2 Basic Morphological Operations 5.3 Opening and Closing Operators Chapter Finding Basic Shapes 6.1 Combining Edges 6.2 Hough Transform Knowledge Discovery and Data Mining 6.3 6.4 6.5 6.6 Bresenham’s Algorithms Using Interest points Problems Exercies Chapter Reasoning, Facts and Inferences 7.1 7.2 7.3 7.4 7.5 7.6 Introduction Fact and Rules Strategic Learning Networks and Spatial Descriptors Rule Orders Exercises Chapter Object Recognition 8.1 8.2 8.3 8.4 8.5 8.6 Introduction System Component Complexity of Object Recognition Object Representation Feature Detection Recognition Strategy 8.7 Verification 8.8 Exercises Chapter The Frequency Domain 9.1 9.2 9.3 9.4 9.5 Introduction Discrete Fourier Transform Fast Fourier Transform Filtering in the Frequency Domain Discrete Cosine Transform Chapter 10 Image Compression 10.1 Introduction to Image Compression 10.2 Run Length Encoding 10.3 Huffman Coding 10.4 Modified Huffman Coding 10.5 Modified READ 10.6 LZW 10.7 Arithmetic Coding 10.8 JPEG 10.9 Other state-of-the-art Image Compression 10.10 Exercise Methods Knowledge Discovery and Data Mining Preface The field of Image Processing and Computer Vision has been growing at a fast pace The growth in this field has been both in breadth and depth of concepts and techniques Computer Vision techniques are being applied in areas ranging from medical imaging to remote sensing, industrial inspection to document processing, and nanotechnology to multimedia databases This course aims at providing fundamental techniques of Image Processing and Computer Vision The text is intended to provide the details to allow vision algorithms to be used in practical applications As in most developing field, not all aspects of Image Processing and Computer Vision are useful to the designers of a vision system for a specific application A designer needs to know basic concept and techniques to be successful in designing or evaluating a vision system for a particular application The text is intended to be used in an introductory course in Image Processing and Computer Vision at the undergraduate or early graduate level and should be suitable for students or any one who uses computer imaging with no priori knowledge of computer graphics or signal processing But they should have a working knowledge of mathematics, statistical methods, computer programming and elementary data structures The selected books used to design this course are followings: Chapter is with material from [2] and [5], Chapter 2, 3, and are with [1], [2], [5] and [6], Chapters is with [3], Chapter is with [1], [2], Chapter is with [1], Chapter is with [4], Chapter and 10 are with [2] and [6] Knowledge Discovery and Data Mining Overview Chapter Image Presentation This chapter considers how the image is held and manipulated inside the memory of a computer Memory models are important because the speed and quality of imageprocessing software is dependent on the right use of memory Most image transformations can be made less difficult to perform if the original mapping is carefully chosen Chapter Statistical Operation Statistical techniques deal with low-level image processing operations The techniques (algorithms) in this chapter are independent of the position of the pixels The levels processing to be applied on an image in a typical processing sequence are low first, then medium, then high Low level processing is concerned with work at the binary image level, typically creating a second "better" image from the first by changing the representation of the image by removing unwanted data, and enhancing wanted data Medium-level processing is about the identification of significant shapes, regions or points from the binary images Little or no prior knowledge is built to this process so while the work may not be wholly at binary level, the algorithms are still not usually application specific High level preprocessing interfaces the image to some knowledge base This associates shapes discovered during previous level of processing with known shapes of real objects The results from the algorithms at this level are passed on to non image procedures, which make decisions about actions following from the analysis of the image Spatial Operations and Transformations This chapter combines other techniques and operations on single images that deal with pixels and their neighbors (spatial operations) The techniques include spatial filters (normally removing noise by reference to the neighboring pixel values), weighted averaging of pixel areas (convolutions), and comparing areas on an image with known pixel area shapes so as to find shapes in images (correlation) There are also discussions on edge detection and on detection of "interest point" The operations discussed are as follows  Spatially dependent transformations  Templates and Convolution  Other window operations  Two-dimensional geometric transformations Segmentation and Edge Detection Segmentation is concerned with splitting an image up into segments (also called regions or areas) that each holds some property distinct from their neighbor This is an essential part of scene analysis  in answering the questions like where and how large is the object, Knowledge Discovery and Data Mining where is the background, how many objects are there, how many surfaces are there Segmentation is a basic requirement for the identification and classification of objects in scene Segmentation can be approached from two points of view by identifying the edges (or lines) that run through an image or by identifying regions (or areas) within an image Region operations can be seen as the dual of edge operations in that the completion of an edge is equivalent to breaking one region onto two Ideally edge and region operations should give the same segmentation result: however, in practice the two rarely correspond Some typical operations are:  Region operations  Basic edge detection  Second-order edge detection  Pyramid edge detection  Crack edge detection  Edge following Morphological and Other Area Operations Morphology is the science of form and structure In computer vision it is about regions or shapes  how they can be changed and counted, and how their areas can be evaluated The operations used are as follows  Basic morphological operations  Opening and closing operations  Area operations Finding Basic Shapes Previous chapters dealt with purely statistical and spatial operations This chapter is mainly concerned with looking at the whole image and processing the image with the information generated by the algorithms in the previous chapter This chapter deals with methods for finding basic two-dimensional shapes or elements of shapes by putting edges detected in earlier processing together to form lines that are likely represent real edges The main topics discussed are as follows  Combining edges  Hough transforms  Bresenham’s algorithms  Using interest point  Labeling lines and regions Reasoning, Facts and Inferences This chapter began to move beyond the standard “image processing” approach to computer vision to make statement about the geometry of objects and allocate labels to them This is enhanced by making reasoned statements, by codifying facts, and making judgements based on past experience This chapter introduces some concepts in logical reasoning that relate specifically to computer vision It looks more specifically at the “training” aspects of reasoning systems that use computer vision The reasoning is the highest level of computer vision processing The main tiopics are as follows: Knowledge Discovery and Data Mining     Facts and Rules Strategic learning Networks and spatial descriptors Rule orders Object Recognition An object recognition system finds objects in the real world from an image of the world, using object models which are known a priori This chapter will discussed different steps in object recognition and introduce some techniques that have been used for object recognition in many applications The architecture and main components of object recognition are presented and their role in object recognition systems of varying complexity will discussed The chapter covers the following topics:       System component Complexity of object recognition Object representation Feature detection Recognition strategy Verification The Frequency Domain Most signal processing is done in a mathematical space known as the frequency domain In order to represent data in the frequency domain, some transforms are necessary The signal frequency of an image refers to the rate at which the pixel intensities change The high frequencies are concentrated around the axes dividing the image into quadrants High frequencies are noted by concentrations of large amplitude swing in the small checkerboard pattern The corners have lower frequencies Low spatial frequencies are noted by large areas of nearly constant values The chapter covers the following topics  The Harley transform  The Fourier transform  Optical transformations  Power and autocorrelation functions  Interpretation of the power function  Application of frequency domain processing 10 Image Compression Compression of images is concerned with storing them in a form that does not take up so much space as the original Compression systems need to get the following benefits: fast operation (both compression and unpacking), significant reduction in required memory, no significant loss of quality in the image, format of output suitable for transfer or storage Each of this depends on the user and the application The topics discussed are as foloows     Introduction to image compression Run Length Encoding Huffman Coding Modified Huffman Coding Knowledge Discovery and Data Mining      Modified READ Arithmetic Coding LZW JPEG Other state-of-the-art image compression methods: Fractal and Wavelet compression References Low, A Introductory Computer Vision and Image Processing McGraw-hill, 1991, 244p ISBN 0077074033 Randy Crane, A simplied approach to Image Processing: clasical and modern technique in C Prentice Hall, 1997, ISBN 0-13-226616-1 Parker J.R., Algorithms for Image Processing and Computer Vision, Wiley Computer Publishing, 1997, ISBN 0-471-14056-2 Ramesh Jain, Rangachar Kasturi, Brian G Schunck, Machine Vision, McGraw-hill, ISBN 0-07-032018-7, 1995, 549p, ISBN0-13-226616-1 Reihard Klette, Piero Zamperoni, Handbook of Processing Operators, John Wisley & Sons, 1996, 397p, ISBN 471 95642 John C Cruss, The Image Processing Handbook, CRC Press, 1995, ISBN 0-84932516-1 IMAGE PRESENTATION 1.1 Visual Perception When processing images for a human observer, it is important to consider how images are converted into information by the viewer Understanding visual perception helps during algorithm development Image data represents physical quantities such as chromaticity and luminance Chromaticity is the color quality of light defined by its wavelength Luminance is the amount of light To the viewer, these physical quantities may be perceived by such attributes as color and brightness How we perceive color image information is classified into three perceptual variables: hue, saturation and lightness When we use the word color, typically we are referring to hue Hue distinguishes among colors such as green and yellow Hues are the color sensations reported by an observer exposed to various wavelengths It has been shown that the predominant sensation of wavelengths between 430 and 480 nanometers is blue Green characterizes a broad range of wavelengths from 500 to 550 nanometers Yellow covers the range from 570 to 600 nanometers and wavelengths over 610 nanometers are categorized as red Black, gray, and white may be considered colors but not hues Saturation is the degree to which a color is undiluted with white light Saturation decreases as the amount of a neutral color added to a pure hue increases Saturation is often thought of as how pure a color is Unsaturated colors appear washed-out or faded, saturated colors are bold and vibrant Red is highly saturated; pink is unsaturated A pure color is 100 percent saturated and contains no white light A mixture of white light and a pure color has a saturation between and 100 percent Lightness is the perceived intensity of a reflecting object It refers to the gamut of colors from white through gray to black; a range often referred to as gray level A similar term, brightness, refers to the perceived intensity of a self-luminous object such as a CRT The relationship between brightness, a perceived quantity, and luminous intensity, a measurable quantity, is approximately logarithmic Contrast is the range from the darkest regions of the image to the lightest regions The mathematical representation is Contrast  I max  I I max  I where Imax and Imin are the maximum and minimum intensities of a region or image High-contrast images have large regions of dark and light Images with good contrast have a good representation of all luminance intensities As the contrast of an image increases, the viewer perceives an increase in detail This is purely a perception as the amount of information in the image does not increase Our perception is sensitive to luminance contrast rather than absolute luminance intensities 1.2 Color Representation A color model (or color space) is a way of representing colors and their relationship to each other Different image processing systems use different color models for different reasons The color picture publishing industry uses the CMY color model Color CRT monitors and most computer graphics systems use the RGB color model Systems that must manipulate hue, saturation, and intensity separately use the HSI color model Human perception of color is a function of the response of three types of cones Because of that, color systems are based on three numbers These numbers are called tristimulus values In this course, we will explore the RGB, CMY, HSI, and YCbCr color models There are numerous color spaces based on the tristimulus values The YIQ color space is used in broadcast television The XYZ space does not correspond to physical primaries but is used as a color standard It is fairly easy to convert from XYZ to other color spaces with a simple matrix multiplication Other color models include Lab, YUV, and UVW All color space discussions will assume that all colors are normalized (values lie between and 1.0) This is easily accomplished by dividing the color by its maximum value For example, an 8-bit color is normalized by dividing by 255 RGB The RGB color space consists of the three additive primaries: red, green, and blue Spectral components of these colors combine additively to produce a resultant color The RGB model is represented by a 3-dimensional cube with red green and blue at the corners on each axis (Figure 1.1) Black is at the origin White is at the opposite end of the cube The gray scale follows the line from black to white In a 24-bit color graphics system with bits per color channel, red is (255,0,0) On the color cube, it is (1,0,0) Blue=(0,0,1) Magenta=(1,0,1) Black=(0,0,0) Red=(1,0,0) Cyan=(0,1,1) White=(1,1,1) Green=(0,1,0) Yellow=(1,1,0) Figure 1.1 RGB color cube The RGB model simplifies the design of computer graphics systems but is not ideal for all applications The red, green, and blue color components are highly correlated This makes it difficult to execute some image processing algorithms Many processing techniques, such as histogram equalization, work on the intensity component of an image only These processes are easier implemented using the HSI color model Many times it becomes necessary to convert an RGB image into a gray scale image, perhaps for hardcopy on a black and white printer To convert an image from RGB color to gray scale, use the following equation: BA 257 AB AB 258 BAA A 259 ABA AA 260 AA This algorithm compresses repetitive sequences of data well Since the codewords are 12 bits, any single encoded character will expand the data size rather than reduce it This is always seen in the early stages of compressing a data set with LZW In this example, 72 bits are represented with 72 bits of data (compression ratio of 1) After a reasonable string table is built, compression improves dramatically During compression, what happens when we have used all 4096 locations in our string table? There are several options The first would be to simply forget about adding any more entries and use the table as is Another would be to clear entries 256-4095 and start building the tree again Some clever schemes clear those entries and rebuild a string table from the last N input characters N could be something like 1024 The UNIX compress utility constantly monitors the compression ratio and when it dips below the set threshold, it resets the string table One advantage of LZW over Huffman coding is that it can compress the input stream in one single pass It requires no prior information about the input data stream The string table is built on the fly during compression and decompression Another advantage is its simplicity, allowing fast execution As mentioned earlier, the GIF image file format uses a variant of LZW It achieves better compression than the technique just explained because it uses variable length codewords Since the table is initialized to the first 256 single characters, only one more bit is needed to create new string table indices Codewords are nine bits wide until entry number 511 is created in the string table At this point, the length of the codewords increases to ten bits The length can increase up to 12 bits As you can imagine, this increases compression but adds complexity to GIF encoders and decoders GIF also has two specially defined characters A clear code is used to reinitialize the string table to the first 256 single characters and codeword length to nine bits An end-of information code is appended to the end of the data stream This signals the end of the image 10.7 Arithmetic Coding Arithmetic coding is unlike all the other methods discussed in that it takes in the complete data stream and outputs one specific codeword This codeword is a floating point number between and The bigger the input data set, the more digits in the number output This unique number is encoded such that when decoded, it will output the exact input data stream Arithmetic coding, like Huffman, is a two-pass algorithm The first pass computes the characters' frequency and generates a probability table The second pass does the actual compression The probability table assigns a range between and to each input character The size of each range is directly proportional to a characters' frequency The order of assigning these ranges is not as important as the fact that it must be used by both the encoder and decoder The range consists of a low value and a high value These parameters are very important to the encode/decode process The more frequently occurring characters are assigned wider ranges in the interval requiring fewer bits to represent them The less likely characters are assigned more narrow ranges, requiring more bits With arithmetic coding, you start out with the range 0.01.0 (Figure 10.9) The first character input will constrain the output number with its corresponding range The range of the next character input will further constrain the output number The more input characters there are, the more precise the output number will be Figure 10.9 Assignment of ranges between and Suppose we are working with an image that is composed of only red, green, and blue pixels After computing the frequency of these pixels, we have a probability table that looks like Pixel Red Probability 0.2 Assigned Range [0.0,0.2) Green 0.6 [0.2,0.8) Blue 0.2 [0.8,1.0) The algorithm to encode is very simple LOW 0 HIGH 1.0 WHILE not end of input stream get next CHARACTER RANGE = HIGH  LOW HIGH = LOW + RANGE * high range of CHARACTER LOW = LOW + RANGE * low range of CHARACTER END WHILE output LOW Figure 10.10 shows how the range for our output is reduced as we process two possible input streams 0.0 1.0 0.8 0.2 RED GREEN BLUE RED GREEN RED GREEN BLUE BLUE a 0.0 RED 0.8 0.2 GREEN RED 1.0 BLUE GREEN BLUE b Figure 10.10 Reduced output range: (a) Green-Green-Red; (b) Green-Blue-Green Let's encode the string ARITHMETIC Our frequency analysis will produce the following probability table Symbol Probability Range A 0.100000 0.000000 - 0.100000 C 0.100000 0.100000 - 0.200000 E 0.100000 0.200000 - 0.300000 H 0.100000 0.300000 - 0.400000 I 0.200000 0.400000 - 0.600000 M 0.100000 0.600000 - 0.700000 R 0.100000 0.700000 - 0.800000 T 0.200000 0.800000 - 1.000000 Before we start, LOW is and HIGH is Our first input is A RANGE =  = HIGH will be (0 + 1) x 0.1 = 0.1 LOW will be (0 + l) x = These three calculations will be repeated until the input stream is exhausted As we process each character in the string, RANGE, LOW, and HIGH will look like A range = 1.000000000 low = 0.0000000000 high = 1000000000 R range =0.100000000 low=0.0700000000 high = 0.0800000000 I range =0.010000000 low=0.0740000000 high = 0.0760000000 T range = 0.002000000 low = 0.0756000000 high = 0.0760000000 H range = 0.000400000 low = 0.0757200000 high = 0.0757600000 M range = 0.000000000 low = 0.0757440000 high = 0.0757480000 E range = 0.000004000 low = 0.0757448000 high = 0.0757452000 T range = 0.000000400 low = 0.0757451200 high = 0.0757452000 I range = 0.000000080 low = 0.0757451520 high = 0.0757451680 C range = 0.0000000 16 low = 0.0757451536 high = 0.0757451552 Our output is then 0.0757451536 The decoding algorithm is just the reverse process get NUMBER DO find CHARACTER that has HIGH > NUMBER and LOW

Ngày đăng: 02/04/2014, 00:33

Xem thêm: INTRODUCTION TO IMAGE PROCESSING AND COMPUTER VISION, INTRODUCTION TO IMAGE PROCESSING AND COMPUTER VISION

INTRODUCTION TO IMAGE PROCESSING AND COMPUTER VISION

Thông tin tài liệu

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan