Tài liệu Regular Expressions Cookbook, 2nd Edition docx

612 3.3K 2
Tài liệu Regular Expressions Cookbook, 2nd Edition docx

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

www.it-ebooks.info www.it-ebooks.info SECOND EDITION Regular Expressions Cookbook Jan Goyvaerts and Steven Levithan Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo www.it-ebooks.info Regular Expressions Cookbook, Second Edition by Jan Goyvaerts and Steven Levithan Copyright © 2012 Jan Goyvaerts, Steven Levithan. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Editor: Andy Oram Production Editor: Holly Bauer Copyeditor: Genevieve d’Entremont Proofreader: BIM Publishing Services Indexer: BIM Publishing Services Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Rebecca Demarest August 2012: Second Edition. Revision History for the Second Edition: 2012-08-10 First release See http://oreilly.com/catalog/errata.csp?isbn=9781449319434 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Regular Expressions Cookbook, the image of a musk shrew, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information con- tained herein. ISBN: 978-1-449-31943-4 [LSI] 1344629030 www.it-ebooks.info Table of Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1. Introduction to Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Regular Expressions Defined 1 Search and Replace with Regular Expressions 6 Tools for Working with Regular Expressions 8 2. Basic Regular Expression Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.1 Match Literal Text 28 2.2 Match Nonprintable Characters 30 2.3 Match One of Many Characters 33 2.4 Match Any Character 38 2.5 Match Something at the Start and/or the End of a Line 40 2.6 Match Whole Words 45 2.7 Unicode Code Points, Categories, Blocks, and Scripts 48 2.8 Match One of Several Alternatives 62 2.9 Group and Capture Parts of the Match 63 2.10 Match Previously Matched Text Again 66 2.11 Capture and Name Parts of the Match 68 2.12 Repeat Part of the Regex a Certain Number of Times 72 2.13 Choose Minimal or Maximal Repetition 75 2.14 Eliminate Needless Backtracking 78 2.15 Prevent Runaway Repetition 81 2.16 Test for a Match Without Adding It to the Overall Match 84 2.17 Match One of Two Alternatives Based on a Condition 91 2.18 Add Comments to a Regular Expression 93 2.19 Insert Literal Text into the Replacement Text 95 2.20 Insert the Regex Match into the Replacement Text 98 2.21 Insert Part of the Regex Match into the Replacement Text 99 2.22 Insert Match Context into the Replacement Text 103 iii www.it-ebooks.info 3. Programming with Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Programming Languages and Regex Flavors 105 3.1 Literal Regular Expressions in Source Code 111 3.2 Import the Regular Expression Library 117 3.3 Create Regular Expression Objects 119 3.4 Set Regular Expression Options 126 3.5 Test If a Match Can Be Found Within a Subject String 133 3.6 Test Whether a Regex Matches the Subject String Entirely 140 3.7 Retrieve the Matched Text 144 3.8 Determine the Position and Length of the Match 151 3.9 Retrieve Part of the Matched Text 156 3.10 Retrieve a List of All Matches 164 3.11 Iterate over All Matches 169 3.12 Validate Matches in Procedural Code 176 3.13 Find a Match Within Another Match 179 3.14 Replace All Matches 184 3.15 Replace Matches Reusing Parts of the Match 192 3.16 Replace Matches with Replacements Generated in Code 197 3.17 Replace All Matches Within the Matches of Another Regex 203 3.18 Replace All Matches Between the Matches of Another Regex 206 3.19 Split a String 211 3.20 Split a String, Keeping the Regex Matches 219 3.21 Search Line by Line 224 3.22 Construct a Parser 228 4. Validation and Formatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 4.1 Validate Email Addresses 243 4.2 Validate and Format North American Phone Numbers 249 4.3 Validate International Phone Numbers 254 4.4 Validate Traditional Date Formats 256 4.5 Validate Traditional Date Formats, Excluding Invalid Dates 260 4.6 Validate Traditional Time Formats 266 4.7 Validate ISO 8601 Dates and Times 269 4.8 Limit Input to Alphanumeric Characters 275 4.9 Limit the Length of Text 278 4.10 Limit the Number of Lines in Text 283 4.11 Validate Affirmative Responses 288 4.12 Validate Social Security Numbers 289 4.13 Validate ISBNs 292 4.14 Validate ZIP Codes 300 4.15 Validate Canadian Postal Codes 301 4.16 Validate U.K. Postcodes 302 4.17 Find Addresses with Post Office Boxes 303 iv | Table of Contents www.it-ebooks.info 4.18 Reformat Names From “FirstName LastName” to “LastName, FirstName” 305 4.19 Validate Password Complexity 308 4.20 Validate Credit Card Numbers 317 4.21 European VAT Numbers 323 5. Words, Lines, and Special Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 5.1 Find a Specific Word 331 5.2 Find Any of Multiple Words 334 5.3 Find Similar Words 336 5.4 Find All Except a Specific Word 340 5.5 Find Any Word Not Followed by a Specific Word 342 5.6 Find Any Word Not Preceded by a Specific Word 344 5.7 Find Words Near Each Other 348 5.8 Find Repeated Words 355 5.9 Remove Duplicate Lines 358 5.10 Match Complete Lines That Contain a Word 362 5.11 Match Complete Lines That Do Not Contain a Word 364 5.12 Trim Leading and Trailing Whitespace 365 5.13 Replace Repeated Whitespace with a Single Space 369 5.14 Escape Regular Expression Metacharacters 371 6. Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 6.1 Integer Numbers 375 6.2 Hexadecimal Numbers 379 6.3 Binary Numbers 381 6.4 Octal Numbers 383 6.5 Decimal Numbers 384 6.6 Strip Leading Zeros 385 6.7 Numbers Within a Certain Range 386 6.8 Hexadecimal Numbers Within a Certain Range 392 6.9 Integer Numbers with Separators 395 6.10 Floating-Point Numbers 396 6.11 Numbers with Thousand Separators 399 6.12 Add Thousand Separators to Numbers 401 6.13 Roman Numerals 406 7. Source Code and Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 7.1 Keywords 409 7.2 Identifiers 412 7.3 Numeric Constants 413 7.4 Operators 414 7.5 Single-Line Comments 415 Table of Contents | v www.it-ebooks.info 7.6 Multiline Comments 416 7.7 All Comments 417 7.8 Strings 418 7.9 Strings with Escapes 421 7.10 Regex Literals 423 7.11 Here Documents 425 7.12 Common Log Format 426 7.13 Combined Log Format 430 7.14 Broken Links Reported in Web Logs 431 8. URLs, Paths, and Internet Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 8.1 Validating URLs 435 8.2 Finding URLs Within Full Text 438 8.3 Finding Quoted URLs in Full Text 440 8.4 Finding URLs with Parentheses in Full Text 442 8.5 Turn URLs into Links 444 8.6 Validating URNs 445 8.7 Validating Generic URLs 447 8.8 Extracting the Scheme from a URL 453 8.9 Extracting the User from a URL 455 8.10 Extracting the Host from a URL 457 8.11 Extracting the Port from a URL 459 8.12 Extracting the Path from a URL 461 8.13 Extracting the Query from a URL 464 8.14 Extracting the Fragment from a URL 465 8.15 Validating Domain Names 466 8.16 Matching IPv4 Addresses 469 8.17 Matching IPv6 Addresses 472 8.18 Validate Windows Paths 486 8.19 Split Windows Paths into Their Parts 489 8.20 Extract the Drive Letter from a Windows Path 494 8.21 Extract the Server and Share from a UNC Path 495 8.22 Extract the Folder from a Windows Path 496 8.23 Extract the Filename from a Windows Path 498 8.24 Extract the File Extension from a Windows Path 499 8.25 Strip Invalid Characters from Filenames 500 9. Markup and Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 Processing Markup and Data Formats with Regular Expressions 503 9.1 Find XML-Style Tags 510 9.2 Replace <b> Tags with <strong> 526 9.3 Remove All XML-Style Tags Except <em> and <strong> 530 9.4 Match XML Names 533 vi | Table of Contents www.it-ebooks.info 9.5 Convert Plain Text to HTML by Adding <p> and <br> Tags 539 9.6 Decode XML Entities 543 9.7 Find a Specific Attribute in XML-Style Tags 545 9.8 Add a cellspacing Attribute to <table> Tags That Do Not Already Include It 550 9.9 Remove XML-Style Comments 553 9.10 Find Words Within XML-Style Comments 558 9.11 Change the Delimiter Used in CSV Files 562 9.12 Extract CSV Fields from a Specific Column 565 9.13 Match INI Section Headers 569 9.14 Match INI Section Blocks 571 9.15 Match INI Name-Value Pairs 572 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 Table of Contents | vii www.it-ebooks.info www.it-ebooks.info [...]... that needs to search through or manipulate text Regular expressions are an excellent tool for the job Regular Expressions Cookbook teaches you everything you need to know about regular expressions You don’t need any prior experience whatsoever, because we explain even the most basic aspects of regular expressions If you do have experience with regular expressions, you’ll find a wealth of detail that... with Regular Expressions | 7 www.it-ebooks.info replacement syntax is basically the same Ruby 1.9 only adds support for named backreferences in the replacement text Named capture is a new feature in Ruby 1.9 regular expressions Tools for Working with Regular Expressions Unless you have been programming with regular expressions for some time, we recommend that you first experiment with regular expressions. .. give you a basis for using regular expressions; each of the subsequent chapters presents a variety of regular expressions while investigating one area of text processing in depth Chapter 1, Introduction to Regular Expressions, explains the role of regular expressions and introduces a number of tools that will make it easier to learn, create, and debug them Chapter 2, Basic Regular Expression Skills,... Chapter 2, Basic Regular Expression Skills, covers each element and feature of regular expressions, along with important guidelines for effective use It forms a complete tutorial to regular expressions Chapter 3, Programming with Regular Expressions, specifies coding techniques and includes code listings for using regular expressions in each of the programming languages covered by this book Chapter... manipulating or extracting text on a computer, a firm grasp of regular expressions will save you plenty of overtime Many Flavors of Regular Expressions All right, the title of the previous section was a lie We didn’t define what regular expressions are We can’t There is no official standard that defines exactly which text patterns are regular expressions and which aren’t As you can imagine, every designer... breaks,” where other regex flavors use ‹(?s)› Regular Expressions Defined | 5 www.it-ebooks.info Search and Replace with Regular Expressions Search-and-replace is a common job for regular expressions A search-and-replace function takes a subject string, a regular expression, and a replacement string as input The output is the subject string with all matches of the regular expression replaced with the replacement... popular regular expression flavors very valuable We organized the whole book as a cookbook, so you can jump right to the topics you want to read up on If you read the book cover to cover, you’ll become a world-class chef of regular expressions This book teaches you everything you need to know about regular expressions and then some, regardless of whether you are a programmer If you want to use regular expressions. .. ability to search or filter through their data using a regular expression Regular expressions are everywhere Many books have been published to ride the wave of regular expression adoption Most do a good job of explaining the regular expression syntax along with some examples and a reference But there aren’t any books that present solutions based on regular expressions to a wide range of real-world practical... The sample regexes in this chapter and Chapter 2 are plain regular expressions that don’t contain the extra escaping that a programming language (even a Unix shell) requires You can type these regular expressions directly into an application’s search box Chapter 3 explains how to mix regular expressions into your source code Quoting a literal regular expression as a string makes it even harder to read,... implementing regular expressions It has the unique ability to emulate all the regular expression flavors discussed in this book, and even convert among the different flavors RegexBuddy was designed and developed by Jan Goyvaerts, one of this book’s authors Designing and developing RegexBuddy made Jan an expert on regular expressions, and using RegexBuddy helped get coauthor Steven hooked on regular expressions . text. Regular expressions are an excellent tool for the job. Regular Expressions Cookbook teaches you everything you need to know about regular expressions. . tu- torial to regular expressions. Chapter 3, Programming with Regular Expressions, specifies coding techniques and includes code listings for using regular expressions

Ngày đăng: 16/02/2014, 13:20

Từ khóa liên quan

Mục lục

  • Table of Contents

  • Preface

    • Caught in the Snarls of Different Versions

    • Intended Audience

    • Technology Covered

    • Organization of This Book

    • Conventions Used in This Book

    • Using Code Examples

    • Safari® Books Online

    • How to Contact Us

    • Acknowledgments

    • Chapter 1. Introduction to Regular Expressions

      • Regular Expressions Defined

        • Many Flavors of Regular Expressions

        • Regex Flavors Covered by This Book

        • Search and Replace with Regular Expressions

          • Many Flavors of Replacement Text

          • Tools for Working with Regular Expressions

            • RegexBuddy

            • RegexPal

            • RegexMagic

            • More Online Regex Testers

              • RegexPlanet

              • regex.larsolavtorvik.com

              • Nregex

              • Rubular

Tài liệu cùng người dùng

Tài liệu liên quan