Lexical and syntax analysis

29 80 0
Lexical and syntax analysis

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Chapter 4 Lexical and Syntax Analysis ISBN 0-321-33025-0 Chapter 4 Topics • Introduction Introduction • Lexical Analysis hbl •T h e Parsing Pro bl em • Recursive-Descent Parsing • Bottom-Up Parsing Copyright © 2006 Addison-Wesley. All rights reserved. 1-2 Introduction • Language implementation systems must analyze • Language implementation systems must analyze source code, regardless of the specific implementation approach implementation approach • Nearly all syntax analysis is based on a formal description of the syntax of the source language (BNF) Copyright © 2006 Addison-Wesley. All rights reserved. 1-3 Using BNF to Describe Syntax • Provides a clear and concise syntax description Provides a clear and concise syntax description • The parser can be based directly on the BNF Parsers based on BNF are easy to maintain • Parsers based on BNF are easy to maintain Copyright © 2006 Addison-Wesley. All rights reserved. 1-4 Syntax Analysis • The syntax analysis portion of a language processor nearly always consists of two parts: – A low-level p art called a lexical anal y zer p y (mathematically, a finite automaton based on a re g ular g rammar ) gg ) – A high-level part called a syntax analyzer, or parser (mathematically a push - down automaton parser (mathematically , a push down automaton based on a context-free grammar, or BNF) Copyright © 2006 Addison-Wesley. All rights reserved. 1-5 Reasons to Separate Lexical and Syntax Analysis Analysis • Simplicity - less complex approaches can be • Simplicity less complex approaches can be used for lexical analysis; separating them simplifies the parser simplifies the parser • Efficiency - separation allows optimization of the lexical analyzer • Portabilit y - p arts of the lexical anal y zer ma y y pyy not be portable, but the parser always is portable Copyright © 2006 Addison-Wesley. All rights reserved. 1-6 portable Lexical Analysis • A lexical analyzer is a pattern matcher for character strings • A lexical analyzer is a “ front - end ” for the parser A lexical analyzer is a front end for the parser • Identify substrings of the source program that bl t th l b e l ong t oge th er l exemes – Lexemes match a character pattern, which is associated with a lexical category called a token – sum is a lexeme; its token may be IDENT Copyright © 2006 Addison-Wesley. All rights reserved. 1-7 Example sum = oldsum – value / 100; Token Lexeme IDENT sum ASSIGN_OP = IDENT oldsum SUBSTRACT_OP – IDENT value DIVISION OP DIVISION _ OP / INT_LIT 100 SEMICOLON Copyright © 2006 Addison-Wesley. All rights reserved. 1-8 SEMICOLON ; Lexical Analysis (cont.) • The lexical analyzer is usually a function that is ll d b th h it d th t t k ca ll e d b y th e parser w h en it nee d s th e nex t t o k en • The lexical analysis process also: – Includes skipping comments, tabs, newlines, and blanks Il f dfi d ( i – I nserts l exemes f or user- d e fi ne d names ( str i ngs, identifiers, numbers) into the symbol table Saves source locations (file line column) for error – Saves source locations (file , line , column) for error messages – Detects and reports syntactic errors in tokens such Copyright © 2006 Addison-Wesley. All rights reserved. 1-9 – Detects and reports syntactic errors in tokens , such as ill-formed floating-point literals, to the user Pragmas • Provide directives or hints to the compiler • Directives: • Directives: – Turn various kinds of run-time checks on or off – Turn certain code improvements on or off (performance vs il ti d) comp il a ti on spee d) – Turn performance profiling on or off •Hints: – Variable x is very heavily used (to keep it in a register) – Subroutine S is not recursive (its storage may be statically allocated) allocated) – 32 bits of precision (instead of 64) suffice for floating-point variable x Le ical anal sis is responsible for (often) dealing ith Copyright © 2006 Addison-Wesley. All rights reserved. 1-10  Le x ical anal y sis is responsible for (often) dealing w ith pragmas Lexical Analysis (cont.) • Three main approaches to building a scanner: 1. Write a formal description of the tokens and use a software tool that constructs lexical analyzers given such a description 2. Design a state diagram that describes the token patterns and write a program that implements the diagram* 3. Design a state diagram that describes the token patterns and hand-construct a table-driven Copyright © 2006 Addison-Wesley. All rights reserved. 1-11 impementation of the state diagram The “longest possible token” rule • The scanner returns to the parser only when the next character cannot be used to continue the next character cannot be used to continue the current token The next character will generally need to be saved – The next character will generally need to be saved for the next token • In some cases you may need to peek at more • In some cases , you may need to peek at more than one character of look-ahead in order to know whether to proceed know whether to proceed – In Pascal, when you have a 3 and you a see a ‘.’ • do you proceed (in hopes of getting 3.14)? or Copyright © 2006 Addison-Wesley. All rights reserved. 1-12 do you proceed (in hopes of getting 3.14)? or • do you stop (in fear of getting 3 5)? The rule … • In messier cases, you may not be able to get by with any fixed amount of look-ahead. In Fortran, for example, we have DO 5 I = 1,25  loop DO 5 I = 1.25  assignment • Here, we need to remember we were in a p otentiall y final state, and save enou g h py g information that we can back up to it, if we get stuck later Copyright © 2006 Addison-Wesley. All rights reserved. 1-13 State Diagram Design • Suppose we need a lexical analyzer that only recognizes program names, reserved words, and integer literals integer literals • A naïve state diagram would have a transition ftt htith f rom every s t a t e on every c h arac t er i n th e source language - such a diagram would be very large! Copyright © 2006 Addison-Wesley. All rights reserved. 1-14 State Diagram Design (cont.) • In many cases, transitions can be combined to simplify the state diagram – When recognizing an identifier, all uppercase and lowercase letters are equivalent - use a characte r class – When recognizing an integer literal, all digits are equivalent - use a digit class – Reserved words and identifiers can be recognized together (rather than having a part of the diagram Copyright © 2006 Addison-Wesley. All rights reserved. 1-15 for each reserved word) State Diagram Design (cont.) • Convenient utility subprograms: – getChar - gets the next character of input, puts it in global variable nextChar, determines its ldhl lblbl c l ass an d puts t h e c l ass in g l o b a l varia bl e charClass hh f i – addChar -puts t h e c h aracter f rom nextChar i nto the place the lexeme (global variable) is being accumulated accumulated – lookup - determines whether the string in lexeme is a reserved word (returns a code) Copyright © 2006 Addison-Wesley. All rights reserved. 1-16 lexeme is a reserved word (returns a code) State Diagram Copyright © 2006 Addison-Wesley. All rights reserved. 1-17 Lexical Analysis - Implementation int lex() { getChar(); getChar(); switch (charClass) { case LETTER: addChar(); getChar(); while (charClass == LETTER || charClass == DIGIT) { while (charClass == LETTER || charClass == DIGIT) { addChar(); getChar(); } return lookup(lexeme); bk Copyright © 2006 Addison-Wesley. All rights reserved. 1-18 b rea k ; … Lexical Analysis - Implementation case DIGIT: dd h () a dd C h ar () ; getChar(); while (charClass == DIGIT) { while (charClass DIGIT) { addChar(); getChar(); } return INT_LIT; } /* End of switch */ } /* End of function lex() */ Copyright © 2006 Addison-Wesley. All rights reserved. 1-19 A part of a Pascal scanner • We read the characters one at a time with look- hd a h ea d • If it is one of the one-character tokens { ( ) [ ] < > , ; = + - } we announce that token • If it is a ‘.’, we look at the next character – If that is a dot, we announce ‘ ’ – Otherwise, we announce ‘.’ and reuse the look- ahead Copyright © 2006 Addison-Wesley. All rights reserved. 1-20 [...]... for a real number if the character after the is not a digit we return an integer and reuse the and the look-ahead etc Copyright © 2006 Addison-Wesley All rights reserved 1-24 The Parsing Problem • Goals of the parser, given an input program: – Find all syntax errors; for each, produce an appropriate diagnostic message, and recover pp p g g , quickly – Produce the parse tree, or at least a trace of... T  F 0E1+6T9 $ Reduce by E  E + T 0E1 $ Accept Copyright © 2006 Addison-Wesley All rights reserved 1-57 Summary • Syntax analysis is a common part of language implementation • A lexical analyzer is a pattern matcher that isolates small scale small-scale parts of a program – Detects syntax errors – Produces a parse tree p • A recursive-descent parser is an LL parser • Parsing problem for bottom-up... terminating *) otherwise we return a left parenthesis and reuse the look-ahead if it is one of the one-character tokens ([ ] , ; = + - etc.) we return that token if it is a we look at the next character if that is a we return otherwise we return and reuse the look-ahead if it is a < we look at the next character if that is a = we return . 1-5 Reasons to Separate Lexical and Syntax Analysis Analysis • Simplicity - less complex approaches can be • Simplicity less complex approaches can be used for lexical analysis; separating. Chapter 4 Lexical and Syntax Analysis ISBN 0-321-33025-0 Chapter 4 Topics • Introduction Introduction • Lexical Analysis hbl •T h e Parsing Pro bl em • Recursive-Descent. Addison-Wesley. All rights reserved. 1-4 Syntax Analysis • The syntax analysis portion of a language processor nearly always consists of two parts: – A low-level p art called a lexical anal y zer p y (mathematically,

Ngày đăng: 26/01/2015, 10:09

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan