CS 164 Programming Language and Compilers Spring 2014 Section 2: Finite Automata and JLex GSI: Peter Xi (2/5) and Wontae Choi (2/6) • Announcement – WA2 and PA3 are out next Monday (2/10). – WA1 due next Monday (2/10) at 10:30am. – PA2 due next Tuesday (2/11) at 11:59pm. • Outline 1. Understanding finite automata. 2. Introduction to JLex. 3. WA1 and PA2 QA. 1 Finite Automata Finite automata1 can be used to implement regular languages. Finite automaton is defined as tuple (Σ, S, n, F, δ) where Σ is the alphabet, S is the set of state, n ∈ S is the start state, F ⊆ S is the set of accepting states, and δ : S × S is the transition relation. Theorem 1. Theorem: A language is regular if and only if it is accepted by a finite automaton. 1. Miscellaneous questions: • How does NFA differ from a DFA? • When does NFA and DFA accept its input? • Which is more powerful? • Why would you choose one over the other? 2. What language is accepted by the following DFA? Answer: Binary strings of even length. 3. What language is accepted by the following NFA? Answer:Binary strings ending in 01 or 010 1 Automata is plural. The singular form is automaton 1 4. Construct DFA and NFA for binary strings with different first and last bits. Answer: 0,1 0 1 1 0 1 1 0 0 0 1 1 0 0,1 1 NFA 0 DFA 5. Convert regular expression a(bb|cc)∗ to a NFA using Thompson construction. 6. Convert the NFA from Exercise 5 to a DFA using subset construction. Answer: a 0 3 E E 1 b b 5 7 E E 2 E 4 c 9 c 6 8 E 10 E E 'E' is an epsilon Thompson construction {5} b {1,2,7,9,10} b b c a {0} b {1,2,3,4} b b {6} {1,2,8,9,10} Subset construction 7. (Challenge) : (a) Construct NFA for decimal numbers divisible by 3. (Leading zeroes are okay.) 2 (b) Construct NFA for binary numbers divisible by 5. (Leading zeros are okay.) (c) Write a regular expression for the language in each of Exercise (a) and (b). Answer: Very complex. See http://www.andrew.cmu.edu/user/ko/pdfs/lecture-5. pdf for one algorithm to convert NFAs to regular expressions. 2 Introduction to JLex JLex is an automatic tool that will spit out Java code for a scanner (lexer) for you, given a bunch of regular expressions. JLex take regular expressions, convert them interally to a DFA, and then emit code for that DFA. 1. JLex syntax for regular expressions: a "a" \a . a* a+ a? a|b (a) ^a a$ [ab] [a-z] [^a] a{m,n} {xx} literal a. an "a", even if a is an operator. an \a, even if a is an operator. all character not equal to \n (new line). 0 or more a’s. 1 or more a’s. 0 or 1 a’s. an a or b. an a. an a at the beginning of a line. an a at the end of a line. a or b. all characters between a and z. all characters not equal to a. m through n occurrences of a. the translation of xx from the definitions section. 3 2. JLex input format: [headers for lexer, such as <class def> and <import>] %% [JLex directives] %% [lexer rules] 3. JLex example: toy.lex class Toy{ public static void main(String argv[]) throws java.io.IOException{ Yylex yy = new Yylex(System.in); yy.yylex(); } } class Yytoken{ /* dummy */ } %% %{ private int count = 0; private void out(String message){ System.out.println(message); count++; } %} DIGIT ID [0-9] [a-z][a-z0-9]* %state COMMENT %% <YYINITIAL> <YYINITIAL> <YYINITIAL> <YYINITIAL> <YYINITIAL> <YYINITIAL> <YYINITIAL> exit "/*" {DIGIT}+ {DIGIT}+"."{DIGIT}* {ID} "+"|"-"|"*"|"/" [ \t\n]+ { { { { { { { return new Yytoken(); } yybegin(COMMENT); } out("int: " + yytext()); } out("float: " + yytext()); } out("identifier: " + yytext()); } out("operator: " + yytext()); } /* eat up whitespace */ } <YYINITIAL> . { out("unrecognized character: " + yytext()); } <COMMENT> [^*]* <COMMENT> "*"+[^*/]* <COMMENT> "*"+"/" { /* skip */ } { /* skip */ } { yybegin(YYINITIAL); } This lexer works! Give it a try. 4 % jlex -o toy.java toy.lex % javac toy.java % java Toy Some notes: • The funny COMMENT section defines a start condition–that is, a special context for defining a separate set of rules. Here, we switch into the COMMENT context when consuming comment text. • Be careful! Your lexer will need to handle nested comments. • Skim the Jlex manual for more detail on all of these features. 4. Questions: (a) How do we handle ambiguity? (b) How do we handle errors? Further Readings • Finite automata – Dragon Book (Compilers: Principles, Techniques, and Tools), Ch. 3.6-3.7 – Regular Expression Matching Can Be Simple and Fast (http://swtch.com/~rsc/regexp/ regexp1.html) • Lexical analyzer generators – Jlex manual (http://www.cs.princeton.edu/~appel/modern/java/JLex/) – The original Lex23 and Flex45 online manual (http://dinosaur.compilertools.net/) – RE2C : a very fast lexical analyzer generator. (http://re2c.org) 2 Eric Schmidt (Google) developed Lex is a Java version of Lex. 4 Vern Paxon (UC Berkeley Professor, Computer Security) developed Flex 5 Jflex is a java version of Flex. 3 Jlex 5
© Copyright 2024 ExpyDoc