PDF slides - Faculty of Computer Science

CSCI 3136
Principles of Programming Languages
Syntactic Analysis and Context-Free Grammars - 2
Summer 2013
Faculty of Computer Science
Dalhousie University
1 / 13
Language defined by a CFG
L(G ) = {w |S ⇒∗ w }
If G is a grammar, the language of the grammar, denoted as L(G ),
is the set of terminal strings that have derivations from the start
symbol.
• Languages defined by CFGs are context-free languages
• Example: Our ‘Jim-ate-cheese’ grammar generates the
following language:
L(G ) =
{“Jim ate cheese”, “Jim ate Jim”, “cheese ate cheese”,
“cheese ate Jim”, “big Jim ate cheese”,
“big Jim ate Jim”, “big cheese ate cheese”, · · · }
2 / 13
CF Languages: Example (1)
G = ({S}, {0, 1}, {S → 0S1|}, S)
• Is in L(G)?
• Is 01 in L(G)?
• Is 0011 in L(G)?
• Is 0n 1n in L(G)?
What language is defined by the following CFG?
S →
S → 0S1
3 / 13
CF Languages: Example (2)
What language is defined by the following CFG?
S →
S → 0S0
S → 1S1
Neither of these two languages is regular.
There are more context-free languages than regular ones.
4 / 13
Classes of Chomsky Hierarchy
5 / 13
Regular grammar (1)
A right regular grammar (or right linear grammar) is a formal
grammar (N, Σ, P, S) such that all the production rules in P are of
one of the following forms:
1. B → a, where B ∈ N and a ∈ Σ
2. B → aC , where B, C ∈ N and a ∈ Σ
3. B → , where B ∈ N and denotes the empty string
In a left regular grammar (or left linear grammar), all rules obey
the forms:
1. B → a, where B ∈ N and a ∈ Σ
2. B → Ca, where B, C ∈ N and a ∈ Σ
3. B → , where B ∈ N and denotes the empty string
A (context-free) grammar is regular if it is a left or right regular
grammar.
Regular grammars are too weak to express programming languages.
6 / 13
Regular grammar (2)
Is the following grammar (with start non-terminal S) regular or
context-free?
S → 0S
S → 1A
A→
A → 2A
If it is a regular grammar then what is the equivalent regular
expression? Otherwise, write down the context-free language
defined by the above grammar.
7 / 13
Regular grammar (3)
What language is defined by the following grammar (with start
non-terminal S)?
S → 0A
A → S1
S →
Is the language defined by the above grammar regular or
context-free? Prove your answer.
8 / 13
Parse Trees
Every derivation can be represented by a parse tree:
———————–
S →PV P
P→N
P →AP
A → big|green
N → cheese|Jim
V → ate
————————
• S ⇒ P V P ⇒ N V P ⇒ N V N ⇒ Jim V N
⇒ Jim ate N ⇒ Jim ate cheese
S
P
V
P
N
ate
N
cheese
Jim
• The root is S
• The children of each node are the symbols (terminals and
non-terminals) it is replaced with.
• The internal nodes are non-terminals. The leaves–called the
yield of the parse-tree–are terminals.
9 / 13
Parse Tree (another example)
———————–
S →PV P
P→N
P →AP
A → big|green
N → cheese|Jim
V → ate
————————
• S ⇒ P V P ⇒ A P V P ⇒ big P V P ⇒
big N V P ⇒ big Jim V P ⇒ big Jim ate P
⇒ big Jim ate A P ⇒ big Jim ate green P
⇒ big Jim ate green N
⇒ big Jim ate green cheese
S
P
V
P
A
P
big
N
ate
Jim
10 / 13
A
P
green
N
cheese
Ambiguity: Example
A CFG for
arithmetic
expressions
———————–
E →n
E →i
E →E +E
E →E −E
E →E ∗E
E → E /E
E → (E )
————————
2+3∗4
E
E
* E
E
E
E +
E + E
2
4
E * E
2
3
4
11 / 13
3
Violates precedence rules.
Ambiguity
• There may be more than one parse tree for the same sentence
generated by a grammar G. If this is the case, we call G
ambiguous.
• Problems of ambiguous grammar: one sentence has different
interpretations.
12 / 13
Some facts about multiple representations
• There are infinitely many context-free grammars generating a
given context-free language. (How?)
• There may be more than one parse tree for the same sentence
generated by a grammar.
• For programming languages, we require that CFG is
unambiguous.
13 / 13