Chapter
12: Context-Free Grammars
Topics
Context-free grammars
How do you use a CFG to generate a sentence?
Parse trees
Lukasiewicz (Polish) notation
Evaluating a formula written in prefix notation
Ambiguity
Context-free
grammars
Context-free grammars (CFG’s):
-
play a role like that of RE’s: generate the sentences* of a language
-
but CFG's are more powerful than RE’s
-
CFG’s can generate non-regular languages (e.g. anbn)
-
are the standard device for defining computer languages
*standard terminology sentence will be used to stand for what the
book calls "word in the language"
Example CFG:
| S ® aSb |
read: "S goes to aSb" or
"S becomes aSb" or "S produces aSb" |
| S ® l |
l is the empty string, as usual |
Terminology:
-
this notation is called Backus-Naur Form (BNF)
-
example shows two productions (or re-writing rules)
-
each production is separated by a ® into:
-
a left-hand side (LHS) and
-
a right-hand side (RHS)
-
the lower case letters (a, b) are called terminal symbols
-
the upper-case letters (S) are called non-terminal symbols
-
each LHS is a single non-terminal symbol
-
each RHS is a string of terminals and/or non-terminals, or l
-
S is called the sentence symbol, and must appear at least once as
a LHS
-
terminal symbols (lower-case letters) are the alphabet for the generated
language
If there are multiple productions with the same LHS, we usually group them
using the alternation sign "|", e.g.:
S ® aSb | l
How
do you use a CFG to generate a sentence?
| Start with the sentence symbol |
S |
|
| Add a Þ (the
derives symbol) |
S Þ |
|
| Replace a non-terminal by a RHS whose LHS is
that non-terminal, and add it: |
S Þ aSb |
we used production 1 to replace S by aSb |
| Repeat as often as you can: |
S Þ aSb Þ
aaSbb |
production 1 |
| |
S Þ aSb Þ
aaSbb Þ aabb |
production 2 |
| When there are no non-terminals to be replaced,
you have a sentence in the language generated by the grammar |
We can, of course, continue to replace S by aSb for as many steps as
we please. The language generated?
anbn (!!)
More terminology:
S Þ aSb Þ
aaSbb Þ aabb is called a derivation
of the string aabb
we say "S derives aSb derives aaSbb derives aabb"
S, aSb, aaSbb, aabb are called sentential forms (book: "working
string"), because they can all be derived from S in 0 or more steps
Parse
trees
This is an alternative scheme for showing the derivation of a sentence,
resembling "diagramming (English) sentences"
Derivation: S Þ aSb Þ
aaSbb Þ aabb
Parse tree:
It’s not too hard to see how the tree represents a derivation. Notice
that the sentence is just the terminal symbols at the leaves of the tree,
read from left to right (ignore l ).
Lukasiewicz
(Polish) notation
Our book shows us a grammar for producing parse trees of the following
form:
 |
This is sometimes referred to as an abstract syntax tree, and
represents the algebraic expression
(3 + 1)*5
Notice:
-
the expression needs the parentheses
-
3 + 1*5 is not the same!
-
the tree, by its structure,
-
indicates that the 3 + 1 must be done before
its result (4) can be combined with the 5 to get 20
|
The abstract syntax tree becomes a device for writing arithmetic expressions
without parentheses, by traversing the tree, as follows:
Trace the tree, top to bottom, left to right, recording symbols as you
go (but not re-recording any you have already seen on the way down). This
is called a pre-order traversal. For our tree:
*+315
This style of representation of an expression is called (Polish) prefix
notation.
Standard (infix) notation: 3*(1 + 5)
Prefix notation: *+315 (parenthesis-free)
Notice, for prefix notation,
-
the operands 3,1,5 are in the same order
-
but there are no parens
Evaluating
a formula written in prefix notation
| 1. locate the rightmost operator and the two
operands to its right |
+**2-*32421 |
| 2. replace that substring by its value
*32 = 3*2 = 6
|
+**2-6421 |
| repeat 1 |
+**2-6421 |
| repeat 2 |
+**2221 |
| repeat 1 and 2 |
+*421 |
| repeat 1 and 2 |
+81 |
| repeat 1 and 2 |
9 |
|
you are done when all you have left is a single result
|
|
if you get something like: 43+9, step 1 cannot be performed, and
the expression was not valid
|
Now:
-
create an expression by tracing the tree as usual
-
but write down the operators on the way up (the second time you see them)
-
you are performing a post-order traversal …
-
generating polish postfix or reverse polish notation.
Ambiguity
Consider:
S ® aSb
S ® aaSbb
S ® l
and the string: aabb
Two parse trees:
and:
When any sentence has more than one parse tree, we say the grammar is
ambiguous. Grammars for computer languages are constructed
so as to be unambiguous.