Chapter 13: Grammatical Format

Chapter 13: Grammatical Format

Topics
Regular grammars
l-production removal
Unit production removal
Chomsky normal form

Regular grammars

Think about the following grammar and transition graph:

S ® aaM | bS
M ® aF | bS
F ® bb | bF | l

corresponding to the grammar derivation:
S Þ bS Þ baaM Þ baaaF Þ baaabb

of the string baaabb
is the TG sequence of state-transitions:
S, b-S, aa-M, a-F, bbb-final

accepting the string baaabb

Notice:

the TG transition sequence "mimics"
the derivation sequence of productions, and vice-versa
illustrating the "equivalence" of the CFG and the TG
with respect to the language they define
and since any TG is equivalent to a FA
we see that the grammar defines a regular language

Note: for our CFG, every production is one of the two forms: Nonterminal ® (string of terminals)(Nonterminal)
Nonterminal ® (string of terminals) (where "string of terminals" can be empty)

Because such a CGF generates a regular language, it is called a regular grammar.

With a little study, you can see from the illustration:
(1) given a regular grammar, how to build a TG from it
(2) given a TG, how to build a regular grammar for it

We now have five equivalent representations for regular languages:

regular expressions

finite automata

non-deterministic FA

transition diagrams

regular grammars

l-production removal

in the context of computer languages ...
l-productions (productions of the form A ® l )
are useful for writing (and reading) of grammars
but they are a nuisance when it comes to mechanizing syntax analysis
they can cause sentential forms to expand and contract during the course of a derivation.
grammars are easier to mechanize if sentential forms never decrease in size
we present an algorithm for eliminating l -productions

But note:

if the language contains the empty string,
we clearly cannot remove all l-productions
but can remove all except the one production: S àl

Notation: if A Þ … Þ x (A derives x in 0 or more steps), we write A Þ* x
Terminology: if A Þ* l, (A derives the null string), we call A a nullable nonterminal

Algorithm for removing l-productions

We will use the following grammar as an example:

S à BCD
B à C | l
C à aD | bD
D à BE
E à l

Part 1: find all nullable nonterminals

step:	for our grammar:
1. Let NN = {}	NN = {}
2. if X à l is a production, add X to NN	NN = {B, E}
3. If Y à A₁A₂ … A_n is a production, and A₁,A₂, … A_n are all in NN, then add Y to NN	NN = {B, E, D}
4. Repeat 3 for all productions until nothing new can be added to NN	NN = {B, E, D}
so we have B Þ* l, E Þ* l, D Þ* l

Part 2: eliminate l -productions from the grammar

1. Remove all l -productions
2. Examine each remaining production, focusing on the nullable nonterminals on the RHS.

pick a subset of those nullable nonterminals
create a new production from the old one by omitting those nonterminals
but never add a l-production
do this for all subsets

Here it is for our example:

production	nullable nonterminals	omit subset	add production
S à BCD	B, D	B	S à CD
		D	S à BC
		B, D	S à C
B à C	none
C à aD	D	D	C à a
C à bD	D	D	C à b
D à BE	B, E	B	D à E
		E	D à B
note: we don’t omit based on the subset B, E because that would introduce D àl

Note that for a given production:

if there are n nullable nonterminals on the RHS,
but not all symbols on RHS are nullable:

there will be 2ⁿ- 1new productions added

if all symbols on RHS are nullable:

there will be 2ⁿ- 2new productions

New grammar:

S à BCD
S à CD
S à BC
S à C
B à C

C à aD | bD
C à a
C à b
D à BE
D à B
D à E

Note:

D à BE, D à E are useless productions:

RHS have nonterminal (E) not appearing anywhere as a LHS

not harmful, but you might as well eliminate them
so our final grammar is:

S à BCD | CD | BC | C
B à C
C à aD | bD | a | b
D à B

Original grammar	Revised grammar
S à BCD B à C \| l C à aD \| bD D à BE E à l	S à BCD \| CD \| BC \| C B à C C à aD \| bD \| a \| b D à B

Assertion: the revised grammar will generate the same language as the original. Why?

Consider the original production S à BCD

if we remove all l-productions, derivations such as
S Þ BCD Þ BCBE Þ BCE Þ BC
made by applying D à BE, B àl , and then E àl
(thus relying on the nullability of D)
will not be possible
but we can recapture that effect by adding the production S à BC directly
but we also need to add S à CD, and S à C
to recapture all variations on the effects of nullability

Unit production removal

In some contexts it is helpful to eliminate unit productions, those of the form

Nonterminal à Nonterminal

Let’s illustrate the method on the following grammar:

S à A | bb
A à B | b
B à S | a

Step 1: for each nonterminal X, find all other nonterminals Y such that X Þ* Y in one or more steps:

S Þ* A

S Þ* B

A Þ* B

A Þ* S

B Þ* S

B Þ* A

Step 2: if X Þ * Y, and Y à s₁ | s₂ | … | s_n are the non-unit productions for Y, add X à s₁ | s₂ | … | s_n to the productions.

S Þ* A S Þ* B A Þ* B A Þ* S B Þ* S B Þ* A

add S à b add S à a add A à a add A à bb B à bb add B à b

Step 3: remove the unit productions

Our final grammar:

S à bb | a | b
A à b | a | bb
B à a | bb | b

Chomsky normal form

Any CFG has an equivalent* CFG (Chomsky normal form) for which all productions are one of two types:

(1)	A à BC	B, C nonterminals
(2)	B à x	x terminal

* with one exception: if l is in the language of the original CFG, it will disappear from the language of the "equivalent" CFG

Step 1: eliminate l-productions and unit productions
Step 2: illustrated here for the production:

A à AaBb

(part A) replace this production by:
A à AXBY, X à a, Y à b (X, Y new nonterminals)
(part B) replace A à AXBY by:
A à AP, P à XQ, Q à BY (P, Q new nonterminals)

Hosted by www.Geocities.ws