Chapter 13: Grammatical Format

Topics
Regular grammars
l-production removal
Unit production removal
Chomsky normal form


Regular grammars

Think about the following grammar and transition graph:
S ® aaM | bS 
M ® aF | bS
F ® bb | bF | l
corresponding to the grammar derivation:

S Þ bS Þ baaM Þ baaaF Þ baaabb

of the string baaabb

is the TG sequence of state-transitions:

S, b-S, aa-M, a-F, bbb-final

accepting the string baaabb

Notice:

Note: for our CFG, every production is one of the two forms: Nonterminal ® (string of terminals)(Nonterminal)
Nonterminal ® (string of terminals)
(where "string of terminals" can be empty)

Because such a CGF generates a regular language, it is called a regular grammar.

With a little study, you can see from the illustration:
(1) given a regular grammar, how to build a TG from it
(2) given a TG, how to build a regular grammar for it

We now have five equivalent representations for regular languages:
 

regular expressions finite automata non-deterministic FA transition diagrams regular grammars


l-production removal But note: Notation: if A ÞÞ x (A derives x in 0 or more steps), we write A Þ* x
Terminology: if A Þ* l, (A derives the null string), we call A a nullable nonterminal

Algorithm for removing l-productions

We will use the following grammar as an example:

S à BCD
B à C | l
C à aD | bD
D à BE
E à l
Part 1: find all nullable nonterminals
step: for our grammar:
1. Let NN = {} NN = {}
2. if X à l is a production, add X to NN NN = {B, E}
3. If Y à A1A2 … An is a production, and A1,A2, … An are all in NN, 
then add Y to NN
NN = {B, E, D}
4. Repeat 3 for all productions until nothing new can be added to NN NN = {B, E, D}
 so we have B Þ* l,  E Þ* l,  D Þ* l

Part 2: eliminate l -productions from the grammar

1. Remove all l -productions
2. Examine each remaining production, focusing on the nullable nonterminals on the RHS.

Here it is for our example:
production nullable
nonterminals
omit
subset
add
production
S à BCD B, D B S à CD
    D S à BC
    B, D S à C
B à C none    
C à aD D D C à a
C à bD D D C à b
D à BE B, E B D à E
    E D à B
note: we don’t omit based on the subset B, E because that would introduce D àl

Note that for a given production:

New grammar:
S à BCD
S à CD 
S à BC 
S à
B à
C à aD | bD
C à a
C à b
D à BE
D à B
D à E

Note:

Original grammar Revised grammar
S à BCD
B à C | l
C à aD | bD
D à BE
E à l
S à BCD | CD | BC | C
B à
C à aD | bD | a | b
D à B

Assertion: the revised grammar will generate the same language as the original. Why?

Consider the original production S à BCD


Unit production removal

In some contexts it is helpful to eliminate unit productions, those of the form

Nonterminal à Nonterminal

Let’s illustrate the method on the following grammar:

S à A | bb
A à B | b
B à S | a
Step 1: for each nonterminal X, find all other nonterminals Y such that X Þ* Y in one or more steps:
S Þ* A  S Þ* B A Þ* B  A Þ* S B Þ* S B Þ* A

Step 2: if X Þ * Y, and Y à s1 | s2 | … | sn are the non-unit productions for Y, add X à s1 | s2 | … | sn to the productions.
 
S Þ* A S Þ* B A Þ* B A Þ* S B Þ* S B Þ* A
add S à b add S à a add A à a add A à bb B à bb add B à b

Step 3: remove the unit productions

Our final grammar:

S à bb | a | b
A à b | a | bb
B à a | bb | b

Chomsky normal form

Any CFG has an equivalent* CFG (Chomsky normal form) for which all productions are one of two types:
 

(1)  A à BC B, C nonterminals
(2) B à x terminal
 
* with one exception: if l is in the language of the original CFG, it will disappear from the language of the "equivalent" CFG

Step 1: eliminate l-productions and unit productions
Step 2: illustrated here for the production:

A à AaBb

(part A) replace this production by:
A à AXBY, X à a, Y à b (X, Y new nonterminals)

(part B) replace A à AXBY by:
A à AP, P à XQ, Q à BY (P, Q new nonterminals)

Hosted by www.Geocities.ws

1