Chapter 24: The Chomsky Hierarchy

Topics
Phrase-structure grammars
Type 0 languages
Type 0 languages = recursively enumerable languages
Context-sensitive grammars and languages
A language that is recursive, but not context-sensitive
The Chomsky hierarchy of grammars/languages


Phrase-structure grammars

We have described two types of grammars:

A phrase structure grammar is defined by:
  1. A finite alphabet of letters called terminal symbols
  2. A finite set of symbols called non-terminal symbols
  3. A special non-terminal S called the sentence symbol
  4. A finite set of productions of the form:
lhs-string ® rhs-string



Type 0 languages

We have already introduced the r.e. languages.

Q: Is there a PSG version of the r.e. languages?
A: Yes!

The Type 0 languages are exactly the (general) phrase structure grammars.

Example:

prod 1,2:       S ® aSBA | abA
3:                 AB ® BA
4:                 bB ® bb
5:                 bA ® ba
6:                 aA ® aa
Notes:
Type 0 languages = recursively enumerable languages

Part 1: Any type 0 language L with grammar G can be recognized by a Turing machine.

The idea: Here's a sketch of our TM:
  1. the tape will be initialized (by the initialization part of the program) with the input string (delimited by $-signs), followed by a string called the current sentential form, initialized to S
  2. the TM will have a subroutine called rewrite (corresponding to the pop state for the CFG-PDA) which will do the following:
    1. (a) look for the LHS of a production in the sentential form, e.g. the AB in AB ® BA
      (b) replace the LHS by the RHS (non-deterministically)
  3. compare the sentential form with the input; if the same, accept, otherwise, continue.
This non-deterministic TM can be converted to a (standard) TM (see Ch. 22)

Part 2: Any TM has a corresponding Type 0 grammar

Proof: See pp. 576-585 (!!!)


Context-sensitive grammars and languages

Between the context-free languages and the r.e. languages sits an important class of languages, called the context sensitive languages.

A PSG whose productions have the property that

length of the LHS £ length of the RHS

is called a context-sensitive grammar (CSG).

The language generated by a CSG is called a context-sensitive language (CSL).

Notes:

linear-bounded automata (LBA)

A language that is recursive, but not context-sensitive

This uses the same "diagonal" argument used to construct L1, a language that is r.e. but not recursive.

  1. Enumerate all CSG's: CSG1, CSG2, …
  2. Enumerate all input strings x1, x2, …
L3 = { xi | xi is not in the language generated by CSGi }
  1. L3 is recursive. Left to the reader.
  2. L3 is not context-free. See the corresponding theorem for L1.

The Chomsky hierarchy of grammars/languages
Language hierarchy
PSG
type
name of 
grammar/
language
production description:
lhs ® rhs
(lhs always has at least one non-terminal)
accepting automaton

 

example
** see note
below
3 regular lhs: one non-
terminal
rhs: wX
w = string term.
X = one non-term
finite automaton a*b*
none deterministic
context-free or
LR(k)
  deterministic
pushdown 
automaton
computer
languages
2 context-free lhs: one non-
terminal
rhs: string of terminals and
non-terms
non-determ.
pushdown
automaton
anbn
1 context-sensitive |lhs| £ |rhs| linear bounded
automaton
anbncn
none recursive   non-looping TM L3
0 recursively
enumerable
no restriction except above (*) TM L2
none not r.e.   none possible L1
* lhs always has at least one non-terminal
** each language is not in the previous class of languages
 
 
 
 
Hosted by www.Geocities.ws

1