Chapter 24: The Chomsky
Hierarchy
Topics
Phrase-structure grammars
Type 0 languages
Type 0 languages = recursively
enumerable languages
Context-sensitive grammars and
languages
A language that is recursive,
but not context-sensitive
The Chomsky hierarchy of grammars/languages
Phrase-structure
grammars
We have described two types of grammars:
-
regular grammars - generate the regular languages
-
context-free grammars - generate the context-free languages
-
these are specific examples of a general type of grammar
-
known as the phrase structure grammars (PSG's)
-
all are described using productions
-
the "phrases" are the RHS and LHS of productions, just as in natural language:
-
<noun phrase> ® <article> <noun>
-
<noun phrase> ® <preposition> <noun>
-
etc.
A phrase structure grammar is defined by:
-
A finite alphabet of letters called terminal symbols
-
A finite set of symbols called non-terminal symbols
-
A special non-terminal S called the sentence symbol
-
A finite set of productions of the form:
lhs-string ® rhs-string
-
the lhs-string (lefthand-side-string) and rhs-string (righthand-side-string)
are strings of terminal and non-terminal symbols
-
the lhs-string must have at least one non-terminal
-
the rhs-string may be empty
Type
0 languages
We have already introduced the r.e. languages.
Q: Is there a PSG version of the r.e. languages?
A: Yes!
The Type 0 languages are exactly the (general) phrase structure
grammars.
Example:
prod 1,2: S ®
aSBA | abA
3:
AB ® BA
4:
bB ® bb
5:
bA ® ba
6:
aA ® aa
Notes:
-
3 applications of production 1 will create the sentential form: aaaSBABABA
(3 a's and 3 SB's)
-
applying production 2 we get: aaaabABABABA
-
notice: #a's = #A's = #b's and B's =4
-
here's where things get interesting:
-
production 3 lets you change the order of items in the sentential form!!
-
quite unlike context-free grammars
-
one application of prod 3 gets us: aaaabBAABABA
-
another: aaaabBABAABA
-
another: aaaabBBAAABA
-
repeated applications will get us: aaaabBBBAAAA
-
finally, using prod 4,5,6 we get aaaabbbbaaaa = a4b4a4
-
in fact, this grammar generates exactly anbnan
-
a non-context-free language!
Type
0 languages = recursively enumerable languages
Part 1: Any type 0 language L with grammar G can be recognized
by a Turing machine.
-
proof is by construction
-
resembles the construction method used to create a PDA that recognizes
the language generated by a CFG
-
we show how to construct a non-deterministic TM that will recognize
strings from L
The idea:
-
create derivations starting with S, and
-
check at each step to see if the required sentence has been derived
Here's a sketch of our TM:
-
the tape will be initialized (by the initialization part of the program)
with the input string (delimited by $-signs), followed by a string called
the current sentential form, initialized to S
-
the TM will have a subroutine called rewrite (corresponding to the
pop
state for the CFG-PDA) which will do the following:
(a) look for the LHS of a production in the sentential form, e.g. the
AB in AB ® BA
(b) replace the LHS by the RHS (non-deterministically)
-
compare the sentential form with the input; if the same, accept, otherwise,
continue.
This non-deterministic TM can be converted to a (standard) TM (see
Ch. 22)
Part 2: Any TM has a corresponding Type 0 grammar
Proof: See pp. 576-585 (!!!)
Context-sensitive
grammars and languages
Between the context-free languages and the r.e. languages sits an important
class of languages, called the context sensitive languages.
A PSG whose productions have the property that
length of the LHS £ length of
the RHS
is called a context-sensitive grammar (CSG).
The language generated by a CSG is called a context-sensitive language
(CSL).
Notes:
-
context-sensitive refers to the fact that some CSL productions,
e.g. BAC ® BXYC
-
impose the condition that A can be replaced by XY only if it is in the
context B..C
-
the rewrite rules consider context
-
like natural languages, and
-
unlike computer languages
-
the CSL's have their own machines, which can be thought of as TM's with
a bounded-length tape, called
linear-bounded automata (LBA)
-
there are recursive languages not recognized by LBA
-
there are CSL's not recognized by PDA
-
the CFL's are proper subset of the CSL's
-
the CSL's are a proper subset of the recursive languages
A
language that is recursive, but not context-sensitive
This uses the same "diagonal" argument used to construct L1,
a language that is r.e. but not recursive.
-
Enumerate all CSG's: CSG1, CSG2, …
-
Enumerate all input strings x1, x2, …
L3 = { xi | xi is not in the language
generated by CSGi }
-
L3 is recursive. Left to the reader.
-
L3 is not context-free. See the corresponding theorem for L1.
The
Chomsky hierarchy of grammars/languages
-
the final cast of language classes:
-
some generated by PSG's (Chomsky hierarchy)
-
these are types 0,1,2,3 seen below
-
others are not described by grammar type
-
each class is a proper subset of the next
Language hierarchy
PSG
type |
name of
grammar/
language |
production description:
lhs ® rhs
(lhs always has at least one non-terminal) |
accepting automaton
|
example
** see note
below |
| 3 |
regular |
lhs: one non-
terminal |
rhs: wX
w = string term.
X = one non-term |
finite automaton |
a*b* |
| none |
deterministic
context-free or
LR(k) |
|
deterministic
pushdown
automaton |
computer
languages |
| 2 |
context-free |
lhs: one non-
terminal |
rhs: string of terminals and
non-terms |
non-determ.
pushdown
automaton |
anbn |
| 1 |
context-sensitive |
|lhs| £
|rhs| |
linear bounded
automaton |
anbncn |
| none |
recursive |
|
non-looping TM |
L3 |
| 0 |
recursively
enumerable |
no restriction except above (*) |
TM |
L2 |
| none |
not r.e. |
|
none possible |
L1 |
* lhs always has at least one non-terminal
** each language is not in the previous class of languages