Topics
Are all languages regular?
anbn
is not a regular language
The pumping lemma
Using the pumping lemma
to prove that anbn
is not regular
An interesting and useful way
to classify strings
A surprising construction:
the CCFA of a language L
The CCFA of a language L accepts exactly L
How does the CCFA for L relate to other FA that accept
L?
The Myhill-Nerode theorem
Using Myhill-Nerode to show
that a language is not regular
That would be nice, because then we could stop right here. But they’re not.
For example, the language { anbn, n = 1, 2, …} is not regular.
Following is a more formal proof of our assertion:
Suppose we have a NFA with 20 states that can recognize a21b21. Since there are only 20 states, one will have to be revisited while reading in the a21 part. Suppose a state is revisited at the 10th and 15th transitions. The sequence of states will look like:

So, some languages are non-regular. The question arises, given a language, how can one tell whether it is regular or not? Of course, any finite language is regular: if L = {s1, s2, … sn} then s1 + s2 + … + sn is a RE whose language is L. But what about infinite languages? We introduce two theorems that are useful in showing whether specific languages are regular or not:
(1) The pumping lemma
(2) The Myhill-Nerode theorem
Suppose L is regular and infinite. Then there is some string xyz in L (y ¹ l ) such that xynz is also in L, for all n = 1, 2, … .
Suppose the FA for L has m states. Choose any string in L of
length > m. While recognizing this string, some state must be revisited;
let’s say state k is visited on transitions i and j.
We divide up our input string into three pieces: x, y, and
z,
where the x-part finishes by visiting state k for the first
time, the y-part causes us to visit state k for the second
time, and the z-part is the rest:
| input string |
|
|
|
||
| character positions | 1, 2, … | i | i+1, … | j | j+1, … , m |
| states
visited |
doesn’t matter | k |
|
k | doesn’t matter |
y, the string of (j-i) characters recognized starting just after the first state k and ending with state k, is the sought string; if xyz is recognized, we can insert, in place of y, y2. The first y will take us from state-k to state-k, and so will the second, then we go on to recognize the z and accept.
If we can do it with y2, we can likewise do it with y3, y4, etc.
This theorem is mainly useful for showing that languages are not regular.
The proof is by contradiction. That is, we will assume that L = anbn is regular, and arrive at a contradiction, thus showing that our assumption is wrong.
If L is regular, then by the pumping lemma, there is some string xyz in L such that all xynz are also in L.
Case 1: y = aa … a.
By the pumping lemma, we can "pump" up y to a high enough power k such
that the number of a’s in xykz exceeds the number of b’s, creating
a string that is not in L. Contradiction.
Case 2: y = aa … abb … b.
By the pumping lemma, x(aa … abb … b)(aa … abb … b)z should also be
in L. Contradiction.
Case 3: y = bb … b. See Case 1.
Pumping Lemma 1 states that “there exists a string w = xyz …”. Sometimes that’s not good enough to prove that a language is not regular.
Pumping Lemma 2 states: If a language L is accepted by a finite automaton with N states, then every string w in L of length > N can be written as xyz, where y ¹l, length(xy) £ N, and all strings xynz are also in L.
Consider the language L = {aab, aba, bab}
Now think of the set of arbitrary strings of a’s and b’s (including l ). One such is aa.
Think of "completing" aa by appending strings of a’s and b’s (or l ), and observing whether the completed string is in L or not. We will use a completion code "L" for yes and ~ for no. We can tabulate a completion code for each completer of each string, called the string’s completion behavior.
Completion behavior of "aa":
| string |
|
|||
| l | a | b |
|
|
| aa | ~ | ~ | L |
|
Adding the completion behaviors of ba and ab:
| string |
|
|||
| l | a | b |
|
|
| aa | ~ | ~ | L |
|
| ba | ~ | ~ | L |
|
| ab | ~ | L | ~ |
|
Notice that the completion behaviors of aa and ba are the same. We say
that aa and ba are in the same completion class. On the other hand,
ab will be in some other completion class. Following is a complete table
of completion behaviors for all possible strings of a’s and b’s. The ~’s
have been omitted to avoid clutter.
|
|
|||||||||||||||
| l | a | b | a
a |
ab | ba | bb | a
a a |
aba | a
ab |
abb | bbb | bba | bab | ba
a |
|
| l | L | L | L | ||||||||||||
| a | L | L | |||||||||||||
| b | L | ||||||||||||||
| aa | L | ||||||||||||||
| ab | L | ||||||||||||||
| ba | L | ||||||||||||||
| bb | |||||||||||||||
| aaa | |||||||||||||||
| aba | L | ||||||||||||||
| aab | L | ||||||||||||||
| abb | |||||||||||||||
| bbb | |||||||||||||||
| bba | |||||||||||||||
| bab | L | ||||||||||||||
| baa | |||||||||||||||
Note: we have omitted strings and completers of length 3 or more; they of course are never in L, and belong to the same class as bb.
Based on this table, we have seven completion classes:
| {l } | {bb, aaa, abb, bbb, bba, baa, all strings of length ³ 4} | {ab} | {aa, ba} | {b} | {a} | {aba, aab, bab} |
Notice that every conceivable string is in exactly one class; we say that this classification partitions all strings into (disjoint) equivalence classes.
For this language, we have a finite number of classes. For some languages the number of classes will be infinite; thus, we would not be able to tabulate them, but we might be able to describe them using more abstract descriptions.
We build a finite automaton, using these completion classes as states, as follows:
The start state is the class containing l :

|
Construct: the a-transition from this state: by determining what class "la" is in, namely, {a} the b-transition: by determining what class "lb" is in, namely, {b} |
![]() |
| Transitions from the a-state:
a-trans = the class that aa is in = {aa, ba}
|
![]() |
| Transitions from the b-state:
a-trans = the class that ba is in = {aa, ba}
|
![]() |
Now think about a-transitions from the aa,ba-state = the class that
aaa is in; or is it the class that baa is in?
That will always be the same class, because aa and ba have the same
completion behavior! so:
Transitions from the aa,ba-state:
a-trans = {bb, … }
b-trans = {aba, aab, bab} notice that this is just L. We make it our final state.
Completing the remaining transitions according to the above, our
final machine is:

Notice that this machine accepts exactly L = {aba, aab, bab}.
If there are finitely many completion classes (therefore finitely many states), we call this machine the Completion Classes FA (CCFA) of L.
Question: is the CCFA of a language L a FA that accepts L? The answer is yes. Read on.
Let M be the CCFA of L.
Terminology: Given a FA M and a string x, we say that x puts M in state S if x as input to M will cause M to stop in state S.
(1) Suppose x is in L. Then x will put M into the state containing x (by construction). This is a final state, by construction. So if x is in L, M accepts x.
(2) Suppose x takes M into a final state S. S has at least one string y of L (by definition). Since y is in L, the completion code for y and completer l is L. But the completion behavior for all strings in S is the same, so xl ’s code is also L, so x is in L. So if M accepts x, then x is in L.
Combining (1) and (2), x is in L Û M accepts x, and the assertion is proved.
Here’s another FA that accepts the language L = {aab, aba, bab } introduced earlier:

{l } = S1
{bb, aaa, abb, bbb, bba, baa, all strings of length ³
4} = S10
{ab} = S5
{aa, ba} = S3 ÈS8
{b} = S7
{a} = S2
{aba, aab, bab} = S4 È
S6 È S9
A language L is regular Û it has a finite number of completion classes.
Proof:
Ü if it has a finite # of completion
classes, its CCFA is a recognizer for it, as just shown.
Þ if the language is regular, it has
a finite number of states, defining finitely many Si’s (see
above) which can be allocated to at most a finite number of completion
classes.
Let’s do anbn again:
Consider the following completion classes:
| for a: | completer b has an L, but completers b2, b3, … are all rejected |
| for aa: | completer bb has an L, but completers b, b3, … are all rejected – different completion behavior than a’s |
| for aaa: | completer bbb has an L, but completers b, b2, b4, … are all rejected – different completion behavior than a’s or aa’s |
| for a to any power: | similarly |
So a, a2, a3, … are all in different completion classes Þ infinitely many completion classes Þ anbn non-regular.