Topics
Self-embedding grammars
Path length and sentence length for CNF grammars
Occurrences of live productions in paths
Self-embedding revisited
Pumping revisited
The uvwxy theorem (Pumping Lemma for
CFG)
anbncn is not a
CFL
Here’s a CNF (Chomsky normal form) grammar:
and the derivation tree for bac:S à BC | a
B à BS | b
C à c

For the purposes of discussion in this section, we will ignore the leaf nodes of our derivation trees.
Under this assumption, notice that the derivation tree for a CNF CFG is a binary tree.

You can see that the depth of the tree is related to the size of the string that can be derived. A tree of depth 2 (shown above) can produce a sentence of length at most 22 = 4. In general, a derivation tree of depth p can produce a sentence of at most 2p characters.
Note that not every derivation tree is a complete tree, for which every path from the root to a leaf is the depth of the tree. For the tree shown just above, every path is of length 2. Referring back to the derivation tree shown at the beginning, we observe paths (not including the leaves) of length 2 and 1.
Let’s suppose that our CNG CFG has p productions of the form A à BC, called live productions (as opposed to the other kind, of the form A à a, called dead productions).
Let’s also suppose that we can derive a string of length greater than
2p. Can all the paths be of
length <= p? No! we have already shown that a derivation
tree of depth p can produce a sentence of at most 2p characters!
So there is at least one path of length > p. How many live productions were used in the course of following that path? Looking at our example, and the highlighted path of length 3:

we see that 3 live productions were used (same as path length).
So for that path of length > p, more than p live productions were used.
If more than p live productions were used in a path (remember, there are only p of them), then some live production A à BC was repeated, and self-embedding of A has occurred! So we have something like that tree shown above, where S à BC was repeated, resulting in the self-embedding of S.
If the production A à BC is repeated along a path, then we have
Notice also (by substitution of A Þ * vAx) we have:
Here is what we have found:
If a grammar G has p live productions and can generate a string z of length > 2p, then z = uvwxy, where v and x are not both empty, and the strings uvnwxny (n = 2, 3, …) are also generated by G.
The pumping lemma is useful in proving that certain languages are not context free.
Let’s assume that anbncn is generated by a CNF CFG, and that the number of live productions is p. We can certainly generate a string of length > 2p. By the pumping lemma, it can be written in the form uvwxy, where uv2wx2y is also in the language, and v and x are not both empty.
Proof part 1: If v is non-empty, it cannot be a mixture of letters, e.g. aabb, because then uv2wx2y = u(aabb)2wx2y = uaabbaabbwx2y and would hence not be in the language. Similarly for x.
Proof part 2: v and x cannot be non-mixed letter strings, because, e.g., if v = aaaa, and x = bbbb, then uv2wx2y would have 4 more a’s and b’s, but no more c’s, and would hence not be in the language.