CSCI 301 L32 Notes

Lecture 32 - Notes

Goals

Announcements

Classifying Languages

At this point, we have encountered two major classes of languages: regular and context-free. We’ve stated, but not proved, that all regular languages are context-free. We’ve seen a few examples of languages that are context-free but not regular, such as \(a^nb^n\).

But how, in general, can we prove that a language is not in a given class? If we can find a DFA it’s regular, but how can we prove that no DFA can possibly exist? The answer to this question is called the Pumping Lemma.

The Pumping Lemma for Regular Languages

The intuition is simply that, for a DFA to exist for an infinite language, there has to be some kind of repetitive structure that allows the DFA to recognize infinite strings using only a finite number of states.

Considering the following NFA: what can say for sure about a string accepted by this machine if the string’s length is greater than 2?

Notice that the only way for an accepted string to be longer than 2 characters is to spend some time in State 1, repeating \(a\)’s. In other words, any string longer than 2 characters must have the form \((\epsilon \cup b)a^*(\epsilon\cup a)\). Considering the three parts of this regular expression, the key observation is that the middle part must be something with a star, and such a middle part must always exist for any FA that accepts an infinite language.

With finite states, there must be some string length, beyond which any accepted string must take one or more trips through a path that begins and ends at the same state.

The pumping lemma formalizes this as follows:

Lemma (The Pumping Lemma for Regular Languages): Let \(A\) be a regular language. Then there exists an integer \(p \ge 1\), called the pumping length, such that the following holds: Every string \(s\) in \(A\) with \(|s| \ge P\) can be written as \(s = xyz\), where

Here’s a picture that summarizes the pumping lemma:

pumping

Proof: Suppose \(A\) is regular; then there exists a DFA \(M = (Q, \Sigma, \delta, q, F)\) that accepts it. Let \(p\) be the number of states in \(Q\). Consider a string \(s = s_1 s_2 s_3 \ldots s_n\) be a string in \(A\) with \(n \ge p\), and let the states \(r_1, r_2, r_3, \ldots, r_{n+1}\) be the states visited while processing \(s\), where \(r_1 = q\) and the machine switches to state \(r_{i+1}\) after processing character \(s_i\).

Since the number of states in \(M\) is \(p\) and the number of states visited while processing \(s\) is \(n+1\), the states \(r_i\) cannot all be distinct, or in other words, some state is visited twice. In fact, a state must be repeated by the time state \(r_{p+1}\) has been visited, because at this point \(|Q|+1\) states have been visited. Let \(j\) and \(\ell\) be the indices in the state transition list of a repeated state (i.e., \(r_j = r_\ell\)), with \(j < \ell\). Observe taht \(1 \le j < \ell \le p+1\). Now we can define:

This satisifes the properties given in the pumping lemma:

Proving Languages are Not Regular

The pumping lemma states something about regular languages; to prove something is not regular, we will use proof by contradiction by supposing the lemma holds, then showing a contradiction.

Example Prove that \(A = \{0^n 1^n : n \ge 0\}\) is not regular.

Proof: We will prove this by contradiction. Assume that \(A\) is regular, and let \(p \ge 1\) be the pumping length given by the pumping lemma. Consider the string \(s = 0^p 1^p\). The conditions of the pumping lemma are satisfied: \(s \in A\) and \(|s| = 2p \ge p\). Thus by the pumping lemma, \(s\) can be written as \(s = xyz\) where \(y \ne \epsilon\), \(|xy| < p\), and \(xy^iz \in A\) for any \(i \ge 0\).

Because \(s = 0^p 1^p\) and \(|xy| \le p\), notice that \(xy\) must be made up of all zeros, and \(y\) is not empty so it has at least one zero. If we let \(|y| = k\), then \(xz = 0^{p-k}1^p\). However, by the pumping lemma, \(xz \in A\), which is a contradiction, so \(A\) cannot be regular.

Strategy Note

In general, the “trick” to pumping lemma proofs is to pick the right string \(s\) given the pumping length \(p\). In the above proof, we picked \(0^p1^p\) because that makes \(xy\) all zeros, and from there we can only add or subtract zeros by subbing \(y^i\).

(End of what we covered in L32)

The Pumping Lemma for Context-Free Languages

There is an equivalent lemma that can be used to show that a language is not context-free. The intuition is similar, except instead of “looping” journies through a state machine, we observe that subtrees of the parse tree must be repeated. We won’t cover this in much detail, but here’s the lemma:

Lemma (The Pumping Lemma for Context-Free Languages): Let \(L\) be a context-free language. Then there exists an integer \(p \ge 1\), called the pumping length, such that every string \(s\) in \(L\) with \(|s| \ge p\) can be written as \(s = uvxyz\), where

This can be used to prove, for example, that \(A = \{a^nb^nc^n : n \ge 0\}\) is not context-free.

Proof Sketch: Consider the string \(a^p b^p c^p\), and looks quite similar to the \(a^nb^n\) regular proof, except that we have to address three cases for the location of \(vxy\). Since \(u\) and/or \(w\) could be empty, we consider three cases:

Thus there’s no way to break \(s\) into \(uvxyz\) that satisfies the CF pumping lemma criteria, so the language cannot be context-free.