Today we’re going to touch on the equivalence of Pushdown Automata and Context-Free languages. In particular, it can be demonstrated that:
Theorem: \(A\) is a context-free language if and only if there exists a (nondeterministic) Pushdown Automaton \(M\) such that \(L(M) = A\).
Recall that our definition of a context-free language is based on context-free grammars. So to prove this in full, we would need to show that:
Today we’re going to address #1 but not #2; to make the proof of #1 a little easier on ourselves, we will take a detour and learn how to convert grammars to a standard representation called Chomsky Normal Form.
First, let’s set up the proof we’d like to complete:
Let \(G = (V, \Sigma, R, \$)\) be a context-free grammar with start variable \(\$\). To show that there exists a PDA \(M\) that accepts \(L(G)\), we will construct such a machine.
Idea: The general idea here is that the stack alphabet will be the set of nonterminals. The stack will start with only the start symbol \(\$\), and we will have a transition function rule for each grammar rule. In each computation step, a nonterminal is popped from the stack and a substitution rule is applied.
The problem here is that we don’t want to put terminals on the stack, so a rule like \(A \rightarrow aBc\) presents a problem. We will fix this by converting our grammar into a form without such productions without modifying its language.
As we saw when modifying grammars to remove left recursion and common prefixes, it’s often possible to rewrite grammar rules in specific ways that don’t change the language of the grammar. One important such conversion is to Chomsky Normal Form.
Definition: A grammar in Chomsky Normal Form has only rules of the following form:
In other words, a nonterminal can only be replaced by exactly two other nonterminals or a single terminal, and the only allowed \(\epsilon\)-rule is from the start symbol.
The steps to complete this conversion are tedious, but possible:
Let \(G\) be a grammar with start symbol \(S\) and the following rules:
\(S \rightarrow ASA \mid aB\) \(A \rightarrow B \mid S\) \(B \rightarrow b \mid \epsilon\)
Let’s go through the steps above:
Eliminate the start variable by creating a new start variable:
\(S_0 \rightarrow S\) \(S \rightarrow ASA \mid aB\) \(A \rightarrow B \mid S\) \(B \rightarrow b \mid \epsilon\)
Eliminate all \(\epsilon\)-rules. We will do this by substituting \(\epsilon\) into the right-hand-side of any rule that contains a nullable variable. Start with the rule \(B \rightarrow \epsilon\), we will add \(\epsilon\) as a possible substitution for \(B\) anywhere else it appears in a rule:
\(S_0 \rightarrow S\) \(S \rightarrow ASA \mid aB \mid a\) \(A \rightarrow B \mid S \mid \epsilon\) \(B \rightarrow b\)
Next, we’ll eliminate the \(A \rightarrow \epsilon\) rule we just created in the same manner:
\(S_0 \rightarrow S\) \(S \rightarrow ASA \mid SA \mid AS \mid aB \mid a\) \(A \rightarrow B \mid S\) \(B \rightarrow b\)
Notice that we didn’t replace both \(A\)’s with \(\epsilon\) in the \(S \rightarrow ASA\) rule only because it would have created the rule \(S \rightarrow S\), which is not useful.
Now, we’ll eliminate unit-rules, starting with \(A \rightarrow B\). We’ll do this substituting what B can derive to (in this case, simply \(b\)) into the right-hand side of thre \(A \rightarrow B\) rule:
\(S_0 \rightarrow S\) \(S \rightarrow ASA \mid SA \mid AS \mid aB \mid a\) \(A \rightarrow S \mid b\) \(B \rightarrow b\)
Next we’ll eliminate \(A \rightarrow S\):
\(S_0 \rightarrow S\) \(S \rightarrow ASA \mid SA \mid AS \mid aB \mid a\) \(A \rightarrow ASA \mid SA \mid AS \mid aB \mid a \mid b\) \(B \rightarrow b\)
And finally eliminate \(S_0 \rightarrow S\):
\(S_0 \rightarrow ASA \mid SA \mid AS \mid aB \mid a\) \(S \rightarrow ASA \mid SA \mid AS \mid aB \mid a\) \(A \rightarrow ASA \mid SA \mid AS \mid aB \mid a \mid b\) \(B \rightarrow b\)
Now we need to eliminate all the rules with more than two symbols on the right. We do this by introducing a new variable, we’ll call it \(M\) as follows:
\(S_0 \rightarrow AM \mid SA \mid AS \mid aB \mid a\) \(S \rightarrow AM \mid SA \mid AS \mid aB \mid a\) \(A \rightarrow AM \mid SA \mid AS \mid aB \mid a \mid b\) \(B \rightarrow b\) \(M \rightarrow SA\)
Finally, eliminate rules with two symbols that are not both variables. We can do this by introducing another new variable \(N\):
\(S_0 \rightarrow AM \mid SA \mid AS \mid NB \mid a\) \(S \rightarrow AM \mid SA \mid AS \mid NB \mid a\) \(A \rightarrow AM \mid SA \mid AS \mid NB \mid a \mid b\) \(B \rightarrow b\) \(M \rightarrow SA\) \(N \rightarrow a\)
…and we have finally arrived at a version of the grammar that is in CNF!
Do Exercises Part A
Now that we know any CFG can be rewritten in CNF, we have the tools needed to prove one direction of the CFG/PDA equivalence; we will show that given a CFG, one can construct a PDA that accepts its language.
Let’s look at a derivation of the string \(bba\) using the above CNF grammar: \[ \begin{align*} S_0 &\Rightarrow AS \\ &\Rightarrow bS \\ &\Rightarrow bAS \\ &\Rightarrow bbS \\ &\Rightarrow bba \\ \end{align*} \] A couple non-coincidental things to notice about this derivation:
If you convince yourself that these things are both true for any CNF grammar derivation, regardless of the string, then we have the pieces to build a straightforward PDA that can perform (only) the derivations for a given CNF grammar.
Before we get started, I’m going to use a shorthand from the book to represent rules in \(\delta\), the PDA’s transition function. Recall that \(\delta\):
We can compactly write a rule as follows: \[ qaS \rightarrow rRAS \]
to indicate that when the machine is in state \(q\), sees tape symbol \(a\) and pops stack symbol \(S\), it will then move to state \(r\), move the tape on to the right \((R)\), and push \(AS\) onto the stack.
One more notational convenience: it will be useful below to write a rule that applies regardless of the current tape symbol. If the alphabet is \(\{a, b\}\), then we will write the following: \[ q*S \rightarrow qRS \] to mean the following two rules: \[ \begin{align*} qaS \rightarrow qRS \\ qbS \rightarrow qRS \end{align*} \] That is, given the state \(q\) and stack symbol \(S\), the output is the same for all input tape symbols the machine might see.
Here’s the plan:
Let \(Q = {q}\); we will have only one state, which is the start state
Let \(\Sigma_{pda} = \Sigma_{g}\), that is, the tape alphabet is the set of terminals in the grammar.
Let \(\Gamma = V\), that is, the stack alphabet is the set of nonterminals in the grammar.
We will define the transition function in terms of the grammar rules as follows:
For each grammar rule of the form \(A \rightarrow BC\), add transition rules of the form:
\(q*A \rightarrow qNBC\) and
For each grammar rule of the form \(A \rightarrow b\), add a transition rule:
\(q*A \rightarrow qR\epsilon\)
Recall that only the start symbol \(S\) can have an \(\epsilon\) rule; if it does, add the following rule: \(q\square S \rightarrow qN\epsilon\)
(the written notes and textbook Section 3.7 have further details)