CSCI 301 L33 Notes

Lecture 33 - Notes

Goals

Announcements

(Nondeterministic) PDAs Accept Context-Free Languages

Today we’re going to touch on the equivalence of Pushdown Automata and Context-Free languages. In particular, it can be demonstrated that:

Theorem: \(A\) is a context-free language if and only if there exists a (nondeterministic) Pushdown Automaton \(M\) such that \(L(M) = A\).

Recall that our definition of a context-free language is based on context-free grammars. So to prove this in full, we would need to show that:

  1. For any context-free grammar, we can construct a PDA accepting its language
  2. For any PDA, we can construct a context-free grammar that generates its language

Today we’re going to address #1 but not #2; to make the proof of #1 a little easier on ourselves, we will take a detour and learn how to convert grammars to a standard representation called Chomsky Normal Form.

Setup

First, let’s set up the proof we’d like to complete:

Let \(G = (V, \Sigma, R, \$)\) be a context-free grammar with start variable \(\$\). To show that there exists a PDA \(M\) that accepts \(L(G)\), we will construct such a machine.

Idea: The general idea here is that the stack alphabet will be the set of nonterminals. The stack will start with only the start symbol \(\$\), and we will have a transition function rule for each grammar rule. In each computation step, a nonterminal is popped from the stack and a substitution rule is applied.

The problem here is that we don’t want to put terminals on the stack, so a rule like \(A \rightarrow aBc\) presents a problem. We will fix this by converting our grammar into a form without such productions without modifying its language.

Chomsky Normal Form

As we saw when modifying grammars to remove left recursion and common prefixes, it’s often possible to rewrite grammar rules in specific ways that don’t change the language of the grammar. One important such conversion is to Chomsky Normal Form.

Definition: A grammar in Chomsky Normal Form has only rules of the following form:

  1. \(A \rightarrow BC\)
  2. \(A \rightarrow a\)
  3. \(S \rightarrow \epsilon\)

In other words, a nonterminal can only be replaced by exactly two other nonterminals or a single terminal, and the only allowed \(\epsilon\)-rule is from the start symbol.

The steps to complete this conversion are tedious, but possible:

  1. Eliminate the start variable from the right-hand side of all rules
  2. Eliminate all \(\epsilon\)-rules \((A \rightarrow \epsilon\), where \(A \ne S)\)
  3. Eliminate unit-rules (\(A \rightarrow B\))
  4. Eliminate rules with more than two symbols on the right-hand side \((A \rightarrow BCD)\)
  5. Eliminate rules with multiple terminals \((A \rightarrow u_1 u_2)\)
Example

Let \(G\) be a grammar with start symbol \(S\) and the following rules:

\(S \rightarrow ASA \mid aB\) \(A \rightarrow B \mid S\) \(B \rightarrow b \mid \epsilon\)

(see whiteboard for the conversion)

Equivalent grammar in Chomsky Normal Form:

\(S_0 \rightarrow AM \mid AS \mid SA \mid NB \mid a\)

\(S \rightarrow AM \mid AS \mid SA \mid NB \mid a\)

\(A \rightarrow AM \mid AS \mid SA \mid NB \mid a \mid b\)

\(B \rightarrow b\)

\(M \rightarrow SA\)

\(N \rightarrow a\)

(out of time for typed notes! see the whiteboard notes and ToC 3.7)