Our textbook does not cover parsing. I’ve found the following resources helpful:
The key issue in parsing is an inability to take a “global” view of the input string - you have to process it piece by piece; in the case of LL(1) parsing, we’re only allowed for 1 symbol of lookahead. In the following examples, think about how much of the input string you need to see in order to choose the correct production.
Do Exercises Part A
Consider:
Parse
Seems easy enough! We can look at one character and correctly decide which production to apply.
Consider:
Parse
You need to see the whole input to choose the
This problem arises because of left recursion. In simple cases such as this, it can be eliminated by converting the grammar to an equivalent one.
Consider the grammar
We can write equivalent grammar rules to generate the same strings by
introducing a new variable
Do Exercises Part B
If two rules in a grammar have a common prefix, it can be very difficult to know which rule to apply. For example, in the following rule representing a conditional statement in some plausible programming language:
or, with more abstract names:
the prefix
We can eliminate common prefixes by introducing new variables; this is called left factoring.
If we think in terms of the conditional statement example above, the
new variable
Do Exercises Part C
It’s worth noting that left recursion and common prefixes can
hide in grammars by way of indirection. An example of
this can be seen in the following grammar:
Similarly,
I’d like to be able to tell you that if we can left factor a grammar and eliminate all left recursion, then we can write an LL(1) parser for it, but unfortunately that’s not even true. These are necessary, but not sufficient, steps for LL(1) parsing to work. The simplest way to tell if a grammar can be LL(1) parsed is to go from the factored grammar all the way to the parse table, and see if there’s any ambiguity. We’ll see how to do next time; this will build up to Lab 8, where you’ll implement an LL(1) a parser for a grammar of the Racket language.
You’re used to seeing arithmetic expressions written in
infix notation, such as (+ 1 (* 2 (- 4 3)))
; prefix notation is sometimes also
known as Polish
notation. We are going to write an LL(1) parser for Reverse
Polish notation (RPN), otherwise known as
postfix notation.
One advantage of RPN is that it removes the need for operator
precedence and parentheses. For example, to evaluate the infix
expression
1 2 4 3 - * +
This would be evaluated in the following way, where the first lines in each pair shows parentheses to highlight the operation to be computed and the second lines show the expression with the evaluated operation substituted.
1 2 (4 3 -) * +
1 2 1 * +
1 (2 1 *) +
1 2 +
(1 2 +)
3
The nifty thing here is that we could unambiguously express an order of operations without needing parentheses or operator precedence.
Notice that a simple evaluation algorithm works here: if you see a number, push it onto a stack; if you see an operator, pop the top two operands off the stack, apply it to the two operands, then push the result back onto the stack. Repeat until the stack contains one number, which is the result.
Do Exercises Part D
Here’s a grammar that describes RPN strings:
This grammar basically describes a space-separated list of integers
and operators. Notice that it doesn’t require a matched number of
numbers and operators; this grammar allows expressions like
1 2 3 +
, which perhaps could evaluate to 6, but also
expressions like 6 7 8 /
or + 1
. The
interpretation of such expressions isn’t our concern for now: as long as
the string is a list of numbers and operators, we’ll parse it.