CSCE 425/825 - Compiler Construction

CSCE 425/825
Course Home Page
Course Description
Calendar
Project Upload
Resources
Project
Java
ANTLR
JVM and ASM
Eclipse and Plugins

Scanners

Exercises

  1. Implement and test an ANTLR based scanner that:
    • formats text in 80-column blocks, while removing multiple, leading, or trailing blanks;
    • removes all tags from an HTML document; or
    • introduces plausible spelling errors. 
  2. Discuss the difference between DFA based lexical analysis and the approach taken by ANTLR.
  3. Solve Exercise 3.6.2 (and of course 3.3.5) in the Dragon book

Reading

Slides and Lecture

Scanning Slides

Notes

To clarify the discussion about precedence and order of oprations in class yesterday. The order of operations is '*' > ' ' > '|'. Note that '*' and '+' have the same precedence.

So, for example, if we were to write the un-parenthesized example from class:

letter letter | digit *
with parenthesis to indicate the default order of operations we would get:
((letter letter) | (digit *))
which would recognize strings such as:
xy, aa, 1, 12345
but not strings like:
xyz, a, a1, abc123

If we want to force the intended meaning of an identifier as a letter followed by some number of letters or digits, then we must use explicit parentheses to force the '|' to happen first as follows:

letter (letter | digit) *