Due: before class on Monday 22 September

Looking back

Of course. Here are the questions formatted in Markdown without the answers.


  1. Consider the following token specifications, in order of priority:

    IF: if
    ID: [a-z0-9]+

    The scanner is processing the input string if9. According to the principles of scanning described in the notes, what token(s) will be produced?

    1. An IF token with the value if, followed by an error.
    2. An IF token with the value if, followed by an ID token with the value 9.
    3. A single ID token with the value if9.
    4. A single IF token with the value if9.
  2. Suppose you have an ANTLR-based interpreter for the calculator language and want to add a new logical AND operator (&&) to the language. Which of the following actions are necessary to fully implement this feature, from token definition to execution? (Select all that apply.)

    1. Modify tokenSpec.txt to define the regex for the new operator token.
    2. Modify Interpreter.java to handle the new grammar rule and perform the logical AND operation.
    3. Modify the pom.xml file to include a new dependency for logical operations.
    4. Modify ParseRules.g4 to add a new production rule for a logical expr.
    5. Modify Tokenizer.java so that it can handle the character &
  3. Suppose a top-down parser uses the following grammar for a simple graphics language:

    prog -> cmd prog
         -> ε
    
    cmd -> DRAW shape STYLE ID
    shape -> CIRCLE
          -> RECTANGLE

    The top-down parser has successfully matched a DRAW token and now needs to expand the shape nonterminal. It uses look-ahead and sees that the next token in the input stream is RECTANGLE. What is the parser’s correct action?

    1. It should report a syntax error because RECTANGLE is ambiguous.
    2. It should shift the RECTANGLE token onto the parse stack.
    3. It must look ahead an additional token (to STYLE) to make a decision.
    4. It should predict and expand the rule shape -> RECTANGLE.
  4. Why are comments and whitespace handled by the scanner instead of the parser?

    1. Because the grammar rules for comments and whitespace are too complex for a CFG to handle.
    2. To make the parsing stage simpler and more efficient by removing tokens that don’t affect the program’s structure.
    3. Because the parser can’t access the original text spelling, only the token types.
    4. To allow an IDE to easily apply syntax highlighting to comments.
  5. Consider a simple language for controlling a robot on a 2D plane. Below is an informal description of the language and a short example program.

    Language Description: A program consists of one or more commands, each ending with a semicolon. The language is case-sensitive. There are two kinds of commands:

    • A move command, which takes a single positive integer (the distance to move forward).
    • A turn command, which takes a direction keyword (left or right) followed by a positive integer (the number of degrees to turn).

    Whitespace (spaces, tabs, newlines) and comments (from a # to the end of the line) should be ignored by the scanner.

    Example Program:

    # Draw two sides of a square
    move 100;
    turn right 90;
    move 100;

    Your Task: Based on the description and example, write a scanner spec (list of token types and regexes) and parser spec (context-free grammar) for this language.