Due: before class on Monday 20 October

Looking back

There are plenty of review problems today to get you ready for the midterm next week.

  1. Which of the following statements about language fundamentals are true? (Select all that apply)

    1. The rules for what code looks like are its semantics, while the rules for what it means are its syntax.
    2. An expression is a piece of code that evaluates to a single value.
    3. A statement is a piece of code that tells the program to perform an action but does not itself evaluate to a value.
    4. A variable declaration, such as String creature = "axolotl";, is an example of an expression.
  2. The following LLVM IR code is invalid. Why?

    define i32 @calculate() {
      %result = add i32 10, 5
      %result = mul i32 %result, 3
      ret i32 %result
    }
    1. The mul instruction is not a valid LLVM command.
    2. It violates the Single Static Assignment (SSA) rule by reassigning a value to %result.
    3. It does not use the Three-Address Code (3AC) format.
    4. The function is missing a target triple declaration.
  3. A situation where a language syntax says a piece of code is valid but does not define exactly what should happen when it runs is called what?

    1. A compilation failure
    2. A syntax error
    3. Underspecified Behavior (UB)
    4. A runtime exception
  4. Which of the following descriptions of compilers vs interpreters are correct? Select all that apply.

    1. A compiler reads source code and executes it directly, while an interpreter translates it into another language like machine code.
    2. An interpreter will translate code into an abstract syntax tree (AST), whereas a compiler instead uses a parse tree.
    3. Languages which have dynamic typing such as Scheme tend to be compiled rather than interpreted
    4. Low-level systems languages such as Rust tend to be interpreted rather than compiled.
    5. An interpreter typically includes more stages of optimizations than a compiler.
  5. The typical structure of an LLVM instruction, such as %dest = command %src1, %src2, is an example of which format?

    1. Single Static Assignment (SSA)
    2. Abstract Syntax Tree (AST)
    3. Three-Address Code (3AC)
    4. Object-Oriented Programming (OOP)
  6. What are the primary advantages of writing complex logic in a C helper function and calling it from a main program written in LLVM IR? (Select all that apply)

    1. It allows you to bypass the need to write complex, low-level LLVM IR for things like loops and conditionals by hand.
    2. The final executable will always run faster than one generated purely from handwritten LLVM IR.
    3. It simplifies the process by breaking down a hard problem into a more manageable one.
    4. It is the only way to link against standard libraries like stdio.h.
  7. What is the most common and reasonable way to determine the correct LLVM IR for a specific C language feature?

    1. Read the official LLVM language reference manual from start to finish.
    2. Write a small C program using the feature and compile it with clang -S -emit-llvm to inspect the output.
    3. Use the lli interpreter to experiment with different commands until one works correctly.
    4. Modify an existing .ll file and see if it still compiles.
  8. When scanning for string literals, why is the regular expression "[^"]*" a better choice than ".*"?

    1. The . character does not match whitespace characters.
    2. The [^"]* pattern correctly handles escaped quotes (\") inside a string.
    3. The .* pattern is “greedy” and can incorrectly match everything between the first and last quote on a line, such as in "one" + "two".
    4. The * quantifier is more efficient when used with a character class.
  9. A scanner encounters the character sequence if_stmt. Given that if is a keyword and identifiers are defined as sequences of letters and underscores, how will this sequence be tokenized?

    1. As KEYWORD:if and IDENTIFIER:_stmt, because keywords have higher precedence.
    2. As a single token, IDENTIFIER:if_stmt, because the scanner prefers the longest possible match.
    3. It will cause a syntax error, as a keyword cannot be part of an identifier.
    4. As two separate tokens, because the parser needs to identify the if keyword first to understand the program’s structure.
  10. Which of the following are characteristics of a declarative programming paradigm, such as the one favored in Scheme? (Select all that apply)

    1. The code explicitly describes the control flow and step-by-step state changes.
    2. The code focuses on defining the logic of what result should be computed.
    3. The programmer primarily uses mutable state and iterative loops to achieve results.
    4. Computation is treated as the evaluation of mathematical functions and expressions.
  11. The Lisp family of languages, including Scheme, is known for its ability to represent its own code as a fundamental data structure (e.g., a list). What is the term for this property, and which feature is used to treat code as data by preventing its evaluation?

    1. Dynamic Typing; define
    2. Homoiconicity; quote
    3. Referential Transparency; cons
    4. First-Class Functions; lambda
  12. Why are comments and whitespace typically handled by the scanner and discarded before the parsing phase begins?

    1. Context-free grammars are incapable of describing the syntax for comments or whitespace.
    2. To simplify the grammar for the parser, which would otherwise need to account for these elements in many different rules.
    3. Because scanners use regular expressions, which are inherently more powerful than the grammars used by parsers.
    4. Discarding whitespace is a necessary first step for all types of code optimization techniques.
  13. Which statement accurately contrasts the core operations of top-down and bottom-up parsers?

    1. A top-down parser’s key decision is to shift or reduce; a bottom-up parser’s is to predict a grammar rule.
    2. A top-down parser starts from the grammar’s root and predicts which production to use to expand the tree downwards. A bottom-up parser starts from the tokens and reduces sequences to build the tree upwards.
    3. Top-down parsers build the tree from leaves to root, while bottom-up parsers build from root to leaves.
    4. Top-down parsers are only suitable for functional languages, while bottom-up parsers are used for imperative ones.
  14. A scanner spec produces a single, combined DFA to help tokenize the input. Select all true statements about this final, combined scanner DFA.

    1. Each accepting state is labeled with the highest-priority token it recognizes.
    2. Transitions on whitespace characters often lead to a special state that discards the input rather than producing a token.
    3. To be a valid DFA, every state must have a transition for every possible character, often including transitions to a non-accepting “trap state” for invalid sequences.
    4. The total number of states in the combined DFA is always the sum of the states from the individual DFAs.
  15. How would a Scheme interpreter process the expression ((lambda (x) (+ x 5)) 10)?

    1. It multiplies lambda by x and 5, then applies this result to 10.
    2. It identifies (lambda (x) (+ x 5)) as an expression that evaluates to a procedure, evaluates 10 to itself, and then applies the procedure to the argument 10.
    3. It reports an error because a lambda expression cannot be the first element in a function call.
    4. It treats the entire expression as a literal list of data due to the nested parentheses.
  16. Which principles are central to the functional programming style? (Select all that apply)

    1. Treating functions as “first-class citizens” that can be passed as arguments or returned as results.
    2. Preferring immutability, where operations create new data structures instead of modifying existing ones.
    3. Relying on iterative loops like for and while as the primary means of repetition.
    4. Focusing on direct, sequential modification of program state to achieve results.
  17. What is the fundamental relationship between a context-free grammar and a valid parse tree?

    1. The leaves of the tree are nonterminals, while the internal nodes are tokens from the input stream.
    2. The root of the tree must be the grammar’s designated start symbol.
    3. Redundant or unnecessary tokens are removed from the parse tree for simplicity and readability.
    4. The left-to-right order of the leaf nodes (tokens) in the tree does not necessarily correspond to their order in the original input.

Looking ahead

  1. Go through the first chapter in the tour of rust. It should only take a few minutes.

    To check your understanding, consider the following function:

    fn foo(a: [i32; 3]) -> bool {
        let x = a[0] < a[1];
        let y = a[1] < a[2];
        x and y
    }
    1. There is a syntax error in the function definition. What is the error? (Hint: you can copy-paste this into the rust playground) and try it!)

    2. What does this function do?