Due: before class on Tuesday 09 December

Part 1: Type Systems & Polymorphism (Unit 5)

In modern programming language terminology, which of the following scenarios best demonstrates “Strong Typing”?
1. A language where you must explicitly write int x = 10; rather than just x = 10;.
2. A language that throws a runtime error if you try to multiply a string by a boolean, rather than implicitly converting the boolean to 0 or 1.
3. A language where the type of a variable is determined at compile-time and cannot change.
4. A language that allows you to cast a pointer to a specific memory address into an integer and perform arithmetic on it.
Consider the implementation of Dynamic Typing using a “tagged union” approach (as seen in Python or Scheme). Why does this implementation inherently introduce runtime overhead compared to static typing?
1. The interpreter must perform a “tag check” before every operation (like addition) to ensure the types are compatible.
2. Tagged unions require all variables to be stored on the heap, which causes frequent garbage collection pauses.
3. The language must generate multiple versions of every function to handle different types.
4. It prevents the use of recursion, forcing the programmer to write slower iterative loops.
Why is Ad-hoc Polymorphism (Function Overloading) generally impossible to implement in a dynamically typed language like Python?
1. Dynamic languages strictly forbid defining two functions with the same name in the same scope.
2. The interpreter cannot determine which version of the function to call because the arguments do not have declared types to match against a signature.
3. Dynamic languages use vtables for all function calls, which only support single inheritance.
4. It would require “Type Erasure,” which is only available in compiled languages like Java.
In Java when we write a method call on an object like midn.march(), what happens at compile-time and at run-time? (Select all that apply)
1. The compiler selects the correct version of the function based only on the declared type (static dispatch)
2. The compiler ensures the declared type has a method with that name (static typing)
3. The JVM runtime looks up which version of the function to call based on the actual type of the object using a vtable (dynamic dispatch)
4. The JVM runtime will throw an error if the actual object type doesn’t have any method with that name (duck typing)
Which of the following statements accurately describes the trade-offs between C++ Templates and Java Generics?
1. C++ Templates use “Type Erasure,” resulting in smaller executable files but slower runtime performance due to casting.
2. Java Generics use “Monomorphization,” creating specialized copies of the class for every data type used.
3. C++ Templates allow for better runtime performance because the compiler generates specialized machine code for each type (e.g., vector<int> vs vector<float>).
4. Java Generics allow you to instantiate new objects of type T (e.g., new T()) because the type information is preserved at runtime.

Part 2: LLVM Optimizations and Backend (Unit 6)

In the context of the LLVM optimization pipeline, what is the primary role of the Middle-End?
1. To parse the source code and generate the Abstract Syntax Tree (AST).
2. To map virtual registers to physical hardware registers (e.g., x86 or ARM).
3. To analyze and transform the Intermediate Representation (IR) to be more efficient without changing its behavior.
4. To verifying type safety and syntax correctness before lowering to IR.
You are optimizing a function to remove stack memory usage (alloca/load/store). If a variable is assigned different values in two different branches of an if/else statement, how must you handle the merge point in SSA form?
1. You must assign the variable to a memory address in the merge block to reset its state.
2. You must duplicate the code of the merge block into both the if and else branches to avoid merging.
3. You can simply define the variable twice in the merge block, once for each path.
4. You must insert a Phi node (\(\phi\)) at the start of the merge block to select the correct value based on which predecessor block was executed.
5. You insert copy commands in each predecessor block, immediately before the branch instruction
Which of the following are strict rules regarding the placement and structure of Phi nodes in LLVM IR? (Select all that apply)
1. Phi nodes must be the very first instructions in a Basic Block.
2. A Phi node must list a value/label pair for every successor block in the control flow graph.
3. A Phi node must list a value/label pair for every predecessor block in the control flow graph.
4. Phi nodes can only be used in loops, not in standard if/else conditional merges.
When performing the mem2reg optimization (promoting memory to registers), what distinguishes an “Easy Case” from a “Hard Case”?
1. Easy Case: The variable is an integer; Hard Case: The variable is a pointer or struct.
2. Easy Case: The load has only one corresponding store (e.g., within the same block); Hard Case: The store could come from multiple predecessor blocks.
3. Easy Case: The code contains no loops; Hard Case: The code contains nested loops.
4. Easy Case: The variable is defined in the preheader; Hard Case: The variable is defined in the body.
Loop Invariant Code Motion (LICM) allows the compiler to move an instruction from the loop body to the preheader. Which of the following instructions would be illegal to move?
1. %x = add i32 %a, 5 (where %a is defined before the loop starts).
2. %x = mul i32 10, 20 (constant arithmetic).
3. %x = call i32 @random() (a function that returns a random number).
4. %x = getelementptr ... (address calculation using only invariant operands).
You are writing a pass to unroll a loop that iterates from 0 to N. Since N is unknown at compile-time, you cannot guarantee it is divisible by your unrolling factor. What structural component handles the “leftover” iterations?
1. The Epilogue loop.
2. The Preheader block.
3. The Vtable.
4. The Interference Graph.
Which of the following best summarizes the primary goal of Loop Unrolling?
1. To reduce the total number of instructions in the binary executable.
2. To reduce the “administrative overhead” (incrementing counters, checking conditions, branching) relative to the actual work done in the loop body.
3. To ensure the loop executes exactly the same number of times regardless of input.
4. To eliminate all phi nodes from the loop header.
In Register Allocation, we build an Interference Graph. What does an edge between two nodes (variables) signify in this graph?
1. The two variables are used in the same mathematical operation.
2. The two variables have overlapping Live Ranges (they are both “alive” at the same time).
3. The two variables are assigned the same value.
4. One variable is a pointer to the other.
During the graph coloring phase of register allocation, if the graph cannot be colored with \(K\) registers, the compiler must “Spill” a variable. What does “spilling” entail?
1. Moving the variable to a different physical register (e.g., from rax to rbx).
2. Deleting the variable entirely and re-calculating it later.
3. Moving the variable from a register into stack memory, inserting load and store instructions.
4. Expanding the number of physical registers available on the CPU.

Part 3: Optimization challenge

Given the following LLVM IR code, run through all the optimization steps we have learned about:

constant propagation
memory elimination
loop-invariant code motion
loop unrolling
phi elimination
register allocation

The final result should be “pseudo-LLVM” IR code, which has exactly the same structure but doesn’t have phi instructions and can re-assign registers and perform copy commands if necessary.

Show all of your work!

define i32 @hw61(i32 %reg0, i32 %reg1) {
A:
  %reg3 = alloca i32
  %reg4 = alloca i32
  %reg5 = alloca i32
  %reg6 = alloca i32
  store i32 %reg0, ptr %reg3
  store i32 %reg1, ptr %reg4
  %reg7 = load i32, ptr %reg3
  %reg8 = load i32, ptr %reg4
  %reg9 = add i32 %reg7, %reg8
  store i32 %reg9, ptr %reg5
  store i32 1, ptr %reg6
  br label %B

B:
  %reg11 = load i32, ptr %reg6
  %reg12 = icmp sle i32 %reg11, 3
  br i1 %reg12, label %C, label %E

C:
  %reg14 = load i32, ptr %reg3
  %reg15 = add i32 %reg14, 10
  %reg16 = load i32, ptr %reg5
  %reg17 = mul i32 %reg16, %reg15
  store i32 %reg17, ptr %reg5
  br label %D

D:
  %reg19 = load i32, ptr %reg6
  %reg20 = add i32 %reg19, 1
  store i32 %reg20, ptr %reg6
  br label %B

E:
  %reg22 = load i32, ptr %reg5
  %reg23 = load i32, ptr %reg4
  %reg24 = sub i32 %reg22, %reg23
  ret i32 %reg24
}