Summary

What is a “Type”?

You have all been writing code with types for years now, and in this semester we have also been writing compilers and interpreters that deal with types and type checking. But what does that actually mean?

At the hardware level, you know that all data in the computer is just bits and bytes. The concept of types was introduced very early in the history of programming languages, to help the programmer keep track of the semantic meaning of these bits and bytes.

At its core, a type is a label that defines a set of values (e.g., int represents {…, -1, 0, 1, …}) and some operations allowed on those values (e.g., +, -, *).

A language’s type system is the set of rules that assigns types to variables and expressions. Its primary goal is to identify type errors – trying to perform an operation on a value it doesn’t support (e.g., "hello" / 5).

The central question is: When and how are these rules checked?

Static vs. Dynamic Typing: The Great Divide

This is the most fundamental distinction in type systems. It’s all about when type checking occurs.

Static Typing

Type checking is performed on the AST directly, prior to execution. This most commonly happens in compiled language, where we can perform type checks at compile-time, before the code is ever run, but it’s important to say that interpreted languages can also technically perform static type checking.

Consider this Java code for example:

// Variable 'x' is statically typed as an integer.
int x = 10;

// This is a COMPILE-TIME TYPE ERROR.
// Notice it's not a syntax error; the assignment could have
// been valid if x were declared with a different type.
x = "hello";

Dynamic Typing

Type checking is performed at runtime, when the code is actually executed. Expression nodes in the AST return a generic Value that can be any type.

Here is an example in Python:

# 'x' is just a name. At first, it points to an integer.
x = 10

# This is perfectly fine.
# 'x' now points to a string. The type is on the value.
x = "hello"

# This is a RUNTIME TYPE ERROR.
# The program runs, but will crash on this line
# when the interpreter tries to divide a string.
print(x / 2)

Pros and cons

Advantages of static typing:

Advantages of dynamic typing:

Cons:

Strong vs Weak typing

You may read things about a language being “strongly” or “strictly” typed, vs being “weakly” typed. This is a classical distinction in programming languages, but can also be kind of confusing because of some language features language features (such as function overloading and dynamic dispatch) that did not exist when these terms were defined.

In modern terminology, strong or strict typing typically means two things:

So the C++ language, even though it has static type declarations, would be considered weakly typed because the language performs all kinds of automatic type promotion (e.g. from int to double, or from almost anything to bool), and because the programmer can cast any pointer to any other type and mess with the underlying byte representation. Javascript is also weakly typed because of the many automatic conversions performed.

On the other hand, Java and Python are both considered strongly typed, because they don’t perform automatic conversion and don’t allow access to the underlying byte representation of a value.

Note that this disctinction is orthogonal to static vs dynamic typing, or to compiled vs interpreted langauges. In fact, you can have all four combinations:

Explicit vs implicit/inferred types

The most easily-recognizable aspect of a type system is whether types need to be explicitly declared in the code.

For example, in Java you would write something like

int x = 15;

whereas in Scheme, you just say

(define x 15)

So clearly Java requires explicit type declaration, whereas Scheme does not. This makes sense, since Scheme is a dynamically-typed language where the same variable name can be assigned to a value of any type.

But some statically-typed languages also allow this. For example, in Rust, you can say

let mut x = 15

and the rust compiler infers that the type of x should be i32. But this is not the same as dynamic typing! In the same Rust program, if we later tried to do something like

x = 3.5

it would be a compile-time type error, since 3.5 has a different type that is incompatible with the i32 type that the compiler inferred for variable x.

And in modern Python with type annotations, we can explicitly declare types even though the language is dynamically-typed! So the point is, although traditionally explicit declared types are a component of statically-typed languages, this is not necessarily the case.

Implementing dynamic typing

How does an interpreter like Python’s know x is an int one moment and a str the next? It stores the value as a pair of some kind of tag for the type itself, and then the actual data with that type.

Conceptual C implementation:

// Make global constants for all possible types
const int TYPE_STRING = 1;
const int TYPE_BOOL = 2;

// Declare a union type for the data.
// This is like a struct, except it can only hold
// one of the named fields.
typedef union {
    char* as_string;
    bool as_bool;
} ValueData;

// Now we have a struct to hold the tag and the data
typedef struct {
    int tag;
    ValueData data;
} Value;

// To create a Value object, we have to set the tag
// and the data appropriately
Value bool_to_value(bool b) {
    Value result;
    result.tag = TYPE_BOOL;
    result.data.as_bool = b;
    return result;
}

Value string_to_value(char* s) { /* similar idea */ }

// To convert from a generic Value to a specific
// type, we have to check the type tag first.
char* value_to_string(Value val) {
    if (val.tag != TYPE_STRING) {
        fprintf(stderr, "ERROR: expected string, got type %d\n", val.tag);
        abort();
    }
    else {
        return val.data.as_string;
    }
}

bool value_to_bool(Value val) { /* similar idea */ }

So if this underlying implementation were used for, say, a Python interpreter, then executing a program like

x = "hello"
y = False
print(x + y)

would entail a few things:

  1. Bind the name "x" to a tagged Value with TYPE_STRING
  2. Bind the name "y" to a tagged Value with TYPE_BOOL
  3. Lookup the value for name "x", and check the tag.
  4. The tag is TYPE_STRING, so we will do string concatenation with the right-hand side of the + operator.
  5. Look up the value for "y" and check the tag.
  6. The tag TYPE_BOOL is not compatible with string concatenation, so it raises a TypeError.

Java implementation

In your starter code for Lab 4.2, we gave you the following definition of a Value class:

public interface Value {
    default String str() {
        return Errors.error(String.format("Value type error: Expected string, got %s", toString()));
    }

    default boolean bool() {
        return Errors.error(String.format("Value type error: Expected boolean, got %s", toString()));
    }

    record Str(String value) implements Value {
        @Override
        public String str() { return value; }

        @Override
        public String toString() { return value; }
    }

    record Bool(Boolean value) implements Value {
        @Override
        public boolean bool() { return value; }

        @Override
        public String toString() {
            return value ? "True" : "False";
        }
    }
}

How it works is:

This looks different from the C implementation above, but it’s actually exactly the same idea, just written differently in Java:

Important takeaway: Any way you do it, this checking is the source of the runtime overhead in dynamically typed languages. By contrast, a statically-typed language would already know the types at run-time because they were checked already, eliminating many of these steps at run-time and making the code run faster.