Summary
- The concepts of type, type system, and type errors
- Static vs dynamic typing
- Strong vs weak typing
- Implementing dynamic typing with tagged pairs
What is a “Type”?
You have all been writing code with types for years now, and in this semester we have also been writing compilers and interpreters that deal with types and type checking. But what does that actually mean?
At the hardware level, you know that all data in the computer is just bits and bytes. The concept of types was introduced very early in the history of programming languages, to help the programmer keep track of the semantic meaning of these bits and bytes.
At its core, a type is a label that defines a set of values
(e.g., int represents {…, -1, 0, 1, …}) and some
operations allowed on those values (e.g., +, -, *).
A language’s type system is the set of rules that assigns types to variables and expressions. Its primary goal is to identify type errors – trying to perform an operation on a value it doesn’t support (e.g., "hello" / 5).
The central question is: When and how are these rules checked?
Static vs. Dynamic Typing: The Great Divide
This is the most fundamental distinction in type systems. It’s all about when type checking occurs.
Static Typing
Type checking is performed on the AST directly, prior to execution. This most commonly happens in compiled language, where we can perform type checks at compile-time, before the code is ever run, but it’s important to say that interpreted languages can also technically perform static type checking.
- Each expression node in the AST is annotated with the type that that expression returns.
- Variable names are bound to types, and these types cannot change.
- Function signatures (parameter types and return type) are also determined prior to execution, and checked against actual argument types.
- Later, at runtime, there is no type checking, since it already happened.
- Examples: Java, C, C++, Rust, Go, Haskell, Swift.
Consider this Java code for example:
// Variable 'x' is statically typed as an integer.
int x = 10;
// This is a COMPILE-TIME TYPE ERROR.
// Notice it's not a syntax error; the assignment could have
// been valid if x were declared with a different type.
x = "hello";
Dynamic Typing
Type checking is performed at runtime, when the code is actually executed. Expression nodes in the AST return a generic Value that can be any type.
- The interpreter checks types as the code is executing.
- Variables are just labels for values; the value has a type, but the variable can hold any type.
- Examples: Scheme, Python, JavaScript, Ruby, PHP, Lisp.
Here is an example in Python:
# 'x' is just a name. At first, it points to an integer.
x = 10
# This is perfectly fine.
# 'x' now points to a string. The type is on the value.
x = "hello"
# This is a RUNTIME TYPE ERROR.
# The program runs, but will crash on this line
# when the interpreter tries to divide a string.
print(x / 2)
Pros and cons
Advantages of static typing:
- Early Error Detection: Catches bugs before you even run the code. This is invaluable for large, complex systems.
- Performance: The compiler knows the type of every variable, so it can generate highly optimized machine code (e.g., it knows
a + bis an integer addition, not a string concatenation). - Tooling & Readability: Excellent IDE support (autocompletion, refactoring) and the code often serves as its own documentation.
Advantages of dynamic typing:
Cons:
- Flexibility and Speed of Development: Great for scripting and rapid prototyping. Less code to write.
- Expressiveness: Allows for powerful patterns more naturally.
- Simplicity: Code is less verbose and looks simpler. The programmer never has to deal writing complex nested type annotations
Strong vs Weak typing
You may read things about a language being “strongly” or “strictly” typed, vs being “weakly” typed. This is a classical distinction in programming languages, but can also be kind of confusing because of some language features language features (such as function overloading and dynamic dispatch) that did not exist when these terms were defined.
In modern terminology, strong or strict typing typically means two things:
- The compiler/interpreter does not implicitly convert values from one type to another.
- The programmer cannot access the underlying representation of a type in terms of bytes.
So the C++ language, even though it has static type declarations, would be
considered weakly typed because the language performs all kinds of
automatic type promotion (e.g. from int to double, or from almost
anything to bool), and because the programmer can cast any pointer to
any other type and mess with the underlying byte representation.
Javascript is also weakly typed because of the many automatic
conversions performed.
On the other hand, Java and Python are both considered strongly typed, because they don’t perform automatic conversion and don’t allow access to the underlying byte representation of a value.
Note that this disctinction is orthogonal to static vs dynamic typing, or to compiled vs interpreted langauges. In fact, you can have all four combinations:
- Static & Strong: Java, Rust, Haskell
- Static & Weak: C, C++
- Dynamic & Strong: Python, Ruby
- Dynamic & Weak: JavaScript, PHP
Explicit vs implicit/inferred types
The most easily-recognizable aspect of a type system is whether types need to be explicitly declared in the code.
For example, in Java you would write something like
int x = 15;
whereas in Scheme, you just say
(define x 15)
So clearly Java requires explicit type declaration, whereas Scheme does not. This makes sense, since Scheme is a dynamically-typed language where the same variable name can be assigned to a value of any type.
But some statically-typed languages also allow this. For example, in Rust, you can say
let mut x = 15
and the rust compiler infers that the type of x should be i32. But
this is not the same as dynamic typing! In the same Rust program, if we
later tried to do something like
x = 3.5
it would be a compile-time type error, since 3.5 has a different
type that is incompatible with the i32 type that the compiler inferred
for variable x.
And in modern Python with type annotations, we can explicitly declare types even though the language is dynamically-typed! So the point is, although traditionally explicit declared types are a component of statically-typed languages, this is not necessarily the case.
Implementing dynamic typing
How does an interpreter like Python’s know x is an int one moment and a str the next?
It stores the value as a pair of some kind of tag for the type
itself, and then the actual data with that type.
Conceptual C implementation:
// Make global constants for all possible types
const int TYPE_STRING = 1;
const int TYPE_BOOL = 2;
// Declare a union type for the data.
// This is like a struct, except it can only hold
// one of the named fields.
typedef union {
char* as_string;
bool as_bool;
} ValueData;
// Now we have a struct to hold the tag and the data
typedef struct {
int tag;
ValueData data;
} Value;
// To create a Value object, we have to set the tag
// and the data appropriately
Value bool_to_value(bool b) {
Value result;
result.tag = TYPE_BOOL;
result.data.as_bool = b;
return result;
}
Value string_to_value(char* s) { /* similar idea */ }
// To convert from a generic Value to a specific
// type, we have to check the type tag first.
char* value_to_string(Value val) {
if (val.tag != TYPE_STRING) {
fprintf(stderr, "ERROR: expected string, got type %d\n", val.tag);
abort();
}
else {
return val.data.as_string;
}
}
bool value_to_bool(Value val) { /* similar idea */ }
So if this underlying implementation were used for, say, a Python interpreter, then executing a program like
x = "hello"
y = False
print(x + y)
would entail a few things:
- Bind the name
"x"to a tagged Value withTYPE_STRING - Bind the name
"y"to a tagged Value withTYPE_BOOL - Lookup the value for name
"x", and check the tag. - The tag is
TYPE_STRING, so we will do string concatenation with the right-hand side of the+operator. - Look up the value for
"y"and check the tag. - The tag
TYPE_BOOLis not compatible with string concatenation, so it raises aTypeError.
Java implementation
In your starter code for Lab 4.2,
we gave you the following definition of a Value class:
public interface Value {
default String str() {
return Errors.error(String.format("Value type error: Expected string, got %s", toString()));
}
default boolean bool() {
return Errors.error(String.format("Value type error: Expected boolean, got %s", toString()));
}
record Str(String value) implements Value {
@Override
public String str() { return value; }
@Override
public String toString() { return value; }
}
record Bool(Boolean value) implements Value {
@Override
public boolean bool() { return value; }
@Override
public String toString() {
return value ? "True" : "False";
}
}
}
How it works is:
Valueis an interface which will have sub-classes for the different types of values in a running program.- The conversion methods from generic
Valueto specific, actual types, havedefaultmethods for them which just throw an error. - Each actual type (
Value.StrandValue.Boolhere) are implemented as sub-classes ofValue, which override one of the conversion methods for that specific type.
This looks different from the C implementation above, but it’s actually exactly the same idea, just written differently in Java:
- A
Valuemust be eitherValue.StrorValue.Bool, as those are the only two implementations of this interface. - The type tag is implicitly stored as the type of the
Valueobject, eitherValue.StrorValue.Bool. - The equivalent to calling the C function
bool_to_value(b)would be to call the constructorValue.Bool(b) - The equivalent to calling the C function
value_to_str(val)would be to call the methodval.str().
Important takeaway: Any way you do it, this checking is the source of the runtime overhead in dynamically typed languages. By contrast, a statically-typed language would already know the types at run-time because they were checked already, eliminating many of these steps at run-time and making the code run faster.