This course will offer an introduction to programming language semantics through implementation and formal models. It will emphasize functional programming and cover the key features of modern functional languages (including type systems and memory management), along with some ideas from object-oriented and other programming styles.
Instructor: Greg Cooper (firstname.lastname@example.org)
Greg received a Ph.D. in computer science from Brown University in 2008, advised by Shriram Krishnamurthi. In real life he works as a software engineer at Google.
This post contains the problem statements for the first homework, which is due on Monday, January 19, 2015 at 11:59pm. Please attempt as much as you can before next Thursday’s lecture.
In this assignment, you’ll build parsing utilities that you’ll be able to reuse throughout the course and develop an evaluator for a simple language. The first step is to install Haskell and set up your development environment if you haven’t done so already. See the main course page for relevant links about doing so.
To simplify parsing, we’ll use S-expressions as the surface syntax for our language. For our purposes, S-expressions have three variants:
- numbers (integers)
- identifiers (which may contain letters, digits, and symbols, but cannot start with a digit or a single dash followed by a digit)
- lists of S-expressions
Examples (in concrete syntax):
- Numbers: 123, -75
- Identifiers: +, -, *, let, a-multi-word-id
- Lists: (* 2 (+ -3 4)), (if (> 1 2) -5 10)
Note how they differ from Haskell lists:
- Syntactically: the separator can be any whitespace, and parentheses mark the start and end of a list. (Actually, our implementation will allow other types of braces, since that’s often useful.)
- Structurally: they contain a mixture of numbers, identifiers, and lists (which can be nested).
We’ll decompose the problem of building S-expressions from strings into two steps: tokenizing and parsing. The first step turns a string into a list of tokens. The second turns the list of tokens into an S-expression (a tree), or returns an error if the input does not represent a valid S-expression. We’ll also support comments (any characters starting with a semicolon and going until the end of a line) by stripping them out, along with all whitespace, in the tokenizing step.
The next step is to try to convert a list of tokens into an S-expression. We’ve defined a type, SExp, and given you a start on parseSExp.
- A number token becomes a number S-expression.
- An identifier token becomes an identifier S-expression.
- An open brace begins a list S-expression. A matching close brace terminates the list. In between there may be an arbitrary number of S-expressions of any variant.
Not every list of tokens corresponds to a valid S-expression. We have defined a type called Resultto distinguish success from failure without having to abort evaluation via error. This lets you test your code’s behavior on invalid inputs much more easily. If the list parses, your code should return
Ok (sexp, remaining tokens)
Otherwise, it should return
Interp.hs defines an abstract syntax tree for a little language with numbers, booleans, a few built-in binary operations, and a conditional form. The final parsing step is to transform an S-expression into an Expr tree representing an expression in this language. (The comments above the Expr data definition show what the concrete S-expression syntax for each variant should look like.)
Now that we have a way of building abstract syntax trees, we can turn our attention to semantics and implement our first evaluator.
For if expressions: if the first sub-expression (the condition) evaluates to true, then the result is determined by evaluating the second sub-expression (the consequent). If the first thing evaluates to false, the result is what you get from evaluating the third sub-expression (the alternative).
λ> text <- readFile “<path to file>”
In this homework, we’ll build an interpreter for a language with functions, then add mutable state to it. We’ll support a fairly rich surface syntax and use desugaring to translate down to a small core abstract syntax, with an initial environment that defines a collection of built-in constants and primitive operations.
Problem 1. Download SExp.hs, Result.hs, and Expr.hs. (The file Token.hs is the same as in Homework 1.) Implement parseExpr, desugar, and checkIds in Expr.hs. You can probably reuse some parts of your parseExpr from Homework 1.
An example expression using boxes:
(with* ([b (box 0)]
[inc! (fun (n) (set-box! b (+ n (unbox b))))]
[_ (inc! 5)]
[_ (inc! 3)])
(should evaluate to 8)
Handin mechanics. Submit your modified Expr.hs, InterpFun.hs, and InterpStore.hs in the dropbox for Homework 2. Don’t forget to test your code.
This will give you practice using continuations (represented as data structures) to implement non-local control features. Start by downloading new versions of Token.hs, Result.hs, SExp.hs, and Expr.hs, along with a new file, InterpCont.hs.
The language grammar is identical to that of Homework 2, and we’ve provided implementations of parseExpr, desugar, and checkIds. So that we’ll have a richer language to play with (and in preparation for studying type-checking), we’ll add a few new types of data: strings, lists, and pairs. We’ve already added the new variants of Val for you:
StringV String — a string constant
EmptyV — an empty list
ConsV Val Val — a non-empty list (first arg can be of any type, second must be a list)
PairV Val Val — a pair: both args may be of any type
Complete the definition of the Cont datatype and the implementations of the functions apply, callK, and interp; apply should work for either a user-defined function (FunV) or a built-in (PrimV). (We’ll refer to such values as applicable later.) It should raise an error when its first argument is any other type of value.
Complete the implementations of the following data constructors, recognizers, and selectors:
empty? (defined as emptyP from Haskell) recognizes the empty list (which is written as empty in our language). If given an EmptyV, it should return true; otherwise false.
cons constructs a non-empty linked list (a ConsV), much like Haskell’s (:) operator does. It takes two arguments: the first may be any value, but the second must be a list (either a ConsV or an EmptyV), or else it should raise an error (by calling handleError with a StringV describing the error).
cons? (defined as consP from Haskell) recognizes non-empty lists. If given a ConsV, it should return true; otherwise false (even if the argument is not a list).
first selects the first element of a non-empty list (a ConsV). If given anything else, it should raise an error.
rest selects the rest of a non-empty list (a ConsV). If given anything else, it should raise an error.
pair constructs a pair. It takes two arguments, each of which may be of any type of value (and the two may be of different types).
fst (defined as pairFst to avoid conflicting with the Prelude) selects the first element of a pair.
snd (defined as pairSnd to avoid conflicting with the Prelude) selects the second element of a pair.
Implement the following non-local control operators:
a. raise raises an error by calling handleError with the current continuation and its argument.
call-with-handler should apply thunk to an arbitrary value (e.g., the empty list). If no errors occur, then the result of the whole expression is the value returned by thunk. However, if an error is raised, handler should be called with the error in the call-with-handler expression’s continuation. For example:
(fun (err) (pair “caught” err))))
should evaluate to: (“caught”, “bad problem!”).
; thunk (just returns a function)
; handler (not called)
; we’ve left the context of the call-with-handler, so no handler here …
should evaluate to an error.
Error handlers may be nested, in which case the handler in the inner context takes precedence. A handler may decide to re-raise an error (or raise a new error), in which case it would propagate to the next outer handler. If no handler is present, the whole evaluation results in an error.
Hint: you’ll want to add a new variant to the Cont datatype, and you’ll want to revise the definition of handleError.
c. call-with-context takes two arguments. The first (call it ctx) may be any value, and the second must be an applicable value (function or primitive operator, call it thunk); call-with-context augments the global context with ctx for the evaluation of thunk (again, applied to an arbitrary value). This context may be read by calling get-context (described below).
The result of the whole expression is the value returned by thunk. When control escapes from the call to call-with-context, whether by ordinary return, raising of an error, or a continuation jump,ctx should “pop” out of the global context.
Hint: you’ll again want to add a new variant of the Cont datatype.
d. get-context takes one argument (which it ignores), reads the global context, and returns a list containing all the values that have been “pushed” by call-with-context, starting with the innermost. For example:
(get-context 12345) => empty
(pair (call-with-context “context” (fun (_) “hi”))
; we’ve left the call-with-context, so the context is empty again …
=> (“hi”, empty)
e. call/cc (“call with current continuation”) captures the current continuation, wraps it as a value (you can use PrimV for this) and applies its argument (which must be applicable) to that continuation. As discussed in lecture, you can think of call/cc as a more general, desugared version of let/cc. Calling the continuation should “return” the argument to the point in the control flow where the continuation was captured.
(let/cc k e) <=> (call/cc (fun (k) e))
See the slides/lectures from weeks 5 and 6 for interesting examples of how continuations can be used.
For example, here’s a slightly modified version of the Fibonacci example (end of lecture 5 slides) translated into the syntax our parser understands:
This should evaluate to:
Ok (6765, <primitive: continuation>)
When you’re finished, hand in your modified version of InterpCont.hs.
In this homework, we’ll implement a type-checker for a language with functions and explicit universal quantification over types (aka, parametric polymorphism, generic types). The syntax is similar to that of the languages in the previous two assignments, except that we’ve added type annotations on function arguments and a notation for abstracting over types and instantiating parametric types. We’ve provided datatype definitions and parsers for this language. You should start by downloading Result.hs, Token.hs, SExp.hs, Expr.hs, and TypeCheck.hs.
The new concrete syntax is as follows:
<t> ::= num | bool | string — basic types
| (<t> , <t>) — pairs
| (list <t>) — lists
| (<t> -> <t>) — functions
| (forall a . <t>) — universally quantified types
— Expressions (surface syntax).
<e> ::= … — previous expression types (note below how function syntax has changed)
| (fun ([x : <t>] …) <e>) — types of function arguments must be declared
| (forall (a …) <t>) — expression with universal quantification over type variables a …
| (<e> < <t> … >) — instantiation of a polymorphic expression with specific types <t> …
(fun ([x : num]) (+ x 3))
(with* ([id (forall (a) (fun ([x : a]) x))])
((pair <num bool>) ((id <num>) 3) ((id <bool>) true)))
Problem 1. In Expr.hs, complete the implementation of the function `alphaEquiv`, which determines whether two types (possibly involving universal quantification) are equivalent up to alpha-renaming of bound type variables. The third argument, `typeVarMap`, is a stack of correspondences of type variables in the two types. (So, when comparing two “forall” types, you should push a pair with their respective type variables onto `typeVarMap` for the recursive comparison of the bodies.)
— Free type variables are compared for simple equality.
alphaEquiv (VarT “c”) (VarT “c”)  ==> True
alphaEquiv (VarT “d”) (VarT “e”)  ==> False
alphaEquiv (ForAllT “a” (ArrowT (VarT “a”) (VarT “c”))
(ForAllT “b” (ArrowT (VarT “b”) (VarT “c”)))  ==> True
(ArrowT (VarT “b”) (PairT (VarT “a”) (VarT “b”))))))
(ForAllT “b” (ForAllT “c” (ArrowT (VarT “b”)
(ArrowT (VarT “c”) (PairT (VarT “b”) (VarT “c”))))))  ==> True
alphaEquiv (ForAllT “a” (ForAllT “b” (ArrowT (VarT “a”)
(ArrowT (VarT “b”) (PairT (VarT “a”) (VarT “b”))))))
(ForAllT “b” (ForAllT “c” (ArrowT (VarT “c”)
(ArrowT (VarT “b”) (PairT (VarT “b”) (VarT “c”))))))  ==> False
(ArrowT (VarT “b”) (PairT (VarT “a”) (VarT “b”))))))
(ForAllT “b” (ForAllT “a” (ArrowT (VarT “a”)
(ArrowT (VarT “b”) (PairT (VarT “a”) (VarT “b”))))))  ==> False
— Alpha equivalence is not unification.
alphaEquiv (PairT (VarT “a”) NumT) (PairT BoolT (VarT “b”))  ==> False
— A tricky example that you’ll want to make sure your code handles correctly.
alphaEquiv (ForAllT “a” (ForAllT “b” (PairT (VarT “a”) (VarT “b”))))
(ForAllT “a” (ForAllT “a” (PairT (VarT “a”) (VarT “a”))))  ==> False
Problem 2. In TypeCheck.hs, complete the implementation of the function `freeTypeVars`, which returns all of the type variables that occur free in a given type. (Its second argument, `bound`, is the set of bound type variables, which grows when we enter the body of a “forall” type expression.)
freeTypeVars (ArrowT (VarT “a”) BoolT)  ==> [“a”] freeTypeVars (ForAllT “a” (PairT (VarT “b”) (VarT “a”)))  ==> [“b”] freeTypeVars (ForAllT “a” (PairT NumT (VarT “a”)))  ==> 
It’s fine if your implementation returns a list with duplicate elements (when a variable occurs more than once).
Problem 3. In TypeCheck.hs, complete the implementation of `alphaRename`, which renames a given type variable within a type expression. The function should rename all occurrences of the variable, including binding occurrences.
alphaRename “a” “b” (ListT (VarT “a”)) ==> ListT (VarT “b”)
alphaRename “a” “b” (ForAllT “a” (PairT (VarT “a”) BoolT)) ==> ForAllT “b” (PairT (VarT “b”) BoolT)
Problem 4. In TypeCheck.hs, complete the implementation of `subst`, which subtitutes a type for a type variable in another type and returns the result. The substitution must be capture-avoiding, as discussed in lectures 2 and 6. You’ll want to use the implementations of `freeTypeVars` and `alphaRename`. You’ll also need the provided functions `allTypeVars` (which returns all type variables in an expression) and `genFreshVar` (which generates a name that’s distinct from everything in a list).
subst “a” NumT (ArrowT (VarT “a”) (VarT “a”)) ==> ArrowT NumT NumT
subst “b” (PairT NumT BoolT) (ListT (VarT “b”)) ==> ListT (PairT NumT BoolT)
subst “a” NumT (PairT (VarT “a”) (ForAllT “a” (ArrowT (VarT “a”) (VarT “a”)))) ==>
PairT NumT (ForAllT “a” (ArrowT (VarT “a”) (VarT “a”)))
subst “a” (PairT BoolT (VarT “b”)) (ForAllT “b” (ArrowT (VarT “a”) (VarT “b”))) ==>
ForAllT “c” (ArrowT (PairT BoolT (VarT “b”)) (VarT “c”))
In the last two examples, note how (1) we don’t susbstitute inside a ForAllT whose variable shadows the variable we’re substituting and (2) we alpha-rename (b to c, in this case) to avoid capture of free type variables in the substituted type (the b in (bool, b) in the last example).
Problem 5. In TypeCheck.hs, complete the implementation of `checkType`, which checks the type of an expression in a given context. The context (i.e., “gamma” from type inference rules) is a pair containing (1) a collection of bound type variables and (2) bindings of program variables to types. When checking a type annotation in the source, you’ll want to make use of another helper we’ve defined called `checkClosed`, which checks that a type expression contains no free type variables (in a given context). We’ve also provided an `initialEnv` with primitives from previous assignments, including polymorphic bindings for list and pair operations and a fixed-point combinator.
A few more examples:
1 :: num
(+ 2 4) :: num
(fun ([x : num]) (= x 5)) :: (num -> bool)
— Self-application of identity function:
(with* ([id (forall (a) (fun ([y : a]) y))] [selfapp (fun ([x : (forall a.(a -> a))])
(selfapp id)) :: (forall b.(b -> b))
— Addition and multiplication of Church numerals.
(with* ([two (forall (a)
(((mult <num>) ((add <num>) (two <num>) (two <num>))
Problem 6. In Expr.hs, complete the implementation of `erase`, which erases type information and finishes desugaring a DExpr to a CExpr (e.g., rewriting WithD expressions to a combination of AppC & FunC). (The resulting expression could be evaluated with the interpreter you wrote in homework 3.)
When you’re done, hand in your modified copies of Expr.hs and TypeCheck.hs.
The course will use the Haskell programming language, which is freely available for recent versions of Windows, Mac OS, and Linux. The instructor uses haskell-mode for Emacs, but many other options are available for non-Emacs users.
haskell.org – the language’s homepage (docs, extensive wiki)
Learn You a Haskell – a free, popular online guide to Haskell
Real World Haskell – a free online book about Haskell focusing on real applications
The Haskell wikibook – very complete (and of course it’s free and online)
Approximate Schedule of Lecture Topics
- Course intro. Programming in Haskell. Syntax; parsing; evaluation.
- Variables; substitution; scope and environments. Functions. Evaluation strategies. (slides)
- Mutation. Monads. (slides)
- Untyped lambda calculus. Church encodings. Y-combinator. (slides)
- Continuations. (slides)
- Continuations, part 2; formal semantics (big-step). (slides)
- Small-step operational semantics. Simply-typed lambda calculus. (slides)
- Explicit parametric polymorphism (System F). Unification, Hindley-Milner type inference. Type soundness. (slides)
- Curry-Howard Correspondence. Recursive types. Subtyping. (slides)
- Something fun. (slides)