Crafting Interpreters
Crafting Interpreters is a book by Robert Nystrom about programming language design and how to build your own full feature scripting language.
Why Learn About Programming Language Design?
- You may need to write a DSL to fit your specific needs.
- Great test of programming skill
- It’s empowering to learn how programming languages work under the hood and build your own programming language
Interpreting a Programing Language at a high level
Starting from the source code - a raw collection of text characters - and ending in machine code that can be run by a computer:
- Scanning (also lexing or lexical analysis). Transforms a collection of characters into tokens that are significant to the programming language.
- Parsing. Parses the tokens into a tree structure called Abstract Syntax Tree (AST) in an attempt to have the source code fit into the grammar defined by the programming language. If the parser fails to have the source code fit the grammar it reports one or several syntax errors.
- Static Analysis. During this phase we do variable resolution (binding). For each identifier we try to find where it is defined and whether it is available in the current scope. If the language is statically typed we do type checking and may throw type errors. Having obtained more information about the semantics of the code (what it means) we store it back in the AST, in a symbol table or an intermediate representation.
- Optimizations. When an interpreter or compiler understands the semantics of the code it is free to optimize it by changing the code for a more efficient alternative that performs the same task (has the same semantics). A common and simple example is constant folding, where a calculation is folded into a single constant value.
- Code generation. The next step of the process is to generate code that a machien can run. This can be bytecode that can run on a virtual machine or machine code that runs of a specific chip.
- Runtime. The last step is to run the actual code. In order to provide a good developer experience there will be a number of services that will be running alongside the program that is executed (e.g. garbage collection). These services are normally referred to as “the language runtime”.
# Starting with a raw collection of characters
const x = (y * 2)/ 10;
# 1. Scanning breaks these into tokens
[const] [x] [=] [(] [y] [*] [2] [)] [/] [10] [;]
# 2. Parsing transforms the tokens into an Abstract syntax tree
# according to the grammar of the programming language.
[x]
|
[/]
/ \
[*] [10]
/ \
[y] [2]
Parts of a compiler
One can think of a compiler as a pipeline with different parts:
- front-end: specific to the source language the code is written in
- back-end: tied to the final architecture where the program will run
- mid-end: an interface between the two above that works on an intermediate representation. This allows you to support multiple source languages and target platforms.
Compilers vs Interpreters
- A compiler transforms a source language into another form, normally at a lower level. A compiler doesn’t execute code, it only transforms it. (e.g. The TypeScript compiler)
- An interpreter consumes source code and executes it immediately. (e.g. Ignition, the V8 JavaScript interpreter)
The Lox language spec (ish)
But I’ve looked at a lot of code written in prototypal languages—including some of my own devising. Do you know what people generally do with all of the power and flexibility of prototypes? … They use them to reinvent classes.
I don’t know why that is, but people naturally seem to prefer a class-based (Classic? Classy?) style. Prototypes are simpler in the language, but they seem to accomplish that only by pushing the complexity onto the user. So, for Lox, we’ll save our users the trouble and bake classes right in.
- Robert Nystrom about prototypical inheritance
Tree Walk Interpreter
See https://github.com/Vintharas/tslox for an implementation in TypeScript.
Scanner
The Scanner or lexer turns a the raw source code comprised of a series of characters and turns them into tokens that are meaningful in the context of the programming language’s grammar.
Interesting Resources
- Crafting interpreters on Github
- A TypeScript implementation of Lox
- Lox implementations in other languages than Java or C
- The Curry-Howard isomorphism
- The Dragon Book
- Structure and Interpretation of Computer Programs
- Yacc - Yet another compiler-compiler.
- List of languages that compile to JavaScript
- Thoughts
- TypeScript as an open source example of a compiler that can be used as inspiration.
- The next 700 programming languages
Written by Jaime González García , dad, husband, software engineer, ux designer, amateur pixel artist, tinkerer and master of the arcane arts. You can also find him on Twitter jabbering about random stuff.