Writing An Interpreter In Rust

18 Apr, 2021

Overview

I built an interpreter for the Monkey programming language following Thorsten Ball’s book on Writing An Interpreter In Go. This was my first project after reading The Rust Programming Language, and it took roughly 30 hours.

Throughout the project, my confidence and skill using Rust improved significantly as I gained experience using many of the essential features of Rust:

Functions, enums, and structs.
Option and Result types.
Error handling best practices using the ? operator.
Standard library traits such as Iterator.
References, ownership, and the borrow checker.
Smart pointers such as Box, Rc and RefCell.

The book also provides source code in Go, which allowed me to focus my energy on translating the high level ideas into Rust. I found the book to be very approachable and engaging. I would recommend it to anyone wanting an introduction to interpreters or to anyone who wants an accessible Rust/Go project.

Demo

The interpreter has been compiled to WebAssembly and you can try it out below! Rust has excellent support for WebAssembly, the commit to compile to WebAssembly only took around 15 lines of code.

Input (editable):

Output:

Rust Highlights

Enums

Enums were particularly useful for defining Tokens, Expressions, and Objects. The Expression enum represents a node in the Abstract Syntax Tree. During evaluation, rather than chaining if/else statements or using some sort of lookup table, we can match on the enum to ensure sure we handle all possible enum variants compile time.

#[derive(Debug, PartialEq, Eq, Clone)]
pub enum Expr {
    Identifier(String),
    Integer(i32),
    Str(String),
    Boolean(bool),
    Array(Vec<Expr>),
    Hash(Vec<(Expr, Expr)>),
    Index(Box<Expr>, Box<Expr>),
    Prefix(Token, Box<Expr>),
    Infix(Box<Expr>, Token, Box<Expr>),
    ...
}

fn eval_expression(expression: &Expr, env: MutEnv) -> Result<Object> {
    match expression {
        Expr::Integer(i) => Ok(Object::Integer(*i)),
        Expr::Boolean(b) => Ok(Object::Boolean(*b)),
        Expr::Str(s) => Ok(Object::Str(s.clone())),
        Expr::Array(items) => Ok(Object::Array(eval_expressions(items, env)?)),
        Expr::Hash(items) => Ok(eval_hash_map(items, env)?),
        Expr::Index(left, index) => eval_index_expression(left, index, env),
        Expr::Prefix(operator, expression) => {
            eval_prefix_expression(operator, eval_expression(expression, env)?)
        }
        Expr::Infix(left, operator, right) => {
            let left = eval_expression(left, Rc::clone(&env))?;
            let right = eval_expression(right, env)?;
            eval_infix_expression(operator, left, right)
        } 
        ...
   }
}

Smart Pointers and Interior Mutability

Rust reference rules dictate that:

At any given time, you can have either one mutable reference or any number of immutable references.

This proves to be a challenge when creating function scopes. Each function needs to be able to access the environment in which they are defined. This can be accomplished by creating a new environment that holds a reference to the outer environment. However, since every environment must be mutable in exactly one scope, this reference violates Rust’s reference rules.

We can get around this restriction by using Rc and RefCell to allow multiple references with interior mutability. Our Environment is a recursive type and heap allocated smart pointers let us avoid dealing with lifetimes. This greatly simplifies the code and is an acceptable trade off for this project.

pub type MutEnv = Rc<RefCell<Environment>>;

#[derive(Debug, PartialEq, Eq, Clone)]
pub struct Environment {
    store: HashMap<String, Object>,
    outer: Option<MutEnv>,
}

impl Environment {
   pub fn get(&self, ident: &str) -> Option<Object> {
       match self.store.get(ident) {
           Some(object) => Some(object.clone()),
           None => self
               .outer
               .as_ref()
               .and_then(|outer| outer.borrow().get(ident)),
       }
   }
}

Functions environments now behave as expected.

let a = 1;
let a_fn = fn(){ a };
a      // output: 1
a_fn() // output: 1

// calling no_change does not affect the outer environment
let no_change = fn(){ let a = 2; a };
no_change() // output: 2
a           // output: 1
a_fn()      // output: 1

// f_fn holds a reference to the outer environment
// so changing a in the outer environment will change the output of a_fn()
let a = 3;
a      // output: 3
a_fn() // output: 3 as f_fn holds a reference to the outer environment

This behavior is typical of many programming languages. Here is an example in Python illustrating the same idea.

a = 1
def a_fn():
    return a
print(a)      # 1
print(a_fn()) # 1

def no_change():
    a = 2
    return a
print(no_change()) # 2
print(a)           # 1
print(a_fn())      # 1

a = 3
print(a)      # 3
print(a_fn()) # 3

#Rust #Interpreter #Monkey