Revisiting "Let's Build a Compiler"

eli.thegreenplace.net

270 points by cui a day ago


ernst_klim - a day ago

> Rather than getting stuck in front-end minutiae, the tutorial goes straight to generating working assembly code, from very early on.

I think this is important and for a more sophisticated compiler design I find Ghuloum approach very appealing [1]. I.e. build a very simple subset of the language from top to bottom and then grow the meat gradually.

The really great book following this approach I've discovered recently was [2]. Although I find both C and x86 not the best targets for your first compiler, still a very good book for writing your first compiler.

[1] http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf

[2] https://norasandler.com/2024/08/20/The-Book-Is-Here.html

statictype - a day ago

This article sums it up perfectly. I was interested in building a compiler long before going to college and this was the most accessible body of work.

Building a recursive descent parser from scratch was an eye opener to 17yo me on how a seemingly very complex problem that I had no idea how to approach can be made simple by breaking it down into the right primitives.

pansa2 - a day ago

> Jack Crenshaw's tutorial takes the syntax-directed translation approach, where code is emitted while parsing, without having to divide the compiler into explicit phases with IRs.

Is "syntax-directed translation" just another term for a single-pass compiler, e.g. as used by Lua (albeit to generate bytecode instead of assembly / machine code)? Or is it something more specific?

> in the latter parts of the tutorial it starts showing its limitations. Especially once we get to types [...] it's easy to generate working code; it's just not easy to generate optimal code

So, using a single-pass compiler for a statically-typed language makes it difficult to apply type-based optimizations. (Of course, Lua sidesteps this problem because the language is dynamically typed.)

Are there any other downsides? Does single-pass compilation also restrict the level of type checking that can be performed?

userbinator - 15 hours ago

Crenshaw's tutorial doesn't quite reach the point where it becomes useful, but precedence-climbing is a rather straightforward refactoring of recursive descent that becomes extremely useful when there are many precedence levels.

Naturally, the author has also written an article about it, albeit without the explanation of the refactoring that links it to RD:

https://eli.thegreenplace.net/2012/08/02/parsing-expressions...

dang - a day ago

Revisiting "Let's Build a Compiler" threads:

Let's Build a Compiler (1988) - https://news.ycombinator.com/item?id=38773049 - Dec 2023 (15 comments)

Let's Build a Compiler (1988) - https://news.ycombinator.com/item?id=36054416 - May 2023 (19 comments)

Let’s Build a Compiler (1995) - https://news.ycombinator.com/item?id=22346532 - Feb 2020 (41 comments)

Let's Build a Compiler - https://news.ycombinator.com/item?id=20444474 - July 2019 (47 comments)

Let's Build a Compiler (1995) - https://news.ycombinator.com/item?id=19890918 - May 2019 (18 comments)

Let’s Build a Compiler (1995) - https://news.ycombinator.com/item?id=6641117 - Oct 2013 (56 comments)

Let’s Build a Compiler (1995) - https://news.ycombinator.com/item?id=1727004 - Sept 2010 (17 comments)

Let’s Build a Compiler (1995) - https://news.ycombinator.com/item?id=232024 - June 2008 (5 comments and already complaining about reposts)

Let's build a compiler (dated, but very good) - https://news.ycombinator.com/item?id=63004 - Oct 2007 (2 comments)

It seems there aren't any (interesting) others? I expected more.

But there is this bonus:

An Interview with Jack Crenshaw, Author of the “Let’s Build a Compiler” - https://news.ycombinator.com/item?id=9502977 - May 2015 (0 comments, but good article!)

muth02446 - a day ago

Shamless plug: http://cwerg.org

Pros: * uses Python and recursive descent parsing * separates front and backend via an IR * generates ELF binaries (either x86 or ARM) * meant for real world use

Cons: * more complex * not written in a tutorial style

pjmlp - a day ago

This brings back memories, I got to learn from Jack Crenshaw's tutorial from comp.compilers newsgroup on USENET.

Which by the way, it is still active, https://compilers.iecc.com/index.phtml

hashtag-til - a day ago

For modern compiler and a more direct approach I recommend https://www.cs.cornell.edu/~asampson/blog/llvm.html

bollu - a day ago

Having similar reasoning, I would up writing a tiny-optimizing-compiler tutorial that only explains how to write a middle and back end of a compiler: https://github.com/bollu/tiny-optimising-compiler

HumblyTossed - a day ago

I printed this out (so I could have it with me everywhere) and read it when I was younger. It was so cool to see it come together so quickly. Some of these works (this one, Beej's guides, etc) are some of the best CS documentation we have and don't get nearly the credit they deserve.

shoo - a day ago

> Rather than getting stuck in front-end minutiae, the tutorial goes straight to generating working assembly code, from very early on

Good summary.

I had no background in compilers or related theory but read Jack Crenshaw's Let's Build a Compiler tutorials some time ago. My main take away from reading half a dozen or so of these tutorials was that building a simple compiler for a toy language was a small project that was well within my grasp and ability, not a huge undertaking that required mastery of esoteric pre-requisites or a large amount of planning.

I got a lot of enjoyment messing about with toy compiler projects related to Brainfuck.

Why Brainfuck? It's a beautiful little toy language. Brainfuck has 8 instructions, each instruction is 1 character, so parsing reduces to getting a char and switching on it. I guess it depends on what you want to explore. If you want to focus on writing recursive descent parsers, not the best choice!

One initial project could be to compile (transpile) from Brainfuck source to C source. You can do this as a source to source compiler without any internal representation by transforming each Brainfuck operation to a corresponding C statement. Brainfuck is specified in terms of a single fixed length array of bytes, and a pointer - an index into that array - that can be moved around, and basic manipulations of the byte it is pointing it. So on the C side you need two variables: one for the array and a second, an index for the pointer.

A second project could be compiling from Brainfuck to assembly language, skipping C. You'd need to read a few tutorials/reference docs about your chosen assembly language and learn how to run the assembler to compile tiny assembly programs into native executables. You could explore some examples of what output assembly programs you get when you compile small Brainfuck programs to C and then compile those C programs to assembly. You could write a direct source to source compiler without an internal representation, where each Brainfuck operation is directly mapped to a snippet of assembly instructions. Once you've got this working, you can compile a Brainfuck program into an assembly program, and then use the usual toolchain to assemble that into a native executable and run it.

There's also lots of projects in another direction, treating Brainfuck as the target language. Imagine that your job is to write Brainfuck programs for a CPU that natively executes Brainfuck. Try writing a few tiny Brainfuck programs by hand and savour how trying to do almost anything involves solving horrible little puzzles. Maybe it'd be much easier to do your job if you, the Brainfuck programmer, didn't have to manually track which index of the array is used to store what. You could invent a higher level language supporting concepts like local variables, where you could add two local variables together and store the results in a third local variable! Maybe you could allow the programmer to define and call their own functions! Maybe you could support `if` blocks, comparisons! You could have a compiler that manages the book-keeping of memory allocation and mapping complex high level abstractions such as integer addition into native Brainfuck concepts of adding one to things and moving left or right. Projects in this direction let you explore more stuff about parsers (the input syntax for your higher level language is richer), internal representations, scopes and so on.

anthk - a day ago

https://t3x.org has several examples on creating a C, Lisp and several more.

The T3X language it's very Pascal like and fun to use (and portable: DOS/Win/Unix/CPM...).

Also, as an intro, with Zenlisp you can get a mini CS-101 a la SICP or CACS but simpler and explained in a much easier way.

azhenley - a day ago

I loved that tutorial! It got me started down this path.

anthk - a day ago

The easiest example: an interpreter for the Subleq VM. One instruction. Literal three or four lines, three more (if any) for I/O.

https://github.com/howerj/subleq/

As a goodie you can run Eforth on top which almost writtes itself. Compiler, interpreter, editor, IDE and a Sokoban, all in a simple VM.

Let's scale. Mu808/n808. Interpreters in C and AWK, a compiler in Python.

https://codeberg.org/luxferre/n808

You have the exact assembly algorithm in the page. What you see it's what you get. Now, for real, I'd suggest getting lvltl (VTL-02) interpreter written in C for a "bigger" language running not just under a VM, but for small machines and simulators such as the 6502 based Kim-1 and Apple1. With that "not enough to be called Basic" a Sokoban might be written with a bit of patience.