Nanopass Framework: Clean Compiler Creation Language

127 points by NordStreamYacht 5 days ago

I built a nanopass-like compiler using Clojure (plus a pattern-matching library[1] that I created) to compile a subset of Python all the way down to bytecode for Untether AI's inference accelerator, a RISC ISA extended with an at-memory compute array. The python subset allowed full control of the array via type-inference instead of intrinsic ops.

I learned how to build a compiler using the nanopass approach and I still think it was a good way to learn, but I would not build another real compiler that way.

Building a lot of passes was great for initial development of the compilation pipeline, but was terrible for maintenance and relatively limited for creating optimizations. Bugfixes in an early pass would frequently require changes across several subsequent ones, especially if any change in representation were required.

About 8 months in, I adopted the a Sea of Nodes[2] approach for the optimizer, type inference, and scheduling but kept my nanopass ingestion, allocation and codegen passes. At the 12 month mark, the v1 compiler was functional but had limitations.

For the next leg of the project I decided to switch fully to Sea of Nodes. The final compiler was much better; orders of magnitude faster, more flexible, much easier to debug and test, more features, and about the same amount of code.

Sea of Nodes takes the core idea of isolating passes to its logical conclusion by narrowing the scope of each change down to a single transformation on a single graph node. Operations are worklist based and can happen in any order, eliminating the pass ordering problem. The IR is a graph that follows simple and consistent semantics from parsing through all phases down to code generation. Going from 20+ IR dialects/sub-dialects to a single graph representation was a huge plus.

[1] https://github.com/pangloss/pattern [2] https://github.com/SeaOfNodes/Simple

verdagon - 15 hours ago

I'm often skeptical of the desire to create a lot of passes. In the early Vale compiler, and in the Mojo compiler, we were paying a lot of interest on tech debt because features were put in the wrong pass. We often incurred more complexity trying to make a concept work across passes than we would have had in fewer, larger passes. I imagine this also has analogies to microservices in some way. Maybe other compiler people can weigh in here on the correct number/kind of passes.

munificent - 13 hours ago

I work on Dart. I don't work directly on the compilers much, but I've gathered from talking to my teammates who do that, yeah, generally fewer passes is better.
From a maintenance perspective, it's really appealing to have a lot of small clearly defined passes so that you can have good separation of concerns. But over time, they often end up needing to interact in complex ways anyway.
For example, you might think you can do lexical identifier resolution and type checking in separate passes. But then the language gets extension methods and now it's possible for a bare identifier inside a class declaration to refer to an extension method that can only be resolved once you have some static type information available.
Or maybe you want to do error reporting separately from resolution and type checking. But in practice, a lot of errors come from resolution failure, so the resolver has to do almost all of the work to report that error, stuff that resulting data somewhere the next pass can get to it, and then the error reporter looks for that and reports it.
Instead of nice clean separate passes, you sort of end up with one big tangled pass anyway, but architected poorly.
Also, the performance cost of multiple passes is not small. This is especially true if each pass is actually converting to a new representation.
- spankalee - 11 hours ago
  
  Hey - I used to be on your team long ago :)
  > For example, you might think you can do lexical identifier resolution and type checking in separate passes. But then the language gets extension methods and now it's possible for a bare identifier inside a class declaration to refer to an extension method that can only be resolved once you have some static type information available.
  Some of this is language design though. If you make it a requirement that scope analysis can be done in isolation (so that it's parallelizable), then you design things like imports and classes so that you never have identifiers depend on types or other libraries.
  I've been working on a language lately and thinking about passes - mostly where they run and what previous state they depend on - has been a big help in designing language so the compiler can be parallel, cache, and incremental friendly.
  - munificent - 6 hours ago
    
    > I used to be on your team long ago :)
    I miss getting to talk to you about music!
    > Some of this is language design though.
    Totally, but it's no fun to be stuck between users wanting some eminently useful feature and not being able to ship it because of your compiler architecture.
eholk - 10 hours ago

I used the Nano pass framework for my language when I was in grad school and loved it. I forget the exact count but I think we had 40 or 50 passes. Generally each pass either did some analysis that would be consumed by a later pass or it rewrote some higher level concept or feature in terms of more primitive operations.
Nanopass had a DSL for describing what forms you delete from each intermediate language and which forms you add. Each pass was generally pretty small then, usually just replacing one form with some set of other forms.
Be cause each pass was really small it was pretty easy to reorder them. Sometimes we figured out if we moved one optimization sooner then later passes worked better. Or we realized some analysis was useful somewhere else so we rearranged the passes again. The pass ordering made it really clear which analysis results are valid at each point in the compilation process.
pfdietz - 14 hours ago

Yes, and a similar question is the organization of the thing being acted on by the passes. If I understand correctly, this is in scheme and the things being acted on are trees with pointers. A performance optimized compiler, on the other hand, will probably use some sort of array-based implementation of trees.
There's also a question of data about the trees (like, a flow graph) being recomputed for each nanopass. Also expensive.
- soegaard - 13 hours ago
  
  Nanopass uses structures internally to represent the programs.
  The Nanopass dsl just gives the user a nicer syntax to specify the transformations.
  - pfdietz - 11 hours ago
    
    So, a conventional linked representation of a tree (but not a tree of cons cells).
    
    soegaard - 8 hours ago
    
    Yes.
armchairhacker - 12 hours ago

I wrote a small nanopass-style compiler (with many small passes) and had the same experience: lots of redundancy, and hard to debug because it wasn't clear how the passes interacted.
Admittedly, this framework has extensive metaprogramming (it's a Racket DSL), so it probably has much less redundancy, but is probably even harder to debug.
To an extent, it's impossible to implement nanopass in a way that's neither redundant nor confusing, without a projectional editor. Because either each pass's AST is fully redefined, which adds redundancy, or most pass's ASTs are defined in many places, which adds confusion. But my wild speculation is that one day projectional editors will gain popularity, and someone will make a compiler where one observe and debug individual passes.
onlyrealcuzzo - 15 hours ago

Do you have an article on lessons learned?
I'm creating a language/compiler now, and I'm quite certain that I did not have enough passes initially, but I hope I'm at a good spot now - but time will tell.
- soegaard - 13 hours ago
  
  Skim
  https://andykeep.com/pubs/dissertation.pdf
  Also see the this text:
  https://www.cs.umd.edu/class/fall2025/cmsc430/Notes.html
jnpnj - 14 hours ago

I wonder if there's some implicit wisdom that layering/modularizing incurs some communication cost that can cancel all the benefits.
- jasonjmcghee - 14 hours ago
  
  This is a question folks are asking about in terms of organization building too.
  Bottlenecks are changing and it's pretty interesting.
LegNeato - 13 hours ago

Why do passes anymore when we have invented egraphs?
- armchairhacker - 11 hours ago
  
  Egraphs are for optimization passes, which operate on the same IR, and (without egraphs) may undo and/or prevent each other and are repeated for more optimization.
  Nanopass is compiler passes, which each have their own IR, and run once in a fixed sequence.
  Egraphs also require the IR to be defined a specific way, which prevents some optimizations. My understanding of https://github.com/bytecodealliance/rfcs/blob/main/accepted/... is that cranelift’s egraph optimizations are pure expression reordering/duplication/deduplication and rewrite rules.
  - LegNeato - 3 hours ago
    
    There is no reason why compiler passes cannot be in an egraph, they are more general than optimizations. When you think about it, traditional compiler concerns like instructions selection are sort of a optimization problem if you squint.
- skybrian - 12 hours ago
  
  It's the first I've heard of them. Looks like the research goes back to 1980, but good libraries seem fairly new?
  https://blog.sigplan.org/2021/04/06/equality-saturation-with...
- mncharity - 11 hours ago
  
  Recent related discussion/blog[1].
  [1] The acyclic e-graph: Cranelift's mid-end optimizer https://news.ycombinator.com/item?id=47717192
ANTHONY6632 - 13 hours ago

[dead]

Panzerschrek - an hour ago

As a compiler developer I see no reason to use a lot of passes. My language compiler is designed in such a way, that it does its frontend work (except tokenization and syntax analysis) in one big pass, which does all the stuff necessary. Splitting this work in multiple passes is impossible due to language specifics - interleaving many compilation stages is necessary for many language features to function properly.

Also I doubt that a common framework can be used by many languages at all. Usually a mature language compiler is self-hosted, which makes near to impossible to incorporate in it some thirdparty library/framework written in some other language.

s20n - 14 hours ago

I agree with the notion that having multiple passes makes compilers easier to understand and maintain but finding the right number of passes is the real challenge here.

The optimal number of passes/IRs depends heavily on what language is being compiled. Some languages naturally warrant this kind of an architecture that would involve a lot of passes.

Compiling Scheme for instance would naturally entail several passes. It could look something like the following:

Lexer -> Parser -> Macro Expander -> Alpha Renaming -> Core AST (Lowering) -> CPS Transform -> Beta / Eta Reduction -> Closure Conversion -> Codegen

rezaprima - 5 hours ago

the getting started button only links to github of the owner. I found this link [1] gives better info about this.

[1] https://docs.racket-lang.org/nanopass/index.html

Mathnerd314 - 16 hours ago

website is not up to date, https://www.youtube.com/watch?v=lqVN1fGNpZw is not on there

vmsp - 14 hours ago

Wouldn't this kind of architecture yield a slower compiler, regardless of output quality? Conceptually, trying to implement the least-amount of passes with each doing as much work as possible would make more sense to me.

bjoli - 11 hours ago

Optimization level 2 in chez scheme does about 100 KLOC/s in my pretty modest machine, while also producing code that is pretty darn fast.
marssaxman - 12 hours ago

There is nothing stopping you from building an old-fashioned single-pass compiler, if compile time is your only concern. The code it generates just wouldn't be very good.
- norir - 10 hours ago
  
  This highly depends on the language and your skill as a compiler writer. You can write a single pass assembler that generates great code but you have to of course write the low level code yourself (including manual register assignment). To do decent automatic register assignment, I agree you need at least two passes, but not 10 or more.
- bsder - an hour ago
  
  > There is nothing stopping you from building an old-fashioned single-pass compiler, if compile time is your only concern. The code it generates just wouldn't be very good.
  Define "very good".
  The very simplest optimizations get you almost all the benefit. Proebsting's Law applies: Compiler Advances Double Computing Power Every 18 Years

presz - 13 hours ago

We really need to get more designers interested in Scheme, because that logo is awful

ape4 - 12 hours ago

Can it only make compilers for Lisp-like languages

- 17 hours ago

[deleted]