Nerd: A language for LLMs, not humans

nerd-lang.org

48 points by gnanagurusrgs 3 hours ago


kenferry - 3 hours ago

Seems like engagement bait or a thought exercise more than a realistic project.

> "But I need to debug!"

> Do you debug JVM bytecode? V8's internals? No. You debug at your abstraction layer. If that layer is natural language, debugging becomes: "Hey Claude, the login is failing for users with + in their email."

Folks can get away without reading assembly only when the compiler is reliable. English -> code compilation by llms is not reliable. It will become more reliable, but (a) isn’t now so I guess this is a project to “provoke thought” (b) you’re going to need several nines of reliability, which I would bet against in any sane timeframe (b) English isn’t well specified enough to have “correct” compilation, so unclear if “several nines of reliability” is even theoretically possible.

knlb - 3 hours ago

> Do you debug JVM bytecode? V8's internals? No. You debug at your abstraction layer

In the fullness of time, you end up having to. Or at least I have. Which is why I always dislike additional layers and transforms at this point.

(eg. when I think about react native on android, I hear "now I'll have to be excellent at react/javascript and android/java/kotlin and C++ to be able to debug the bridge; not that "I can get away with just javascript".)

synalx - 3 hours ago

One major disadvantage here is the lack of training data on a "new" language, even if it's more efficient. At least in the short term, this means needing to teach the LLM your language in the context window.

I've spent a good bit of time exploring this space in the context of web frameworks and templating languages. One technique that's been highly effective is starting with a _very_ minimal language with only the most basic concepts. Describe that to the LLM, ask it to solve a small scale problem (which the language is likely not yet capable of doing), and see what kinds of APIs or syntax it hallucinates. Then add that to your language, and repeat. Obviously there's room for adjustment along the way, but we've found this process is able to cut many many lines from the system prompts that are otherwise needed to explain new syntax styles to the LLM.

gnanagurusrgs - 3 hours ago

Creator here. This started as a dumb question while using Claude Code: "Why is Claude writing TypeScript I'm supposed to read?"

40% of code is now machine-written. That number's only going up. So I spent some weekends asking: what would an intermediate language look like if we stopped pretending humans are the authors?

NERD is the experiment.

Bootstrap compiler works, compiles to native via LLVM. It's rough, probably wrong in interesting ways, but it runs. Could be a terrible idea. Could be onto something. Either way, it was a fun rabbit hole.

Contributors welcome if this seems interesting to you - early stage, lots to figure out: https://github.com/Nerd-Lang/nerd-lang-core

Happy to chat about design decisions or argue about whether this makes any sense at all.

dlenski - 3 hours ago

This is a 21st-century equivalent of leaving short words ("of", "the", "in") out of telegrams because telegraph operators charged by the word. That caused plenty of problems in comprehension… this is probably much worse because it's being applied to extremely complex and highly structured messages.

It seems like a short-sighted solution to a problem that is either transient or negligible in the long run. "Make code nearly unreadable to deal with inefficient tokenization and/or a weird cost model for LLMs."

I strongly question the idea that code can be effectively audited by humans if it can't be read by humans.

perons - 3 hours ago

That looks to me like Forth with extra steps and less clarity? Not sure why I'd choose it over something with the same semantic advantages ("terse english" but in a programming language), but just agressively worse for a human operator to debug.

al_borland - 3 hours ago

> Do you debug JVM bytecode? V8's internals? No. You debug at your abstraction layer. If that layer is natural language, debugging becomes: "Hey Claude, the login is failing for users with + in their email."

I’ve run into countless situations where this simply doesn’t work. I once had a simple off-by-one error and the AI could not fix it. I tried explaining the end result of what I was seeing, as implied by this example, with no luck. I then found why it was happening myself and explained the exact problem and where it was, and the AI still couldn’t do it. It was sloshing back and further between various solutions and compounding complexity that didn’t help the issue. I ended up manually fixing the problem in the code.

The AI needs to be nearly flawless before this is viable. I feel like we are still a long way away from that.

DSMan195276 - 3 hours ago

> Do you debug JVM bytecode? V8's internals? No.

I can't speak for the author, but I do often do this. IMO it's a misleading comparison though, you don't have to debug those things because rarely does the compiler output incorrect code compared to the code you provided, it's not so simple for an LLM.

wilsonnb3 - 3 hours ago

> Do you debug JVM bytecode? V8's internals? No. You debug at your abstraction layer. If that layer is natural language, debugging becomes: "Hey Claude, the login is failing for users with + in their email."

I debug at my abstraction layer because I can trust that my compiler actually works, LLMs are fundamentally different and need to produce human readable code.

ksec - an hour ago

Edited for Formatting.

My take on the timeline; ( Roughly I think some of them are in between but may be best not to be picky about it )

1950s: Machine code

1960s: Assembly

1970s: C

1980s: C++

1990s: Java

2000s: Perl / PHP / Python / Ruby

2010s: Javascripts / Frameworks

2020s: AI writes, humans review

But the idea is quite clear once we have written this out, we are moving to higher level abstraction every 10 years. In essence we are moving to Low Code / No Code direction.

The languages for AI assisted programming idea isn't new. I have heard at least a few said may be this will help Ruby ( Or Nim ) . Or a Programming languages that is closest reassembling of the English language.

And considering we are reading the code more that ever writing it, since we are mostly reviewing now with LLM. I am thinking if this will also changes the pattern or code output preference.

I think we are in a whole different era now. And a lot of old assumptions we have about PL may need a rethink. Would Procedure Programming and Pascal made a comeback, or the resurgence of SmallTalk OO Programming ?

xlbuttplug2 - 3 hours ago

I have the exact opposite prediction. LLMs may end up writing most code, but humans will still want to review what's being written. We should instead be making life easier for humans with more verbose syntax since it's all the same to an LLM. Information dense code is fun to write but not so much to read.

nrhrjrjrjtntbt - 3 hours ago

Feels like a dead end optimisation ala the bitter lesson.

No LLM has seen enough of this language vs. python and context is now going to be mostly wordy not codey (e.g. docs, specs etc.)

Animats - 3 hours ago

The question is whether this language can be well understood by LLMs. Lack of declarations seems a mistake. The LLM will have to infer types, which is not something LLMs are good at. Most of the documentation is missing, so there's no way to tell how this handles data structures.

A programming language for LLMs isn't a bad idea, but this doesn't look like a good one.

lovidico - an hour ago

How can a language both be human-unfriendly and also sanely auditable? The types of issues that require human intervention in LLM output are overwhelmingly things where the LLM depends on the human to detect things it cannot. Seems to break the loop if the human can’t understand well

zaptheimpaler - an hour ago

I get the idea, but this language seems to be terrible for humans, while not having a lot of benefits for LLMS besides keeping keywords in single tokens. And I bet like 1 or 2 layers into an LLM, the problem of a keyword being two tokens doesn't really matter.

tom_ - 2 hours ago

The space separated function call examples could do with a 2+ary example i think. How do you do something like pow(x+y,1/z)? I guess it must be "math pow x plus y 1 over z"? But then, the sqrt example has a level of nesting removed, and perhaps that's supposed to be generally the case, and actually it'd have to be "math pow a b" amd you need to set up a and b accordingly. I'm possibly just old fashioned and out of touch.

munchler - 3 hours ago

By this logic, shouldn’t you be prompting an LLM to design the language and write the compiler itself?

killingtime74 - 2 hours ago

Why not write in LLVM IR then? Or JVM/CLR bytecode? Makes no sense to make it unreadable but also need to be compiled.

agnishom - 2 hours ago

> LLMs tokenize English words efficiently. Symbols like {, }, === fragment into multiple tokens. Words like "plus", "minus", "if" are single tokens.

The insight seems flawed. I think LLMs are just as capable of understanding these symbols as tokens as they are English words. I am not convinced that this is a better idea than writing code with a ton of comments

- an hour ago
[deleted]
dgreensp - 3 hours ago

A curly brace is multiple tokens? Even in models trained to read and write code? Even if true, I’m not sure how much that matters, but if it does, it can be fixed.

Imagine saying existing human languages like English are “inefficient” for LLMs so we need to invent a new language. The whole thing LLMs are good at is producing output that resembles their training data, right?

CGamesPlay - 3 hours ago

If you're going to set TypeScript as the bar, why not a bidirectional transpile-to-NERD layer? That way you get to see how the LLM handles your experiment, don't have to write a whole new language, and can integrate with an existing ecosystem for free.

- 3 hours ago
[deleted]
azhenley - 3 hours ago

Aren't there many programming languages not built for humans? They're built for compilers.

ekinertac - 3 hours ago

the real question isn't "should AI write readable code" but "where in the stack does human comprehension become necessary?" we already have layers where machine-optimized formats dominate (bytecode, machine code, optimized IR). the source layer stays readable because it's the interface where human judgment enters.

maybe AI should write better readable code than humans. more consistent naming, clearer structure, better comments. precisely because humans only "skim". optimize for skimmability and debuggability, not keystroke efficiency.

joegibbs - 3 hours ago

Would it make more sense to instead train a model and tokenise the syntax of languages differently so that white space isn’t counted, keywords are all a single token each and so on?

mehmetkose - 3 hours ago

‘So why make AI write in a format optimized for human readers who aren't reading?’ well yo’ll do when you needed to. sooner or later. but i like the idea anyway

unsaved159 - 2 hours ago

Ironically the getting started guide (quite long) is still to be executed by a human, apparently. I'd expect an LLM first approach, such as, "Insert this prompt into Cursor, press Enter and everything will be installed, you'll see Hello World on your screen".

measurablefunc - 3 hours ago

I can't tell if this is parody or not. It seems like it's parody.

- 2 hours ago
[deleted]
thealistra - 2 hours ago

For the bootstrap c lexer and parser was hand rolling really necessary? Lex and yacc exist for a reason

- 3 hours ago
[deleted]
ForHackernews - 3 hours ago

Assembly already exists.

kace91 - 3 hours ago

>NERD is what source code becomes when humans stop pretending they need to write it.

It is so annoying to realise mid read that a piece of text was written by an LLM.

It’s the same feeling as bothering to answer a call to hear a spam recording.

dented42 - 3 hours ago

I can’t be alone in this, but this seems like a supremely terrible idea. I reject whole heartedly the idea that any sizeable portion of one’s code base should specifically /not/ be human interpretable as a design choice.

There’s a chance this is a joke, but even if it is I don’t wanna give the AI tech bros more terrible ideas, they have enough. ;)

diath - 2 hours ago

The entire point of LLM-assisted development is to audit the code generated by AI and to further instruct it to either improve it or instruct it to fix the shortcomings - kind of being a senior dev doing a code review on your colleague's merge request. In fact, as developers, we usually read code more than we write it, which is also why you should prefer simple and verbose code over clever code in large codebases. This seems like it would be instead aimed at pure vibecoded slop.

> Do you debug JVM bytecode? V8's internals?

People do debug assembly generated by compilers to look for miscompilations, missed optimization opportunities, and comparison between different approaches.

globalnode - 3 hours ago

I was going to try and resist posting for as long as possible in 2026 (self dare) and here I am on day 1 -- this is a pretty bad idea. are you going to trust the llm to write software you depend on with your life? your money? your whatever? worst idea so far of 2026. wheres the accountability when things go wrong?

porcoda - 2 hours ago

Seems the TL;DR is “squished out most of the structural and semantic features of languages to reduce tokens, and trivial computations still work”. Beyond that nothing much to see here.

croes - 2 hours ago

> You debug at your abstraction layer. If that layer is natural language, debugging becomes: "Hey Claude, the login is failing for users with + in their email."

That sounds like step 2 before step 1. First you get complains that login in doesn’t work, then you find out it’s the + sign while you are debugging.

johnnyfived - 2 hours ago

Oh boy a competitor to the well-renowned TOON format? I'm so surprised this stuff is even entertained here but the HN crowd probably is behind on some of the AI memes.

tayo42 - 3 hours ago

> 67% fewer tokens than TypeScript. Same functionality.

Doesnt typescript have types? The example seems to not have types?

dragonwriter - 3 hours ago

Uh, I think every form of (physical or virtual, with some pedagogic exceptions) machine code beats Nerd as an earlier “programming language not built for humans”.

itsthecourier - 3 hours ago

tokens are tokens, shorter or larger, they are tokens

in that sense I don't see how this is more succinct than phyton

it is more than typescript and c#, of course, but we need to compete with the laconic languages

in that sense you will end up with Cisc vs Risc dilemma from the cpu wars. you will find the ability to compress even more is adding new tokens to compress repetitive tasks like sha256 being a single token. I feel that's a way to compress even more

4b11b4 - 2 hours ago

no

behnamoh - 3 hours ago

That's not true, the first not-for-humans language is Pel: https://arxiv.org/abs/2505.13453