Fabrice Bellard: Biography (2009) [pdf]
ipaidia.gr360 points by lioeters 3 months ago
360 points by lioeters 3 months ago
Publishing ffmpeg and QEMU in a five year span that also included winning IOCCC (twice!) is absolutely bonkers.
He’s one of the GOATs, but this article is written by someone who has no idea about software engineering and full of exaggerations as a result. For example:
> Many times there are certain chunks which will occur many times in the code of a program. Instead of taking the time to translate them all separately, QEMU stores the chunks and their native translation, next time simply executing the native translation instead of doing translation a second time. Thus, Bellard invented the first processor emulator that could achieve near native performance in certain instances.
JIT is about as old as Fabrice, or even older depending on what you consider a modern JIT.
The actual innovation in QEMU was that the architecture-dependent part was much smaller than a full JIT compiler, because it used the C compiler to build small blocks and parsed ELF relocations to be able to move them into the translated code.
This technique has since been dropped by QEMU, but something similar is now used by the Python JIT. These days QEMU uses Tiny Code Generator, originally forked out of TCC though by now the source is probably unrecognizable except in the function names.
Moreover, Transmeta did this for their actual processor back in the day. Transmeta's version even did it in multipass, fusing more and more instructions as they appear more, getting faster as the system is used more, up to a certain point of course.
This doesn't make Fabrice a lesser man, but truth is truth.
Yeah, afaik arhitecture dynamic binary translation dates back to at least 1998 (VMware).
If you leave out the JIT part, binary translation dates back to at least 1966 (Honeywell).
Still one of the GOATs, agree.
Claims of ‘firsts’ undermine the authority of this document, though not the achievements of the subject.
For instance Marco Ternelli’s dynamic binary translator ZM/HT dates back to 1993, when it was published by Ergon Development. It translates Z80 to 68000 machine code on the fly and was a successful commercial product. I’d be interested to hear of earlier JIT binary to binary implementations, especially others which coped with self-modifying code, without which ZM/HT wouldn’t have been very useful.
Self-unpacking executables are at least a decade older, and Fabrice quite likely had Microsoft’s 1985 EXEPACK, written by Reuben Borman, on his computer when he came up with LZEXE. That was bundled with MASM and Microsoft C 3.0, their first in-house version. Both were preceded by Realia’s Spacemaker product, which Wikipedia says was written by Robert B. K. Dewar in 1982.
Thanks for the reference to https://en.wikipedia.org/wiki/Honeywell_200 apparently its claim to fame was it could run IBM 1401 programs faster than a 1401 for less money.
> Compatibility with the IBM/1400 Series has, of course, been a key factor in the success of the Series 200. The principal software components in Honeywell's "Liberator" approach are the Easytran translators, which convert Autocoder source programs written for the IBM machines into Easycoder source programs which can be assembled and run on Series 200/2000 systems, usually with little or no need for manual alterations. The Easytran routines have effectively overcome the minor differences between the instruction sets and assembly languages of the two systems in literally hundreds of installations.
from https://bitsavers.org/pdf/honeywell/datapro/70C-480-01_7404_...
https://cdnibm1401.azureedge.net/1401-Competition.html
It appears that Honeywell Liberator was a program to convert 1401 assembly to Easycoder, the Honeywell 200 assembly format.
Umm one of the authors, Andy Gocke, is the lead for the .NET runtime... https://github.com/agocke
(reposting from the MicroQuickJS thread if only because it seems more relevant here)
Always interesting when people as talented as Bellard manage to (apparently) never write a "full-on" GUI-fronted application, or more specifically, a program that sits between a user with constantly shifting goals and workflows and a "core" that can get the job done.
I would not want to dismiss or diminish by any amount the incredible work he has done. It's just interesting to me that the problems he appears to pick generally take the form of "user sets up the parameters, the program runs to completion".
> when people as talented as Bellard manage to (apparently) never write a "full-on" GUI-fronted application
The "full-on GUI-fronted application" is two different problems.
PROBLEM_A = create a minimal interface (arguments to application) and focus on making robust logic that is fit for use and purpose.
PROBLEM_B = make users who resist/object to a minimal interface happy by satisfying an unbounded set of requirements involving a changing stack of tools and OS dependencies.
The latter effort can expand to consume the time and energy of entire teams of people.
Actually, this is missing my point quite a bit. The difference is not the minimal/non-minimal interface.
One can easily imagine (and I think they even exist) GUI front ends for ffmpeg that let a user set up a conversion "more easily" than they might find it using the command line. Bellard has chosen not to do this (lots of entirely fine reasons), but even if you use such a GUI front end the use of ffmpeg still consists of "set the parameters and let the program run". At some point after clicking "Run" (or whatever the button says), then just like after press "Return", the ffmpeg process will have completed its work, and that particular user interaction is over.
By contrast, a video and/or audio editor is really an entirely different beast, in which the user is continually adjusting any and all parameters and states of the project, expecting undo/redo histories, and so on and so forth. There is essentially no "completion state" for the application to reach.
I'm just curious that Bellard seems never to have tackled this kind of application (as is absolutely his right to do, or not do). I'm curious because it creates an entirely different class of programming problems from the "set-and-run" type of application (though they also obviously overlap in many important areas).
> a video and/or audio editor is really an entirely different beast, in which the user is continually adjusting any and all parameters and states of the project, expecting undo/redo histories, and so on and so forth.
If you accept that there is some similarity to game development or a real piloting system for an aircraft, these complex adjustments would be split among components to be developed and tested separately and then integrated.
Could you just call these “interactive programs”?
Sure, it's just a bit of an "old" term that I wasn't sure the young'uns on HN would understand :)
Isn’t a JavaScript engine interactive ?
No regular user interacts directly with a JavaScript engine, not in the sense that they interact with a text editor, a video editor, an audio editor, a CAD application, a medical imaging application etc. etc. etc.
Apparently GUI frontend is not a subject or problem that interests him. He lives and thinks close to the metal, at a lower layer of abstraction. The software he writes for himself and others in that ecological milieu, people who would take his codebase as an embedded library, command-line tool, or wrap it with an abstraction and user interface for their particular purpose, like browsers did with FFmpeg.
He has his favorite niche intellectual and technical subjects, where all his big and small projects are explorations of that space from various angles. It's a lesser concern whether the result has business value, or wider public appeal. He's more of a researcher and scientist.
> Apparently GUI frontend is not a subject or problem that interests him. He lives and thinks close to the metal, at a lower layer of abstraction.
It's not that cut and dried. The application I work on has some notable chunks of assembly code, lots of tricky multithreaded realtime lock free code involving threads, atomics, RCU and more ... and ... a GUI that lets the user continuously interact with it.
Oh, and we use ffmpeg for video decoding/encoding :)
Bellard wrote an emacs-type text editor, with full html rendering support, Unicode, X11 GUI, ... in the early 2000s!
This biography includes more information than I've seen elsewhere about the legendary programmer, who's been discussed time and again on this forum.
He did a few things since, notably 5G base stations using PC hardware, and some LLM stuff.
And he wrote a proprietary ASN.1 compiler and stack.
It’s far from being impossible, the main thing you need is free time and obsession (and money for your free time btw).
C or asm are not obscure languages or anything, they are brutal languages where you have to trace runtime from A to Z, and manage the memory.
In 1990, it was absolutely normal to code in C. Yes you had to decode images yourself, yes you had to decode audio, yes you had to raytrace, etc.
“Wait, you had to calculate all of these by hand ?
Yes my friend everybody had to do that in my time, what else could we do ?
So we took books, and did one by one.
This was the norm, just that it became some sort of archeology.”
Every year, thousands of 19-year-olds complete these tasks in low-level schools like Epita/Epita/42 or in demoscene contests. They aren't geniuses; they are just students who were forced to read the manual and understand how the computer actually works.
Free time won’t guarantee you success, but free time + obsession will (like Terry Davis).
Really, this is not alien tech.
Before FFmpeg, people had to encode the videos. Before emulators someone had to create the state machine, etc. All these people it would be insane to ignore them.
Most of the difficult problems have shifted somewhere else from low-level.
How to simulate millions of pharmaceutical molecules in short amount of time ?
How to simulate the world in GTA VI ?
Saving 2 bytes of memory by writing asm (that… won’t be portable) is not the thing going to save you. The problems are now elsewhere.
The problem now is not about “wow you read ancient manuals and mixed sand with water and got a solid foundational brick” but it is about “ok, using these bricks, how to build a skyscraper that is 1km tall”.
No doubt that these modern programmers are as good as the archeologists who like to explore handcrafted code.
This doesn't explain why so few people of Fabrice's generation have reached his level. Think about violin playing. Many players can become professionals if they have the obsession, but 99% of them won't reach the Heifetz/Hadelich/Ehnes level no matter how hard they try. Talent matters. Programming is not much different from performing art.
I think this is well covered by his first line:
> the main thing you need is free time and obsession (and money for your free time btw).
Free time (and money for your free time) is a privilege not everyone may have had. Also, access to computers which, don't forget, has only become ubiquitous this century, and sadly not always in the form that might encourage experimentation. Without getting too much into the Nature-Nurture debate, talent and obsession sadly won't go anywhere without the proper environment to cultivate it. You don't become Bellard/Knuth/Dijkstra with just a bunch of rocks[1] and a whole host of other concerns on top.
That doesn't cover OP's point, some people's brains just work differently and they can achieve something in 1000x less time than others. You can have all the time in the world and you'll never reach their level. That's essentially what talent is.
I have been thinking what talent means in programming and thought of a case in the past. The task was to parse a text file format. One programmer used ~1000 lines of code (LOC) with complex logic. The other used <200 LOC with a straightforward solution that ran times faster and would probably be more extensible and easier to maintain in future. This is a small task. The difference will be exponentially amplified for complex projects that Fabrice is famous for. The first programmer in my story may be able to write a javascript runtime if he has time + obsession, but it will take him much longer and the quality will be much lower in comparison to quickjs or mqjs.
Victor Taelin posts an intuition `HVM is missing a fundamental building block' having done 10 years thinking
https://x.com/VictorTaelin/status/2003839852006232478?s=20I won't pretend to know the answer, I am not even sure I understand the question :|
> It’s far from being impossible, the main thing you need is free time and obsession (and money for your free time btw).
I'm aware :(
(I maintain one, one written by my Swedish friends, whom too were obsessed.)
Without being glib, I honestly wonder if Fabrice Bellard has started using any LLM coding tools. If he could be even more productive, that would be scary!
I doubt he is ideologically opposed to them, given his work on LLM compression [1]
He codes mostly in C, which I'm sure is mostly "memorized". i.e. if you have been programming in C for a few decades, you almost certainly have a deep bench of your own code that you routinely go back to / copy and modify
In most cases, I don't see an LLM helping there. It could be "out of distribution", similar to what Karpathy said about writing his end-to-end pedagogical LLM chatbot
---
Now that I think of it, Bellard would probably train his own LLM on his own code! The rest of the world's code might not help that much :-)
He has all the knowledge to do that ... I could see that becoming a paid closed-source project, like some of his other ones [2]
[1] e.g. https://bellard.org/ts_zip/
What I wonder is: are current LLMs even good for the type of work he does: novel, low-level, extremely performant
As a professional C programmer, the answer seems to be no; they are not good enough.
They are absolutely good at reviewing C code. To catch stupid bugs and such. Great for pair programming type use.
I'm writing C for microcontrollers and ChatGPT is very good at it. I don't let it write any code (because that's the fun part, why would I), but I discuss with it a lot, asking questions, asking to review my code and he does good. I also love to use it to explain assembly.
It's also the best way to use llms in my opinion, for idea generation and snippets, and then do the thing "manually". Much better mastery of the code, no endless loop of "this creates that bug, fix it", and it comes up with plenty of feedback and gotchas when used this way.
This is how I used LLMs to learn and at the same time build an application using Tkinter.
This is a funny one because on the one hand the answer is obviously no, it's very fiddly stuff that requires a lot of umming and ahhing, but then weirdly they can be absurdly good in these kinds of highly technical domains precisely because they are often simple enough to pose to the LLM that any help it can give is actually applicable immediately whereas in a comparatively boring/trivial enterprise application there is a vast amount of external context to grapple with.
If Fabrice explained what he wanted, I expect the LLM would respond in kind.
If Fabrice explained what he wanted the LLM would say it's not possible.
When the coding assistant LLMs load for a while it's because they are sending Fabrice an email and he corrects it and replies synchronously.
From my experience, it's just good enough to give you a code overview of a codebase you don't know and give you enough implementation suggests to work from there.
I doubt it, although LLMs seem to do well on low-level (ASM level instructions).
I think it's the opposite: llms ask Fabrice Bellard instead
Congrats, the Chuck Norris meme has finally made its way onto HN.
They're trained on his code for sure. Every time I ask about ffmpeg internals, I know it's Fabrice's training data.
He has in fact written one: https://bellard.org/ts_server/
Yeah I've seen that, but it looks like the inference-side only?
Maybe that is a hint that he does use off-the-shelf models as a coding aid?
There may be no need to train your own, on your own code, but it's fun to think about
> Without being glib, I honestly wonder if Fabrice Bellard has started using any LLM coding tools
I doubt it. I follow him and look at the code he writes and it's well thought out and organized. It's the exact opposite of AI slop I see everywhere.
> He codes mostly in C, which I'm sure is mostly "memorized". i.e. if you have been programming in C for a few decades,
C I think he memorized a long time ago. It's more like he keeps the whole structure and setup of the program (the context) in his head and is able to "see it" all and operate on it. He is so good that people are insinuating he is actually "multiple people" or he uses an LLM and so on. I imagine he is quite amused reading those comments.
Still, humans can only type so quickly. It's not hard to imagine how even a flawless coder could benefit from an llm.
> humans can only type so quickly
Real programming is 0.1% typing. Typing speed is not a limiting factor for any serious development.
You're conflating typing with programming. Typing is in fact the limiting factor to serious development.
typing would not make top-100 list of “limiting factors” for serious development.
Most coding is better done with agents than with your hands. Coding is the main financial impediment to development. Yes, actually articulating what you want is the hard problem. Yes, there are technical problems that demand real analytical insight and real motivation. But refusing to use agents because you think you can type faster is mistaking typing for your actual skill: reasoning and interpretation.
It is if for AI users who can't type code.
I am a heavy AI user and have been typing code for 3 decades :)
Ok, if you have such insight into development, why not leverage agents to type for you? What sort of problems have you faced that you are able to code against faster than you can articulate to an agent?
I have of course found some problems like this myself. But it's such a tiny portion of coding I really question why you can't leverage LLMs to make yourself more productive