Fabrice Bellard: Biography (2009) [pdf]

360 points by lioeters 3 months ago

Publishing ffmpeg and QEMU in a five year span that also included winning IOCCC (twice!) is absolutely bonkers.

He’s one of the GOATs, but this article is written by someone who has no idea about software engineering and full of exaggerations as a result. For example:

> Many times there are certain chunks which will occur many times in the code of a program. Instead of taking the time to translate them all separately, QEMU stores the chunks and their native translation, next time simply executing the native translation instead of doing translation a second time. Thus, Bellard invented the first processor emulator that could achieve near native performance in certain instances.

JIT is about as old as Fabrice, or even older depending on what you consider a modern JIT.

bonzini - 3 months ago

The actual innovation in QEMU was that the architecture-dependent part was much smaller than a full JIT compiler, because it used the C compiler to build small blocks and parsed ELF relocations to be able to move them into the translated code.
This technique has since been dropped by QEMU, but something similar is now used by the Python JIT. These days QEMU uses Tiny Code Generator, originally forked out of TCC though by now the source is probably unrecognizable except in the function names.
bayindirh - 3 months ago

Moreover, Transmeta did this for their actual processor back in the day. Transmeta's version even did it in multipass, fusing more and more instructions as they appear more, getting faster as the system is used more, up to a certain point of course.
This doesn't make Fabrice a lesser man, but truth is truth.
isopede - 3 months ago

Yeah, afaik arhitecture dynamic binary translation dates back to at least 1998 (VMware).
If you leave out the JIT part, binary translation dates back to at least 1966 (Honeywell).
Still one of the GOATs, agree.
- ZX8301 - 3 months ago
  
  Claims of ‘firsts’ undermine the authority of this document, though not the achievements of the subject.
  For instance Marco Ternelli’s dynamic binary translator ZM/HT dates back to 1993, when it was published by Ergon Development. It translates Z80 to 68000 machine code on the fly and was a successful commercial product. I’d be interested to hear of earlier JIT binary to binary implementations, especially others which coped with self-modifying code, without which ZM/HT wouldn’t have been very useful.
  Self-unpacking executables are at least a decade older, and Fabrice quite likely had Microsoft’s 1985 EXEPACK, written by Reuben Borman, on his computer when he came up with LZEXE. That was bundled with MASM and Microsoft C 3.0, their first in-house version. Both were preceded by Realia’s Spacemaker product, which Wikipedia says was written by Robert B. K. Dewar in 1982.
- seg_lol - 3 months ago
  
  Thanks for the reference to https://en.wikipedia.org/wiki/Honeywell_200 apparently its claim to fame was it could run IBM 1401 programs faster than a 1401 for less money.
  > Compatibility with the IBM/1400 Series has, of course, been a key factor in the success of the Series 200. The principal software components in Honeywell's "Liberator" approach are the Easytran translators, which convert Autocoder source programs written for the IBM machines into Easycoder source programs which can be assembled and run on Series 200/2000 systems, usually with little or no need for manual alterations. The Easytran routines have effectively overcome the minor differences between the instruction sets and assembly languages of the two systems in literally hundreds of installations.
  from https://bitsavers.org/pdf/honeywell/datapro/70C-480-01_7404_...
  https://cdnibm1401.azureedge.net/1401-Competition.html
  It appears that Honeywell Liberator was a program to convert 1401 assembly to Easycoder, the Honeywell 200 assembly format.
evntdrvn - 3 months ago

Umm one of the authors, Andy Gocke, is the lead for the .NET runtime... https://github.com/agocke

PaulDavisThe1st - 3 months ago

(reposting from the MicroQuickJS thread if only because it seems more relevant here)

Always interesting when people as talented as Bellard manage to (apparently) never write a "full-on" GUI-fronted application, or more specifically, a program that sits between a user with constantly shifting goals and workflows and a "core" that can get the job done.

I would not want to dismiss or diminish by any amount the incredible work he has done. It's just interesting to me that the problems he appears to pick generally take the form of "user sets up the parameters, the program runs to completion".

heresie-dabord - 3 months ago

> when people as talented as Bellard manage to (apparently) never write a "full-on" GUI-fronted application
The "full-on GUI-fronted application" is two different problems.
PROBLEM_A = create a minimal interface (arguments to application) and focus on making robust logic that is fit for use and purpose.
PROBLEM_B = make users who resist/object to a minimal interface happy by satisfying an unbounded set of requirements involving a changing stack of tools and OS dependencies.
The latter effort can expand to consume the time and energy of entire teams of people.
- PaulDavisThe1st - 3 months ago
  
  Actually, this is missing my point quite a bit. The difference is not the minimal/non-minimal interface.
  One can easily imagine (and I think they even exist) GUI front ends for ffmpeg that let a user set up a conversion "more easily" than they might find it using the command line. Bellard has chosen not to do this (lots of entirely fine reasons), but even if you use such a GUI front end the use of ffmpeg still consists of "set the parameters and let the program run". At some point after clicking "Run" (or whatever the button says), then just like after press "Return", the ffmpeg process will have completed its work, and that particular user interaction is over.
  By contrast, a video and/or audio editor is really an entirely different beast, in which the user is continually adjusting any and all parameters and states of the project, expecting undo/redo histories, and so on and so forth. There is essentially no "completion state" for the application to reach.
  I'm just curious that Bellard seems never to have tackled this kind of application (as is absolutely his right to do, or not do). I'm curious because it creates an entirely different class of programming problems from the "set-and-run" type of application (though they also obviously overlap in many important areas).
  - heresie-dabord - 3 months ago
    
    > a video and/or audio editor is really an entirely different beast, in which the user is continually adjusting any and all parameters and states of the project, expecting undo/redo histories, and so on and so forth.
    If you accept that there is some similarity to game development or a real piloting system for an aircraft, these complex adjustments would be split among components to be developed and tested separately and then integrated.
  - senderista - 3 months ago
    
    Could you just call these “interactive programs”?
    
    PaulDavisThe1st - 3 months ago
    
    Sure, it's just a bit of an "old" term that I wasn't sure the young'uns on HN would understand :)
    
    cgio - 3 months ago
    
    Isn’t a JavaScript engine interactive ?
    
    PaulDavisThe1st - 3 months ago
    
    No regular user interacts directly with a JavaScript engine, not in the sense that they interact with a text editor, a video editor, an audio editor, a CAD application, a medical imaging application etc. etc. etc.
lioeters - 3 months ago

Apparently GUI frontend is not a subject or problem that interests him. He lives and thinks close to the metal, at a lower layer of abstraction. The software he writes for himself and others in that ecological milieu, people who would take his codebase as an embedded library, command-line tool, or wrap it with an abstraction and user interface for their particular purpose, like browsers did with FFmpeg.
He has his favorite niche intellectual and technical subjects, where all his big and small projects are explorations of that space from various angles. It's a lesser concern whether the result has business value, or wider public appeal. He's more of a researcher and scientist.
- PaulDavisThe1st - 3 months ago
  
  > Apparently GUI frontend is not a subject or problem that interests him. He lives and thinks close to the metal, at a lower layer of abstraction.
  It's not that cut and dried. The application I work on has some notable chunks of assembly code, lots of tricky multithreaded realtime lock free code involving threads, atomics, RCU and more ... and ... a GUI that lets the user continuously interact with it.
  Oh, and we use ffmpeg for video decoding/encoding :)
aaptel - 3 months ago

https://bellard.org/qemacs/
Bellard wrote an emacs-type text editor, with full html rendering support, Unicode, X11 GUI, ... in the early 2000s!
never_inline - 3 months ago

That way leaving employment for lesser ones among us

lioeters - 3 months ago

This biography includes more information than I've seen elsewhere about the legendary programmer, who's been discussed time and again on this forum.

speedgoose - 3 months ago

He did a few things since, notably 5G base stations using PC hardware, and some LLM stuff.

cryptonector - 3 months ago

And he wrote a proprietary ASN.1 compiler and stack.
- huhtenberg - 3 months ago
  
  To be fair, that's not terribly difficult.
- rvnx - 3 months ago
  
  It’s far from being impossible, the main thing you need is free time and obsession (and money for your free time btw).
  C or asm are not obscure languages or anything, they are brutal languages where you have to trace runtime from A to Z, and manage the memory.
  In 1990, it was absolutely normal to code in C. Yes you had to decode images yourself, yes you had to decode audio, yes you had to raytrace, etc.
  “Wait, you had to calculate all of these by hand ?
  Yes my friend everybody had to do that in my time, what else could we do ?
  So we took books, and did one by one.
  This was the norm, just that it became some sort of archeology.”
  Every year, thousands of 19-year-olds complete these tasks in low-level schools like Epita/Epita/42 or in demoscene contests. They aren't geniuses; they are just students who were forced to read the manual and understand how the computer actually works.
  Free time won’t guarantee you success, but free time + obsession will (like Terry Davis).
  Really, this is not alien tech.
  Before FFmpeg, people had to encode the videos. Before emulators someone had to create the state machine, etc. All these people it would be insane to ignore them.
  Most of the difficult problems have shifted somewhere else from low-level.
  How to simulate millions of pharmaceutical molecules in short amount of time ?
  How to simulate the world in GTA VI ?
  Saving 2 bytes of memory by writing asm (that… won’t be portable) is not the thing going to save you. The problems are now elsewhere.
  The problem now is not about “wow you read ancient manuals and mixed sand with water and got a solid foundational brick” but it is about “ok, using these bricks, how to build a skyscraper that is 1km tall”.
  No doubt that these modern programmers are as good as the archeologists who like to explore handcrafted code.
  - attractivechaos - 3 months ago
    
    This doesn't explain why so few people of Fabrice's generation have reached his level. Think about violin playing. Many players can become professionals if they have the obsession, but 99% of them won't reach the Heifetz/Hadelich/Ehnes level no matter how hard they try. Talent matters. Programming is not much different from performing art.
    
    yallpendantools - 3 months ago
    
    I think this is well covered by his first line:
    > the main thing you need is free time and obsession (and money for your free time btw).
    Free time (and money for your free time) is a privilege not everyone may have had. Also, access to computers which, don't forget, has only become ubiquitous this century, and sadly not always in the form that might encourage experimentation. Without getting too much into the Nature-Nurture debate, talent and obsession sadly won't go anywhere without the proper environment to cultivate it. You don't become Bellard/Knuth/Dijkstra with just a bunch of rocks[1] and a whole host of other concerns on top.
    [1] https://xkcd.com/505/
    
    sirfz - 3 months ago
    
    That doesn't cover OP's point, some people's brains just work differently and they can achieve something in 1000x less time than others. You can have all the time in the world and you'll never reach their level. That's essentially what talent is.
    
    attractivechaos - 3 months ago
    
    I have been thinking what talent means in programming and thought of a case in the past. The task was to parse a text file format. One programmer used ~1000 lines of code (LOC) with complex logic. The other used <200 LOC with a straightforward solution that ran times faster and would probably be more extensible and easier to maintain in future. This is a small task. The difference will be exponentially amplified for complex projects that Fabrice is famous for. The first programmer in my story may be able to write a javascript runtime if he has time + obsession, but it will take him much longer and the quality will be much lower in comparison to quickjs or mqjs.
  - __patchbit__ - 3 months ago
    
    Victor Taelin posts an intuition `HVM is missing a fundamental building block' having done 10 years thinking
    https://x.com/VictorTaelin/status/2003839852006232478?s=20
    
    rvnx - 3 months ago
    
    I won't pretend to know the answer, I am not even sure I understand the question :|
  - cryptonector - 3 months ago
    
    > It’s far from being impossible, the main thing you need is free time and obsession (and money for your free time btw).
    I'm aware :(
    (I maintain one, one written by my Swedish friends, whom too were obsessed.)

chubot - 3 months ago

Without being glib, I honestly wonder if Fabrice Bellard has started using any LLM coding tools. If he could be even more productive, that would be scary!

I doubt he is ideologically opposed to them, given his work on LLM compression [1]

He codes mostly in C, which I'm sure is mostly "memorized". i.e. if you have been programming in C for a few decades, you almost certainly have a deep bench of your own code that you routinely go back to / copy and modify

In most cases, I don't see an LLM helping there. It could be "out of distribution", similar to what Karpathy said about writing his end-to-end pedagogical LLM chatbot

---

Now that I think of it, Bellard would probably train his own LLM on his own code! The rest of the world's code might not help that much :-)

He has all the knowledge to do that ... I could see that becoming a paid closed-source project, like some of his other ones [2]

[1] e.g. https://bellard.org/ts_zip/

[2] https://bellard.org/lte/

latenightcoding - 3 months ago

What I wonder is: are current LLMs even good for the type of work he does: novel, low-level, extremely performant
- vitaminCPP - 3 months ago
  
  As a professional C programmer, the answer seems to be no; they are not good enough.
  - checker659 - 3 months ago
    
    They are absolutely good at reviewing C code. To catch stupid bugs and such. Great for pair programming type use.
- vbezhenar - 3 months ago
  
  I'm writing C for microcontrollers and ChatGPT is very good at it. I don't let it write any code (because that's the fun part, why would I), but I discuss with it a lot, asking questions, asking to review my code and he does good. I also love to use it to explain assembly.
  - bionsystem - 3 months ago
    
    It's also the best way to use llms in my opinion, for idea generation and snippets, and then do the thing "manually". Much better mastery of the code, no endless loop of "this creates that bug, fix it", and it comes up with plenty of feedback and gotchas when used this way.
    
    zelphirkalt - 3 months ago
    
    This is how I used LLMs to learn and at the same time build an application using Tkinter.
- mhh__ - 3 months ago
  
  This is a funny one because on the one hand the answer is obviously no, it's very fiddly stuff that requires a lot of umming and ahhing, but then weirdly they can be absurdly good in these kinds of highly technical domains precisely because they are often simple enough to pose to the LLM that any help it can give is actually applicable immediately whereas in a comparatively boring/trivial enterprise application there is a vast amount of external context to grapple with.
- wolttam - 3 months ago
  
  If Fabrice explained what he wanted, I expect the LLM would respond in kind.
  - vasco - 3 months ago
    
    If Fabrice explained what he wanted the LLM would say it's not possible.
    When the coding assistant LLMs load for a while it's because they are sending Fabrice an email and he corrects it and replies synchronously.
- rjzzleep - 3 months ago
  
  From my experience, it's just good enough to give you a code overview of a codebase you don't know and give you enough implementation suggests to work from there.
- koakuma-chan - 3 months ago
  
  No
- slekker - 3 months ago
  
  I doubt it, although LLMs seem to do well on low-level (ASM level instructions).
raverbashing - 3 months ago

I think it's the opposite: llms ask Fabrice Bellard instead
- jacquesm - 3 months ago
  
  Congrats, the Chuck Norris meme has finally made its way onto HN.
  - throwup238 - 3 months ago
    
    Fabrice Bellard is far more deserving of the honor that ol’ Chucky.
    
    jacquesm - 3 months ago
    
    Tough choice: Knuth, Bellard, Norvig...
- echelon - 3 months ago
  
  They're trained on his code for sure. Every time I ask about ffmpeg internals, I know it's Fabrice's training data.
MrDrMcCoy - 3 months ago

He has in fact written one: https://bellard.org/ts_server/
- chubot - 3 months ago
  
  Yeah I've seen that, but it looks like the inference-side only?
  Maybe that is a hint that he does use off-the-shelf models as a coding aid?
  There may be no need to train your own, on your own code, but it's fun to think about
  - zelphirkalt - 3 months ago
    
    Are you saying a LFM could be a good idea? A Large Fabrice Model?
lomase - 3 months ago

Why every single post in HN has to come down to talk about AI sloop...
rdtsc - 3 months ago

> Without being glib, I honestly wonder if Fabrice Bellard has started using any LLM coding tools
I doubt it. I follow him and look at the code he writes and it's well thought out and organized. It's the exact opposite of AI slop I see everywhere.
> He codes mostly in C, which I'm sure is mostly "memorized". i.e. if you have been programming in C for a few decades,
C I think he memorized a long time ago. It's more like he keeps the whole structure and setup of the program (the context) in his head and is able to "see it" all and operate on it. He is so good that people are insinuating he is actually "multiple people" or he uses an LLM and so on. I imagine he is quite amused reading those comments.
- MangoToupe - 3 months ago
  
  Still, humans can only type so quickly. It's not hard to imagine how even a flawless coder could benefit from an llm.
  - dmitrygr - 3 months ago
    
    > humans can only type so quickly
    Real programming is 0.1% typing. Typing speed is not a limiting factor for any serious development.
    
    MangoToupe - 3 months ago
    
    You're conflating typing with programming. Typing is in fact the limiting factor to serious development.
    
    bdangubic - 3 months ago
    
    typing would not make top-100 list of “limiting factors” for serious development.
    
    MangoToupe - 3 months ago
    
    Most coding is better done with agents than with your hands. Coding is the main financial impediment to development. Yes, actually articulating what you want is the hard problem. Yes, there are technical problems that demand real analytical insight and real motivation. But refusing to use agents because you think you can type faster is mistaking typing for your actual skill: reasoning and interpretation.
    
    lomase - 3 months ago
    
    It is if for AI users who can't type code.
    
    bdangubic - 3 months ago
    
    I am a heavy AI user and have been typing code for 3 decades :)
    
    MangoToupe - 3 months ago
    
    Ok, if you have such insight into development, why not leverage agents to type for you? What sort of problems have you faced that you are able to code against faster than you can articulate to an agent?
    I have of course found some problems like this myself. But it's such a tiny portion of coding I really question why you can't leverage LLMs to make yourself more productive
    
    lomase - 3 months ago
    
    Do you feel called out?