Show HN: I built a zero-browser, pure-JS typesetting engine for bit-perfect PDFs

github.com

66 points by cosmiciron 20 hours ago


Hi HN, I'm a film director by trade, and I prefer writing my stories in plain text rather than using clunky screenplay software. Standard markup like Fountain doesn't work for me because I write in mixed languages, so I use Markdown with a custom syntax I invented to resemble standard screenplay structures.

This workflow is great until I need to actually generate an industry-standard screenplay PDF. I got tired of manually copying and pasting my text back into the clunky software just to export it, so I decided to write a script to automate the process. That's when I hit a wall.

I tried using React-pdf and other high-level libraries, but they failed me on two fronts: true multilingual text shaping, and complex contextual pagination. Specifically, the strict screenplay requirement to automatically inject (MORE) at the bottom of a page and (CONT'D) at the top of the next page when a character's dialogue is split across a page break.

You can't really do that elegantly when the layout engine is a black box. So, I bypassed them and built my own typesetting engine from scratch.

VMPrint is a deterministic, zero-browser layout VM written in pure TypeScript. It abandons the DOM entirely. It loads OpenType fonts, runs grapheme-accurate text segmentation (Intl.Segmenter), calculates interval-arithmetic spatial boundaries for text wrapping, and outputs a flat array of absolute coordinates.

Some stats:

Zero dependencies on Node.js APIs or the DOM (runs in Cloudflare Workers, Lambda, browser).

88 KiB core packed.

Performance: On a Snapdragon Elite ARM chip, the engine's "God Fixture" (8 pages of mixed CJK, Arabic RTL, drop caps, and multi-page spanning tables) completes layout and rendering in ~28ms.

The repo also includes draft2final, the CLI tool I built to convert Markdown into publication-grade PDFs (including the screenplay flavor) using this engine.

This is my first open-source launch. The manuscript is still waiting, but the engine shipped instead. I’d love to hear your thoughts, answer any questions about the math or the architecture, and see if anyone else finds this useful!

--- A note on AI usage: To be fully transparent about how this was built, I engineered the core concept (an all-flat, morphable box-based system inspired by game engines, applied to page layouts), the interval-arithmetic math, the grapheme segmentation, and the layout logic entirely by hand. I did use AI as a coding assistant at the functional level, but the overall software architecture, component structures, and APIs were meticulously designed by me.

For a little background: I’ve been a professional systems engineer since 1992. I’ve worked as a senior system architect for several Fortune 500 companies and currently serve as Chief Scientist at a major telecom infrastructure provider. I also created one of the world's first real-time video encoding technologies for low-power mobile phones (in the pre-smartphone era). I'm no stranger to deep tech, and a deterministic layout VM is exactly the kind of strict, math-heavy system that simply cannot be effectively constructed with a few lines of AI prompts.

TimTheTinker - 2 hours ago

> If you generate PDFs with headless browsers or HTML-to-PDF tools, you've accepted a compromise: heavy dependencies, memory leaks, and "approximate" layout that shifts across environments

Absolutely not true with Prince[0]. It's an HTML/CSS-based typesetter built by the creator of CSS (Håkon Wium Lie [1]) that is lightweight, cross-platform, requires no dependencies, has no memory leaks, is 100% consistent in its output, is fully compliant with the relevant standards, and has a lot of really great print-oriented features (like using CSS to control things like page headers/footers, numbering, etc.). Prince has been used to typeset a lot of different print output types, from posters to books to scientific papers. It's even a viable alternative to LaTex. I've used it in the past, and can attest that it is outstanding.

[0] https://www.princexml.com/

[1] https://en.wikipedia.org/wiki/H%C3%A5kon_Wium_Lie

LastTrain - 2 hours ago

So this is what it has come to? AI bots writing code and fake origin stories of said code and AI bots commenting on it any other bots responding? This is front page content now? HN: please require all AI generated content to be flagged as such. Ban offenders. This just blows.

raphlinus - 4 hours ago

Unfortunately, your complex script shaping for Arabic and Devanagari is wrong. The Arabic is missing the joining (all forms are isolated), and the Devanagari doesn't have the vowels combining (so you see those dotted circles).

To fix this you'll need Harfbuzz or something similar. Taking a quick look at the code, it seems like you're just doing a glyph at a time through the cmap. That, uh, won't do.

flexagoon - 4 hours ago

Looks interesting, but the "Why Not Just Use" section in the readme is definitely missing Typst. Would be interesting to know how they compare, since Typst is the obvious choice for typesetting nowadays, rather than LaTeX.

raphman - an hour ago

Hi cosmiciron, wow, few humans find time to be a film director and a chief scientist and work on open-source projects.

What about these strangely written strange sentences in the README? What does that mean?

> In the 1980s and 90s, serious software thought seriously about pages.

Or this?:

> Desktop publishing software understood widows, orphans, and the subtle difference between a line break and a paragraph break.

As the difference between a line break and a paragraph break is really subtle -could you elaborate a little bit?

- an hour ago
[deleted]
speajus - 4 hours ago

Oh man -- I just wrote of these browserless markdown to pdf a few days ago.... Thanks for publishing [https://github.com/speajus/markdown-to-pdf.git](https://speajus.github.io/markdown-to-pdf). I didn't need anything this exacting. Anyways nice work; excited to look deeper.

luaybs - 4 hours ago

Every single screenshot of Arabic in the README is malformed, the letters are squished together and not connected.

irrationalfab - 2 hours ago

Interesting! Generating PDFs with properly paginated content is still a pain point in 2026.

Do you have a comprehensive integration test suite that can validate the robustness of your implementation?

samlinnfer - 2 hours ago

>ai description

>ai code

>ai comments

almaight - an hour ago

I've been working with PDFs lately, but I'm using PostScript.

codegladiator - 4 hours ago

devnagri in the screenshot is wrongly rendered.

Also can you share some names of films you have been part of as film director.

LastTrain - 3 hours ago

Define "I"

koterpillar - 4 hours ago

Are Unicode combining characters (dotted circles) visible on the screenshot by design?

nodoodles - 2 hours ago

Curious but offtopic - are others also immediately suspicious of the content and quality because the readme is so obviously AI-written? What are ways you distinguish genuinely useful contributions on the sea of slop?

sriram_malhar - 2 hours ago

Love it, love it! Thanks for sharing.

- 10 hours ago
[deleted]
armanidev - 2 hours ago

[dead]