Tectonic: A modernized, complete, self-contained TeX/LaTeX engine
tectonic-typesetting.github.io58 points by maxloh 4 days ago
58 points by maxloh 4 days ago
Background of the project:
Tectonic is forked from the XeTeX TeX engine. However, the build process for XeTeX (and all other mainstream TeX engines) is extremely baroque:
1. The original Knuth sources are written in a language called WEB (which is basically used by nobody else)
2. Those sources are then patched with a series of “change files” due to restrictions on distributing modified sources
3. These patched files are then converted to the Pascal language using some custom processing scripts
4. That Pascal code is then converted to C code using additional custom tools
5. Those C files are then compiled against a set of extension files and libraries written natively in C and C++
One of the big motivators for the launch of the Tectonic project was to break out of this ridiculously unwelcoming development process. Tectonic is based on the C/C++ files that emerge from the XeTeX build process, and is gradually translating that code to modern Rust.
While the core TeX architecture and XeTeX are largely stable, they do, however, evolve with time. This repository recreates the final XeTeX C/C++ files from their source, so that modifications can be ported into the main Tectonic codebase.
https://github.com/tectonic-typesetting/tectonic-staging/blo...
> However, the build process for XeTeX (and all other mainstream TeX engines) is extremely baroque
(Disclaimer: I'm on the TeX Live team)
Yes, the build process is rather unusual internally, but this is pretty well insulated from the user. The "standard" build commands used by lots of other open source projects
git clone […] && ./configure && make
should mostly just work, and give you a functioning TeX engine at the end. The full steps are listed at [0] if you're interested in more details though. And TeX Live is regularly built on essentially every platform imaginable [1], so it's pretty unlikely that you'll ever need to work with the low-level build system.And only some of the engines require this complex build process; LuaTeX is the recommended engine these days [2], and it's written entirely in C (with a few C++ libraries), and it uses a standard autotools build process [3].
> One of the big motivators for the launch of the Tectonic project was to break out of this ridiculously unwelcoming development process.
TeX development is pretty welcoming in my personal experience: the first time that I built TL (~5 years ago), I was able to follow the official instructions without any problems, and I got a working TeX engine by the end of it. And it was only 2 or 3 years after that that I became an "official" member of the TL team, so I like to think that the development process is fairly welcoming.
But if you have any suggestions on what we can do better, please let me know, either by replying to this comment or to the email address linked in my profile.
> Tectonic is based on the C/C++ files that emerge from the XeTeX build process
I haven't looked at the Tectonic source, but the problem is that the C code generated by Web2C is fairly unreadable, so it's not really very usable as source code.
> While the core TeX architecture and XeTeX are largely stable, they do, however, evolve with time.
XeTeX is essentially frozen for the time being [4], unless someone steps up to maintain it. We are cautiously optimistic regarding LLMs though: they seem to be fairly decent at writing WEB code, and since even the TL development team only has 2 developers who are proficient in WEB, we'll take any help that we can get. (We don't have any non-trivial AI-written code yet, but it's definitely something that we're looking into)
[0]: https://tug.org/texlive/build.html
[1]: https://ftp.math.utah.edu/pub/texlive-utah/
[2]: https://www.latex-project.org/news/2024/11/01/issue40-of-lat...
So what does this fork do? No more WEB? No more Pascal? Does it produce exact results as XeTeX?
Co-maintainer here.
Tectonic is a cool project, but hasn't seen any significant changes in a few years---and likely won't anytime soon. It seems we maintainers don't have the time and motivation to put serious work into Tectonic.
I haven't looked at the code in years (and thus may be wrong), but here's a quick overview:
Tectonic's code consists of thin bindings to /harfbuzz/graphite/etc and a vendored XeTeX (in C, with some tweaks to make the build easier), driven by Rust that tries to keep the TeX environment predictable and sane. A few components have been fully ported to Rust (bibtex, spx2html), but the project is very unfinished.
I've looked into the dark corners of TeX when I worked on Tectonic, and it is not pretty. TeX relies on a stack of evil hacks and esoteric behavior that is very hard to replicate, and very difficult to expose in an ergonomic way. This is true of the core system, and of many packages on CTAN.
A quick example: code highlighting does not work in Tectonic. The canonical solution is https://ctan.org/pkg/minted, which spawns a python process to style your code. Reproducibility is one of Tectonic's selling points, so we cannot replicate this behavior.
With https://typst.app/ as good as it is, there's little motivation to modernize TeX---especially considering the effort required. Typst _is_ modern TeX, and I'd rather spend my time there.
While Typst appears to be popular, I think that TeXmacs, https://www.texmacs.org/, which is a program independent from both TeX and Emacs, is the kind of program that we need for writing: a fully WYSIWYG, fully structured document preparation system, in which you edit the structure of your document in a WYSIWYG way. When editing the structure on-screen, the user has no need to be aware that is doing so, as it looks like they are editing a text document; at the same time, the TeXmacs editor will guide the user to keeping a structured document.
IIRC TeXmacs supports only quite limited subset of what LaTeX and TeX can do. Just like LyX, it could create new documents but will often fail opening ones that were created outside of it.
I think it is so. As far as I know, there are no converters that can do that. A search with an LLM made me find https://arxiv.org/pdf/2605.16562, a paper describing the ArXiv conversion tool from LaTeX to HTML; here is a sentence from the abstract:
"corpus-scale conversion work aimed at 90% error-free HTML (currently 75%)"
although there may be issues that I do not understand or did not see (I looked at the paper very quickly) that make it more difficult for the authors than for the simplest possible translation.
I’ve been using Typst lately and it has been great. I’ve made an exam template for my university and made an export feature so that I could generate the exam in the json format that our online exam system (WISEflow) expects, with support for multiple choice and essay style questions.
It is so snappy and with great error messages. I encourage people to try it out. The typst tutorial is very approachable.
I should note, it's still not on par feature-wise compared to TeX ecosystem, but it gets there with incredible speed. As for UX - it beats anything TeX-based ten times over.
if I were to write a new TeX system I would use the attempts from the beginning of the 2000s, where they tried to use java und modularize the system, the systems being NTS [0] and ExTeX [1] [0] https://github.com/jamespfennell/new-typesetting-system [1] https://github.com/tex-other/extex
I have been using TeX/LaTeX for ages and today same issues hinder the user experience like multiple decades ago - cryptic error messages, complex pipeline, lack of the proper Unicode symbols support out of the box, and so on.
Nowadays, with Typst existing, it's vital for TeX ecosystem to solve these issues, since none of them are present in Typst. Projects like Tectonic would solve this for TeX, but they lack enough hands and (maybe) financial support.
Otherwise, using TeX only makes sense nowadays only if 1) you already have some templates 2) some features are still missing in Typst 3) you are just forced to use TeX/LaTeX for whatever reason.
(Disclaimer: I'm on the TeX Live team)
> cryptic error messages
These have somewhat improved recently, but I agree that they're still not great.
> complex pipeline
You can typically just run "latexmk --lualatex <filename.tex>" and your document should compile in a single step.
> lack of the proper Unicode symbols support out of the box
UTF-8 has been the default input encoding since 2018 [0], so character input should mostly just work. Using complex scripts (Arabic, Devanagari, etc.) requires XeLaTeX or LuaLaTeX, but LuaLaTeX is recommended for most documents anyways [1].
Now, you still won't be able to typeset arbitrary characters without any additional setup, but this is because there is no single font that contains all characters, and since mismatched fallback fonts usually looks bad, the (La)TeX developers do not want this to be the default. But
\usepackage{fontspec}
\setmainfont{Some Font with your Characters}
should be all that you need in most cases.> Projects like Tectonic would solve this for TeX
All of these have already been fixed in TeX, except for the error messages, which would be impossible for Tectonic to fix.
(Background: the TeX engines give excellent error messages, and LaTeX gives good error messages for "expected" errors, but unexpected errors usually give a TeX engine error message unrelated to your LaTeX input, since LaTeX is internally implemented on top of TeX engine macros. So much like C++ template errors, it isn't really possible to fix this.)
> Otherwise, using TeX only makes sense nowadays only if
4) If you actually like TeX for some reason like I do :)
[0]: https://www.latex-project.org/news/latex2e-news/ltnews28.pdf...
[1]: https://www.latex-project.org/news/2024/11/01/issue40-of-lat...
I recently had a document break because I used umlauts together with the subfigs package. Apparently both use " characters internally and clash badly. This is not a particularly exotic use case.
Yeah, unfortunately lots of the third-party LaTeX packages are fairly poorly written. Which also applies to most other programming languages, but LaTeX is somewhat unique here since (1) approximately nobody makes money off of TeX, so even "important" packages are often volunteer-maintained, and (2) LaTeX2e has been around since the early 90s, so some of the popular packages have been unmaintained for over 20 years.
The core/official LaTeX code is really quite stable, but it's also very limited, so it's pretty reasonable to conflate it with the LaTeX ecosystem as a whole. But yes, the LaTeX Team, is definitely aware of the problems caused by Babel shorthands (which is what " is in German), and they're trying to figure out some way to fix it without breaking other documents.
Using LaTeX makes sense because that's what all journals and conferences expect.
I'm writing two books, both in LaTeX.
I really don't get what the problem is.
Using LaTeX is mostly fine, except for the endless compile times, useless error messages, lack of unicode, etc. like the GP said.
I'm maintaining an internally used LaTeX document class and the development experience is even worse. TeX has no concept of such avant-garde ideas like lists, dictionaries, or namespaces. Things break all the time, and sometimes only when you load three specific packages in a specific order because they all patch each other's routines. I still haven't completely groked the idea of fragile commands and expanding macros. Characters can change meaning depending on context, even the `comment` character (%) or the `escape` character (\), (and I believe even the curly braces) for example when used inside `\path{}` or `\url{}` [1]. It makes a difference whether you comment out line endings or not. The LaTeX3 syntax looks like a bad joke. I mean, look at it:
\ExplSyntaxOn \tl_set:Nn \l_tmpa_tl {A} \group_begin: \tl_set:Nn \l_tmpa_tl {B} \par value~inside~group:~\tl_use:N \l_tmpa_tl \group_end: \par value~outside~group:~\tl_use:N \l_tmpa_tl
\tl_set:Nn \l_tmpb_tl {A} \group_begin: \tl_gset:Nn \l_tmpb_tl {B} \par value~inside~group:~\tl_use:N \l_tmpb_tl \group_end: \par value~outside~group:~\tl_use:N \l_tmpb_tl \ExplSyntaxOff
????
Let's just let it retire and focus our efforts on Typst and pushing publishers to accept Typst.
[1] Just look at all these poor souls trying to achieve something as exotic as putting a URL with a percent sign inside a footnote: https://tex.stackexchange.com/questions/12230/getting-percen...
> except for the endless compile times, useless error messages, lack of unicode, etc.
Some of these have been fixed; see my sibling comment [0] for more details.
> TeX has no concept of such avant-garde ideas like lists, dictionaries, or namespaces. […]. The LaTeX3 syntax looks like a bad joke.
But that is in fact the entire purpose of LaTeX3. I agree that the syntax looks intimidating, but it's actually quite nice once you learn it, and it's written that way to provide namespacing in TeX. Similarly, LaTeX3 defines lists, dictionaries, and most other conventional datastructures.
> Things break all the time, and sometimes only when you load three specific packages in a specific order because they all patch each other's routines.
Hmm, well it depends. The LaTeX kernel and the TeX engines are more stable than nearly all other software, but the third-party packages do indeed break occasionally. But you see similar dynamics play out in most other ecosystems: JavaScript the language is incredibly stable and has excellent backwards compatibility, but if you use 50+ third-party packages, then things do indeed break occasionally.
> Characters can change meaning depending on context
Much like operator overloading in other languages, catcode changes in TeX can indeed be misused and are sometimes confusing, but they're also a pretty useful solution to problems that would otherwise be tricky to solve.
All this isn't to say that TeX doesn't have issues—I criticize LaTeX myself fairly frequently—but most of these are due to the fact that LaTeX is 40-year-old software built on a 50-year-old engine, and has remained backwards-compatible with documents throughout that entire time. And La(TeX) is slowly modernizing, so I'm fairly hopeful that things will continue to improve.
Thanks for your insight, much appreciated!
However, regarding this:
> Much like operator overloading in other languages, catcode changes in TeX can indeed be misused and are sometimes confusing, but they're also a pretty useful solution to problems that would otherwise be tricky to solve.
I'm sorry, but I've never seen overloading of such fundamental characters like the comment character or escape character anywhere. Or at least if you use these characters inside a string, it's pretty clear that the string context is special. In LaTeX I have no way of knowing which catcodes a macro has modified without essentially parsing the entire thing, which breaks syntax highlighters and language servers (something that increases quality of life in other languages substantially), because the compile times are prohibitive. The decision to let users redefine %, \ and literally every character seems like a really, really bad idea to me.
Other languages and syntaxes seem to do just fine, so I'm not sure what you mean by tricky to solve.
> most of these are due to the fact that LaTeX is 40-year-old software built on a 50-year-old engine, and has remained backwards-compatible with documents throughout that entire time
I realize that, and I appreciate what LaTeX (and by extension TeX) has done. It's a giant in sciences and the software world, of absolutely critical importance, but still. We learned a lot of lessons about writing software in the last 50 years, and Typst is applying these from the ground up. Unfortunately I don't have a lot of confidence that LaTeX can be modernized.
> I'm sorry, but I've never seen overloading of such fundamental characters like the comment character or escape character anywhere.
I believe that Racket [0], Mathematica [1], Raku [2], and Rust [3] let you assign arbitrary meanings to most symbols, but these are indeed much more restricted than TeX is (and for good reason). But the issue is really more that TeX barely supports lexical scoping, and that it lets you change the global catcodes at any point in the document, since changing the meanings of characters before any code runs or only in the middle of a scope is pretty useful.
> Or at least if you use these characters inside a string, it's pretty clear that the string context is special.
Verbatim is essentially equivalent to strings in other languages, and it mostly works pretty well, aside from the huge problem that it's impossible to nest it or pass it as an argument to most macros.
> In LaTeX I have no way of knowing which catcodes a macro has modified without essentially parsing the entire thing
Agreed, this is pretty annoying, but the only consolation is that most documents don't change their catcodes very often (since it's usually a pretty terrible idea).
> Other languages and syntaxes seem to do just fine, so I'm not sure what you mean by tricky to solve.
Texinfo [4] is what I was mainly thinking of, since that is able to completely redefine TeX's syntax without needing to manually implement parsing itself (which would be the best strategy today, but was less feasible back when computers were much slower). Similarly, active characters are pretty useful (this is how ~ is defined to insert a non-breaking space, and is also useful for faux-Markdown [5]).
> We learned a lot of lessons about writing software in the last 50 years, and Typst is applying these from the ground up.
Yup, I first learned about Typst on the first day that it was released, and even then I thought that it had a good chance of suceeding, since it solved the problems that most users had (bad error messages and slow compile times), whereas the other TeX competitors focused on things like better typesetting quality, better extensibility, or easier programability, which most users don't care about at all.
I would be personally a little disappointed if Typst replaced LaTeX, but until that happens, I definitely hope that it continues to do well.
> Unfortunately I don't have a lot of confidence that LaTeX can be modernized.
There are lots of other non-LaTeX TeX formats that are quite modern (ConTeXt [6] is my personal favourite, but OpTeX [7] is nice too), and even LaTeX itself has improved quite a bit over the last 5 years or so [8]. But yeah, it's modernization process is still much slower than nearly any other piece of software, so I am also worried that this may end up being too little too late.
[0]: https://docs.racket-lang.org/guide/language-get-info.html
[1]: https://reference.wolfram.com/language/Notation/tutorial/Not...
[2]: https://docs.raku.org/language/slangs
[3]: https://doc.rust-lang.org/book/ch20-05-macros.html#function-...
[4]: https://en.wikipedia.org/wiki/Texinfo#Texinfo_source_file
[5]: https://tex.stackexchange.com/a/236457/270600
[6]: https://www.ctan.org/pkg/context
[7]: https://petr.olsak.net/optex/
[8]: https://www.latex-project.org/news/latex2e-news/ltnews.pdf#s...
In my project https://mdview.io I support Latex, but sometimes people share some weird types of syntax which looks like Latex but not rendered correctly. Haven't' found a good way to fix it, probably I will use some hybrid approach which I adopted for broken Mermaid diagrams (LLM + heuristics)
Maybe a little off topic, but Kudos to the people who chose the name!
It reminds me of that very embarrassing dance we used to do around 2008 :)
Just wanted to say big thanks to the maintainers, I've been using tectonic the last couple years as my only LaTeX distribution, works everywhere (including macOS), it's available in conda-forge, so I can just have it as a dependency in my projects. Everything "just works", that's the best way to describe it.
Thank you for fully fixing LaTeX for me.
A tool like this is sorely needed for LaTeX, and Tectonic is especially intuitive to embed into other applications, but the divergence of XeTeX from pdflatex makes it incompatible with most Overleaf projects. This is just an unfortunate ecosystem gripe, but for most workflows I end up having to reach for latexmk instead for this reason.
Huh? Overleaf supports XeTeX out of the box. The LaTeX project, incidentally has been moving towards requiring LuaTeX as the engine (also supported in Overleaf).
The pdftex engine is pretty much a dead end these days and I would only recommend its use for compiling legacy documents.
That said, the biggest problem is nothing to do with the source code of TeX. The change file mechanism is pretty straightforward and there have been tools for decades to allow application of more than one change file against the source, although with the standardization on web2c in the build process as well as better cross-platform C compilation in 2026 vs 1982, there isn’t the proliferation of platform-dependent change files that there were in the 80s when people were compiling on Pascal compilers that supported different subsets of the language.
But as I was saying before I got into that digression, the source language isn’t the issue with TeX so much as the basic architecture which is highly coupled to the limitations of computers in the late 70s/early 80s when even 7-bit ASCII couldn’t be assumed to be consistent between systems¹. As much as I enjoy writing TeX macros and can do wonderful things with them that most people would consider dark magic, it’s a cursed way to do programming and has no parallel in any other programming paradigm.
⸻
1. The SAIL platform at Stanford where Knuth did the initial work, for example, had ↑ in the code space ASCII designates as ^, and IBM mainframes all used EBCDIC which has the complication of having | and ¦ as two separate characters both of which were typically mapped to | in EBCDIC to ASCII conversions with the reverse conversions arbitrarily choosing one of the two characters so that there was no guarantee that you’d get the expected character in your text file conversion² or your ASCII terminal controller.
2. Which is yet another reason why non-Unix operating systems would have distinct text and binary modes for opening files.
> so much as the basic architecture which is highly coupled to the limitations of computers in the late 70s/early 80s when even 7-bit ASCII couldn’t be assumed to be consistent between systems¹. As much as I enjoy writing TeX macros and can do wonderful things with them that most people would consider dark magic
Well LuaTeX solves both of these problems, but I'm pretty sure that you're already aware of that :)
> it’s a cursed way to do programming and has no parallel in any other programming paradigm
It's fairly similar to C preprocessor macros, and writing Mathematica code occasionally reminds me of TeX, but these aren't exactly the most flattering comparisons.
Thanks for the additional information information, but with the "ecosystem gripe" I meant that most Overleaf projects are inadvertently designed for pdflatex simply because it's the default. No matter how much better other compilers are, pdflatex is the de facto standard in certain circles, even if few within these circles are aware of this (e.g. university laboratories), so I've had to begrudgingly switch back to latexmk for most projects to accommodate this.