Roman Letters
romanletters.org91 points by diodorus 3 days ago
91 points by diodorus 3 days ago
A few things about AI-led projects like this come to my mind — first, it’s cool to see all this pulled together. I’m sure the design will read “Claude 2026” soon, but that’s fine - it’s clean and generally has reasonable UX.
There are some real rough spots - for instance, the Latin texts are generated via OCR from scanned documents directly; they’re not from some other scholarly corpus that’s been checked. I only looked at a few, but they all have significant transcription difficulties. Sources are linked, and those sources seem to be archive.org scans. Of course, getting a fluid-sounding translation out of a somewhat shitty transcription is something AI will do for you happily, but it’s harder to get it to tell you where it’s gone off the rails.
That’s not the thing that comes to mind, though. What comes to mind is that projects like this are super useful scaffolding, and I hope it’s built as such. Transcription will get better. Actually I’m pretty sure it could be better now, given the output quality. Translations of better transcriptions will be better. Plus we will likely have higher quality translation tech available.
So, I’d like to see a project like this lean in to that iterative side of this kind of scholarship/hobby/historical work and make versioning and logging of updates part of the interface. Starting in the late 1990s many academic projects did this with large corpuses of documents, (I’m familiar at the least with the Yale Jonathan Edwards project), and used crowd sourced support — there’s no reason not to include facilities that interleave the AI and interested Latin/Roman scholars here.
In my mind with that done, this could turn into a genuinely useful tool. Which would be cool!
I haven't checked any texts from the 500s. But I did some work with texts from the 1700s. Most of them had terrible transcriptions on archive.org, made using old tesseract versions. You could probably improve a lot with newer tesseract versions. I went for the nuclear option and just passed the image of each page (along with some context on how the previous page ended) to Qwen2.5vl:32b and got near-perfect transcriptions. And as you can tell by the old model that was months ago, vision models only got better.
Of course in some cases vision models are a liability for OCR because the errors they do make are replaced by plausible sounding replacements instead of alphabet soup. But if you only use the transcription as input for an LLM that doesn't matter. It only becomes an issue of how much compute you are willing to throw at it
Yes, exactly. What could be durable is not the specific transcription as of today - until it’s perfect or at least ‘good enough’ - but the web site, comments, and process that can be run and turn into improved results - that part seems likely to be valuable to me.
What a cool project, I like this one where Pliny the Younger complains about a no-show at his dinner party:
This appears to be written to this guy: https://en.wikipedia.org/wiki/Gaius_Septicius_Clarus
Had to look up "sow's matrices."
> A "sow's matrix" (or vulva in Latin) is a dish from ancient Rome consisting of the uterus of a sow (a female pig), often specifically from one that has never farrowed or that was slaughtered shortly after farrowing. It was considered a delicacy among the wealthy elite and was a common dish served at lavish Roman banquets and dinner parties, often used as a sign of luxury, wealth, and status.
The website looked as any LLM ("AI") generated one, usually via Claude, considering the design that model frequently uses.
And it is (300,755++ lines from Claude): https://github.com/CraigVG/roman-letters-network
Here, I am sorry, but I just cannot consider it serious nor accountable, since I just cannot trust its data.
If all the information there is valid and verified, every single letter and the authors' word after the LLM's processing, then the "AI" may be dimmed.
Yet, I don't believe so, knowing how unlimitedly every subjective word may change contexts, and using objectified and limited LLM for it?
There's `?scholarly=true` GET parameter mentioned in the `:/CLAUDE.md`, but a quick check of its behavior didn't result in any change.
Regardless, the idea and overall intention that highlights the impact and importance of history, and presents connections between infinitely unique and miraculous people around the infinite world... where every single word carries a life moment... is ineffably magnificent...
Thank you, Craig Vander Galien, for the idea and love in history!
---
> Modern English translations were produced using Claude (Anthropic), working from either the Latin/Greek original or an existing 19th-century English version. Translation work was guided by two internal documents: a translation guide covering late antique epistolary conventions, rhetorical register, and how to handle common formulaic phrases; and a modern voice guide specifying tone, vocabulary level, and how to avoid archaism while remaining faithful to the original.
>
> AI-generated translations are clearly marked in the interface. They are provided for accessibility and research convenience, not as authoritative scholarly translations. The original Latin or Greek is preserved alongside every translation, and 19th-century English versions are shown where available. Corrections from domain experts are welcome.
>
> Source: https://romanletters.org/about/The design is good. It is unoriginal but not every project needs to use an original design.
serious_angel is not contending with you that the design is bad, or that it is bad because it is unoriginal. In fact, they are not even specifically calling out the design.
They have noticed the design, recognized it as the output of an LLM, then proceeded to discover that an LLM was involved in much of the creation of the project. This is an academic project. Whatever the pedigree of the researcher is, this implies to the grandparent that the final result of the work may be amateurish or worse, to an extent generated. Therefore, he's concerned that it puts the legitimacy of the research outcomes (e.g. completeness, contents of letters, classification, maybe even hallucinations in the thesis proper).
Preemptive arguments:
1. "The author's a researcher, not a programmer; therefore it's fine to use an LLM. It is preposterous to ask each researcher to learn web development to publish their research." You are right, but given the amount of vibe-coded websites we see, and them all having the default (Astro?) style, the grandparent all the same has the right to associate that style with untrustworthy crap. I'm not saying that this academic website is necessarily crap. However, I think it's useful for the grandparent to share their sentiment, because the researcher might not know.
2. "A lot of pages have links to sources; you could verify the legitimacy yourself". perhaps, but doubting the veracity of research is a bad first impression, isn't it?
It's a bit sad, because the website is non-trivial, and would have taken quite a bit of effort without an LLM. But it is difficult to separate webdev enablement with the rest of the LLM baggage.
A pity the Latin text isn't made available as well.
But it is. There is a link to the original text at the bottom of the translation.