An entire Herculaneum scroll has been read for the first time
scrollprize.org1555 points by verditelabs a day ago
1555 points by verditelabs a day ago
Preprint: https://scrollprize.org/pdf/main.pdf
https://github.com/ScrollPrize/villa
I am on the vesuvius challenge team that did the segmentation, unwrapping, and ink detection, so feel free to ask any questions. How awesome do you feel right now? This is HUUUGE! To think that a scroll was unreadable for so, so long, until we invented machines that let us read it slice by slice. It's such an unfathomable achievement - we made machines that let us read 2000+ year olds fragile scrolls without ever opening them - and you helped do just that. Hats off! In March I went to Beam Line 18 at the European Synchrotron Radiation Facility. I had to swap out the scrolls on the xray pedestal. Scrolls that were presented as a diplomatic gift to Napoleon and Josephine by King Ferdinand. France has 2 of the 6 that they were given still in tact. I had to handle both of them. I have never felt more stressed in my life and have never and will probably never again handle such a priceless artifact. I feel the opposite of that feeling and am immensely proud of everything that the core challenge team has accomplished I am floored at these achievements. Such amazing work. If I may ask, when you started thinking about achieving this, what were the first attempts, ideas on how to go about it? What were some of the obstacles that had to be overcome to achieve this ? The process of trying to read the scrolls has been going on for about 275 years or so, now. Doing it nondestructively via CT scanning and virtual unrolling and reading has been in the works for 25 years or so, so it's a lot of building on previous work. Virtual unrolling and reading are not terribly hard to do manually, they are just not feasable on a large scale. Like years and years of human time spent tediously clicking on papyrus and labelling ink in renders, so a large amount of automation is required. A lot of difficulty has come from the first step: xraying the scrolls. It's hard and expensive and difficult to get right. The efforts since this all began with CT scanning 25 years ago has been kneecapped by the data simply not being good enough. We xray on what is AFAIK literally the most powerful xray beamline in the world and we would still like for it to be more powerful and faster. Not to mention the massive amounts of data. For Pherc Paris 3, our largest scroll, the raw reconstructed data is 260 terabytes. That's a lot of data to have to deal with. Lots of great work that pioneered here (I wish the website did a better job showing that?) e.g., Dr. Brett Seales and his decades of work: https://www.science.org/doi/10.1126/sciadv.1601247 Brent is an advisor on the Vesuvius Challenge. He's listed on our website as such but the work we are doing and specifically that which falls under the Vesuvius Challenge is separate from him (apart from his being an advisor), EduceLab lab at U of K, and U of K as a whole. The purpose of the scrollprize website is not to showcase the 25 years of research leading up to the Vesuivus Challenge. It's to showcase what the Vesuivus Challenge is doing. Granted none of the core team are web developers so updates to the website are best effort. ah cool - thanks for the clarification. some of the comments here read like nothing like this has ever been done before ... This is one of the most fascinating comments I’ve ever read. Thank you so much! I was wondering, how does this all get funded? There's a sponsors and partners list on their webpage: https://scrollprize.org/#sponsors > We xray on what is AFAIK literally the most powerful xray beamline in the world and we would still like for it to be more powerful and faster. What makes power relevent here? Obviously medical applications aren't particulary powerful, are quick, and are very useful. Is it harder to penetrate the material than the human body? Is the increased power due to increased resolution - i.e., increased pixels/cm^2 rather than increased watts/pixel? The latter would seem to risk damaging the artifact? We scan the full scrolls a 2.4 micron and scan portions of them at up to .5 micron. This is 1000x to 4000x higher resolution than your standard medical CT scanner, so that requires a lot more power to get readings at such high resolution. There are other properties that make large synchrotrons more amenable to our task but I am not an xray technician so am not qualified to speak to most of them. Damage to the artifacts is less than you might expect. I think that the radiation is particulary dangerous to living tissue and fiber. The scrolls are inert, pure carbon charcoal bricks for the most part and not particularly vulnerable to high power xrays. Where can we read about the xray setup? e.g the type of sensor, if/how the target and/or beam is scanned, any fancy gratings/etc, what kind of CT algorithms are used Parent comment says "Beam Line 18 at the European Synchrotron Radiation Facility" so https://www.esrf.fr/home/UsersAndScience/Experiments/BM18.ht... Just wonderful Wonderful that all of this amazing technology exists Wonderful that we used it to read these ancient scrolls Thank you Do you know what kinds of features the model is picking up on to distinguish ink from papyrus? And did you have any labeled data (images where a human expert has identified ink or perhaps a scan of a burnt scroll with known content) to help train it? Certainly my Mark 1 eyeballs would not obviously perform better than random guessing at this task. Although my eyeballs are, if nothing else, nerfed by only being able to see a 2D slice of the data. Yes. Most of the ink we have come across is carbon based. This leaves a certain texture on the scrolls that is recoverable and viewable with fairly basic physically based rendering, though how much ink is recoverable varies greatly from one character to the next. I don't have links handy but we just published updates to our data viewer page on our website. Pherc.Paris.4 I believe has the best overlay of ink. A lot of labeled data is available on our ftp server which has public access When you say "physically based rendering" do you mean that one could build a PBR model based on the (unrolled?) xray data, render that model, and be able to see the ink? edit: I found this: https://scrollprize.org/data_browser#/samples/PHercParis4/se... The JSON seems to suggest that I'm mostly looking at ink detection output, but I could easily be using the tool wrong. But I also found this awesome explanation: https://scrollprize.org/data_fragments I guess I bunch of the training was done by using fragments of scrolls where ground truth data is available using IR photography. Also... that xray resolution is absolutely amazing! Some images on that page, specifically the "alpha composite" and "combined alpha" images, are a pretty simple PBR (if it's even that complex; it's just a composite rendering over a 3d array to a 2d image) rendering with no ML based ink detection in the input. I assume that's because the writer probably sometimes shortly after re-inking the writing instrument was putting down a 10x thicker layer... Outstanding work! I've participated in the challenge, but didn't get far. One of the questions I had at the time was - if I'm going to use ML to detect ink, could it invent hallucinated letters, or even parts of text, and how to prevent that? Yes, it's quite possible for ML to hallucinate ink, though it is on a much more local scale, like predicting a slightly longer stroke, filling in more of a character than is actually in the data, etc. Perhaps enough to change a reading of a character or show where ink isnt. It is difficult for ink detection to hallucinate grammatical and idiomatic greek and latin. What is the input to the ML algorithm? Does it know the surrounding context so that it has a chance to deduce "if this stroke is slightly longer then the end result will be idiomatic greek and latin"? The input is 3d chunks of reconstructed CT data from our scans. I can't remember the specifics but maybe enough voxels for .5mm^3 at a time or so? They're all available for free from https://registry.opendata.aws/vesuvius-challenge-herculaneum... . Our trained models are all available at https://huggingface.co/scrollprize Not all machine learning is generative AI. True but like regular document scanning software there can be errors in detection. Just as with redacted documents (consistently blocked terms) or bad OCR jobs (wrong or missing characters), even if only a certain percentage comes out unmangled it is more readable than having no data at all. A stable base corpus and some dynamic programming will allow you to clean up the remainder[0]. The problem is when you can't tell which bits are unmangled. OCR systems will happily give you plausible but wrong readings, and even some scanners/copiers will change things: https://dkriesel.com/en/blog/2013/0802_xerox-workcentres_are... Yeah. There was a weird Xerox printer bug that swapped digits (turning 6s into 8s) on scanned documents caused by the JBIG2 image format [1]. [1] https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres... I am researching for a talk on the philosophy of code, the similarities of engineering and art, and why we enjoy reading old code. This amazing work you folks have done may be an interesting tangent. The biggest question I have for you is why you imagine we are so interested in reading these old scrolls. Surely some of it is to see whether or not, technically, we can. Surely some of it is to get a glimpse into the human expression inscribed on them. Are we looking to learn anything, or just to connect with our ancestors? I'd like to hear your take on it, both for why you think it's important and, if you know, why your colleagues feel similarly. I wrote this as an answer to a different question but I think it applies to what you're asking as well > Though I have an interest in Old Norse and I spend a lot of time reading Scandinavian runestones. > 90% of them are grave markers for a dead father, mother, brother, sister, cousin, etc. If I've learned anything from that, it's that people across time and space all lead lives as real and complex as anyone else's. Their joys were as high as mine have been and their sorrows as low as mine have been. A VSauce video I watched a long time ago described that realization as "chronosonder". I think trying to understand those that came before us and why they made the decisions that they did given the circumstances they were in can help better inform us of the things we choose to do given our own circumstances. Otherwise, I think that a lot of things are worth doing just to see if it's possible. I like to lift weights and I'm training to lift the Dinnie Stones one day; a pair of stones that are a combined ~730 pounds. The physical and mental benefits of exercise and training are well documented and great but at the end of the day I just _really_ wanna pick up 2 stones. There's nothing more to it than that, and that's ok with me. One of the things we said a lot in 2023 was "We just wanna read the scrolls" but that slogan has unfortunately fallen a bit by the wayside as the goal and path got longer and initial hype started to fade, but I think it perfectly encapsulates why: The scrolls are there. They can be read. Why not read them? 1. Why is that a realization, are there really people who say "Scandinavians are just mechanical" or "9th century people were made out of wood"? Why would their lives be assumed not to be "real", what even is that mindset? 2. "Real and complex lives" doesn't mean "just the same as ours", mind you. > 1 Are there really people ... Why would their lives be assumed not to be "real", what even is that mindset? Yes, there are a very great many! The philosopher David Gray says that most modern thinking sees our way of life and liberalism and "progress" as meaning growth and change. It implies it is inevitable, a kind of always changing improvement. Change that has occurred is for the good and its impossible to go back. I like the ${current_year} meme where someone says "it's 2026 things have changed, sweety". The joke is funny because that's what people actually say and that they say this every year but they don't notice that they say that every year. So the modern way of life has many people who view people in the past as not real, as figuratively made of wood, who are primitive, who didn't lead complex lives. David Gray concludes by saying that Liberalism therefore needs to be constantly fought for, that you cannot rest on your laurels and think that humanity is naturally and inexorably progressing. These scrolls and History as a whole challenges a fundamental psychological investment in modern liberalism. To think of the world as always improving and evolving for the better directly opposes a kind of empathy about how people 2500 years ago are the same human beings as we are. The scrolls should humble us. Given this. > 2. "Real and complex lives" doesn't mean "just the same as ours", mind you. They are more like ours than we like to imagine. We prefer to think of ourselves as improved. What are the wildest, most exciting but plausible things that might be discovered in these documents? I am not a papyrologist or a classicist, rather I'm a computer scientist, so my expertise is unfortunately not in _what_ the scrolls say, rather how we get there. That being said I think and hope that there will be a trove of things that has no known provenance at all, completely lost works that elude the public memory. Well what were your first thoughts when you decoded the script, besides the obvious Eureka, after making some sense of the texts? Other members that were on the team before me had already proved it out before I came along so I knew it was possible. The cool thing for me though was specifically doing some physicically based rendering techniques. How well these work varies greatly, but on a few segments in one scroll they work extremely well. I whipped up some simple code to composite layers, did up a render, and without any ML at all was looking at multiple rows of text that no one had read for 2000 years. That was neat. Your response reminds me of Nigel Richards :) https://en.wikipedia.org/wiki/Nigel_Richards Congratulations, and thank-you! Aristotle's second book of Poetics, of course. we already know that a blind Italian monk burnt it to ashes, at least, that's what Eco wrote and he was a learned scholar but that was a copy well the other existing copy (or original) was destroyed with the library of Alexandria Here's a list. The scrolls are from a library that burned in 79 AD. Woah there was a lost Homer epic comedy about a bumbling fool named Margites? There's also the Telegony. Odysseus has a son through Circe who winds up killing him and marrying Penelope. Odysseus son through Penelope, Telemachus, marries Circe. There's some wild stuff that doesn't survive. Looking through these it’s crazy to find out that The Iliad is only 1 of like 5 original texts on the Trojan war. We’re reading book 2 of a 5 book series It was an oral epic passed through generations for quite a while before anything was written down so there isn't necessarily much of an "original" Probably a lot more texts of Epicurean philosophy and not a whole lot else unfortunately according to my papyrologist friend. That's what was thought, but maybe not -- only one of the three so far looks Epicurean, which is not what was expected. Maybe it's a fluke, but historians are buzzing a bit about whether it might be broader than expected. Why would Epicurean philosophy be unfortunate? I was under the impression that there was almost nothing left of that school of thought, and that it’s writings had been destroyed. What would you like to have instead? The unfortunate part is the lack of anything else therein, not that it's Epicurean philosophy. The Jewish Talmud uses Epicurus's name as a term meaning "heretic". The Epicureans were particularly hostile to the Jews and Christians, because Epicureans deny Providence or the active intervention of the divine in human affairs. See Horace Sermones 1.5. It's more like the Christians and the Jews were particularly hostile to Epicureans and Stoics, because those mocked the claims about the existence of an all-powerful God that requires prayers. The Epicureans and Stoics did not care much about Christians and Jews, but after the Christians obtained the power in the Roman Empire they made great efforts to persecute and discredit the Epicureans and the Stoics, as the most dangerous kinds of non-believers. (Unlike the rational Epicureans and Stoics, the traditional polytheists could be much easier converted to Christianity, by inventing a set of Christian saints to which the former polytheists could redirect the prayers and the holidays to which they were habituated.) The Christian propaganda has created a false image of the Epicureans, which has persisted until today. The Epicureans were not atheists, but they had a very different conception about what Gods are. They thought that in nature there are a lot of entities that have a god-like power, i.e. humans are too small and weak to influence them in any way, but the life of the humans is strongly dependent on the actions of those entities, so they can rightly be considered as gods. Examples of such entities are the Sun, the Moon, storms, volcanos etc. Unlike in the traditional Greek and Roman religions, where it was believed that for each such natural phenomenon there exists some sentient god, who can be convinced to change the events to a more favorable outcome by prayers and sacrifices, the Epicureans believed that the gods, even supposing that they were sentient, in any case they do not care about humans more than humans care about ants, so there is absolutely no point in praying to them or bringing sacrifices to them. Therefore humans should conduct their life according to ethic principles, but without worrying about what gods may think about their actions. Many modern humans would probably agree with the Epicurean philosophy, which was completely different from what the Christian propaganda claimed, e.g. that Epicureans were some kind of sinners addicted to pleasures. > completely different from what the Christian propaganda claimed, e.g. that Epicureans were some kind of sinners addicted to pleasures. Interestingly, in Jewish literature (Talmud and further refined by Maimonedes) Epicurus refers to a certain kind of non-believer, not to a sinner for pleasure. See here for example https://www.sefaria.org/Mishneh_Torah%2C_Repentance.3.8?lang... I always wondered about that because I guess I fell for the "Christian propaganda" as you call it.
verditelabs - a day ago
Izmaki - 21 hours ago
verditelabs - 20 hours ago
_boffin_ - 14 hours ago
verditelabs - 14 hours ago
Upvoter33 - 21 minutes ago
verditelabs - 14 minutes ago
Upvoter33 - 4 minutes ago
tokioyoyo - 13 hours ago
rjtavares - 9 hours ago
mmooss - an hour ago
verditelabs - an hour ago
skew-aberration - 12 hours ago
SideburnsOfDoom - 7 hours ago
anentropic - 9 hours ago
amluto - 21 hours ago
verditelabs - 21 hours ago
amluto - 19 hours ago
verditelabs - 18 hours ago
londons_explore - 19 hours ago
Dzugaru - a day ago
verditelabs - a day ago
im3w1l - a day ago
verditelabs - a day ago
cwnyth - a day ago
mc32 - a day ago
dleeftink - a day ago
mkl - 21 hours ago
selcuka - 17 hours ago
Jeaye - 16 hours ago
verditelabs - 15 hours ago
card_zero - 15 hours ago
thinkingemote - 12 hours ago
adriand - a day ago
verditelabs - a day ago
arikrahman - a day ago
verditelabs - 21 hours ago
readthenotes1 - a day ago
GeoAtreides - a day ago
wolfi1 - 15 hours ago
pestatije - 11 hours ago
wolfi1 - 7 hours ago
colechristensen - a day ago
kouru225 - a day ago
sapphicsnail - a day ago
kouru225 - 21 hours ago
colechristensen - 19 hours ago
suddenlybananas - a day ago
Matticus_Rex - a day ago
cwmoore - a day ago
cwnyth - a day ago
ogogmad - a day ago
Telemakhos - a day ago
adrian_b - 21 hours ago
FergusArgyll - 20 hours ago