Show HN: Scanned 1927-1945 Daily USFS Work Diary

forestrydiary.com

96 points by dogline 13 hours ago


My great-grandfather Reuben P. Box was a US Forest Ranger in Northern California, and I've got his daily work diary from 1927-1945, through the depression, WWII, Conservation Corps, and lots of forest fires. I've scanned the entire thing, had Claude help with transcription, indexing, and web site building, and put the whole thing here:

https://forestrydiary.com/

This is one of those projects I've sat on for years, but with Claude and Mistral helping with the handwriting recognition, and even helping me write a custom scanning app that would auto scan each page and put it into a database as I assembled everything.

As far as I know, this is the only US Forestry Diary that has been fully scanned in and published. I understand that there are other diaries in some collections, but none have been scanned in. I hope this helps somebody. Please let me know if it does.

This is the sort of project Claude and AI can help with - A personal project that sits on the shelf forever, but now a reasonable project that can be published in my spare time. I'm not trying to earn money on this, but just improving our knowledge and history just a little bit.

dogline - 12 hours ago

Also, just to clarify, I scanned all 7488 pages in personally (Fujitsu ScanSnap ix500). With Claude's help, I found some undocumented SANE features to auto crop and fix the scans, then had a Python script in Linux auto scan them and put them into a Postgres database as I went. Other scripts would add transcription, summaries, and auto index everything.

"mistral-ocr-latest" did really good handwriting transcription, considering how tight and small some of the handwriting is. Then back to Claude API calls to summarize by month and collect people and places from all of the entires.

Claude then created static html pages from what started as a Flask app. Published on Dreamhost.

jlpk - 11 hours ago

Nice work! For others with journals in the U.S., but not feeling up to all the scanning and transcription work, I volunteer with the American Diary Project (https://americandiaryproject.com/) based in Cleveland Ohio. You can donate journals to be archived and shared. It's only been established in the past few years, and all scanning/transcription is done by volunteers, but are currently evaluating more automated pipelines like OPs. So great to see it in practice!

ricksunny - 10 hours ago

Imagine how much unanticipated historical perspectives might become uncovered if everyone uploaded paraphenelia of long deceased ancestors like this; after indexing, and searched as one hyper-amalgamated crowdsourced knowledge graph, can show who was where doing what in say the 1920s, 1930s, 1940s in a way that mainstream history might fall short of capturing.

anonymous908213 - 7 hours ago

Very cool. Some feedback:

- I think it would be a very large improvement if the actual diary pages/transcriptions were more accessible. I found the LLM summaries completely uncompelling, and did not particularly appreciate having to scroll through 5+ pages of LLM summary to get to the part where I could actually read the diary entries for a given month.

- The dates of the diary entries for many months are broken. For example, in the final month, all of the entries are labelled 1945-03-19. From a cursory examination, I believe the dating broke 24th July 1941 and was broken for every month from there to the end.

- The page for Nov 1941 seems entirely broken. For some reason, the dates labelling the pages are described in a different format that included the name of the month rather than a numeric representation, the pages are out of order, and then all manner of months are mixed in. The first pages are "November 1941", "April 1941", "October 2 1941", "October 3 1941", "November 4 1941", "November 12 1941", "November 7 1941" ... and so on. The LLM summary notes an "Event", a construction project that took place from 1931 to 1934, despite this being the entry for Nov 1941.

canada_dry - 8 hours ago

Nicely done.

It inspires me to tackle a project I've been holding off on for many years: OCR my grandmother/great-grandmother's cookbook. It's about 100 pages of collected and annotated recipes from the 1930-1980s.

OCR and AI have become sufficiently capable (as you've demonstrated) to properly scan, index, and classify the recipes into something I can share with relatives online or as an ebook.

reaperducer - 12 hours ago

Fun fact: "Government mule" isn't just an expression, it's a real thing. And the U.S. government, including the Forest Service, still employs teams of mules to carry things to places that can't be reached any other way.

toomuchtodo - 12 hours ago

Well done! Have you uploaded these scans to the Internet Archive? If not, please consider doing so.

https://help.archive.org/help/uploading-a-basic-guide/

https://help.archive.org/help/managing-and-editing-your-item...

Trail Crew Stories and Mountain Gazette might also be interested in this.

https://www.trailcrewstories.com/

https://mountaingazette.com/

unit149 - 12 hours ago

[dead]