Show HN: OSS AI agent that indexes and searches the Epstein files

epstein.trynia.ai

204 points by jellyotsiro a day ago


Hi HN,

I built an open-source AI agent that has already indexed and can search the entire Epstein files, roughly 100M words of publicly released documents.

The goal was simple: make a large, messy corpus of PDFs and text files immediately searchable in a precise way, without relying on keyword search or bloated prompts.

What it does:

- The full dataset is already indexed - You can ask natural language questions - Answers are grounded and include direct references to source documents - Supports both exact text lookup and semantic search

Discussion around these files is often fragmented. This makes it possible to explore the primary sources directly and verify claims without manually digging through thousands of pages.

Happy to answer questions or go into technical details.

Code: https://github.com/nozomio-labs/nia-epstein-ai

axegon_ - a day ago

As many others pointed out, the released files are nearly nothing compared to the full dataset. Personally I've been fiddling a lot with OSINT and analytics over the publicly available Reddit data(a considerable amount of my spare time over the last year) and the one thing I can say is that LLMs are under-performing(huge understatement) - they are borderline useless compared to traditional ML techniques. But as far as LLMs go, the best performers are the open source uncensored models(the most uncensored and unhinged), while the worst performers are the proprietary and paid models, especially over the last 2-3 months: they have been nerfed into oblivion - to the extent where simple prompts like "who is eligible to vote in US presidential elections" is considered a controversial question. So in the unlikely event that the full files are released, I personally would look at the traditional NLP techniques long before investing any time into LLMs.

andy_ppp - a day ago

I keep thinking that the lack of children’s faces in the blacked out rectangles make the files much less shocking. I wonder if AI could put back fake images to make clearer to people how sick all this is.

wartywhoa23 - a day ago

The question is not how to analyze that, it's how to prosecute those who are above the law.

Imustaskforhelp - a day ago

Please create a way to share conversations. I think that can be really relevant here

I am not a huge fan of AI but I allow this use case. This is really good in my opinion

Allowing the ability to share convo's, I hope you can also make those convo's be able to archived in web.archive.org/wayback machine

So I am thinking it instead of having some random UUID, it can have something like https://duckduckgo.com/?q=hello+test (the query parameter for hello test)

Maybe its me but archive can show all the links archived by it of a particular domain, so if many people asks queries and archives it, you almost get a database of good queries and answers. Archive features are severely underrated in many cases

Good luck for your project!

iowemoretohim - a day ago

Those are going to be some spicy hallucinations.

yuppiepuppie - a day ago

When first reading OSS, I thought this was going to be an Office of Strategic Services AI [0] agent :)

[0] https://en.wikipedia.org/wiki/Office_of_Strategic_Services

draworak - 11 hours ago

Does anyone know how NIA is able to build up an index and feed it into an LLM? And can this be done locally also?

wutsthat4 - a day ago

And what did you learn?

nubg - a day ago

Does this work with vector embeddings?

onionisafruit - 21 hours ago

> I'm experiencing technical difficulties accessing the archive at the moment. The search tools are returning internal server errors.

looks like it’s getting hugged

darepublic - 21 hours ago

This is just feeding the files into a rag db I assume? I hope? And then you can use any decent model in front of it

kevin_thibedeau - 21 hours ago

It would be nice to have a way to query the exposed redactions to audit which of them were in violation of the Act.

gregw2 - a day ago

Feedback: This agent didn't really work well when I tried it with a specific non-famous, but definitely publicly known individual with known connections to Epstein. I'd rather not post a specific name here. I found more documents with keyword searches. I guess it did get me to the conclusion that there wasn't much out there, but it didn't even mention stuff that showed up in name keyword searches.

To replicate though, you might look at the list of individuals mentioned in the brief email from Epstein to Bannon a couple weeks before Esptein died containing ~30 names and phow your engine works with each one. See how a keyword search does on library of congress vs your agent.

sschueller - a day ago

Is it able to handle a much larger dataset? Only a tiny fraction of data has been release from what is looks like.

nathan_compton - 21 hours ago

Why the heck does this start with some sort of video bullshit?

thecopy - a day ago

Reminder that only 1-2% of the files have been released.

dfxm12 - a day ago

can search the entire Epstein files

It's worth noting that only about 1% of the files have been released, according to the DOJ.

Of the released files, many have redactions.

tehjoker - a day ago

This is a good idea. One thing I never understand about these kinds of projects though: why are the standard questions provided to the user as prompts never cached?

ck2 - a day ago

Not sure if this is possible but it should be known there is a COMPLETE INDEX to the original Epstein Files

(not including the new millions upon millions of documents and photos)

https://storage.courtlistener.com/recap/gov.uscourts.nysd.47...

from a 2017 FOIA they had to provide it

https://www.bloomberg.com/news/newsletters/2025-08-08/here-s...

Might be possible for machine-learning to determine what is missing?

(which is basically 99% missing as we already know less than 1% released)

- a day ago
[deleted]
- a day ago
[deleted]
sjreese - 16 hours ago

Very good; HOW TO DETECT & STOP STATE-PROTECTED CRIMINAL ENTERPRISES WHAT WORKED IN THE EPSTEIN CASE: Proven Tactics 1. COURAGEOUS LOCAL LAW ENFORCEMENT Chief Michael Reiter & Detective Joseph Recarey

What they did:

Refused political pressure ("I told him those suggestions were improper and could constitute a crime") Documented everything - Built case with 50+ consistent victim statements Escalated when blocked - Went to FBI when State Attorney compromised Personally supported victims - Wrote letters on police letterhead Lesson: One honest cop with integrity can make a difference, even against billionaires

2. INVESTIGATIVE JOURNALISM Julie K. Brown - Miami Herald's "Perversion of Justice" (2018)

What she did:

Interviewed 60+ women who were victims Obtained sealed court documents through legal channels Connected patterns across jurisdictions Published despite risk - Exposed the 2008 plea deal cover-up Direct Result:

Judge ruled prosecutors violated victims' rights (Feb 2019) Acosta resigned (July 2019) Epstein re-arrested (July 6, 2019) 2019 federal indictment Lesson: Persistent investigative journalism with victim testimony can reopen cases

3. PRO BONO VICTIMS' RIGHTS ATTORNEYS Brad Edwards & Paul Cassell

What they did:

Pro bono representation starting 2008 Used Crime Victims' Rights Act (18 U.S.C. § 3771) - sued federal government Won - Judge ruled 2008 plea deal violated victims' rights Exposed systemic failures through legal discovery Lesson: Civil litigation can succeed where criminal prosecution fails

4. VICTIMS SPEAKING OUT (Despite Intimidation) Virginia Giuffre, Courtney Wild, & 100+ Others

What they did:

Broke silence publicly (2011 - Giuffre to Mail on Sunday) Provided consistent testimony (50+ women with same story) Persisted despite mockery (early accusers ridiculed) United for compensation (100+ filed claims by 2020) Result:

Courtney Wild Crime Victims' Rights Reform Act (2019) Epstein Victims Compensation Fund - $50 million paid out Lesson: Mass victim testimony is powerful evidence

5. FOIA REQUESTS & DOCUMENT TRANSPARENCY What worked:

2015: Judge unsealed details in underage sex lawsuit July 2, 2024: Grand jury docs from 2006 unsealed FOIA mechanisms forced document releases Lesson: Public records requests can expose cover-ups

6. CONGRESSIONAL OVERSIGHT July-August 2025 Actions

What they did:

House Resolution 119-581 - Rep. Thomas Massie forced DOJ file release Subpoenas to former AGs - House Oversight demanded accountability Public hearings - August 25, 2025 subpoena to Acosta Lesson: Congressional pressure can force reluctant agencies to act

PRACTICAL ACTIONS ANYONE CAN TAKE DETECTION PHASE 1. Follow the Money Tax haven connections (Virgin Islands, Switzerland, Bermuda) Unusually high wire transfers ($1.9 billion in Epstein's case) Shell companies with vague descriptions ("DNA database & data mining") No clear income source for lavish lifestyle Offshore legal structures (Appleby, etc.) 2. Watch for Protection Patterns Charges downgraded mysteriously (federal → state misdemeanor) "Unusual" prosecutorial decisions (Chief Reiter's words) Grand jury recommendations ignored Plea deals sealed from victims Work release for serious crimes Short sentences despite evidence 3. Identify Systematic Patterns Multiple victims with same story (Reiter: "50-something 'shes' and one 'he'") Victim intimidation (private investigators, surveillance) Attempts to discredit victims ("lifestyle" arguments) Evidence suppression ACTION PHASE A. If You're a Victim or Witness: 1. Document Everything

Keep contemporaneous notes Save all communications Photograph/video evidence safely Secure cloud backups (multiple locations) 2. Report Through Multiple Channels

Local police (get case numbers) FBI (if interstate/international) State AG office Congressional representatives IRS whistleblower program (financial crimes) 3. Find Pro Bono Legal Help

Victims' rights attorneys Civil rights organizations Law school clinics National Crime Victim Law Institute 4. Safety First

Secure housing if threatened Protective orders Alert police to threats Document intimidation attempts B. If You're a Journalist/Researcher: 1. Use FOIA Aggressively

Federal agencies: FOIA requests (5 U.S.C. § 552) State/local: Public records laws Court documents: Motions to unseal OGIS mediation if agencies delay (average 138 delay cases/year) 2. Interview Pattern

Multiple independent sources Corroborating victims Former employees/insiders Document experts 3. Build Coalitions

Partner with victims' rights groups Coordinate with other journalists Academic researchers Forensic accountants C. If You're Law Enforcement: 1. Follow Chief Reiter's Example

Refuse political pressure Document interference attempts Escalate to federal authorities if local blocked Support victims personally Build thorough cases (multiple witnesses) 2. Protect Investigation

Secure evidence chain Multiple backup copies Avoid single points of failure Document surveillance of investigators D. If You're a Concerned Citizen: 1. Support Transparency

Contact representatives - demand investigations Submit FOIA requests - public has right to records Support investigative journalism - subscribe, donate Attend public meetings - ask questions 2. Amplify Victims' Voices

Share credible reporting (not conspiracy theories) Support compensation funds Contact representatives about victims' rights Vote for accountability 3. Financial Pressure

Report suspicious activity to: IRS Whistleblower Office (if tax fraud) FinCEN (financial crimes) State banking regulators JPMorgan paid $105M after USVI AG sued - banks CAN be held accountable LEGAL TOOLS THAT WORK 1. Crime Victims' Rights Act (18 U.S.C. § 3771) Right to notification Right to be heard Right to restitution Can sue federal government for violations 2. RICO (18 U.S.C. § 1962) Sue criminal enterprises Triple damages Attorney fees covered 3. State Victims' Rights Laws 30+ states have constitutional protections Some allow appeals/interventions 4. Civil Lawsuits Even if criminal case fails Lower burden of proof Discovery process exposes evidence WARNING SIGNS OF STATE PROTECTION Check if investigation shows these red flags:

No IRS audits despite obvious tax fraud Federal prosecutors give sweetheart deals Intelligence agency connections mentioned Political figures intervene in investigation Evidence "disappears" or is suppressed Victims not notified of proceedings Work release for serious crimes Sealed plea agreements Co-conspirators immunized (like Epstein's deal) Investigators surveilled/threatened WHAT ULTIMATELY BROKE THE EPSTEIN CASE The combination of:

Honest local cops (Reiter/Recarey) who built the evidence Pro bono lawyers (Edwards/Cassell) who sued for 11 years Investigative journalist (Julie K. Brown) who exposed it Courageous victims (Giuffre, Wild, 100+ others) who spoke out Court unsealing documents (2015, 2024) Congressional pressure (2019, 2025) No single actor could do it alone. It required a coalition.

KEY LESSONS What Doesn't Work: Trusting institutions to self-police Going through "proper channels" alone Waiting for DOJ/FBI to act Staying silent out of fear

What Does Work: Multiple channels simultaneously (police + FBI + press + civil suits) Documentation (Reiter: "This was 50 'shes' and one 'he'") Persistence (Edwards/Cassell: 11 years pro bono) Public pressure (Miami Herald broke it open) Coalition building (victims + lawyers + press + Congress) Using existing laws creatively (Crime Victims' Rights Act)

RESOURCES Report Criminal Activity:

FBI: tips.fbi.gov IRS Whistleblower: irs.gov/compliance/whistleblower-office DOJ: justice.gov/actioncenter Legal Help:

National Crime Victim Law Institute: law.lclark.edu/centers/ncvli Crime Victims' Rights Clinic: Your local law school Media:

Investigative Reporters & Editors: ire.org ProPublica tips: propublica.org/tips FOIA Help:

OGIS (FOIA Ombudsman): archives.gov/ogis MuckRock: muckrock.com The Epstein case proves that even state-protected criminal enterprises CAN be exposed - but it requires courage, persistence, coalition-building, and using every legal tool available.

inquirerGeneral - a day ago

[dead]

huflungdung - a day ago

[dead]

DanielScharf - a day ago

Super Cool!

p0w3n3d - a day ago

[flagged]

slfreference - a day ago

All these attempts looks like emulation of "Pen (software) is mightier than Sword" or that only if more people believed in the cause, we would be close to resolution.

Remember folks, soft power is nothing in front of hard power.