Reverse engineering a $1B Legal AI tool exposed 100k+ confidential files
alexschapiro.com228 points by bearsyankees 2 hours ago
228 points by bearsyankees 2 hours ago
This is the collision between two cultures that were never meant to share the same data: "move fast and duct-tape APIs together" startup engineering, and "if this leaks we ruin people's lives" legal/medical confidentiality.
What's wild is that nothing here is exotic: subdomain enumeration, unauthenticated API, over-privileged token, minified JS leaking internals. This is a 2010-level bug pattern wrapped in 2025 AI hype. The only truly "AI" part is that centralizing all documents for model training drastically raises the blast radius when you screw up.
The economic incentive is obvious: if your pitch deck is "we'll ingest everything your firm has ever touched and make it searchable/AI-ready", you win deals by saying yes to data access and integrations, not by saying no. Least privilege, token scoping, and proper isolation are friction in the sales process, so they get bolted on later, if at all.
The scary bit is that lawyers are being sold "AI assistant" but what they're actually buying is "unvetted third party root access to your institutional memory". At that point, the interesting question isn't whether there are more bugs like this, it's how many of these systems would survive a serious red-team exercise by anyone more motivated than a curious blogger.
It's a little hilarious.
First, as an organization, do all this cybersecurity theatre, and then create an MCP/LLM wormhole that bypasses it all.
All because non-technical folks wave their hands about AI and not understanding the most fundamental reality about LLM software being fundamentally so different than all the software before it that it becomes an unavoidable black hole.
I'm also a little pleased I used two space analogies, something I can't expect LLMs to do because they have to go large with their language or go home.
While true this comment seems AI written. I did a fair bit of exploration around AI responses to HN threads and this fits the pattern.
That comment didn't read like AI generated content to me. It made useful points and explained them well. I would not expect even the best of the current batch of LLMs to produce an argument that coherent.
This sentence in particular seems outside of what an LLM that was fed the linked article might produce:
> What's wild is that nothing here is exotic: subdomain enumeration, unauthenticated API, over-privileged token, minified JS leaking internals.
The users' comment history does read like generic LLM output. Look at the first lines of different comments:
> Interesting point about Cranelift! I've been following its development for a while, and it seems like there's always something new popping up.
> Interesting point about the color analysis! It kinda reminds me of how album art used to be such a significant part of music culture.
> Interesting point about the ESP32 and music playback! I've been tinkering with similar projects, and it’s wild how much potential these little devices have.
> We used to own tools that made us productive. Now we rent tools that make someone else profitable. Subscriptions are not about recurring value but recurring billing
> Meshtastic is interesting because it's basically "LoRa-first networking" instead of "internet with some radios attached." Most consumer radios are still stuck in the mental model of walkie-talkies, while Meshtastic treats RF as an IP-like transport layer you can script, automate, and extend. That flips the stack:
> This is the collision between two cultures that were never meant to share the same data: "move fast and duct-tape APIs together" startup engineering, and "if this leaks we ruin people's lives" legal/medical confidentiality.
The repeated prefixes (Interesting point about!) and the classic it's-this-not-that LLM pattern are definitely triggering my LLM suspicions.
I suspect most of these cases aren't bots, they're users who put their thoughts, possibly in another language, into an LLM and ask it to form the comment for them. They like the text they see so they copy and paste it into HN.
It's probably a list of bullet points or disjointed sentences fed to the LLM to clean up. Might be a non-English speaker using it to become fluent. I won't criticize it, but it's clearly LLM generated content.
“This comment is AI” is the new “First Post” from /. days. Please stop unless you have evidence or a good explanation.
That was literally the same thought that crossed my mind. I agree wholeheartedly, accusing everything and everyone of being AI is getting old fast. Part of me is happy that the skepticism takes hold quickly, but I don't think it's necessary for everyone to demonstrate that they are a good skeptic.
(and I suspect that plenty of people will remain credulous anyway, AI slop is going to be rough to deal with for the foreseeable future).
Yeah, you have a point... the comment - and their other comments, on average - seem to fit quite a specific pattern. It's hard to really draw a line between policing style and actually recognising AI-written content, though.
What makes you think that? it would need some prompt engineering if so since ChatGPT won't write like that (bad capitalization, lazy quoting) unless you ask it to
“Chat, write me a blog article that seems like a lazy human who failed English wrote it”?
What’s worse being accused of an AI post or being defended because your post is so bad that AI wouldn’t have written it?
Ya ur right, it's either LLM generated, LLM enhanced, or the author has been reading so much LLM output that its writing style has rubbed off.
We finally have a blog that no one (yet) has accused of being ai generated, so obviously we just have to start accusing comments of being ai. Can't read for more than 2 seconds on this site without someone yelling "ai!".
For what it's worth, even if the parent comment was directly submitted by chatgpt themselves, your comment brought significantly less value to the conversation.
It's the natural response. AI fans are routinely injecting themselves into every conversation here to somehow talk about AI ("I bet an AI tool would have found the issue faster") and AI is forcing itself onto every product. Comments dissing anything that sounds even remotely like AI is the logical response of someone who is fed up.
Every other headline and conversation having ai is super annoying.
But also, its super annoying to sift through people saying "the word critical was used, this is obviously ai!". not to mention it really fucking sucks when you're the person who wrote something and people start chanting "ai slop! ai slop!". like, how am i going to prove is not AI?
I can't wait until ai gets good enough that no one can tell the difference (or ai completely busts and disappears, although that's unlikely), and we can go back to just commenting about whether something was interesting or educational or whatever instead of analyzing how many em-dashes someone used pre-2020 and extrapolating whether their latest post has 1 more em-dashes then their average post so that we can get our pitchforks out and chase them away.
Cultural acceptance of conversation with AI should've come because of actual AI that are indistinguishable from humans, being forced to swallow recognizable if not blatant LLM slop and turn a blind eye feels unfair
What? It doesn't read that way to me. It reads like any other comment from the past ~15 years.
The point you raised is both a distraction... And does not engage with the ones it did.
This might be off topic since we are in topic of AI tool and on HackerNews.
I've been pondering a long time how does one build a startup company in domain they are not familiar with but ... Just have this urge to 'crave a pie' in this space. For the longest time, I had this dream of starting or building a 'AI Legal Tech Company' -- big issue is, I don't work in legal space at all. I did some cold reach on lawfirm related forums which did not take any traction.
I later searched around and came across the term, 'case management software'. From what I know, this is what Cilo fundamentally is and make millions if not billion.
This was close to two years or 1.5 years ago and since then, I stopped thinking about it because of this understanding or belief I have, "how can I do a startup in legal when I don't work in this domain" But when I look around, I have seen people who start companies in totally unrelated industry. From starting a 'dental tech's company to, if I'm not mistaken, the founder of hugging face doesn't seem to have PHD in AI/ML and yet founded HuggingFace.
Given all said, how does one start a company in unrelated domain? Say I want to start another case management system or attempt to clone FileVine, do I first read up what case management software is or do I cold reach to potential lawfirm who would partner up to built a SAAS from scratch? Other school of thought goes like, "find customer before you have a product to validate what you want to build", how does this realistically work?
Apologies for the scattered thoughts...
I'm always a bit surprised how long it can take to triage and fix these pretty glaring security vulnerabilities. October 27, 2025 disclosure and November 4, 2025 email confirmation seems like a long time to have their entire client file system exposed. Sure the actual bug ended up being (what I imagine to be) a <1hr fix plus the time for QA testing to make sure it didn't break anything.
Is the issue that people aren't checking their security@ email addresses? People are on holiday? These emails get so much spam it's really hard to separate the noise from the legit signal? I'm genuinely curious.
In my experience, it comes down to project management and organizational structure problems.
Companies hire a "security team" and put them behind the security@ email, then decide they'll figure out how to handle issues later.
When an issue comes in, the security team tries to forward the security issue to the team that owns the project so it can be fixed. This is where complicated org charts and difficult incentive structures can get in the way.
Determining which team actually owns the code containing the bug can be very hard, depending on the company. Many security team people I've worked with were smart, but not software developers by trade. So they start trying to navigate the org chart to figure out who can even fix the issue. This can take weeks of dead-ends and "I'm busy until Tuesday next week at 3:30PM, let's schedule a meeting then" delays.
Even when you find the right team, it can be difficult to get them to schedule the fix. In companies where roadmaps are planned 3 quarters in advance, everyone is focused on their KPIs and other acronyms, and bonuses are paid out according to your ticket velocity and on-time delivery stats (despite PMs telling you they're not), getting a team to pick up the bug and work on it is hard. Again, it can become a wall of "Our next 3 sprints are already full with urgent work from VP so-and-so, but we'll see if we can fit it in after that"
Then legal wants to be involved, too. So before you even respond to reports you have to flag the corporate counsel, who is already busy and doesn't want to hear it right now.
So half or more of the job of the security team becomes navigating corporate bureaucracy and slicing through all of the incentive structures to inject this urgent priority somewhere.
Smart companies recognize this problem and will empower security teams to prioritize urgent things. This can cause another problem where less-than-great security teams start wielding their power to force everyone to work on not-urgent issues that get spammed to the security@ email all day long demanding bug bounties, which burns everyone out. Good security teams will use good judgment, though.
security@ emails do get a lot of spam. It doesn't get talked about very much unless you're monitoring one yourself, but there's a fairly constant stream of people begging for bug bounty money for things like the Secure flag not being set on a cookie.
That said, in my experience this spam is still a few emails a day at the most, I don't think there's any excuse for not immediately patching something like that. I guess maybe someone's on holiday like you said.
This.
There is so much spam from random people about meaningless issues in our docs. AI has made the problem worse. Determining the meaningful from the meaningless is a full time job.
A lot of the time it’s less “nobody checked the security inbox” and more “the one person who understands that part of the system is juggling twelve other fires.” Security fixes are often a one-hour patch wrapped in two weeks of internal routing, approvals, and “who even owns this code?” archaeology. Holiday schedules and spam filters don’t help, but organizational entropy is usually the real culprit.
> A lot of the time it’s less “nobody checked the security inbox” and more “the one person who understands that part of the system is juggling twelve other fires.”
At my past employers it was "The VP of such-and-such said we need to ship this feature as our top priority, no exceptions"
Not every organization prioritizes being able to ship a code change at the drop of a hat. This often requires organizational dedication to heavy automated testing a CI, which small companies often aren't set up to do.
I can't believe that any company takes a month to ship something. Even if they don't have CI, surely they'd prefer to break the app (maybe even completely) than risk all their legal documents exfiltrated.
> I can't believe that any company takes a month to ship something.
Outside of startups and big tech, it's not uncommon to have release cycles that are months long. Especially common if there is any legal or regulatory involvement.
Well we have 600 people in the global response center I work at. And the priority issue count is currently 26000. That means its serious enough that its been assigned to some one. There are tens of thousands of unassigned issues cuz the traige teams are swamped. People dont realize as systems get more complex issues increase. They never decrease. And the chimp troupes response has always been a Story - we can handle it.
> October 27, 2025 disclosure and November 4, 2025 email confirmation seems like a long time to have their entire client file system exposed
I have unfortunately seen way worse. If it will take more than an hour and the wrong people are in charge of the money, you can go a pretty long time with glaring vulnerabilities.
I call that one of the worrisome outcomes from "Marketing Driven Development" where the business people don't let you do technical debt "Stories" because you REALLY need to do work that justifies their existence in the project.
> ... after looking through minified code, which SUCKS to do ...
AI tends to be good at un-minifying code.
I work for a finance firm and everyone is wondering why we can store reams of client data with SaaS Company X, but not upload a trust document or tax return to AI SaaS Company Y.
My argument is we're in the Wild West with AI and this stuff is being built so fast with so many evolving tools that corners are being cut even when they don't realize it.
This article demonstrates that, but it does sort of beg the question as to why not trust one vs the other when they both promise the same safeguards.
The question is what reason did you have to trust SaaS Company X in the first place?
SaaS is now a "solved problem"; almost all vendors will try to get SOX/SOC2 compliance (and more for sensitive workloads). Although... its hard to see how these certifications would have prevented something like this :melting_face:.
Because it's the Cloud and we're told the cloud is better and more secure.
In truth the company forced our hand by pricing us out of the on-premise solution and will do that again with the other on-premise we use, which is set to sunset in five years or so.
> My argument is we're in the Wild West with AI and this stuff is being built so fast with so many evolving tools that corners are being cut even when they don't realize it.
The funny thing is that this exploit (from the OP) has nothing to do with AI and could be <insert any SaaS company> that integrates into another service.
And nobody seems to pay attention to the fact that modern copiers cache copies on a local disk and if the machines are leased and swapped out the next party that takes possession has access to those copies if nobody bothered to address it.
The first thing that comes to my mind is SOC2 HIPAA and the whole security theater.
I am one of the engineers that had to suffer through countless screenshots and forms to get these because they show that you are compliant and safe. While the real impactful things are ignored
Given the absurd amount startups I see lately that have the words "healthcare" and "AI", I'm actually incredibly concerned that in just a couple of months we're going to have an multiple, enormous HIPAA-data disasters
Just search "healthcare" in https://news.ycombinator.com/item?id=46108941
If they have a billion dollar valuation, this fairly basic (and irresponsible) vulnerability could have cost them a billion dollars. If someone with malice had been in your shoes, in that industry, this probably wouldn't have been recoverable. Imagine a firm's entire client communications and discovery posted online.
They should have given you some money.
Exactly.
They could have sold this to a ransomare group or affiliate for 5-6 figures and then the ransomware group could have exfil'd the data and attempted to extort the company for millions.
Then if they didnt pay and the ransomware group leaked the info to the public, they'd likely have to spend millions on lawsuits and fines anyways.
They should have paid this dude 5-6 figures for this find. It's scenarios like this that lead people to sell these vulns on the gray/black market instead of traditional bug bounty whitehat routes.
I've worked in several "agentic" roles this year alone (I'm very poachable lol)
and otherwise well structured engineering orgs have lost their goddamn minds with move fast and break things
because they're worried that OpenAI/Google/Meta/Amazon/Anthropic will release the tool they're working on tomorrow
literally all of them are like this
Of course there will be no accountability or punishment.
That doesn't surprise me one bit. Just think about all the confidential information that people post into their Chatgpt and Claude sessions. You could probably keep the legal system busy for the next century on a couple of days of that.
"Hey uh, ChatGPT, just hypothetically, uh, if you needed to remove uh cows blood from your apartments carpet, uh"
Who is Margolis, and are they happy that OP publicly announced accessing all their confidential files?
Clever work by OP. Surely there is automatic prober tool that already hacked this product?
This guy didn't even get paid for this? We need a law that establishes mandatory payments for cybersecurity bounty hunters.
Legal attacks engineering - font type license fee on japan consumers. Engineering attacks legal - AI info dump in above post.
How does above sound like and what kind of professional write like that?
Thank you bearsyankees for keeping us informed.
I think this class of problems can be protected against.
It's become clear that the first and most important and most valuable agent, or team of agents, to build is the one that responsibly and diligently lays out the opsec framework for whatever other system you're trying to automate.
A meta-security AI framework, cursor for opsec, would be the best, most valuable general purpose AI tool any company could build, imo. Everything from journalism to law to coding would immediately benefit, and it'd provide invaluable data for post training, reducing the overall problematic behaviors in the underlying models.
Move fast and break things is a lot more valuable if you have a red team mechanism that scales with the product. Who knows how many facepalm level failures like this are out there?
> I think this class of problems can be protected against.
Of course, it’s called proper software development
The techniques for non-disclosure of confidential materials processed by multi-tenant services are obvious, well-known, and practiced by very few.