We saved $500k per year by rolling our own "S3"

engineering.nanit.com

272 points by mpweiher 17 hours ago


jrochkind1 - an hour ago

What a great and helpful write-up, love when people share things like this so I can learn.

It's less about whether I would have a use case for this exact thing (or whether or not it was appropriate for this use case, i dunno, prob don't have enough context to know).

More just seeing what is possible, how they thought about it and analyzed it, what they found unexpected and how, etc. I learned a lot!

Havoc - 13 hours ago

Tbh I feel this in one of those that would be significantly cleaner without serverless in first place.

Sticking something with 2 second lifespan on disk to shoehorn it into aws serverless paradigm created problems and cost out of thin air here

Good solution moving at least partially to a in memory solution though

ixtli - 11 hours ago

They didn’t actually do what the headline claims. They made a memory cache which sits in front of S3 for the happy path. Cool but not nearly rolling your own S3

varenc - 10 hours ago

In HN style, I'm going to diverge from the content and rant about the company:

Nanit needs this storage because they run cloud based baby cameras. Every Nanit user is uploading video and audio of their home/baby live to Nanit without any E2EE. It's a hot mic sending anything you say near it to the cloud.

Their hardware essentially requires a subscription to use, even though it costs $200/camera. You must spend an additional $200 on a Nanit floor stand if you want sleep tracking. This is purely a software limitation since there's plenty of other ways to get an overhead camera mount. (I'm curious how they even detect if you're using the stand since it's just a USB-C cable. Maybe etags?)

Of course Nanit is a popular and successful product that many parents swear by. It just pains me to see cloud based in-home audio/video storage being so normalized. Self-hosted video isn't that hard but no one makes a baby-monitor centric solution. I'm sure the cloud based video storage model will continue to be popular because it's easy, but also because it helps justifies a recurring subscription.

edit: just noticed an irony in my comment. I'm ranting about Nanit locking users into their 3rd party cloud video storage, and the article is about Nanit's engineering team moving off a 3rd party (S3) and self-hosting their own storage. Props to them for getting off S3.

JCM9 - 2 hours ago

The article strikes me as a self congratulatory solution to solving a problem that they could just have avoided entirely by instead selling hardware with local video storage. Lots of options for doing that efficiently and inexpensively in 2025. Hosting everything in the cloud like this is a 2015-era solution.

ruperthair - 4 hours ago

This may be an obvious point, but I didn't see it mentioned in the (otherwise excellent) article: I would have been interested in the cost saving in just implementing the 'delete on read' with S3 that they ended up using with the home-made in-memory cache solution. I can't see this on the S3 billing page, but if the usage is billed per-second, as with some other AWS services, then the savings may be significant.

The solution they document also matches the S3 'reduced redundancy' storage option, so I hope they had this enabled from day one.

freak42 - 8 hours ago

They saved $500k on what total sum? $500'001 or 55'000'000? Without this info the post is moot.

swiftcoder - 7 hours ago

This feels like they were using the wrong architecture from the start, and are now papering over that problem with additional layers of cache.

The only practical reason to put a video in S3 for an average of 2 seconds is to provide additional redundancy, and replacing that with a cache removes most of the redundancy.

Feels like if you uploaded these to an actual server, the server could process them on upload, and you could eliminate S3, the queue in SQS, and the lambdas all in one fell swoop...

elmigranto - an hour ago

Classic case of "focus on building your app, not infrastructure". Here's another multi-million dollar idea: put this cache directly inside your own video processing server and upload there.

anarsdk - 9 hours ago

Sounds like the title should have been

> We used S3 even though it wasn’t the right service

dmje - 8 hours ago

I’m sufficiently old / sensible (you decide) to think that uploading video of your baby (to anywhere) is fucking weird and fucking spooky and not needed anyway. This is a solution that doesn’t have a problem. Worse: it prays on parental / young parental fears. There’s nothing here - this is not a product that’s needed. You don’t need to “track” your baby, ffs. You don’t need to watch it while it sleeps. You don’t need “every breath, seen”. People have been having babies for fucking centuries without entering them into this hyper weird surveillance state at birth.

What an appalling screwed up world we seem to have manufactured for ourselves.

gethly - 5 hours ago

I made my own S3 as well. I used two S3-compatible services before but there was always some issue(first one failed to upload certain file, no matter what and support was unhelpful; second one did not migrate with file metadata properly so i knew this would be ongoing problem). In the end, it is just a dumb file store, nothing else. All you need to do is to write a basic HTTPS API layer and some logic to handle database for the file metadata and possibly location. That is about it. Takes a few days with testing.

But then you also have to think about file uploads and file downloads. You cannot have a single server fulfilling all the roles, otherwise you have a bottleneck.

So this file storage became a private backend service that end-users never access directly. I have added upload services, whose sole purpose is to allow users to upload files and only then upload them to this central file store, essentially creating a distributed file upload queue(there is also a bit more logic regarding file id creation and validation).

Secondly, own CDN was needed for downloads. But only because I use custom access handling and could not have used any of the commercial services(though they do support access via tokens, it just was not working for me). This was tricky because I wanted for the nodes to distribute files between themselves and not always fetch them from the origin to avoid network costs on the origin server. So they had to find each other, talk to each other and know who has which file.

In short, rolling your own is not as hard as it might seem and should be preferable. Maybe to save time, use cloud at the beginning, but once you are up and running and your business idea is validated by having customer, immediately move to your own infra in order to avoid astronomical costs of cloud services.

btw, i also do video processing like mentioned in the blog post :)

none2585 - 13 hours ago

I'm curious how many engineers per year this costs to maintain

ghm2180 - 2 hours ago

Actually why(just) RAM? Why not have an append only storage to the local disk? WALs are quite fast.

dxxvi - 11 hours ago

So, you want a place to store many files in a short period of time and when there's a new file, somebody must be notified?

Have you ever thought of using a postgresql db (also on aws) to store those files and use CDC to publish messages about those files to a kafka topic? In your original way, we need 3 aws services: s3, lambda and sqs. With this way, we need 2: postgresql and kafka. I'm not sure how well this method works though :-)

elchananHaas - 13 hours ago

Video processing is one of those things that need caution when doing serverlessly. This solution makes sense, especially because S3s durability guarantees aren't needed.

gnarlouse - 9 hours ago

because "How we stopped putting your kids in S3 buckets"

just sounded less attractive

Huxley1 - 11 hours ago

S3 certainly saves a lot of hassle, but in certain use cases, it really is prohibitively expensive. Has anyone tried self-hosted alternatives like MinIO or SeaweedFS? Or taken even more radical approaches? How do you balance between stability, maintenance overhead, and cost savings?

anshumankmr - 8 hours ago

Some stuff like this also exists: https://www.dell.com/en-in/shop/storage-servers-and-networki...

We could just use something like that

Or there is that other Object storage solution called R1 from Cloudflare.

hk1337 - 5 hours ago

I have always understood S3 is just HDFS with some extra features? So, if you were going to roll your own S3, then you’d stand up an HDFS cluster.

danjc - 7 hours ago

I'd it's processed in 2 seconds, why not just process it immediately in memory?

OrangeDelonge - 9 hours ago

Couldn’t they have used S3 express one zone?

VladVladikoff - 11 hours ago

I’m mostly just impressed that some janky baby monitor has racked up server fees on this scale. Amazing example of absolutely horrible engineering.

Also, just take an old phone from your drawer full of old phones, slap some free camera app on it, zip tie a car phone mount to the crib, and boom you have a free baby monitor.

lpa22 - 10 hours ago

If anyone here uses the Nanit app in the background of their phones, it absolutely destroys battery life.

I got a new phone because I thought my battery was cooked, but turns out it was just the app.

0xbadcafebee - 7 hours ago

Their architecture is internet bandwidth heavy and storage heavy; these are some of the most expensive things in AWS. You probably want to use a different provider for those things.

> It turns out that when AWS says an instance can do “Up to 12.5 Gbps”, that’s burstable networking backed by credits; when you’re below the baseline, you accrue credits and can burst for short periods.

Yes, AWS has a burst rating and a sustained/baseline rating for both EBS types as well as instance types. Use https://instances.vantage.sh/ (and make sure you choose specific columns) to compare specific criteria and then export as a CSV to find the lowest price that matches your performance/feature/platform criteria. Design to the baseline if you need guaranteed performance. Do sustained performance testing.

> When we Terminated connections idle >10 minutes, memory dropped by ~1 GB immediately; confirming the leak was from dangling sockets. Fix: make sockets short-lived and enforce time limits.

We used to do that with Apache 20 years ago. Config option forces a forked subchild to exit after N requests to avoid the inevitable memory leaks. AKA the Windows 95 garbage collector (a reboot a day keeps the slowness at bay).

FWIW, if the business feasibility of your architecture depends on custom stuff, performance enhancements, etc, you will find that you eventually have harder and harder problems to solve to keep your business functioning. It's more reliable to waste money on a solution that is brainless, than invest human ingenuity/technical mastery in a solution that is frugal.

miroljub - 5 hours ago

Ah, these modern babies. They can't even sleep without being spied and recorded 24/7.

I'm glad both me and my kind grew up in different times.

Today's kids will never in their lives know what freedom is, and we are guilty we made such dystopian societies a reality.

fergie - 2 hours ago

I mean fair enough, but I feel like S3 is one of the few AWS products that is actually pretty cheap for what you get.

bryanrasmussen - 9 hours ago

they don't seem to have factored in the cost of doing this, so not sure what their actual saving was although it was probably substantial.

Brian_K_White - 6 hours ago

unnecessary cloud subscription service abjures unnecessary cloud subscription service

NetOpWibby - 9 hours ago

Cloudflare R2 solves this

ch2026 - 13 hours ago

Who is “The South Korean Government”?

another_twist - 10 hours ago

I mean the "S3" could be replaced with object storage. I guess thats the technical term anyway. Having said that just goes to show how cheap S3 is, if after all of this, the savings are just $500k. Definitely money saved but not a lot.

TimCock - 6 hours ago

[flagged]

Lucian6 - 11 hours ago

[flagged]