Running out of disk space in production

alt-romes.github.io

184 points by romes 5 days ago


flanfly - 19 hours ago

A neat trick I was told is to always have ballast files on your systems. Just a few GiB of zeros that you can delete in cases like this. This won't fix the problem, but will buy you time and free space for stuff like lock files so you can get a working system.

dabinat - 42 minutes ago

It can be difficult to reason about seemingly innocuous things at scale. I have definitely fallen into the trap of increasing file size from 8 KB to 10 KB and having it cause massive problems when multiplied across all customers at once.

dirkt - 16 hours ago

If you run nginx anyway, why not serve static files from nginx? No need for temporary files, no extra disk space.

The authorization can probably be done somehow in nginx as well.

entropie - 18 hours ago

> I rushed to run du -sh on everything I could, as that’s as good as I could manage.

I recently came across gdu (1) and have installed/used it on every machine since then.

[1]: https://github.com/dundee/gdu

gmuslera - 16 hours ago

Putting limits on folders where information may be added (with partitions or project quotas) is a proactive way to avoid that something misbehaves and fills the whole disk. Filling that partition or quota may still cause some problems, depending on the applications writing there, but the impact may be lower and easier to fix than running out of space for everything.

bdcravens - 17 hours ago

I appreciate the last line

> Note: this was written fully by me, human.

ilaksh - 14 hours ago

I'm not sure that his problems are really over if a LOT of people were downloading a 2GB file. It would depend on the plan. Especially if his server is in the US.

But maybe the European Hetzner servers still have really big limits even for small ones.

But still, if people keep downloading, that could add up.

SoftTalker - 14 hours ago

I've run into that "process still has deleted files open" situation a few times. df shows disk full, but du can't account for all of it, that's your clue to run lsof and look for "deleted" files that are open.

Even more confusing can be cases where a file is opened, deleted or renamed without being closed, and then a different file is created under the orginal path. To quote the man page, "lsof reports only the path by which the file was opened, not its possibly different final path."

brunoborges - 17 hours ago

I remember a story of an Oracle Database customer who had production broken for days until an Oracle support escalation led to identifying the problem as mere "No disk space left".

grugdev42 - 15 hours ago

You missed out point five.

5. Implement infrastructure monitoring.

Assuming you're on something like Ubuntu, the monit program is brilliant.

It's open source and self hosted, configured using plain text files, and can run scripts when thresholds are met.

I personally have it configured to hit a Slack webhook for a monitoring channel. Instant notifications for free!

huijzer - 17 hours ago

> Plausible Analytics, with a 8.5GB (clickhouse) database

And this is why I tried Plausible once and never looked back.

To get basic but effective analytics, use GoAccess and point it at the Caddy or Nginx logs. It’s written in C and thus barely uses memory. With a few hundreds visits per day, the logs are currently 10 MB per day. Caddy will automatically truncate if logs go above 100 MB.

renatovico - 15 hours ago

Why not implement x send file ?

nottorp - 15 hours ago

Didn't root used to have some reserved space (and a bunch of inodes) on file systems just for occasions like this?

jollymonATX - 15 hours ago

Never partition 100%. Simple solution here really and should be standard for every sysadmin. Like never worked with one that needed to be told this...

RALaBarge - 16 hours ago

Wait until you run out of inodes!

- 17 hours ago
[deleted]
kndjdgeksgw - 5 hours ago

Anonymous

merlin1de - 8 hours ago

[flagged]

MeetRickAI - 14 hours ago

[dead]

tcp_handshaker - 17 hours ago

[dead]

giahoangwin - 17 hours ago

[dead]