Fast-Servers

geocar.sdf1.org

96 points by tosh 9 hours ago


luizfelberti - 8 hours ago

A bit dated in the sense that for Linux you'd probably use io_uring nowadays, but otherwise it's a timeless design

Still, I'm conflicted on whether separating stages per thread (accept on one thread and the client loop in another) is a good idea. It sounds like the gains would be minimal or non-existent even in ideal circumstances, and on some workloads where there's not a lot of clients or connection churn it would waste an entire core for handling a low-volume event.

I'm open to contrarian opinions on this though, maybe I'm not seeing soemthing...

lmz - 8 hours ago

Seems similar to the SEDA architecture https://en.wikipedia.org/wiki/Staged_event-driven_architectu...

kogus - 8 hours ago

Slightly tangential, but why is the first diagram duplicated at .1 opacity?

ratrocket - 8 hours ago

discussed in 2016: https://news.ycombinator.com/item?id=10872209 (53 comments)

bee_rider - 8 hours ago

> One thread per core, pinned (affinity) to separate CPUs, each with their own epoll/kqueue fd

> Each major state transition (accept, reader) is handled by a separate thread, and transitioning one client from one state to another involves passing the file descriptor to the epoll/kqueue fd of the other thread.

So this seems like a little pipeline that all of the requests go through, right? For somebody who doesn’t do server stuff, is there a general idea of how many stages a typical server might be able to implement? And does it create a load-balancing problem? I’d expect some stages to be quite cheap…

password4321 - 6 hours ago

Always interesting to review the latest techempower web framework benchmarks, though it's been a year:

https://www.techempower.com/benchmarks/#section=data-r23&tes...

rot13maxi - 7 hours ago

i havent seen an sdf1.org url in a looooong time. lovely to see its still around

fao_ - 7 hours ago

this is more or less, in some way, what Erlang does and how Erlang is so easy to scale.

epicprogrammer - 7 hours ago

It’s an interesting throwback to SEDA, but physically passing file descriptors between different cores as a connection changes state is usually a performance killer on modern hardware. While it sounds elegant on a whiteboard to have a dedicated 'accept' core and a 'read' core, you end up trading a slightly simpler state machine for massive L1/L2 cache thrashing. Every time you hand off that connection, you immediately invalidate the buffers and TCP state you just built up. There’s a reason the industry largely settled on shared-nothing architectures like NGINX having a single pinned thread handle the entire lifecycle of a request keeps all that data strictly local to the CPU cache. When you're trying to scale, respecting data locality almost always beats pipeline cleanliness.