Why Strong Consistency?

brooker.co.za

65 points by SchwKatze a day ago


Kinrany - 3 hours ago

I continue to be surprised that in these discussions correctness is treated as some optional highest possible level of quality, not the only reasonable state.

Suppose we're talking about multiplayer game networking, where the central store receives torrents of UDP packets and it is assumed that like half of them will never arrive. It doesn't make sense to view this as "we don't care about the player's actual position". We do. The system just has tolerances for how often the updates must be communicated successfully. Lost packets do not make the system incorrect.

kukkeliskuu - 4 hours ago

I think we should stop calling these systems eventually consistent. They are actually never consistent. If the system is complex enough and there are always incoming changes, there is never a point in time in these "eventually consistent systems" that they are in consistent state. The problem of inconsistency is pushed to the users of the data.

rakoo - 2 hours ago

I don't understand this article and It's like the author doesn't really know what they're talking about. They don't want eventual consistency, they want read-your-writes, a consistency level that's stronger than EC yet still not strong.

https://jepsen.io/consistency/models/read-your-writes

Read-your-writes is indeed useful because it makes code easier to write: every process can behave as if it was the only one in the world, devs can write synchronous code, that's great ! But you don't need strong consistency.

I hope developers learn a little bit more about the domain before going to strong consistency.

Tractor8626 - an hour ago

> read-modify-write is the canonical transactional workload. That applies to explicit transactions (anything that does an UPDATE or SELECT followed by a write in a transaction), but also things that do implicit transactions (like the example above)

Your "implicit transaction" would not be consistent even if there was no replication involved at all. Explicit db transactions exist for a reason - use them.

mrkeen - 2 hours ago

It's wishful thinking. It's like choosing Newtonian physics over relativity because it's simpler or the equations are neater.

If you have strong consistency, then you have at best availability xor partition tolerance.

"Eventual" consistency is the best tradeoff we have for an AP system.

Computation happens at a time and a place. Your frontend is not the same computer as your backend service, or your database, or your cloud providers, or your partners.

So you can insist on full-ACID on your DB (which it probably isn't running btw - search "READ COMMITTED".) but your DB will only be consistent with itself.

We always talk about multiple bank accounts in these consistency modelling exercises. Do yourself a favour and start thinking about multiple banks.

nullorempty - 3 hours ago

I keep wondering how the recent 15h outage have affected these eventually consistent systems.

I really hope to see a paper on the effects of it.

jeffbee - an hour ago

The argument seems to rely on the point that the replicas are only valuable if you can send reads to them, which I don't think is true. Eventually-consistent replicated databases are valuable on their own terms even if you can only send traffic to the leader.

generalzod - 3 hours ago

in the read after write scenario, why not use something like consistency tokens ? and redirect to primary if the secondary detects it has not caught up ?

jiggawatts - an hour ago

Blogs like this make me go on the same rant for the n-th time:

Consistency for distributed systems is impossible without APIs returning cookies containing vector clocks.

The idea is simple: every database has a logical sequence number (LSN), which the replicas try to catch up to -- but may be a little bit behind. Every time an API talks to a set of databases (or their replicas) to produce a JSON response (or whatever), it ought to return the LSNs of each database that produced the query in a cookie. Something like "db1:591284;db2:10697438".

Client software must then union this with their existing cookie, and return the result of that to the next API call.

That way if they've just inserted some value into db1 and the read-after-write query ends up going to a read replica that's slightly behind the write master (LSN 591280 instead of 591284) then the replica can either wait until it sees LSN >= 591284, or it can proxy the query back to the write master. A simple "expected latency of waiting vs proxying" heuristic can be used for this decision.

That's (almost entirely) all you need for read-after-write transactional consistency at every layer, even through Redis caches and stateless APIs layers!

sgarland - 4 hours ago

For the love of all that’s holy, please stop doing read-after-write. In nearly all cases, it isn’t needed. The only cases I can think of are if you need a DB-generated value (so, DATETIME or UUIDv1) from MySQL, or you did a multi-row INSERT in a concurrent environment.

For MySQL, you can get the first auto-incrementing integer created from your INSERT from the cursor. If you only inserted one row, congratulations, there’s your PK. If you inserted multiple rows, you could also get the number of rows inserted and add that to get the range, but there’s no guarantee that it wasn’t interleaved with other statements. Anything else you wrote, you should already have, because you wrote it.

For MariaDB, SQLite, and Postgres, you can just use the RETURNING clause and get back the entire row with your INSERT, or specific columns.

Animats - 4 hours ago

So why isn't the section that needs consistency enclosed in a transaction, with all operations between BEGIN TRANSACTION and COMMIT TRANSACTION? That's the standard way to get strong consistency in SQL. It's fully supported in MySQL, at least for InnoDB. You have to talk to the master, not a read slave, when updating, but that's normal.