Wrong ways to use the databases, when the pendulum swung too far

luu.io

87 points by luuio 3 days ago


freetime2 - 7 hours ago

Anecdotally, the worst codebase I ever worked on made heavy use of stored procedures. Over the years people couldn’t be bothered or were afraid to update them - which really was the root of the problem. This led to all kinds of crazy patterns in the application code like calling things in a loop where a bulk operation was needed. Or stringing together several stored procedure calls to get the desired outcome, when really a refactor was need to combine or decompose them as needs evolved.

If you’re hiring devs for their experience with C# (which the app was written in) then you probably want to have the bulk of your business logic written in C#. Even the DBAs at my current company tend to advocate for using stored procedures only as a last resort for this reason.

That team eventually decided they needed to rewrite the whole app using a NoSQL database. I didn’t stick around long enough to see how that decision turned out.

Quarrelsome - 10 hours ago

Religion and engineering do not make good bedfellows. I got into a pointless argument with someone on LinkedIn who was trashing on ORMs, strawmanning them by stating they would all pull the entire data into memory to just perform a count (some ORMs are much more sophisticated than this). It was some sort of weird religious argument because the chap thought every single piece of data processing should be written in a stored procedure (much like your first example).

However the flip side of the coin in your case is somehow even worse, because people failed to identify the good in the RDBMS and only identified it all with evil, despite then storing their blobs in an RDBMS.

To me, the use of distributed transactions is the smell. If your solution to the problem of a multi-database model is a sticking plaster that can generate even worse problems (e.g. the deadlocks) then it just shows a poor approach to problem solving or allocating enough resource to development.

tanelpoder - 4 hours ago

The reluctance of using stored procedures where they’d be valuable is also a skill + will issue. I do get the non-database-developer view that if 98% of your app is gonna be written in some other language anyway, why complicate your app layers by adding a bit of PL/SQL code + testing infra here and there.

But for processing (and validating) various data loads and incoming data streams, by looking up matching values in existing reference tables, stored procedures can increase performance/efficiency and reduce complexity (of having to maintain connection pools and caching layers just for validating incoming record batches).

As far as the unwillingness to even evaluate stored procedures issue goes, I sometimes joke that maybe the DB vendors should start calling stored procedures as “database lambdas” instead…

bazoom42 - 5 hours ago

Stored procedures are actually great when used for bulk set-oriented data operations. Much simpler and more efficient than passing large datasets back and forth between servers.

The problems arise when processing data row-by-row using cursors or simular procedural approach, or with too much buisness logic in procedual form.

I guess the moral of the story is: use the right tool for the job instead of deciding on the “one true pattern”.

branko_d - 7 hours ago

I never quite understood these kinds of arguments, event though they seem to be repeated a lot:

> 1. Many critical APIs call would spend most of its CPU time inside the finite database servers. New APIs, or updating existing APIs meant you were hogging the DB/CPU time from every one else.

It's not that easy for a stored procedure to spend CPU on anything other than data processing. If you are going to do that data processing, then it's going to cost this amount of CPU regardless of whether it is initiate internally (through a stored procedure) or externally (by the client submitting queries). Furthermore, a frequently-used stored procedure's plan is likely already cached, so it doesn't have to be re-parsed and re-planned on each call. On the other hand, when the client submits the text of the query, that text has to at least be hashed, so it can be compared to the hashes in the query plan cache (even if the plan is already in the cache). So if anything, client-submitted queries will consume at least as much CPU as stored procedures, if not more.

> 2. API latency was at the mercy of the Query Planner. Any slight change in data statistics or business logic would cause the a different Query Plan to be picked, which drastically changed the execution time, and usually caused timeouts. A hash join suddenly became a full table scan across billions of rows without any warning.

This is also true for client-submitted queries. Basically the same techniques are useful for both stored procedures an client-submitted queries: query hints and plan guides, and above all: sound indexing. That being said, this is a big problem in practice and, I think, a fundamental problem in the design of SQL as "declarative" language where you are not supposed to concern yourself with the physical access path, even though it could make a big difference in performance. Fortunately SQL Server has mitigations for this problem (hints and plan guides), but not all DBMSes do (PostgreSQL still doesn't support hints natively, if I'm not mistaken).

t43562 - 4 hours ago

So far in my career, such as it is, I have been on lots of rewrite projects and not one of them was a good idea.

There were one or two outright failures (new code abandoned) but the more subtle ones were just efforts that took so long to deliver value that the whole market opportunity was lost.

In every single case it was possible to take the existing system and gradually morph it towards something better - but the desire for revolution prevented evolution and forced everyone to live with shit in the hope of a future that didn't arrive.

marc_abonce - 10 hours ago

> The company is still standing and seems to be doing well financially, so I guess things turned out well enough, or maybe some of the technical decisions started trending more reasonable.

Perhaps I've been lucky or I haven't been observant enough, but I've never seen a company suffer financially because of inefficient code. Don't get me wrong, I still value good code for its own sake, but in my experience there is no correlation between that and profits.

jrochkind1 - 11 hours ago

> The checkpoint system worked like this: every time we needed to perform a write, we would generate a UUID, store this UUID into a “checkpoint” for the current flow....

>Logically, that was… fine. In practice, writes into the same database which previously required 5 IO round trips, now required almost double the number of trips for the extra checkpointing operations...

So this sounds like trying to invent a form of optimistic locking, but it's definitely possible to do optimistic locking where no extra reads or writes are necessary unless there is a conflict. you have to write that value (whether timestamp or uuid) on every write, and then you do every write with a conditional "do this write as long as the lock value has not been changed."

But I guess the store they were using didn't allow even a conditional write like that? Although I'm not sure how they managed to make this "checkpoint" system work without that feature either... this seems a bit confusing, it seems like a more typical optimistic locking system should have been possible using whatever primitives the checkpointing system used? Maybe not?

cl0ckt0wer - 11 hours ago

Does the author really call joins "joints"? I thought it was maybe a spelling thing but then they make a Snoop joke...

kragen - 11 hours ago

Not Dan Luu, is it? The site seems to leave that intentionally ambiguous.

- 11 hours ago
[deleted]
zupa-hu - 4 hours ago

Is it just me? No conclusion, no takeaway, nothing.

gxs - 11 hours ago

This was a good read, especially for me considering I’m always having to remind people that SQL has been around since the 80s and has been optimized for a number of tasks over a very long time, tasks that should live in the database vs in code

Very nice to see the other side of this

- 11 hours ago
[deleted]
TZubiri - 7 hours ago

"They wanted everything to be written into simple primitive Key-Value stores for their new design."

I feel this is a very political phenomenon that is very poignant in democracy, both Argentina and the US have elected executives that promise to chainsaw most of the government bureocracy for cost and streamlining reasons.

There's a chapter of south park where the kids buy a roller coaster park and make it their own way with their own rules, and slowly they start to face the challenges of running the whole operation, they start by hiring a security guard, and then the security guard needs food so they hire some fast food chain and then the bathrooms and etcetera, by the end of the chapter the park is fully operational and as good as always.

I used to be very libertarian and revolutionary too when I was younger, then I tried to revolutionize a codebase, ended up breaking stuff, and when correcting I ended up with the traditional architecture.

I don't doubt that driving a plow over the traditions of the previous generation is necessary, but in my experience there is always more wisdom in what we throw away than what we innovate, so I guess I'm a conservative