Wrong ways to use the databases, when the pendulum swung too far
luu.io87 points by luuio 3 days ago
87 points by luuio 3 days ago
Anecdotally, the worst codebase I ever worked on made heavy use of stored procedures. Over the years people couldn’t be bothered or were afraid to update them - which really was the root of the problem. This led to all kinds of crazy patterns in the application code like calling things in a loop where a bulk operation was needed. Or stringing together several stored procedure calls to get the desired outcome, when really a refactor was need to combine or decompose them as needs evolved.
If you’re hiring devs for their experience with C# (which the app was written in) then you probably want to have the bulk of your business logic written in C#. Even the DBAs at my current company tend to advocate for using stored procedures only as a last resort for this reason.
That team eventually decided they needed to rewrite the whole app using a NoSQL database. I didn’t stick around long enough to see how that decision turned out.
I am not going to dismiss your experience here. Stored Procedures can turn into wild monsters. Pair it with Triggers and you are in for chasing a noodle.
But it's also a reality, that relational databases often become the main integration point in companies. In those environments it’s hard (next to impossible) and dangerous to use something like ORMs.
Often enough I don't "own the tables" and I don't "own the columns" for lack of a better word. The DBA only gives me SELECT and EXECUTE permissions on the database. How am I going to work here without Stored Procedures?
And while this sounds weird, these limited permissions are to protect me from accidentally writing to wrong columns. Wrong columns that could impact systems, I didn't even know about. Is it possible to write to the same columns with a Stored Procedure? Of course! But it's not as dangerous as giving an application fat UPDATE permissions.
By using SQL Views I can build a data model upon these tables for reading the data, and build a more consumable data model. And you mentioned C#: I can use EF Core to query these Views with LINQ. Stored Procedures are used to store data to multiple tables in a "legacy database" within a transaction.
This could also be done with EF Core, but I need to explicitly log all changes to the data. How should this be done without a Stored Procedure or Triggers? CDC doesn't help a lot here.
How many DBAs have you dealt with that only give you SELECT and EXECUTE? That seems somewhat crazy. It doesn't cut down the surface area for bugs at all (the same amount of SQL is still required and it'd be the same code as the app was going to execute anyway as a normal query). What scenarios are they worried about here?
Amending data obviously. Building a dashboard using payroll data and having the power to change it are two wildly different things, and no sane large corporation would allow a dashboard's user account to change that data .
You're coming at this from the idea that one user has full access to everything when that is likely the biggest security lapse you'll find in modern apps that use databases: a Web app that has full ddl and even dml access when it shouldn't have.
Yeah, that was a very confused question from me. I thought for a moment EXECUTE permissions were a roundabout way of granting restricted write access. Let me try again: how is "How am I going to work here without Stored Procedures?" supposed to be interpreted in that comment? The application could just run the queries without using stored procedures. Stored procedures don't change the permission requirements; they're a moderate performance optimisation. They're almost always optional.
bytefish's comment seems to be linking stored procedures and writing to columns, which doesn't make sense because the scenario doesn't have any write permissions.
I have seen many actually.
Imagine having a huge amount of applications all using the same database, which is, like I said, often enough the main integration point in a company. The applications you are maintaining are written in, let’s say Java, C#, Python and C++. They are too important, too large and a rewrite is way too risky.
How would you start to get this under your control? By moving common logic in a shared C++ DLL probably? Rewrite all those applications at once, in a heroic effort? I for one take the pragmatic approach and extract common business logic shared between all them into a Stored Procedure.
The Stored Procedures are invoked by all applications, so instead of maintaining 10 implementations, I maintain 1. Suddenly you have a common data access path! The 1 implementation will be drowned in a shitload integration tests, because this is what really matters.
Of course, a Stored Procedure doesn’t prevent you from shooting yourself in the foot! But in my experience, it’s way harder to shoot yourself in the foot using a Stored Procedure, that you can inspect the Query Planner for… than hunting down Monster-Queries for weeks, generated by a LINQ provider.
As for the INSERT, UPDATE and DELETE: With SPs, you’ll have fine-grained control, which columns are updated instead of needing to expose all properties in your application code and praying for an ORM to not accidentally update “columns I don’t own”, because of some convention applied by the ORM.
If you are in the lucky position of owning the database and don’t have to maintain legacy applications, I see resistance to Stored Procedures. Then go ahead and build an API all applications are going to use.
But if you don’t own the database and need to maintain dozens of mission-critical applications, Stored Procedures are a tool worth knowing.
I'm convinced that the solution to stored procedures that people are afraid to update is automated tests. Write unit tests for them just like you would any other complex logic - the tests can evolve over time into a detailed spec for how they should work, and give you the ability to refactor or improve them with confidence later on.
For some reason stored procedures and automated testing seem to often not go together, maybe because they're often accompanied by an environment where DBAs and software developers are separate teams with separate sets of skills?
I recently took a couple 100 line stored procedures that I wrote years ago, and replaced them with equivalent go code and a few inline SQL calls. We had extensive tests on those stored procedures too, or rather, tests on the go methods that called them. The tests all ran in a few seconds. The go code feels so much nicer to deal with, and I thought about why:
- Stored procedures are deployed separately from the code that calls them: one is a db migration, one is a k8s container deploy. And that binding isn’t strongly-typed, so there’s more thinking and double checking involved during deployment (and, heaven forbid, rollback).
- The SQL procedures duplicated some business logic and constants that are defined in some of our go core libraries. It felt good to remove that duplication.
- The procedures also had a few specific return codes. We needed to define some constants (like postgres itself does) that the go code could detect and translate into go errors. That’s more complexity.
- Maybe there’s a good SQL IDE out there, but I don’t have it, so I was writing these things in vscode with syntax highlighting. Plenty of errors that would get an immediate underline in go instead required 20 seconds to compile and run the test and interpret the error message.
- This lack of tooling, and the fact that I’m not great at SQL (I’m fairly good with queries, but not with stuff like variables and IF EXISTS that you only see in stored procedures), made it hard to use all the complexity-cutting techniques I’m used to applying in go and other languages. So the 100 line stored procedure stayed 100 lines, and everyone hated working on it.
Stored procedures are basically a cross-service API call, with all the complexities that entails. I won’t reach for them again unless there’s a technical problem that can’t be solved any other way. We don’t have a dedicated DBA so Conway’s Law doesn’t factor in.
I think a lot of the problems come from the fact testing stored procedures ventures into e2e testing land. You have to stand up infra in order to test them. There's not really been a simple way to unit test stored procedures as part of your application codes testing framework.
(Aside: I think this is something PGlite helps with if your in Postgres land)
My team has found a lot of success using testcontainers for testing Go and Java applications integrating with Postgres. They feel more unit-testy than e2e-testy.
Admittedly I’m only talking about selects, inserts, updates, views, etc. not stored procedures. But having worked in codebases with far too many stored procedures in the past, I think there might be a marginal improvement.
For what it’s worth, I fully agree that the main problem with using store procedures is testability and debugability.
PostgreSQL server is a single process that starts in under 100ms on a developer's laptop.
In the company I work for we use real PostgreSQL in unit tests — it's cheap to start one at the beginning of a suite, load the schema and go, and then shut it down and discard its file store.
I keep thinking of moving that file store to tmpfs when run on Linux, but it's nowhere near the top of the performance improvements for the test suite.
So: no more mocks or subsitute databases with their tiny inconsistencies.
In good resource on testing stored procedures ?
There are unit-testing frameworks for the purpose. For SQL Server there are several, [0] I've worked briefly with one called tSQLt [1] and was quite impressed. Frameworks also exist for other RDBMSs. [0]
[0] https://en.wikipedia.org/wiki/List_of_unit_testing_framework...
> you probably want to have the bulk of your business logic written in C#
Perhaps, if the C# applications are the only ones accessing the database. But suppose multiple applications written in different languages (and for different purposes) need the database access? You can do this via stored procs or (better in my experience) by adding an intermediate server process which the applications use, via a publicly documented API, and which then talks to the the database which may provide stored proc access to the intermediate server.
Equally anecdotally:
An example is a server me and a couple of colleagues worked on at a big investment bank. The underlying databases were mostly proprietary from suppliers or written by other teams in the bank. And versioned (badly, eek!). We wrote the SQL queries/procs to access the data that the VB (now C# I guess) would not have been capable of (without enormous waste of effort), and then provided COM interfaces to our server that ran our queries. Mercifully, we didn't have to provide updates, but the same idea would have applied. This gives you more or less complete isolation between app and database.
Anyway, SPs can certainly fit into such an architecture, but might not be first (or second) choice.
> Perhaps, if the C# applications are the only ones accessing the database. But suppose multiple applications written in different languages (and for different purposes) need the database access?
Without wishing this to sound like a personal attack, YUCK
A database that's accessed by multiple applications, regardless of the number of languages, is a really bad smell.
If I have a user db, and hundreds of other applications want to make use of the data in it, they come through MY application. I own the db, I decide what's in it, how the data is shaped, and when it changes.
Multiple applications accessing a single database is asking for a bureaucratic nightmare - "App A wants to change the first name field to UTF-8" - EVERY other user, of that database needs to be informed (best of luck figuring out what other apps are, and who owns them). If you are tempted to say "Those other apps should support UTF-8 as well" then that's the sound of someone that's never had legacy code in the environment.
> A database that's accessed by multiple applications, regardless of the number of languages, is a really bad smell.
Except that really was the original model back in the 90's. All "good" databases had an extensive roles and permissions system, and a point of pride was the number of ODBC connectors for different languages and environments.
You were supposed to have The Database, looked after and controlled by the hallowed DBAs, who had their own hardware budget, their own organization, and who controlled access to The Database, giving trusted applications access to it, but only after they had vetted the schema and queries that those dirty developers wanted to run against it. Trusted users could get SELECT access, but only to the tables they needed to run their custom reports and queries.
It was a whole ass thing that's fallen completely to the wayside as database software went from underpinning Larry's latest mega-yacht, to free as in beer, and we learned how to clone and shard the data instead.
Precisely why I said the applications should access the data via a server process - they never know the underlying database schema.
Oh man. I've seen an org where the stored procedures were written by a different team, and friction was high.
Then one java dev found a glaring sql injection hole. The whole app quickly reorganized around using it as an API for everything. Management was pleased speed went up and bugs went down, DBAs were pleased for less work, and Java devs were pleased they could understand what was going on in the app. Everybody happy I guess?
Religion and engineering do not make good bedfellows. I got into a pointless argument with someone on LinkedIn who was trashing on ORMs, strawmanning them by stating they would all pull the entire data into memory to just perform a count (some ORMs are much more sophisticated than this). It was some sort of weird religious argument because the chap thought every single piece of data processing should be written in a stored procedure (much like your first example).
However the flip side of the coin in your case is somehow even worse, because people failed to identify the good in the RDBMS and only identified it all with evil, despite then storing their blobs in an RDBMS.
To me, the use of distributed transactions is the smell. If your solution to the problem of a multi-database model is a sticking plaster that can generate even worse problems (e.g. the deadlocks) then it just shows a poor approach to problem solving or allocating enough resource to development.
I work for a company that's still obsessed with micro services. "Religious" is a pretty good way to describe it. Drives me absolutely nuts, especially when no one seems to understand what micro services are. I've seen at least half a dozen teams that have just created a distributed monolith and called it a micro service because they're using Docker and Kubernetes.
ORMs are basically the enemy of rdbms, by encouraging antipatterns and undermining language features they have easily done more harm than good. of course reasonable people can be trusted to use them reasonably but if you're arguing about it on the internet..
ye I think the guy I was arguing with was mostly a DBA so I get his perspective, but its just religious to be wholly on one or the other side of the argument.
Haha. I’ve actually had an ORM attempt to load the whole database just to do a count. That was a fun bug to dig into (nobody noticed until we had a big enough customer) and an easy fix. Literally, all I had to do was mark some fields as lazy loaded.
I think the problem is that it's not very obvious to developers that one or the other thing is going wrong until - as you say - you get a big enough customer.
Yeah, I've had almost the same experience with people and ORMs. Often they'll make a good point about some weakness of some specific ORM, but it is almost always some use case that represents 1% (or less) of their application. Worse, there's nothing stopping them from just using sql for those couple of times that they need to bypass the ORM. After all, there's no reason to be religious about it and only use one tool.
I'm not hugely experimented, and SQL has always been enough for me, perhaps due to my simple requirements. But so far I hold the opinion that ORM is always a mistake.
I find a lot of programmers don't know how to write an array of floats to disk, if you forced me to choose between an ORM or No DB at all, I would choose no DB all day.
ORM feels like an abstraction on top of an abstraction. I don't trust that those who chose ORMs have studied and exhausted the possibilities of the two underlying layers, I feel more often than not they recourse to higher layers of technology without understanding the lower level.
But I may be wrong, maybe those who use ORMs know all about File systems and Posix and SQL itself.
It also reminds me of the people who ask chatgpt to write a prompt for them, it's as if you had a dishwasher and you were too lazy to load the dishes onto the dishwasher, so you buy a product that loads the dishes to the dishwasher.
I know both and ORMs are fine. They save a lot of time at the cost of some inefficiency. However some of the newer ones are like Entity Framework Core for dotnet are significantly smarter than older models.
> ORM feels like an abstraction on top of an abstraction. I don't trust that those who chose ORMs have studied and exhausted the possibilities of the two underlying layers, I feel more often than not they recourse to higher layers of technology without understanding the lower level.
I agree with that. However I feel that teams that choose not to use an ORM end up having one somehow reimplemeted by the seniors, and just used as you describe by the juniors.
I'd rather have the seniors master an existing ORM and spend their time elsewhere.
> I find a lot of programmers don't know how to write an array of floats to disk
what does that have to do with it?
I think his point is that ORMs (and maybe DBs in general) are used for data persistence by folks who just don’t know any alternative.
The reluctance of using stored procedures where they’d be valuable is also a skill + will issue. I do get the non-database-developer view that if 98% of your app is gonna be written in some other language anyway, why complicate your app layers by adding a bit of PL/SQL code + testing infra here and there.
But for processing (and validating) various data loads and incoming data streams, by looking up matching values in existing reference tables, stored procedures can increase performance/efficiency and reduce complexity (of having to maintain connection pools and caching layers just for validating incoming record batches).
As far as the unwillingness to even evaluate stored procedures issue goes, I sometimes joke that maybe the DB vendors should start calling stored procedures as “database lambdas” instead…
> But for processing (and validating) various data loads and incoming data streams, by looking up matching values in existing reference tables, stored procedures can increase performance/efficiency and reduce complexity
Performing operations on data directly in the SQL provider is the peak of human enlightenment. It takes a lot of leadership or wisdom to push a modern team away from using crap like EF to process everything, but 100x slower.
In exactly 0% of cases of cycling records through code will you see a higher performance result than executing a single T-SQL or PL/SQL script against the provider.
The procedural SQL languages are Turing complete. SQL itself is as of recursive common table expressions. There's not any reasonable argument for not trying this if all the information is already in the SQL store.
Moving information is way more expensive than processing information that is already in place (cache). Your SQL server process can iterate on items in L1 millions of times before a packet makes it across the data center one time.
Operational concerns trumps raw performances most of the time. Stored procedures live in a different CI/CD environment, with a different testing framework (if there’s even one), on a different deployment lifecycle, using a different language than my main code. It is also essentially an un-pinnable dependency. Too much pain for the gain.
Now, give me ephemeral, per-connection procedures (call them unstored procedures for fun) that I can write in the language I want but that run on provider side, sure I’ll happily use them.
> Stored procedures live in a different CI/CD environment
They don't have to. The procedural SQL commands can be source controlled along with the rest of the codebase. Transmitting the actual command text to the SQL server that does all the work is not the inefficient part.
Stored procedures are actually great when used for bulk set-oriented data operations. Much simpler and more efficient than passing large datasets back and forth between servers.
The problems arise when processing data row-by-row using cursors or simular procedural approach, or with too much buisness logic in procedual form.
I guess the moral of the story is: use the right tool for the job instead of deciding on the “one true pattern”.
The problem I've seen, very few people from Java / C# / other languages actually know plsql or ever bothered to learn it. Its not that hard, but I guess not a cool enough tech to have on resume? When it could massively help with speed of certain use cases, in ways no external code could ever dream of achieving.
I never quite understood these kinds of arguments, event though they seem to be repeated a lot:
> 1. Many critical APIs call would spend most of its CPU time inside the finite database servers. New APIs, or updating existing APIs meant you were hogging the DB/CPU time from every one else.
It's not that easy for a stored procedure to spend CPU on anything other than data processing. If you are going to do that data processing, then it's going to cost this amount of CPU regardless of whether it is initiate internally (through a stored procedure) or externally (by the client submitting queries). Furthermore, a frequently-used stored procedure's plan is likely already cached, so it doesn't have to be re-parsed and re-planned on each call. On the other hand, when the client submits the text of the query, that text has to at least be hashed, so it can be compared to the hashes in the query plan cache (even if the plan is already in the cache). So if anything, client-submitted queries will consume at least as much CPU as stored procedures, if not more.
> 2. API latency was at the mercy of the Query Planner. Any slight change in data statistics or business logic would cause the a different Query Plan to be picked, which drastically changed the execution time, and usually caused timeouts. A hash join suddenly became a full table scan across billions of rows without any warning.
This is also true for client-submitted queries. Basically the same techniques are useful for both stored procedures an client-submitted queries: query hints and plan guides, and above all: sound indexing. That being said, this is a big problem in practice and, I think, a fundamental problem in the design of SQL as "declarative" language where you are not supposed to concern yourself with the physical access path, even though it could make a big difference in performance. Fortunately SQL Server has mitigations for this problem (hints and plan guides), but not all DBMSes do (PostgreSQL still doesn't support hints natively, if I'm not mistaken).
1. Beyond just querying, the stored proc spent a lot of time processing data. As in, looping through cursors, making business logic decisions, calculating things, etc.
2. Having the business logic (not just loading the data) inside the stored procs meant that a change in business logic that would normally only need to update application code, now invalidates the stored procedure's cached execution plan.
If you’re going to process a lot of data, doing that in the database is usually faster, as you avoid moving all the data into a separate server process and then moving it back again. For many queries the round trip to the database server from the application server takes longer that the query itself.
> looping through cursors, making business logic decisions, calculating things, etc.
Interesting. Can you share more details about the "non-data" processing that was done? Were they doing heavy mathematical calculations and such?
> change in business logic that would normally only need to update application code, now invalidates the stored procedure's cached execution plan
As for plan cache invalidation - the most extreme case I saw was on the order of 5-10s. Basically, it depends on the size of that one stored procedure - not on all other stored procedures that may call it or be called by it. What was the actual time that they got?
So far in my career, such as it is, I have been on lots of rewrite projects and not one of them was a good idea.
There were one or two outright failures (new code abandoned) but the more subtle ones were just efforts that took so long to deliver value that the whole market opportunity was lost.
In every single case it was possible to take the existing system and gradually morph it towards something better - but the desire for revolution prevented evolution and forced everyone to live with shit in the hope of a future that didn't arrive.
I've got the opposite experience. Most rewrites I was on had substantial uplift in the product value.
The trick is to engage with the customer early, directly and often. If the customer isn't willing to pay more money for something, you should probably not do it.
If the effort is too large then you might take years to get to the end - always trying to match the old product (which evolves) and eventually realising that your new architecture isn't perfect either.
I think it can be extremely good to rewrite small things - you always know how to do it better after you've done it once.
It really depends on the origin of the system to be replaced. If it was never designed to be replaceable you are usually in for a bad time. Similarly if the drive to replace it comes from new folk that simply don't understand the original system you are probably completely doomed.
I often write code with full intention of replacing it, generally by writing it in such a way it's replacement is forced. That latter bit is pretty important because systems generate inertia, you need to have some hard forcing function baked into prevent it ossifying.
i.e my current project is to replace a pretty awful pile of Bash I wrote last year and replace it with a much more sophisticated and automated system in Golang. The new system is already in production and the old tools are being phased out.
Writing the original in Bash was how I basically ensured it would get rewritten and not continually upgraded/morphed into something grotesque. There was a ton of pushback, lots of people saying "why not Go! why not Python! This is an awful decision, 3k LOC of Bash is an unmaintainable nightmare!". Literally all of those people missed the point.
Building simple systems that you 100% will replace (not intend, that really isn't the same thing) is a really good pattern to find a real understanding of the solution space before building something gold plated/spaceship-like.
It generally means once you implement the follow up it very rarely becomes "legacy" software without horrible mismanagement, i.e bringing in new management that wants to rewrite things for the sake of rewriting them, usually in some FoTM nonsense.
Rewriting something you wrote yourself makes sense. i.e. who knows better than you what the bash script does and why? You did it with bash - probably quite quickly but now you understand what's needed you can do it better the second time in a language which might be a bit less flexible but is much faster.
As for writing code to be replaced - well I can understand doing it but generally I've been in companies where you never get a chance to go back to anything that doesn't seem to be critical to the next requirement from the business - and they always have more than you can do.
The big rewrites are almost a response to this - no improvement can be justified unless it enables so many new features that the business decides to sign off. But because it's such a big change it is also much much more risky.
> The company is still standing and seems to be doing well financially, so I guess things turned out well enough, or maybe some of the technical decisions started trending more reasonable.
Perhaps I've been lucky or I haven't been observant enough, but I've never seen a company suffer financially because of inefficient code. Don't get me wrong, I still value good code for its own sake, but in my experience there is no correlation between that and profits.
I've seen customers be driven away by poorly performing interfaces. I've seen downtime caused by exponentially growing queries. I've seen poorly written queries return such large datasets that they cause servers to run out of memory processing the request.
Unless you're doing stock trades or black Friday sales, though, it can be pretty hard to pin down a specific inefficiency to a concrete measure of list income. Instead, people move on from products for general "we don't like it" vibes.
The most concrete measure, as someone else pointed out, is heavily inflated PAAS spend caused by poorly written code that requires beefier than necessary servers. In theory, moving faster means you're spending less money on developer salaries (the Ruby/rails mantra of old) but there's a distinct tipping point where you have to pony up and invest in performance improvements.
My previous job designed their data lake and operations on it with horrific incompetence, and their solution was just to use AWS Lambdas scaling into the thousands and tens of thousands to do stuff over it.
They made so much money but would then squander it on hopelessly inefficient designs that required an AWS spend that basically prevented them from ever financially scaling.
IME, the suffering of bad performing code is mostly secondary. It increases compute costs. Mostly because requiring more beefy VMs than strictly required which is still benign and possibly more cost-efficient than spending more engineering effort. Sometimes because of the lack of performance now more scaling and orchestration is required which comes additional complexity and therefore compute and staffing cost. This is rare to get noticed and fixed due to organisational momentum.
The worst is when the performance is so bad it starts to prevent onboarding new features or customers.
The real cost is an opportunity cost. It doesn't show up in the financials. Your ability to react quickly to new business opportunities is hurt. Most CEOs and boards don't notice it, until it's too late.
I bet Atlassian could make even more money with Jira if it wasn't this slow. They are not struggling as it is, but it's bad enough that it is costing them customers
But generally I would agree
They are an example where someone decides "we will use jira" but they aren't necessarily the ones using it every day so the **ness doesn't matter to them.
Do you have any source for the "they are loosing customers"? I always thought, that they consciously decided to have that shit of an interface because no one relevant to them (i.e. purchasing departments) cares?
Only if the bad code affects customer experience significantly. That only happens to a big enough amount of you really let things grow out of contol. At some point you'll get bugs that take forever to solve and angry customers as a result.
As a general rule it is hard to measure lost opportunity costs. That doesn't mean they don't exist or shouldn't be considered. I mean.. why do humans even acknowledge efficiency at all.. let alone almost always as a virtue?
> The checkpoint system worked like this: every time we needed to perform a write, we would generate a UUID, store this UUID into a “checkpoint” for the current flow....
>Logically, that was… fine. In practice, writes into the same database which previously required 5 IO round trips, now required almost double the number of trips for the extra checkpointing operations...
So this sounds like trying to invent a form of optimistic locking, but it's definitely possible to do optimistic locking where no extra reads or writes are necessary unless there is a conflict. you have to write that value (whether timestamp or uuid) on every write, and then you do every write with a conditional "do this write as long as the lock value has not been changed."
But I guess the store they were using didn't allow even a conditional write like that? Although I'm not sure how they managed to make this "checkpoint" system work without that feature either... this seems a bit confusing, it seems like a more typical optimistic locking system should have been possible using whatever primitives the checkpointing system used? Maybe not?
The KV store had etag support for conditional writes. Etags are only useful to make sure the data didn't change underneath between your read and your write.
Storing the checkpoints along with the mutation was for idempotency. If the checkpoint was in the document, that meant the mutation had succeeded and a retry should be no-op
Does the author really call joins "joints"? I thought it was maybe a spelling thing but then they make a Snoop joke...
The author was implying that in between writing the business logic, they were smoking a lot of weed.
Yeah, I wondered what was going on there. Maybe just a joke or error, but definitely makes one wonder how much experience the author has with sql to compare/contrast the approaches to a more standard approach.
Not Dan Luu, is it? The site seems to leave that intentionally ambiguous.
Seems to be this[1] guy, which you can find out by skulking around an older snapshot of his website from the Wayback Machine, where he displayed his real name within the footer on every page, and went by a slightly different username up to seemingly as late as April this year. Maybe a rather poor attempt at dissociating his real identity from the website somehow? (on second thought, he still links to the webpage from his GitHub profile, so that wouldn't make much sense, I think. Hmm.)
Not Dan Luu. And to the other comment, not intentionally hiding my identity. Just that for most of my writing, who I am is irrelevant.
Is it just me? No conclusion, no takeaway, nothing.
This was a good read, especially for me considering I’m always having to remind people that SQL has been around since the 80s and has been optimized for a number of tasks over a very long time, tasks that should live in the database vs in code
Very nice to see the other side of this
"They wanted everything to be written into simple primitive Key-Value stores for their new design."
I feel this is a very political phenomenon that is very poignant in democracy, both Argentina and the US have elected executives that promise to chainsaw most of the government bureocracy for cost and streamlining reasons.
There's a chapter of south park where the kids buy a roller coaster park and make it their own way with their own rules, and slowly they start to face the challenges of running the whole operation, they start by hiring a security guard, and then the security guard needs food so they hire some fast food chain and then the bathrooms and etcetera, by the end of the chapter the park is fully operational and as good as always.
I used to be very libertarian and revolutionary too when I was younger, then I tried to revolutionize a codebase, ended up breaking stuff, and when correcting I ended up with the traditional architecture.
I don't doubt that driving a plow over the traditions of the previous generation is necessary, but in my experience there is always more wisdom in what we throw away than what we innovate, so I guess I'm a conservative
> there is always more wisdom in what we throw away than what we innovate
i have found this to be true throughout my career. not that things cannot improve, they can always improve. but assuming the original work wasn't grossly incompetent, anything "new" tend to make the same [undocumented] mistakes
An interesting mathematical caveat is that regardless of whether the majority of the work thrown out is correct, throwing the baby with the bathwater may still be optimal, as it might be the only way to affect change. It's not like decreasing the rate of change would decrease the amount of false positives discarded.
On the other hand...... if you look at software as a thing that is mostly about people's heads rather than code.
I mean that in a way I think it's more important to have a team that understands the problem domain fully than to have the code itself. Obviously the business might not think so but as a software process, the costly thing to do IMO is get a group of people to a point of common understanding where they can work productively.
When your developers didn't write the code they're unlikely to understand all the reasons why it is the way it is. How do you get them trained up to that level again? One way is a rewrite. The end result may be no better or even worse but now you have people who know the system, the business and they can look after it efficiently....until they all leave and you start again.
This is my devil's advocate answer to my own feeling that one should never rewrite.
Note that Argentine definitely needed some change in course given their situation.
Not checked their situation much, but their current ideas were clearly not sustainable given repeated defaults. And general bad situation of population.