The Architecture of Open Source Applications (Volume 1) Berkeley DB
aosabook.org72 points by grep_it 6 days ago
72 points by grep_it 6 days ago
Many years ago I was obsessed with Berkeley DB and its performance.
But when I discovered Tokyo Cabinet and Tokyo Tyrant I almost literally fell in love. We used it for things that would have been impossible without it at the time.
Still worth checking it out: https://github.com/hthetiot/Tokyo-Cabinet
I got similar experience. Using Berkeley DB until I found SQLite ;) Of course it is not directly key/value, but small size, simplicity and IO performance was amazing for me.
Berkeley DB is one of those things everyone respected, for some reason, but that didn't actually work if you threw a bit of data at it. And not just for us. I remember talking to companies that paid them lots of money to work on reliability, and it never got better.
But I do remember reading much of the source (trying to figure out why it didn't work) and thinking "this is pretty nice code".
Well, it worked for Amazon — Berkeley DB was used extensively there as the makn database, right from the beginning. I remember talking to an ex-Amazon engineer in 2006 who said BDB was still the main database used for inventory, and complained that everything was a mess, with different teams using different tech for everything. Around that time Amazon made DynamoDB to solve some of that mess — and it sat on top of BDB.
An old thread about this: https://news.ycombinator.com/item?id=29290095.
Can verify. When I started in the catalog department in '97, "the catalog" was essentially a giant Berkeley DB keyed on ISBN/ASIN that was built/updated and pushed out (via a mountain of Perl tools) to every web server in the fleet on a regular cadence. There were a bunch of other DBs too, like for indexes, product reviews, and other site features. Once the files landed, the deploy tooling would "flip the symlinks" to make them live.
Berkeley DBs were the go-to online databases for a long time at Amazon, at least until I left at the turn of the century. We had Oracle databases too, but they weren't used in production, they were just another source of truth for the BDBs.
It worked well for Amazon because they kept it within a tight operating envelope. They used it to persist bytes on disk in multiple, smaller BDBs per node. This kept it out of trouble. They also sidestepped the concurrency and locking problems by taking care of that in the layers above. It was used more like SSTables in BigTable.
They phased out BDB before DynamoDB was launched. Some time between 2007 and 2010. By the time DynamoDB launched as a product in 2012(?), BDB was gone.
yeah - scars still visible here from a year 2000 project using BerkeleyDB. Unbelievable complexity to write adapters to ordinary desktop software.
I wanted to love berkeley db; it was available everywhere, seemed simple, was fast when tested. In practice it never worked well though, with pretty frequent corruption under load, and license confusion from oracle. It has a lot of features you're never going to use, and if you try, you'll be disappointed
There's no shortage of embeddable key-value stores with C bindings like leveldb, rocksdb, or even gdbm, and all of them have worked better for me.
I love the aosa book. I learned a lot about systems design from it. Ironically, I usually fail the Systems Design interviews at fancy companies because they only ask about LBs, sharding, obscure data structures like CRDTs, and what not.
Loved this chapter, great design, well written.