The Architecture of Open Source Applications (Volume 1) Berkeley DB

72 points by grep_it 6 days ago

Many years ago I was obsessed with Berkeley DB and its performance.

But when I discovered Tokyo Cabinet and Tokyo Tyrant I almost literally fell in love. We used it for things that would have been impossible without it at the time.

Still worth checking it out: https://github.com/hthetiot/Tokyo-Cabinet

fix4fun - 6 hours ago

I got similar experience. Using Berkeley DB until I found SQLite ;) Of course it is not directly key/value, but small size, simplicity and IO performance was amazing for me.

bborud - 7 hours ago

Berkeley DB is one of those things everyone respected, for some reason, but that didn't actually work if you threw a bit of data at it. And not just for us. I remember talking to companies that paid them lots of money to work on reliability, and it never got better.

But I do remember reading much of the source (trying to figure out why it didn't work) and thinking "this is pretty nice code".

atombender - 6 hours ago

Well, it worked for Amazon — Berkeley DB was used extensively there as the makn database, right from the beginning. I remember talking to an ex-Amazon engineer in 2006 who said BDB was still the main database used for inventory, and complained that everything was a mess, with different teams using different tech for everything. Around that time Amazon made DynamoDB to solve some of that mess — and it sat on top of BDB.
An old thread about this: https://news.ycombinator.com/item?id=29290095.
- spudlyo - an hour ago
  
  Can verify. When I started in the catalog department in '97, "the catalog" was essentially a giant Berkeley DB keyed on ISBN/ASIN that was built/updated and pushed out (via a mountain of Perl tools) to every web server in the fleet on a regular cadence. There were a bunch of other DBs too, like for indexes, product reviews, and other site features. Once the files landed, the deploy tooling would "flip the symlinks" to make them live.
  Berkeley DBs were the go-to online databases for a long time at Amazon, at least until I left at the turn of the century. We had Oracle databases too, but they weren't used in production, they were just another source of truth for the BDBs.
- bborud - 5 hours ago
  
  It worked well for Amazon because they kept it within a tight operating envelope. They used it to persist bytes on disk in multiple, smaller BDBs per node. This kept it out of trouble. They also sidestepped the concurrency and locking problems by taking care of that in the layers above. It was used more like SSTables in BigTable.
  They phased out BDB before DynamoDB was launched. Some time between 2007 and 2010. By the time DynamoDB launched as a product in 2012(?), BDB was gone.
mistrial9 - 8 minutes ago

yeah - scars still visible here from a year 2000 project using BerkeleyDB. Unbelievable complexity to write adapters to ordinary desktop software.

procaryote - 8 hours ago

I wanted to love berkeley db; it was available everywhere, seemed simple, was fast when tested. In practice it never worked well though, with pretty frequent corruption under load, and license confusion from oracle. It has a lot of features you're never going to use, and if you try, you'll be disappointed

There's no shortage of embeddable key-value stores with C bindings like leveldb, rocksdb, or even gdbm, and all of them have worked better for me.

tkiolp4 - 4 hours ago

I love the aosa book. I learned a lot about systems design from it. Ironically, I usually fail the Systems Design interviews at fancy companies because they only ask about LBs, sharding, obscure data structures like CRDTs, and what not.

tebeka - 5 hours ago

Loved this chapter, great design, well written.