RRB-Trees: Efficient Immutable Vectors (2012) [pdf]

infoscience.epfl.ch

44 points by azhenley 3 days ago

I used this data structure in my immutable database (see profile) but eventually switched to a B-tree because I believe RRB trees are inherently flawed. If you do a large number of slices and concats, it is possible for the tree to contain so many gaps it that the tree grows so deep it can't be modified anymore. At first I thought it was a bug in my own implementation but I eventually found the same bug in rrb-vector, the clojure implementation (see CRRBV-14). In fact, the maintainer of that library reached the same conclusion I did and switched to B-trees: https://github.com/jafingerhut/core.btree-vector

Still, I have huge respect for Phil Bagwell and I make heavy use of his hash array mapped trie. But this issue with RRB trees makes it impossible for me to use them, especially for an on-disk data structure whose long lifespan makes it way more likely that the problem will eventually happen.

bjoli - 3 hours ago

Hmmm. This comment made me think. If you would do a radix tree that degrades to a b+tree, you would get a couple of things. The b+tree concat and subsequently insertions and removal, is faster than the rrb tree one, but yields more relaxed nodes.
If you have a relaxed root, you know the size of every subtree. If the the subtree is dense you can easily calculate the shift, meaning you get dense radix tree subnodes with fast lookup.
This gets you the benefits of a relaxed b tree, but no issues with accidentally having too deep shifts since they are relative to where the dense subtree starts.
This could be done to a rrb tree as well, which would mitigate the issues you are seeing, but would still lead to an arbitrarily deep tree if the concatenation isn't done properly.
The only tradeoff would be slower indexing on relaxed trees because more nodes would end up unbalanced, but at the same time concatenations, insertions and removals would probably be faster.
I will try to implement this.
Edit: no, seriously. This is a splendid idea. The b tree concat is faster than the rrb tree "full" merge. And simpler. And storing the tree subtree height in a byte somewhere means we can go from relaxed to dense and then do a proper radix tree search. (I have one of those rrb tree implementations that probably fails, btw. It has a proper concat, but the improper one is so much faster. I know. I am ashamed.)
- xeubie - an hour ago
  
  Yeah you should try it. I place a very big value on simplicity and code size but the combo you describe might have benefits.
HexDecOctBin - 16 hours ago

Did you use some resource on how to layer immutability on top of B-trees? This is finicky enough that I don't feel comfortable YOLOing.
- bjoli - 13 hours ago
  
  This is a persistent ordered map on b+trees.
  https://rikspucko.koketteriet.se/bjoli/PersistentMap/src/bra...
bjoli - 14 hours ago

that is because many implementations bypass strict rebalancing to squeeze out extra concat performance. And because the strict rebalancing algorithm is awful to implement.

ashton314 - 21 hours ago

This is Rhombus’ native data type! Such a nice data structure.

azhenley - 21 hours ago

There was some interesting discussion which data structure to use for Rhombus: https://github.com/racket/rhombus/discussions/221

erichocean - 2 days ago

If you like this kind of thing, Bifurcan [0] is a Java library with implementations of RBB-trees and related (fast) immutable data structures.

[0] https://github.com/lacuna/bifurcan

wasting_time - 3 days ago

A refreshing break from Molt News. Now I want to check how vectors are implemented in my favorite languages.

inhumantsar - 2 days ago

the `im` rust crate provides immutable data structures, one of them being an RRB-based Vec. don't remember what the stdlib Vec uses.
- oniony - 2 days ago
  
  I believe Vec is a straight array underneath, which is reallocated at a larger size when full. And Vector in the `im` crate you mentioned looks very interesting indeed.