The Burrows-Wheeler Transform

sandbox.bio

156 points by g0xA52A2A 3 days ago


mfworks - 3 days ago

The most magical part of this transform is the search! First learned about this in a bioalgorithms course, and the really cool property is that for a string length l, you can search for the string in O(l) time. It has the same search time complexity of a suffix tree with O(n) space complexity (with a very low constant multiple). To this day it may be the coolest algorithm I've encountered.

pixelpoet - 3 days ago

I remember randomly reading about this in the Hugi demoscene diskmag as a young teen, and it completely blew my mind (notably the reverse transform, apparently not covered in OP's article): https://hugi.scene.org/online/coding/hugi%2013%20-%20cobwt.h...

The author later wrote many now-famous articles, including A Trip Through the Graphics Pipeline.

hinkley - 3 days ago

Noteworthy that the paper that was published describing this technique came at a time when IP lawyers had begun to destroy the world wrt to small business innovation. And that they released it unencumbered is a huge debt of gratitude that we haven’t really honored.

Imnimo - 3 days ago

An interesting bit of trivia about the Burrows-Wheeler transform is that it was rejected when submitted to a conference for publication, and so citations just point to a DEC technical report, rather than a journal or conference article.

moribvndvs - 3 days ago

For some reason I was really getting lost on step 2, “sort” it. They mean lexicographically or “sort it like each rotation is a word in the dictionary, where $ has the lowest character value”. Now that I see it, I’m a little embarrassed I got stuck on that step, but hopefully it helps someone else.

lancefisher - 3 days ago

This is a good article and a fun algorithm. Google made a great series of compression videos a while ago - albeit sometimes silly. For the Burrows-Wheeler one they interviewed Mike Burrows for some of the history. https://youtu.be/4WRANhDiSHM

username223 - 3 days ago

Does anyone else remember when Dr. Dobb's Journal wrote about stuff like this? https://web.archive.org/web/20040217202950/http://www.ddj.co...

foobarian - 3 days ago

The unintuitive part of BWT to me was always the reverse operation, which was the only thing the post didn't give intuition for :-)

raboukhalil - 3 days ago

Author here, nice to see the article posted here! I'm currently looking for ideas for other interactive articles, so let me know if think of other interesting/weird algorithms that are hard to wrap your head around.

kvark - 3 days ago

I've spent years playing with BWT and suffix sorting at school (some work can be found be names of archon and dark). It's a beautiful domain to learn!

Now I'm wondering: in the era of software 2.0, everything is figured out by AI. What are the chances AI would discover this algorithm at all?

kragen - 2 days ago

This page looks very nice indeed!

This doesn't follow the original paper very closely, although I think the device of putting an end-of-string marker into the data instead of including the start index as a separate datum is an improvement over the original algorithm. My own "explorable explanation" of the algorithm, at http://canonical.org/~kragen/sw/bwt, is visually uglier, and is very similar to this page in many ways, but it may still be of some interest: it follows the original paper more closely, which is likely to be useful if you're trying to understand the paper. Also, I used an inverse-transform algorithm that was efficient enough that you could imagine actually using in practice, while still being easy to explain (from the original paper)—but, like the page linked, I didn't use an efficient suffix-array-construction algorithm to do the forward transform. I did link to the three that have been discovered.

I forget when I did this but apparently sometime around 02010: https://web.archive.org/web/20100616075622/http://canonical....

I think my JS code in that page is a lot more readable than the minified app.js served up with this page.

zahlman - 3 days ago

BWT is how I first got to interact with Tim Peters on the Internet: https://stackoverflow.com/questions/21297887

jansan - 3 days ago

For an article describing a compression algorithm this was very digestible and entertaining.

fair_enough - 3 days ago

Thanks, man! This helps a lot, because as you say the algorithm is not intuitive. The description of the type of data it is very good for is the best value-adding part of this writeup. Greatly appreciated!

DmitryOlshansky - 2 days ago

And I’ve just implemented BWT and Inverse BWT in D, earlier today! https://github.com/DmitryOlshansky/compd/blob/master/source/...

feanaro - 3 days ago

This is essentially a group theoretical result about permutation groups. Would be nice to see a treatment from this angle.

loloquwowndueo - 3 days ago

I was once mentioning this transform to someone and they were like “what? The Bruce Willis transform?” - I guess my pronunciation sucked that much.

embit - 3 days ago

This is great. The best article I have read on BWT and experimented was in Dr Dobbs Journal around 1996-97.