Apache Arrow is 10 years old

arrow.apache.org

163 points by tosh 9 hours ago


data_ders - 7 hours ago

if I could tell myself in 2015 who had just found the feather library and was using it to power my unhinged topic modeling for power point slides work, and explained what feather would become (arrow) and the impact it would have on the date ecosystem. I would have looked at 2026 me like he was a crazy person.

Yet today I feel it was 2016 dataders who is the crazy one lol

aynyc - 4 hours ago

What's the difference between feather and parquet in terms of usage? I get the design philosophy, but how would you use them differently?

HoldOnAMinute - an hour ago

I read that entire page and I could not tell you what Apache Arrow is, or what it does.

pm90 - 5 hours ago

Its nice to see useful, impactful interchange formats getting the attention and resources they need, and ecosystems converging around them. Optimizing serialization/deserialization might seem like a "trivial" task at first, but when moving petabytes of data they quickly become the bottlenecks. With common interchange formats, the benefits of these optimizations are shared across stacks. Love to see it.

aerzen - 4 hours ago

I like arrow for its type system. It's efficient, complete and does not have "infinite precision decimals". Considering Postgres's decimal encoding, using i256 as the backing type is so much saner approach.

mempko - 5 hours ago

We use Apache Arrow at my company and it's fantastic. The performance is so good. We have terabytes of time-series financial data and use arrow to store it and process it.

actionfromafar - 7 hours ago

I had to look up what Arrow actually does, and I might have to run some performance comparisons vs sqlite.

It's very neat for some types of data to have columns contiguous in memory.

- 3 hours ago
[deleted]