Apache DataFusion

datafusion.apache.org

13 points by thebuilderjr 2 days ago


krapht - 2 days ago

I feel like I'm not the target audience for this. When I have large data, then I directly write SQL queries and run them against the database. It's impossible to improve performance when you have to go out to the DB anyway; might as well have it run the query too. Certainly the server ops and db admins have loads more money to spend on making the DB fast compared with my anti-virus laden corporate laptop.

When I have small data that fits on my laptop, Pandas is good enough.

Maybe 10% of the time I have stuff that's annoyingly slow to run with Pandas; then I might choose a different library, but needing this is rare. Even then, of that 10% you can solve 9% of that by dropping down to numpy and picking a better algorithm...

bionhoward - 2 days ago

How does this compare/contrast to polars? Seems pretty similar, anybody tried both?