Spiral

spiraldb.com

257 points by jorangreef 2 days ago


mellosouls - 2 days ago

This is a pretty website but doesn't actually give us anything to actually look at, its just blurb.

For anybody confused, the "Vortex" stuff is the underlying data format used but isn't the database/whatever this website (by the creators of Vortex) is pushing.

pauldix - 2 days ago

I've been following this team's work for a while and what they're doing is super interesting. The file format they created and put into the LF, Vortex, is very welcome innovation in the space: https://github.com/vortex-data/vortex

I'm excited to start doing some experimentation with Vortex to see how it can improve our products.

Great stuff, congrats to Will and team!

spankalee - 2 days ago

I'm curious... I'm not a database or AI engineer. The last time I did GPU work was over a decade ago. What is the point of the "saturate an H100" metric?

I would think that a GPU isn't just sitting there waiting on a process that's in turn waiting for one query to finish to start the next query, but that a bunch of parallel queries and scans would be running, fed from many DB and object store servers, keeping the GPUs as utilized as possible. Given how expensive GPUs are, it would seem like a good trade to buy more servers to keep them fed, even if you do want to make the servers and DB/object store reads faster.

paxys - 2 days ago

Wasn't "3.0" supposed to be crypto? Is it AI now? It's had to keep track.

vouwfietsman - 2 days ago

Although I welcome a parquet successor, I am not particularly interested in a more complicated format. Random access time improvements are nice, but really what I would like just storing multiple tables in a single parquet file.

When I read "possible extension through embedded wasm encoders" I can already imagine the c++ linker hell required to get this thing included in my project.

I also don't think a lot of people need "ai scale".

cryptonector - 2 days ago

I can't tell what this is about.

donperignon - 2 days ago

“ We work in person at our offices in London and New York. Face to face is better: if uncertain, the answer is “yes, get on the plane”. On Wednesdays, we wear pink.”

No comments.

djfobbz - 2 days ago

So this Vortex engine is a combination of OLTP and OLAP on steroids?

dwroberts - a day ago

> Remember that uncanny valley between 1KB and 25MB? The problem isn't the sizes—it's that Second Age systems force you to choose between two bad options: inline the data (killing performance) or store pointers (breaking governance). Spiral eliminates this false choice. We store 10KB embeddings directly in Vortex for microsecond access, intelligently batch 10MB blocks of images for optimal S3 throughput, and externalize 4GB videos without copying a single byte. One system, no compromises.

No compromises but isn’t ‘externalising’ a large video the equivalent of storing a pointer in the first example? Can’t really see any other way to understand what that means (it goes to an external system and you store where it is)

reactordev - 2 days ago

Anyone that can improve upon the parquet hell that is my life is gladly welcomed...

rubenvanwyk - 2 days ago

Interesting that Joran from Tigerbeetle posted this? So must be legit.

derekhecksher - 2 days ago

The AnyBlox paper from the folks at TUM, and linked to in the post, is a bit more interesting, imo, since it looks to solve the data systems x storage format problem in composable data architectures - https://gienieczko.com/anyblox-paper

- 2 days ago
[deleted]
all2 - 2 days ago

Spelling error "sttill"

> P.S. If you're sttill managing data in spreadsheets, this post isn't for you. Yet.

---

Since I discovered the ECS pattern, I've been curious about backing it with a database. One of the big issues seems to be IO on the database side. I wonder if Spiral might solve this issue.

4ndrewl - 2 days ago

The three eras of database systems starts with a client-server Postgres, but missed the daddy of the generation before that - xBase (ie dBase, FoxPro etc).

redwood - 2 days ago

So it's for low change rate data that needs to be bulk processed during ML model training. Cool. But hardly the same thing as what you need for powering live AI applications... which is what I assumed this was upon reading the intro and the mention of Postgres..

Postgres (and MongoDB) are the king and prince of data due to their transactional capabilities.

rubenvanwyk - 2 days ago

How does Vortex compare to Lance? I imagine Lance is already a good solution for AI on CPUs.

dwb - 2 days ago

If you don't clearly detail what your new tech product or system is bad at, as well as what it's good at, I'm not interested. So much of engineering is about navigating the inevitable tradeoffs. Marketing should have no place in engineering.

sys13 - 2 days ago

I wonder how much we need this vs implementing it as part of Delta Lake or Iceberg

mlhpdx - 2 days ago

I stopped reading at “new era”. At this point in time with the deluge of content, start with a problem and solution in a concise statement if you want my attention. I’m not reading your opinion piece.

zzzeek - 2 days ago

This links to a super long winded blog post that sounds more like a toast at a wedding, so I went to the main page to try to see what their product is, and you just get a blitz of fancy animations of table diagrams and things and lots of very cheap sounding slogans pushed out like "Works with any data! Fully XYZ 2.0 compliant! Ties your shoes!"

basically im not sure where the product is hiding under all of this bluster but this doesnt feel very "hacker"-Y

holoduke - 2 days ago

So basically this is a file system that runs on your gpu?

whalesalad - 2 days ago

The hot new aesthetic these days is either "receipt printer" or "liquid glass". I dig it, tbh.

bflesch - 2 days ago

Big ick from my side. Manifest-style marketing blog post talking about revolutionary things but it seems their main metric is in the image above the post: "hey, we've raised $22M in funding".

Landing pages of both spiral and vortex are GPU-hugging animations and void of any technical information. Empty nothing-statements like "machine scale". They claim 100x improvements but don't link any metrics.

Maybe this is a "don't hate the player, hate the game" situation but somehow the collective of likeminded AI engineers decided to upvote this post to #1 on HN.

- 2 days ago
[deleted]
skywhopper - 2 days ago

Man, they are really proud of that initial seed round funding aren’t they? Forgive me, but $22 million does not sound like enough to truly revolutionize data processing technology.

The gist seems to be that they can overcome network latency issues when dealing with huge numbers of smallish objects in S3-like storage systems that need to be fed into GPUs? Yeah, those formats and systems were not designed to feed that type of processor. You’re doing it wrong if this is your problem.

After a lot of nonsense, it sounds like they just reformat the data into something more efficient instead. But they forget about the network latency and blame CPUs for slowing things down? And what was that sidetrack about S3 permissions?

I wouldn’t jump right onto this… well, it’s not clear what this even is exactly. But you can probably wait it out.

curtisszmania - 2 days ago

[dead]

- 2 days ago
[deleted]
SomeHacker44 - 2 days ago

"100KiB images"... This is odd. Most of my images are 2.5-4 MB. My raw images are 3-10x larger.

raziel2p - 2 days ago

> Vortex is designed to support decoding data directly from S3 to GPU, skipping the CPU bottleneck entirely.

how is this significant? surely either the network or the GPU calculations is the bottleneck here?