Hierarchical Modeling (H-Nets)

cartesia.ai

93 points by lukebechtel 10 months ago


lukebechtel - 10 months ago

> H-Net demonstrates three important results on language modeling:

> 1. H-Nets scale better with data than state-of-the-art Transformers with BPE tokenization, while learning directly from raw bytes. This improved scaling is even more pronounced on domains without natural tokenization boundaries, like Chinese, code, and DNA.

> 2. H-Nets can be stacked together to learn from deeper hierarchies, which further improves performance.

> 3. H-Nets are significantly more robust to small perturbations in input data like casing, showing an avenue for creating models that are more robust and aligned with human reasoning.

modeless - 10 months ago

I don't know if this is the one but something like this is clearly the future IMO. We need more levels of hierarchy to efficiently generalize to longer sequences with high level structure. Back when Byte Latent Transformers came out I thought extending the idea to more levels of hierarchy was the way to go, and this seems to be basically that?

Another article about H-Nets: https://main-horse.github.io/posts/hnet-inf/

cs702 - 10 months ago

I've only skimmed the paper, but it looks interesting and credible, so I've added it to my reading list.

Thank you for sharing on HN!

---

EDIT: The hierarchical composition and routing aspects of this work vaguely remind me of https://github.com/glassroom/heinsen_routing/ but it has been a while since I played with that. UPDATE: After spending a bit more time on the OP, it's different, but the ideas are related, like routing based on similarity.

blurbleblurble - 10 months ago

Hand wavy idea: I wonder if we couldn't take this to another level and have some kind of general graph representation along with hierarchical reductions of it.

I sort of disagree with the assertion that "language is fundamentally hierarchical" in that it supposes there is a single abstraction hierarchy that's universally preferable or correct. That's just not true. It doesn't hurt anybody and it's definitely simpler to choose just one useful one (a hierarchy) but why learn only one? Why not learn multiple and also learn how to modulate between them?

notreallymetho - 10 months ago

I haven’t read fully yet, but it reminds me of some work I’ve done. https://github.com/jamestexas/papers/blob/main/bread/paper.m...

astrange - 10 months ago

> 3. H-Nets are significantly more robust to small perturbations in input data like casing, showing an avenue for creating models that are more robust and aligned with human reasoning.

If it forms a hierarchy (a tree), it seems like it wouldn't be robust to rearranging the information in a prompt.

eg if your request has a long list or a table of data, all the different permutations of that will create different trees even though they're actually the same thing.

vannevar - 10 months ago

>The best AI architectures in use today treat all inputs equally.

Doesn't this architecture also treat all inputs equally? It seems like an encoder that preprocesses the input by inferring hierarchy. But don't all models essentially do that while training?

aeon_ai - 10 months ago

Seems likely to be relevant for memory formation/consolidation/management.

Big, if so.

gdiamos - 10 months ago

How does it handle images?

cubefox - 10 months ago

As Mamba didn't make it, will H-Nets replace Transformers?

- 10 months ago
[deleted]