Einsum for Tensor Manipulation

swe-to-mle.pages.dev

80 points by peluche_ a year ago

While I applaud the OP's exposition, imagine that instead of having axis names live as single-letter variables within einsum, our arrays themselves had these names attached to their axes?

It's almost like when we moved from writing machine code to structured programming: instead of being content to document that certain registers corresponding to certain semantically meaningful quantities at particular times during program execution, we manipulated only named variables and left compilers to figure out how to allocate these to registers? We're at that point now with array programming.

https://nlp.seas.harvard.edu/NamedTensor

https://math.tali.link/rainbow-array-algebra/

https://arxiv.org/abs/2102.13196

WCSTombs - a year ago

> imagine that instead of having axis names live as single-letter variables within einsum, our arrays themselves had these names attached to their axes?
I already replied to a couple of other comments with this, but you're exactly describing the Python library Xarray: https://docs.xarray.dev/en/stable/ I've found it to work quite well in practice, and I've gotten a lot of mileage out of it in my work.
- taliesinb - a year ago
  
  Thanks for highlighting XArray in your other comments. Yup, XArray is great. As are Dex and the various libraries for named axis DL programming within PyTorch and Jax. I never said these things don't exist -- I even mention them in the linked blog series!
  But I do think it's fair to say they are in their infancy, and there is a missing theoretical framework to explain what is going on.
  I anticipate name-free array programming will eventually be considered a historical curiosity for most purposes, and everyone will wonder how we put up without it for so long.
  - WCSTombs - a year ago
    
    Thanks for pointing that out, and sorry for being redundant. I skimmed the blog but missed that.
    Overall, I agree with your larger point. I think this it's a natural progression and pretty analogous to assembly language leading to higher-level languages with a much better developer experience. In that case we went from fixed registers to developer-named variables, and now we can go from fixed numerically-indexed array axes to ones indexed by developer-chosen names.
    
    taliesinb - a year ago
    
    No worries! Yes, exactly. It's also similar to doing arithmetic with and without units. Sure, you can do arithmetic without units, but when you are actually working with real-world quantities, but you easily get yourself into a muddle. Unit-carrying quantities and algebraic systems built on them prevent you from doing silly things, and in fact guide you to getting what you want.
    
    WCSTombs - a year ago
    
    Yeah, I'm totally with you on the units, too. For me it's even more frustrating, because it's such an obvious and ubiquitous problem, and still relatively unacknowledged. There are third-party solutions (like Pint in Python), but in my experience by default scientific software projects tend to begin with just NumPy basically, and introducing more advanced (IMO) tools like Xarray and Pint is an uphill battle against ingrained habits.
  - - a year ago
    
    [deleted]
albertzeyer - a year ago

PyTorch also has some support for them, but it's quite incomplete and has many issues so that it is basically unusable. And its future development is also unclear. https://github.com/pytorch/pytorch/issues/60832

ctrw - a year ago

You mean like

    res2 = dumbsum(query, key, 'batch seq_q d_model, batch seq_k d_model -> batch seq_q seq_k')

evnc - a year ago

When I see an embedded DSL passed around as strings like this I can't help but think "this could be its own programming language"

Then it could have syntax highlighting, auto complete, and so on. The type system for such a language could possibly include verifying shapes at compile time.

What would a language comprised of .ein source files for manipulating tensors, which compiles down to the same low level ops, look like?

reikonomusha - a year ago

No need for .ein source files. We just need a programming language that allows the definition of embedded DSLs without shoving them into one-line strings. A language like Common Lisp.
Here's einsum in 200 lines of Common Lisp. All einsum expressions are statically analyzed, checked for errors, and AOT compiled to machine code: https://github.com/quil-lang/magicl/blob/master/src/high-lev...
- mcabbott - a year ago
  This is also how it works in Julia, where macros digest notation for einsum-like operations before compile-time. In fact the linked file's explanatory comment:
  (einsum (A i j) (B i k) (C k j)) results in the the updates A[i,j] = \\sum_k B[i,k]C[k,j], which is equivalent to matrix multiplication.
  very nearly contains the syntax used by all the Julia packages (where @ marks a macro), which is
  @tensor A[i,j] = B[i,k] * C[k,j]
  (using https://github.com/Jutho/TensorOperations.jl, but see also OMEinsum, Finch, and my Tullio, TensorCast.)
dsharlet - a year ago
I wrote a library in C++ (I know, probably a non-starter for most reading this) that I think does most of what you want, as well as some other requests in this thread (generalized to more than just multiply-add): https://github.com/dsharlet/array?tab=readme-ov-file#einstei....
A matrix multiply written with this looks like this:
```
    enum { i = 2, j = 0, k = 1 };
    auto C = make_ein_sum<float, i, j>(ein<i, k>(A) * ein<k, j>(B));
```
Where A and B are 2D arrays. This is strongly typed all the way through, so you get a lot of feedback at compile time, and C is 2D array object at compile time. It is possible to make C++ template errors reasonable with enable_if and the like, this works well-ish on clang, but not so well in GCC (YMMV).
This library is a lot less automated than most other einsum implementations. You have to explicitly control the loop ordering (in the example above, the `j` loop is innermost because it is loop 0). If you build a good loop order for your shapes, the compiler will probably autovectorize your inner loop, and you'll get pretty good performance. Control over the loop ordering is in general a useful tool, but it's probably a lot lower level than most users want.
lukehoban - a year ago

I played around with the idea of a language motivated by this same thought process last year: https://github.com/lukehoban/ten.
* Succinct syntax and operators tailored to AI model definition
* Fully statically typed tensors, including generic functions over tensor dimension and batch dimensions (...)
* First-class hyper-parameters, model parameters and model arguments for explicit model specification
* Einops-style reshaping and reductions - tensor dimensions are explicit not implicit
- evnc - a year ago
  
  Hey, this is neat! Thanks for sharing. I may be interested in collaborating with you on this when I have some free time.
jimsimmons - a year ago

Einsums are the regexes of tensor programming. Should be avoided at all costs IMO. Ideally we should be able to write native loops that get auto-vectorized into einsums for which there is a CUDA/PTX emitting factory. But for some reason neither PyTorch nor JAX/TF took this route and now we are here.
Some of the einsum expressions I have seen for grouped multi headed/query attention is mind-boggling and they get shipped to prod.
- oivey - a year ago
  
  JAX kind of did take this route, no? The main issue is that it’s going to be hard/impossible to compile Python loops to GPU kernels. It’s also maybe not the most ergonomic solution, which is why there is shorthand like einsum. Einsum can be much more clear than a loop because what it can do is so much more limited.
  - jimsimmons - a year ago
    
    JAX tries to be a functional language that has a Python front end. The problem is if you are outside Google and don't really understand the XLA compiler, then you are screwed.
- eutectic - a year ago
  
  I like einsum, it's concise and self-explanatory. Way better than multiple lines of nested for loops would be.
  - jimsimmons - a year ago
    
    I agree that nested loops are verbose. But einsum is unstructured and not extensible, hard to debug in pieces, hard to document etc.,
bjourne - a year ago

That sounds very overkill for something that already is overrkill for most ppliciations.

barfbagginus - a year ago

If you'd like a nice, logically complete diagrams language for Einstein summation, consider this:

scholar.google.com/scholar?hl=en&as_sdt=7%2C39&q=trace+"string+diagrams"+"Einstein+summation"&btnG=#d=gs_qabs&t=1714249856116&u=%23p%3Dbw0D4te4FGEJ

This article would have been better if the author had generated these kinds of diagrams, to show us what tensors the code was summing, and how. They could have easily used GPT to write the diagrammatic TiKz code, like they did for the fantasy style introduction.

But the diagrams would have illuminated the math, like real magic, while the fantasy writing didn't do any real magic because it only obfuscated the math. GPT does this when you don't give it the context to really explain things.

taliesinb - a year ago

Looks interesting, but your link is broken, can you try again or give us a direct PDF or Arxiv link?
- jasomill - a year ago
  
  Working Google Scholar link:
  https://scholar.google.com/scholar?hl=en&as_sdt=7%2C39&q=tra...
  Direct arXiv PDF link to first result:
  https://arxiv.org/pdf/1308.3586

ziofill - a year ago

If you, like me, need to use einsum programmatically you'll find the less-known stringless interface way less cumbersone: e.g. matrix multiplication of A and B: `np.einsum(A, [0,1], B, [1,2])`.

chpatrick - a year ago

I think einsum is great but it's a bit like regex in that it's unnecessarily concise and hard to read. I use a wrapper where you can name the axes anything you like, eg. "index", "world", "cam", and it translates it to the single letters.
Even better is if the library keeps track of the names of the tensor axes and the einsum is automatic.
- rsfern - a year ago
  
  Not sure if the wrapper you’re talking about is your own custom code, but I really like using einops lately. It’s got similar axis naming capabilities and it dispatches to both numpy and pytorch
  http://einops.rocks/
  - chpatrick - a year ago
    
    I just made my own function but that sounds great, I'll check it out!
- WCSTombs - a year ago
  
  I totally agree. Here's a library that basically does that: https://docs.xarray.dev/en/stable/
quietbritishjim - a year ago

Note this is only available in the pytorch variant [1], not the original numpy interface that it is based on [2]
[1] https://pytorch.org/docs/stable/generated/torch.einsum.html
[2] https://numpy.org/doc/stable/reference/generated/numpy.einsu...
- a year ago

[deleted]
- a year ago

[deleted]

namibj - a year ago

I like what Facebook had done on tensor comprehension. Basically generalizing from pure mul+add to other operations and allowing specifying what to reduce, as needed for e.g. expressing convolutional kernels. Sadly abandoned; I still love the conciseness.

metalrain - a year ago

I like how this explains the details.

One can easily dismiss use of this kind of stringly typed construct, but once you see how abstraction enables optimizations in implementation, it's kind of win-win.

keithalewis - a year ago

If someone told me they wrote a C library that took a string of C code that could be compiled and linked back into the program, I would suggest there might be a better solution. Einsum does not seem to be respecting the Zen of Python.

KeplerBoy - a year ago

Einsum's instructions are hardly "a string of C Code". It's dense mathematical notation, which has been around for quite some time, hence the name.
- keithalewis - a year ago
  
  They are a string however. Remember when Windows was called 'A thing on a thing?' That's what this reminds me of.

andy99 - a year ago

Having used these (mostly translating code that used them) I see the power and benefit. I also find it takes a lot of mentally energy to get my head around them and makes readability harder.

ssivark - a year ago

> I also find it takes a lot of mentally energy to get my head around them and makes readability harder.
Why so? They take what was implicit and make it explicit, so I would expect this notation to be far easier to understand than transposes and matmuls. You also get better variable names for axes, etc.
My complaint with Einsum variants in Python is that they are not generic enough -- they are specialised to multiplication & addition whereas generalizing slightly to allow other operators makes the code very expressive and elegant, while allowing you to still compile down to efficient code. For example, the Tullio library in Julia lang is absolutely a breath of fresh air.
WCSTombs - a year ago

For something in the same ballpark that I've found to be pretty ergonomic, there is Xarray: https://docs.xarray.dev/en/stable/index.html
What you do is name all the axes of an array, and then for all array computations, you address the axes by name. The great thing is that for the correctness of any of these operations, you never need to remember what numerical order the axes are in, since you just use the names. (For _performance_, the order can be important, in which case you can easily permute the axes to any order you want and then access the underlying NumPy/CuPy/whatever arrays if you need to.)
hobs - a year ago

Being a non-native and a person who frankly doesn't need this, just passing strings that are to be evaluated seems like the worst case scenario for readability and maintainability.
- pizza - a year ago
  
  It keeps the relevant information local to the site of execution and fairly explicit. These operations can have quite a few inputs where you need to keep track of, e.g. the size of every axis of every input, that you would otherwise have to keep track of in your head by knowing what happened during the declaration, or using a lot of matmul/vector operations to do the same thing as one einsum. Couple that with the fact that lots of algorithms are making all kinds of intermediate computation outputs with all kinds of possibly varying shapes, einsum/einops helps a lot with verifying assumptions and letting you annotate what each axis means all throughout.