Cohere Launches Embed 4

cohere.com

96 points by rekovacs 4 days ago


simonw - 4 days ago

I have huge respect for Cohere and this embedding model looks like it could be best-in-class, but I find it hard to commit to a proprietary embedding model that's only available via an API when there are such good open weight models available.

I really like the approach Nomic take: their most recent models are available via their API or as open weights for non-commercial use only (unless you buy a license). They later relicense their older models under Apache 2.0 licenses.

This gives me confidence that I can continue to use my calculated vectors in the future even if Nomic's model is no longer available because I can run the local one instead.

Nomic Embed Vision 1.5 for example started out as CC-BY-NC-4.0 but was later relicensed to Apache 2.0: https://www.nomic.ai/blog/posts/nomic-embed-vision

xfalcox - 4 days ago

No downloadable open weights ?

Looks like I'll stay on [bge-m3](https://huggingface.co/BAAI/bge-m3)

lukebuehler - 4 days ago

I just started to look into multi-modal embedding models recently, and I was surprised how few options there are.

For example, Google's model only supports 30 text tokens [1]!!

This is definitely a welcome addition.

Any pointers to similarly powerful embedding models? I'm looking specifically for text and images? I wish there'd be also one that could do audio and video, but I don't think that exists.

[1] https://cloud.google.com/vertex-ai/generative-ai/docs/embedd...

neom - 4 days ago

Curious for those in the industry, is there room for Cohere? Apparently they are doing very well in the enterprise, however recently I found myself wondering what their long term value prop is.

podgietaru - 4 days ago

I built a little RSS Reader / Aggregator that uses Cohere in order to do some arbitrary classification into different topics. I found it incredibly cheap to work with, and pretty good overall at classifying even with very limited inputs.

I also built this into a version of an OpenSource read it later app.

You can check it out here: https://github.com/aws-samples/rss-aggregator-using-cohere-e...

moojacob - 4 days ago

Seems to under-perform voyage-3-large on the same benchmark. At the same time, I'm unsure how useful benchmarks are for embeddings.

pencildiver - 4 days ago

I'm a huge fan of Cohere. We were highlighted in the launch post and use their V3 text embeddings in production: https://www.searchagora.com/

We're switching to the V4 to store unified embeddings of our products. From the early tests we ran, this should help with edge case relevancy (i.e. when a product's image and text mismatch, thus creating a greater need for multi-modal embeddings) and improve our search speed by ~100ms.

cahaya - 4 days ago

Wondering how this compares to the Gemini (preview) embeddings as they seem to perform significantly better than OpenAI embeddings 3 large. I don't see any MTEB scores so hard to compare.

BrandiATMuhkuh - 4 days ago

This is really great. I'll use it asap. I'm working with enterprise clients in the AEC space. Having a model that actually understands documents with messy data (drawings, floor plans, books, norms, ...) will be great.

The current situation of chunking and transforming is such a messy situation.

tiffanyh - 4 days ago

Can someone help me understand what Cohere does.

Do they just host open source models - so you can get them up and going faster?

If so, what’s their moat?

What prevents AWS from doing the same thing?

moralestapia - 4 days ago

A bit expensive but the benchmarks look quite good!

distantsounds - 4 days ago

so which stolen properties were used to train this model?