Visualizing GPT-OSS-20B embeddings

melonmars.github.io

77 points by melonmars 4 days ago


esafak - 8 hours ago

Without a way to tune it, this visualization is as much about the dimensionality reduction algorithm used as the embeddings themselves, because trade-offs are unavoidable when you go from a very high dimensional space to a 2D one. I would not read too much into it.

numpad0 - 9 hours ago

Is this handling Unicode correctly? Seems like a lot of even Latin alphabets are getting mangled.

voodooEntity - 7 hours ago

@Author i would recommend you to give

https://github.com/vasturiano/3d-force-graph

a try, for the text labels you can use

https://github.com/vasturiano/three-spritetext

its based on Three.js and creates great 3D graph visualisations GPU rendered (webgl). This could make it alot more interresting to watch because it could display actual depth (your gpu is gonne run hot but i guess worth it)

just a suggestion.

_def - 10 hours ago

I have the suspicion that this is how GPT-OSS-20B would generate a visualization of it's embeddings. Happy to learn otherwise.

graphviz - 10 hours ago

What do people learn from visualizations like this?

What is the most important problem anyone has solved this way?

Speaking as somewhat of a co-defendant.

ashvardanian - 9 hours ago

Any good comparisons of traditional embedding models against embeddings derived from autoregressive language models?

eddywebs - 9 hours ago

Cool ! Would it possible to generate visualizations of any given open weight model out there ?

lawlessone - 8 hours ago

what does it mean that some embeddings are close to others in this space?

That they're related or connected or it arbitrary?

Why does it look like a fried egg?

edit: must be related in some way as one of the "droplets" in the bottom left quadrant seems to consist of various versions of the word "parameter"

kingstnap - 10 hours ago

It's an interesting looking plot I suppose.

My guess is its the 2 largest principle components of the embedding.

But none of the points are labelled? There isn't a writeup on the page or anything?