The Missing Nvidia GPU Glossary

modal.com

230 points by birdculture 3 months ago


jms55 - 3 months ago

The weird part of the programming model is that threadblocks don't map 1:1 to warps or SMs. A single threadblock executes on a single SM, but each SM has multiple warps, and the threadblock could be the size of a single warp, or larger than the combined thread count of all warps in the SM.

So, how large do you make your threadblocks to get optimal SM/warp scheduling? Well it "depends" based on resource usage, divergence, etc. Basically run it, profile, switch the threadblock size, profile again, etc. Repeat on every GPU/platform (if you're programming for multiple GPU platforms and not just CUDA, like games do). It's a huge pain, and very sensitive to code changes.

People new to GPU programming ask me "how big do I make the threadblock size?" and I tell them go with 64 or 128 to start, and then profile and adjust as needed.

Two articles on the AMD side of things:

https://gpuopen.com/learn/occupancy-explained

https://gpuopen.com/learn/optimizing-gpu-occupancy-resource-...

saagarjha - 3 months ago

It would be nice if this also included terms that are often used by Nvidia that apparently come from computer architecture (?) but are basically foreign to software engineers, like “scoreboard” or “math pipe”.

EarlKing - 3 months ago

FINALLY. Nvidia's always been pretty craptacular when it comes to their documentation. It's really hard to read unless you already know their internal names for, well, just about everything.

charles_irl - 3 months ago

Oh hey, I wrote this!

Thanks for sharing it.

krackers - 3 months ago

There's a wonderful correspondence between GPU and more conventional SIMD vector terms in the P&H comp arch book. Slide 13 of https://cse.buffalo.edu/~rsridhar/cse490-590/lec/Chapter04.p...

joshdavham - 3 months ago

Incredible work, thank you so much! This will hopefully break down more barriers to entry for newcomers wanting to work with GPUs!

3abiton - 3 months ago

Really great work, suggest for a next post: the VRAM requirements estimation calculation for running models locally. Especially with different architecture and different Quants, it gets always confusing and even online calculators give different answer. I never found a really good deep dive on this yet.

K0IN - 3 months ago

Is there a plain text / markdown / html version?

einpoklum - 3 months ago

This has been submitted, like, five times already over the past 5 weeks:

https://news.ycombinator.com/from?site=modal.com

JeremyMorgan - 3 months ago

This is incredible. I'm gonna spend some time here.

And I love the design/UI.

germanjoey - 3 months ago

This is really incredible, thank you!

richwater - 3 months ago

content is cool; usability and design of the website is awful (although charming)

weltensturm - 3 months ago

that's pretty

pythops - 3 months ago

Awesome <3

hkgjjgjfjfjfjf - 3 months ago

[dead]