Matrix Core Programming on AMD CDNA Architecture

rocm.blogs.amd.com

60 points by salykova 8 days ago


phkahler - 3 days ago

So from CDNA3 to 4 they doubled fp16 and fp8 performance but cut fp32 and fp64 by half?

Wonder why the regression on non-AI workloads?

saagarjha - 3 days ago

If AMD were serious they would show a fully-worked out GEMM, not just "here is our theoretical performance, this is the instruction to use".