Why is Japan still investing in custom floating point accelerators?

nextplatform.com

239 points by rbanffy 8 days ago


monocasa - 5 days ago

Adding to what everyone else has said, Japan is known to be a threshold nuclear state (from a weapons perspective). They explicitly stay around just weeks away from being able to perform a nuclear weapons test, and they are commonly referred to being "a screwdriver's turn" away from having a nuclear weapon.

They have massive government investment in not only maintaining that status, but also doing so on a completely domestic supply chain as much as possible.

Therefore they have the same need for supercomputers that the US national labs do (perhaps more so, since they're even more reliant on simulation), and heavily prefer locally sourced pieces of that critical infrastructure.

I wouldn't be surprised if an incredibly large part of the local push for Rapidus is to pull them off of TSMC and the supply chain risk for their nuclear program in case the whole China/Taiwan thing comes to a head.

Aissen - 5 days ago

Because the LLM craze has rendered last-gen Tensor accelerators from NVIDIA (& others) useless for all those FP64 HPC workloads. From the article:

> The Hopper H200 is 47.9 gigaflops per watt at FP64 (33.5 teraflops divided by 700 watts), and the Blackwell B200 is rated at 33.3 gigaflops per watt (40 teraflops divided by 1,200 watts). The Blackwell B300 has FP64 severely deprecated at 1.25 teraflops and burns 1,400 watts, which is 0.89 gigaflops per watt. (The B300 is really aimed at low precision AI inference.)

pclmulqdq - 5 days ago

Pezy and the other Japanese native chips are first and foremost about HPC. The world may have picked up AI in the last 2 years, but the Japanese chipmakers are still thinking primarily about HPC, with AI as just one HPC workload.

These Pezy chips are also made for large clusters. There is a whole system design around the chips that wasn't presented here. The Pezy-SC2, for instance, was built around liquid immersion cooling. I am not sure you could ever buy an air-cooled version.

numpad0 - 6 days ago

It's unfortunate that they don't sell them on open markets. There are few of these accelerators that could threaten NVIDIA monopoly if prices(and manufacturing costs!) were right.

kragen - 6 days ago

Fascinating. https://en.m.wikipedia.org/wiki/Single_program,_multiple_dat... explains the relation to SIMT.

thiago_fm - 6 days ago

Great article documenting PEZY. It's incredible how close they are from NVidia despite being a very small team.

To me, this looks like a win.

Governments are there to finance projects like this that enable the country to have certain skillsets that wouldn't exist otherwise because of other countries having better solutions in the global market.

ghaff - 5 days ago

It may also be worth noting that Japan has a pretty long history of marching to their own drummer in computing. They either created their own architectures or adopted others after pretty much everyone had moved on.

johnklos - 5 days ago

When you're building your own CPUs, why be beholden to US companies for GPUs? This makes perfect sense.

GPUs are great if your workload can use them, but not so great for more general tasks. These are more appropriate to more traditional supercomputing tasks, as in they're not optimized for lower precision AI stuff, like NVIDIA GPUs are.

markstock - 5 days ago

Something doesn't add up here. The listed peak fp64 performance assumes one fp64 operation per clock per thread, yet there's very little description of how each PE performs 8 flops per cycle, only "threads are paired up such that one can take over processing when another one stalls...", classic latency-hiding. So the performance figures must assume that each PE has either an 8-wide SIMD unit (and 16-wide for fp32) or 8 separately schedulable execution units, neither of which seem likely given the supposed simplicity of the core (or 4 FMA EUs). Am I missing something?

andrepd - 5 days ago

I wonder how much progress (if any) is being done on floating point formats other than IEEE floats; on serious adoption in hardware in particular. Stuff like posits [1] for instance look very promising.

[1] https://posithub.org/docs/posit_standard-2.pdf

nxobject - 5 days ago

Interesting that they’re investing in standard _AI_ toolchains, rather than standard HPC toolchains, even though I imagine Japanese supercomputing has more demand for the latter.

sylware - 5 days ago

Last time I heard about that it was for "super computers": nearly or even faster than the alternatives with a massive energy consumption advantage.

retube - 5 days ago

What is an "accelerator" in this context?

Avlin67 - 5 days ago

you can get 8TFlops of fp64 on xeon 6980P which is 6K€ now

curtisszmania - 5 days ago

[dead]