How I solved PyTorch's cross-platform nightmare

svana.name

73 points by msvana 5 days ago


lynndotpy - 2 days ago

> Setting up a Python project that relies on PyTorch, so that it works across different accelerators and operating systems, is a nightmare.

I would like to add some anecdata to this.

When I was a PhD student, I already had 12 years of using and administrating Linuxes as my personal OS, and I'd already had my share of package manager and dependency woes.

But managing Python, PyTorch, and CUDA dependencies were relatively new to me. Sometimes I'd lose an evening here or there to something silly. But I had one week especially dominated by these woes, to the point where I'd have dreams about package management problems at the terminal.

They were mundane dreams but I'd chalk them up as nightmares. The worst was having the pleasant dream where those problems went away forever, only to wake up to realize that was not the case.

di - 2 days ago

Note that https://peps.python.org/pep-0440/#direct-references says:

> Public index servers SHOULD NOT allow the use of direct references in uploaded distributions. Direct references are intended as a tool for software integrators rather than publishers.

This means that PyPI will not accept your project metadata as you currently have it configured. See https://github.com/pypi/warehouse/issues/7136 for more details.

mdaniel - 2 days ago

> Cross-Platform

  cpu = [
  "torch @ <https://download.pytorch.org/whl/cpu/torch-2.7.1%2Bcpu-cp312-cp312-manylinux_2_28_x86_64.whl> ; python_version == '3.12'",
  "torch @ <https://download.pytorch.org/whl/cpu/torch-2.7.1%2Bcpu-cp313-cp313-manylinux_2_28_x86_64.whl> ; python_version == '3.13'",
  ]
:-/ It reminds me of Microsoft calling their thing "cross platform" because it works on several copies of Windows

In all seriousness, I get the impression that pytorch is such a monster PITA to manage because it cares so much about the target hardware. It'd be like a blog post saying "I solved the assembly language nightmare"

kwon-young - 2 days ago

In my opinion, anything that touch compiled packages like pytorch should be packaged with conda/mamba on conda-forge. I found it is the only package manager for python which will reliably detect my hardware and install the correct version of every dependency.

cmdr2 - 2 days ago

https://pypi.org/p/torchruntime might help here, it's designed precisely for this purpose.

`pip install torchruntime`

`torchruntime install torch`

It figures out the correct torch to install on the user's PC, factoring in the OS (Win, Linux, Mac), the GPU vendor (NVIDIA, AMD, Intel) and the GPU model (especially for ROCm, whose configuration varies per generation and ROCm version).

And it tries to support quite a number of older GPUs as well, which are pinned to older versions of torch.

It's used by a few cross-platform torch-based consumer apps, running on quite a number of consumer installations.

arun-mani-j - 2 days ago

This is so nice, I wish more packages followed something like this. I'm on AMD integrated GPU (doesn't even support Rocm). Whenever I install a Python package that depends on PyTorch, it automatically installs some GBs of CUDA related packages.

This ends up wasting space and slowing down installation :(

Speaking of PyTorch and CUDA, I wish the Vulkan backend becomes stable, but that seems to super far dream...

https://docs.pytorch.org/executorch/stable/backends-vulkan.h...

zbowling - 2 days ago

Check out Pixi! Pixi is an alternative to the common conda and pypi frontends and has better system for hardware feature detection and get the best version of Torch for your hardware that is compatible across your packages (except for AMD at the moment). It can pull in the condaforge or pypi builds of pytorch and help you manage things automagically across platforms. https://pixi.sh/latest/python/pytorch/

It doesn't solve how you package your wheels specifically, that problem is still pushed on your downstream users because of boneheaded packaging decisions by PyTorch themselves but as the consumer, Pixi soften's blow. The condaforge builds of PyTorch also are a bit more sane.

ashvardanian - 2 days ago

Related, but wasn’t broadly discussed on HN: https://astral.sh/blog/wheel-variants

Simulacra - 2 days ago

Good writeup. PyTorch has generally been very good to me when I can mitigate its resource hogging at times. Production can be a little wonky but for everything else it works

tuna74 - 2 days ago

Is there a problem using distro packages for Pytorch? What are the downsides of using the official Fedora Pytorch for example?

antimora - 2 days ago

Check out https://github.com/tracel-ai/burn project! It makes deploying models across different platforms easy. It uses Rust instead of Python.

userabchn - 2 days ago

I maintain a package that provides some PyTorch operators that are written in C/C++/CUDA. I have tried various approaches over the years (including the ones endorsed by PyTorch), but the only solution I have found that seems to work flawlessly for everyone who uses it is to have no Python or PyTorch dependence in the compiled code, and to load the compiled libraries using ctypes. I use an old version of nvcc to compile the CUDA, use manylinux2014 for the Linux builds, and ask users to install PyTorch themselves before installing my package.