100 CPU threads & 240GB RAM to make @risc_v #AI @amd #ROCm and #t2linux https://www.twitch.tv/videos/2421181919
100 CPU threads & 240GB RAM to make @risc_v #AI @amd #ROCm and #t2linux https://www.twitch.tv/videos/2421181919
ffs, why does their docker only support Navi 31 and not Navi 32?
https://hub.docker.com/r/rocm/pytorch
I just wish both #Nvidia and #AMD would stop with that whole licensing bullshit around #CUDA and #ROCm and just include that damn stuff in the default driver.
I just want to run #Codestral on my local machine so I can use it with non-public code. Will be troublesome enough to cram it into 16gb VRAM.
#computer #Linux #AI
we did it! #amd #ROCm #HPE compute stack and #AI acceleration now (mostly) available in https://t2linux.com for #riscv https://www.twitch.tv/videos/2416606393 #t2sde #t2linux
Last night I was up until 2AM trying to get #trunas #amd drivers installed inside of a #docker #container so that #ollama would actually use the #gpu. I was so close. It sees the gpu, it sees it has 16GB of ram, then it uses the #cpu.
Trunas locks down the file system at the root level, so if you want to do much of anything, you have to do it inside of a container. So I made a container for the #rocm drivers, which btw comes to like 40GB in size.
It's detecting, but I don't know if the ollama container has some missing commands, ie rocm
or rocm-info
, that it may need.
Another alternative is one I don't really want, and that's to install either #debian or windows as a VM - windows because I did a test on the application that runs locally in windows on this machine before and it was super fast. It isn't ideal from RAM usage, but I may be able to run the models more easily with the #windows drivers than the #linux ones.
But anyway, last night was too much of #onemoreturn for a weeknight.
The B-17 Bomber was amazing and helped win WWII. I flew on one in 2002 as a tourist - I have family members that were ball turret gunners - bad place to be.
This video was shot on Hi-8, and thankfully I digitized it (at 720x480) way back in that day. Now, I've up-scaled it with local AI (1408x954) and the improvement is astounding.
Sadly, this actual B17 crashed in 2019: https://en.wikipedia.org/wiki/2019_Boeing_B-17_Flying_Fortress_crash
Aiter: AI Tensor Engine for ROCm
https://rocm.blogs.amd.com/software-tools-optimization/aiter:-ai-tensor-engine-for-rocm™/README.html
Even now, Thrust as a dependency is one of the main reason why we have a #CUDA backend, a #HIP / #ROCm backend and a pure #CPU backend in #GPUSPH, but not a #SYCL or #OneAPI backend (which would allow us to extend hardware support to #Intel GPUs). <https://doi.org/10.1002/cpe.8313>
This is also one of the reason why we implemented our own #BLAS routines when we introduced the semi-implicit integrator. A side-effect of this choice is that it allowed us to develop the improved #BiCGSTAB that I've had the opportunity to mention before <https://doi.org/10.1016/j.jcp.2022.111413>. Sometimes I do wonder if it would be appropriate to “excorporate” it into its own library for general use, since it's something that would benefit others. OTOH, this one was developed specifically for GPUSPH and it's tightly integrated with the rest of it (including its support for multi-GPU), and refactoring to turn it into a library like cuBLAS is
a. too much effort
b. probably not worth it.
Again, following @eniko's original thread, it's really not that hard to roll your own, and probably less time consuming than trying to wrangle your way through an API that may or may not fit your needs.
6/
CRIUgpu: Transparent Checkpointing of GPU-Accelerated Workloads
AI rabbit hole ... I've been playing with Ollama and some stability diffusion tools on my MacBook Pro M2 Max and my Linux desktop ... the desktop is way faster and only has an RX6800 in it, so of course I'm now thinking about an Rx7900XTX ... (I don't do Nvidia cards) ...
Anyone have experience with this upgrade? Is going from 16gb of VRAM to 24gb going to make a massive difference?
Using radeontop I can see it's using all 16gb at some points, but not consistently ... and I'm not sure if that's an issue or a feature. I believe #rocm still has some issues.
在Ryzen 7 #8845HS w/ Radeon #780M 用 #ComfyUI 生圖( #Linux )
https://blog.pastwind.org/2025/02/ryzen-7-8845hs-w-radeon-780mcomfyuilinux.html
試了很久才發現成功的方程式…這是因為每次安裝 #ROCm 都需要下載安裝超過30GB的檔案!!!
tl;dr 直接說結論
OS: Ubuntu 22.04(因為ROCm 6.1只支援此以下的版本)
ROCm: <= 6.1.2,6.2跟6.3都沒辦法正常運行
PyTorch: <= 2.4.1,2.5.1版會顯示不支援硬體的警告,圖片有時候無法正確產生。
UserWarning: Attempting to use hipBLASLt on an unsupported architecture! Overriding blas backend to hipblas2.6以上則完全無法正常運行。使用PyTorch官網的版本而不是AMD提供的。
#sydbox 3.32.0 is released! We now officially support #GPU access for #ROCm and #nVIDIA! See the release mail here: https://is.gd/kN1rUt and here is a profile auto-generated by #pandora for #hashcat accessing an #nVIDIA #GPU using #cuda libraries: https://dpaste.com/6DQ97T2DM #exherbo #linux #security
I also uploaded the slides for my talk in the #hpc devroom at #fosdem about the #programming models in #ROCm.
Video is reviewed too and waiting to be released to the public. You can get the slides and the video (once released) at https://fosdem.org/2025/schedule/event/fosdem-2025-5143-programming-models-with-the-rocm-compiler/
#HPC devroom starting early with great content and good participation!
I’ll be talking about #ROCm at 11.35 — come and say hello or follow the livestream at https://fosdem.org/2025/schedule/event/fosdem-2025-5143-programming-models-with-the-rocm-compiler/