TL;DR

A Linux kernel module now allows consumer Thunderbolt/USB4 ports to emulate InfiniBand devices, enabling fast RDMA communication for AI workloads on mini PCs. This breakthrough could democratize high-performance AI training and inference at home.

A Linux kernel module has been developed that enables ordinary USB4 and Thunderbolt ports on AMD mini PCs to emulate InfiniBand devices, achieving high-speed RDMA communication suitable for AI workloads at home. This breakthrough could significantly lower the barrier to high-performance AI computation outside data centers.

The project involves creating experimental RDMA-over-USB4 using a custom Linux kernel module. Tests on 128GB Strix Halo mini PCs demonstrated bidirectional data transfer speeds of approximately 95 Gb/s and one-way latency of about 7 microseconds. These speeds enable AI inference and training tasks that typically require enterprise-grade networking hardware. The setup allows two consumer mini PCs to split large AI inference workloads, such as tensor-parallel inference and Fully Sharded Data Parallel (FSDP) training, with performance comparable to traditional InfiniBand networks. The developer notes that this is experimental, with loads of AI-generated code and kernel modules that are not supported for production use, and that there are likely false assumptions and sharp edges.

Why It Matters

This development matters because it could democratize access to high-performance AI hardware by enabling consumer-grade hardware to communicate at speeds traditionally reserved for enterprise data centers. If further refined and stabilized, this approach could reduce costs and complexity for AI researchers and hobbyists aiming to run large models locally or at home, bypassing the need for expensive networking gear.

Mini eGPU Enclosure Compatible with Thunderbolt 3/4, USB4 40Gbps External GPU Dock Station, Compatible with NVIDIA/AMD PCIe, PD 85W Charging Support, Daisy Chain, DC/ATX/SFX Support

Mini eGPU Enclosure Compatible with Thunderbolt 3/4, USB4 40Gbps External GPU Dock Station, Compatible with NVIDIA/AMD PCIe, PD 85W Charging Support, Daisy Chain, DC/ATX/SFX Support

  • Compatibility Check: Ensure system supports USB4 or Thunderbolt 3/4
  • Not for Handheld Consoles: Not recommended for gaming handheld devices
  • Package Contents: Includes eGPU dock, TB3/USB4 cable, user manual

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

InfiniBand is a high-speed networking technology used in data centers for AI and HPC workloads, offering low latency and high bandwidth. Traditionally, achieving similar performance at home has required expensive, specialized hardware. Recent efforts have focused on soft-RoCE and other software-based solutions, which offer limited performance. If you’re interested in home security options, check out the best home security cameras for 2026. This project represents a significant step toward making high-speed RDMA accessible on consumer hardware, leveraging the ubiquity of Thunderbolt and USB4 ports on modern mini PCs. The developer has spent weeks building and testing kernel modules to emulate InfiniBand devices, with initial results showing promising performance metrics.

“This is experimental research code, most of it AI-generated, and it loads experimental kernel modules on machines I was willing to crash repeatedly.”

— the developer behind the project

“We built experimental RDMA-over-USB4 for 128GB Strix Halo mini PCs, enabling two consumer boxes to talk fast enough for tensor-parallel inference and FSDP workloads.”

— the developer again

GOWENIC Dual Port Server Network Card, 10Gbps High Speed RDMA Low Latency PCIe x8 Adapter with Hardware Acceleration for VMware Data Center

GOWENIC Dual Port Server Network Card, 10Gbps High Speed RDMA Low Latency PCIe x8 Adapter with Hardware Acceleration for VMware Data Center

  • High-Speed Data Transfer: Supports 10Gbps transmission rate
  • Hardware Acceleration: Reduces CPU load for efficiency
  • Low Latency RDMA: Enables fast, stable data transfer

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It is not yet clear how stable and scalable this solution will be for broader use. The project remains experimental, with potential issues around kernel stability, compatibility, and reproducibility. The developer emphasizes that no warranty or support is offered, and real-world deployment remains uncertain.

Amazon

InfiniBand emulation USB4

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

Further testing and development are needed to stabilize the kernel modules and optimize performance. Future steps may include refining the code for better stability, expanding compatibility to other hardware, and exploring integration with existing AI frameworks. Community feedback and collaboration could accelerate progress toward practical use.

GMKtec EVO-X2 AI Mini PC Ryzen Al Max+ 395 (up to 5.1GHz) Mini Gaming Computers, 128GB LPDDR5X 8000MHz (16GB*8) 2TB PCIe 4.0 SSD, Quad Screen 8K Display, WiFi 7 & USB4, SD Card Reader 4.0

GMKtec EVO-X2 AI Mini PC Ryzen Al Max+ 395 (up to 5.1GHz) Mini Gaming Computers, 128GB LPDDR5X 8000MHz (16GB*8) 2TB PCIe 4.0 SSD, Quad Screen 8K Display, WiFi 7 & USB4, SD Card Reader 4.0

  • Powerful Ryzen AI Max+ 395: Up to 5.1GHz with 32 threads
  • Advanced AI NPU with XDNA 2: 50+ peak AI TOPS performance
  • High-performance AMD Radeon RX 8060S: Up to 2.9GHz with 40 CUs

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Can this setup be used in production now?

No, this is experimental research code with no support or warranty. It is not suitable for production environments yet.

What hardware is required?

Two AMD mini PCs with USB4/Thunderbolt ports, such as the 128GB Strix Halo, are used in the current tests.

How does this performance compare to traditional enterprise networks?

The experimental setup achieves approximately 95 Gb/s bidirectional RDMA with latency around 7 microseconds, comparable to enterprise InfiniBand performance, significantly outperforming typical Ethernet or soft-RoCE solutions.

Is this approach applicable to other hardware or only specific mini PCs?

While currently tested on specific AMD mini PCs, the underlying concept could potentially be adapted to other hardware with compatible USB4/Thunderbolt ports, but further development is needed.

Source: Hacker News

You May Also Like

Wirestock raises $23M to supply creative multi-modal data to AI labs

Wirestock raises $23 million to provide AI research labs with diverse multi-modal data, including images, videos, and audio, to support AI training.

How Police Scanner Radios Receive Public Safety Traffic

Unlock the secrets of how police scanner radios receive public safety traffic and discover what makes these signals so vital for real-time updates.

AI in Cyber Defense: Machine Learning to Predict and Prevent Attacks

Theodore’s guide explores how AI and machine learning can predict and prevent cyber attacks, transforming security—discover how your defenses can stay ahead.

Hollywood Thriller “The Electric State” Delivers Shocking Finale With Jamie Foxx.

Hollywood thriller “The Electric State” delivers a shocking finale with Jamie Foxx, leaving audiences questioning the true cost of technology in a gripping narrative.