TL;DR
Scientists have developed a scalable probabilistic computer with one million p-bits by connecting multiple FPGAs. This system performs Gibbs sampling at over a trillion flips per second, marking a significant advance in hardware-based sampling and optimization.
Researchers have built a programmable probabilistic computer with one million p-bits by connecting multiple FPGAs into a single system, surpassing the capacity limits of single-chip designs. This development enables high-speed Gibbs sampling at over a trillion flips per second, offering new possibilities for hardware-accelerated sampling and optimization tasks. The system exchanges only 1-bit boundary states during operation, demonstrating a scalable approach to probabilistic computing that could impact fields like spin glasses, Max-Cut, and Boolean satisfiability.
The new architecture involves networking FPGAs to form a large, distributed probabilistic computer capable of handling one million p-bits. Unlike previous systems confined to a single chip, this design maintains all coupling weights in local on-chip memory, significantly improving scalability. During operation, devices exchange only boundary state information, raising questions about how often this boundary data must be refreshed to ensure the distributed system behaves like a monolithic one.
Experimental results using three-dimensional Edwards-Anderson spin glasses show that the system’s performance depends on a single timing ratio, eta, which compares boundary-exchange frequency to local p-bit update frequency. When eta exceeds a topology-dependent threshold, the distributed machine matches the performance of a GPU-based reference system. Below this threshold, residual energy decreases more slowly, indicating a tradeoff between throughput and accuracy. A theoretical model supports these findings, suggesting this tradeoff is a universal property of partitioned stochastic dynamics.
Potential Impact of Large-Scale Probabilistic Hardware
This development represents a step toward scalable, hardware-based probabilistic computing, enabling complex sampling and optimization tasks at high speeds. The ability to network FPGAs into a single, programmable system with one million p-bits could facilitate research in fields like statistical physics, combinatorial optimization, and machine learning. It also provides a framework for scaling such systems beyond single-chip limitations, which could influence future hardware architectures for probabilistic algorithms.

Xilinx Artix-7 FPGA M.2 Development Board (A100T FPGA/512MB DDR)
- FPGA Model: Xilinx XC7A100T-L2FGG484E
- Memory Capacity: 512MB DDR3-800
- Configuration Flash: 256Mb
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Advances in Probabilistic Computing and FPGA Networks
Prior to this work, probabilistic computers built from p-bits were limited to single-chip configurations, constraining their capacity and performance. Recent research has explored using p-bits for hardware acceleration of sampling and optimization, but scalability remained a challenge. Networking multiple FPGAs into a cohesive system offers a potential pathway to overcome these limitations, with earlier studies indicating the importance of boundary state exchange timing. This new effort builds on these insights, demonstrating a practical implementation with one million p-bits and providing a framework for future large-scale probabilistic hardware.
“This is the first demonstration of a programmable probabilistic computer with such scale, opening new avenues for hardware-accelerated sampling.”
— an anonymous researcher

Graph Colouring and the Probabilistic Method by Michael Molloy (2001-12-06)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Remaining Questions on System Scalability and Performance
It is not yet clear how the system performs with larger numbers of p-bits or under different problem types. The optimal boundary exchange frequency for various topologies and problem complexities remains to be fully characterized. Additionally, the long-term stability and energy consumption of such large-scale networks are still under investigation, as are potential hardware implementation challenges for real-world deployment.

FPGA Development Board EBAZ4205 with SD Card and JTAG Header Ready
- Development Board for FPGA Projects: EBAZ4205 ZYNQ FPGA Development Board
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Next Steps for Scaling and Applying Probabilistic Hardware
Future research will likely focus on increasing the number of p-bits beyond one million, optimizing boundary exchange protocols, and testing the system on more complex problems. Development of dedicated hardware implementations and integration with existing computing architectures are also anticipated. The researchers aim to refine the theoretical models and explore practical applications in optimization, machine learning, and physical simulations, moving toward real-world deployment of large-scale probabilistic computers.

Practical FPGA Design: A Systematic Guide to Digital Logic, HDLs, and Hardware Architecture
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
How does networking FPGAs enable scaling beyond single-chip limits?
Connecting multiple FPGAs allows the system to handle a larger number of p-bits while maintaining high performance, as each FPGA manages a subset of p-bits and exchanges only boundary state information, enabling scalable, distributed probabilistic computing.
What is the significance of the boundary exchange timing ratio, eta?
Eta determines how frequently boundary states are exchanged between FPGAs relative to local p-bit updates. When eta exceeds a certain threshold, the distributed system performs as well as a monolithic one; below it, performance degrades, indicating a tradeoff between speed and accuracy.
What types of problems can this large-scale probabilistic computer solve?
The system has been demonstrated on spin glasses, Max-Cut, and Boolean satisfiability problems, suggesting it can accelerate sampling and optimization tasks across a range of complex computational challenges.
Are there any limitations or challenges remaining?
Yes, questions remain about performance at larger scales, hardware stability, energy efficiency, and how to best optimize boundary exchange protocols for different problem types and topologies.
When will this technology be available for practical use?
This research is currently in the experimental stage. Further development, testing, and hardware refinement are needed before practical deployment, which could take several years.
Source: Hacker News