A highly scalable Restricted Boltzmann Machine FPGA implementation

13 years 9 months ago

Download www.stanford.edu

Restricted Boltzmann Machines (RBMs) — the building block for newly popular Deep Belief Networks (DBNs) — are a promising new tool for machine learning practitioners. However, future research in applications of DBNs is hampered by the considerable computation that training requires. In this paper, we describe a novel architecture and FPGA implementation that accelerates the training of general RBMs in a scalable manner, with the goal of producing a system that machine learning researchers can use to investigate ever-larger networks. Our design uses a highly efﬁcient, fully-pipelined architecture based on 16-bit arithmetic for performing RBM training on an FPGA. We show that only 16-bit arithmetic precision is necessary, and we consequently use embedded hardware multiply-and-add (MADD) units. We present performance results to show that a speedup of 25-30X can be achieved over an optimized software implementation on a high-end CPU.

Sang Kyun Kim, Lawrence C. McAfee, Peter L. McMaho

Real-time Traffic

16-bit Arithmetic | FPL 2009 | Hardware | Machine Learning | Machine Learning Practitioners |

claim paper

» A multiFPGA architecture for stochastic Restricted Boltzmann Machines

» A Scalable FPGAbased Multiprocessor

» Object Views Language Support for Intelligent Object Caching in Parallel and Distributed C...

Post Info
More Details (n/a)

Added	24 Jul 2010
Updated	24 Jul 2010
Type	Conference
Year	2009
Where	FPL
Authors	Sang Kyun Kim, Lawrence C. McAfee, Peter L. McMahon, Kunle Olukotun

Comments (0)

Sciweavers

A highly scalable Restricted Boltzmann Machine FPGA implementation

16-bit Arithmetic | FPL 2009 | Hardware | Machine Learning | Machine Learning Practitioners |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers