A Deep Learning Accelerator Based on a Streaming Architecture for Binary Neural Networks

Quang Hieu Vo, Ngoc Linh Le, Faaiz Asim, Lok Won Kim, Choong Seon Hong

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


Deep neural networks (DNNs) have played an increasingly important role in various areas such as computer vision and voice recognition. While training and validation become gradually feasible with high-end general-purpose processors such as graphical processor units (GPU), high throughput inferences in embedded hardware platforms with low hardware resources and power consumption efficiency are still challenging. Binarized neural networks (BNNs) are emerging as a promising method to overcome these challenges by reducing bit widths of DNN data representations with many optimal prior solutions. However, accuracy degradation is a considerable problem of the BNN, compared to the same architecture with full precision, while the binary neural networks still contain significant redundancy for optimization. In this paper, to address the limitations, we implement a streaming accelerator architecture with three optimization techniques: pipelining-unrolling for streaming each layer, weight reuse for parallel computation, and MAC (multiplication-accumulation) compression. Our method first constructs streaming architecture by pipelining-unrolling method to maximize throughput. Next, the weight reuse method with the K-mean cluster is applied to reduce the complexity of the popcount operation. Finally, MAC compression reduces hardware resources used for remaining computation on MAC operations. The implemented hardware accelerator integrated into a state-of-the-art field programable gate array (FPGA) provides the maximum performance of the classification at 1531k frames per second with 98.4% accuracy for the MNIST dataset and 205K frame per second with 80.2% accuracy for the Cifar-10 dataset. Besides, the proposed design's ratio FPS/LUTs is approximately 57 (MNIST) and 0.707 (Cifar-10), which is much lower than the state-of-the-art design with a comparable throughput and inference accuracy.

Original languageEnglish
Pages (from-to)21141-21159
Number of pages19
JournalIEEE Access
Publication statusPublished - 2022


  • Binary neural networks
  • FPGAs
  • deep learning accelerators


Dive into the research topics of 'A Deep Learning Accelerator Based on a Streaming Architecture for Binary Neural Networks'. Together they form a unique fingerprint.

Cite this