FPGA-Based Accelerator for Quantized CNNs: High-Throughput Edge Deployment with Optimized Resource Utilization

Skip to content

SEARCH

Current Issue

Archive

Current Issue

Archive

About the Journal Introducing the KFUPM Journal of Undergraduate Research International Journal Submission Scope Editorial Office Editorial Board Open Access

Instructions for Authors Guide for Authors Authorship Policy Peer Review Process Research Ethics Policy AI Guidelines Open Access Plagiarism Policy

ORIGINAL ARTICLE

Figure from article: FPGA-Based Accelerator for...

FPGA-Based Accelerator for Quantized CNNs: High-Throughput Edge Deployment with Optimized Resource Utilization

Zeyad Emad Abdel-Mawjoud ¹

,

Ahmed Sayed Abd-Rabou Mohammed ¹

1

Nanotechnology and Nanoelectronics Engineering Program, Zewail City of Science and Technology, Egypt

Submission date: 2025-06-04

Final revision date: 2025-07-30

Acceptance date: 2025-08-20

Publication date: 2025-08-31

Journal of Undergraduate Research International 2025;1(1):45-53

DOI: https://doi.org/10.64589/juri/209734

References (18)

KEYWORDS

Deep Learning Inference

GenAI Acceleration

Resource-Constrained Deployment

Hardware–Software Co-Design

TOPICS

ABSTRACT

This paper presents MaxNet, a high-throughput FPGA-based accelerator for quantized convolutional neural networks (CNNs), designed to meet the demand for efficient edge AI deployment on low-cost hardware. By targeting the Intel MAX 10 FPGA (10M08DAF484C8GES), MaxNet stands out from prior work focused on high-end platforms, offering a tailored solution for resource-constrained applications like IoT and embedded vision. The optimized two-layer CNN with 8-bit quantization (Q0.8 inputs/activations, Q1.7 weights) achieves 77% accuracy on CIFAR-10, with a throughput of 8,065 frames per second (0.124 ms/image) and power consumption of 1.2 W, using 861 LUTs (11%) and 9 M9K blocks (2.9%), as synthesized in Quartus Prime Standard Edition 22.1. Comprehensive power measurements, derived from Quartus Power Analyzer, psutil, and nvidia-smi, demonstrate superior efficiency compared to CPU (Intel Core i5-10400, 578 fps, 15 W) and GPU (NVIDIA GTX 1650, 730 fps, 50 W) baselines. Ablation studies validate the two-layer design and quantization choices, while sensitivity analysis optimizes clock frequency and numerical formats. Standardized tables and detailed figures clarify resource utilization and per-class accuracy, reinforcing MaxNet’s suitability for low-power, high-performance edge AI on cost-effective FPGAs.

ACKNOWLEDGEMENTS

The authors gratefully acknowledge the support of Zewail City of Science and Technology for providing the research facilities and resources required for this study. We would also like to thank Maria Mansour, Teaching Assistant, Nanotechnology and Nanoelectronics Engineering Program, Zewail City of Science and Technology, Egypt, for her valuable assistance and guidance during the research.

REFERENCES (18)

1.

Y. Hou, W. Liu, J. Wang, and B. C. Zhang, “LeNet-5 improvement based on FPGA acceleration,” J. Eng., vol. 2020, no. 13, pp. 526–528, 2020. [Online]. Available: https://doi.org/10.1049/joe.20....

2.

M. Cho and Y. Kim, “Implementation of data-optimized FPGAbased accelerator for convolutional neural network,” in Proc. Int. Conf. Electron. Inf. Commun., 2020, pp. 1–4. [Online]. Available: https://doi. org/10.1109/ICEIC49074.2020.9050993.

3.

C. Wang, D. Li, L. Zhang, and J. Han, “LUTNet: Rethinking inference in FPGA-based neural network accelerators,” arXiv preprint arXiv:1811.12345, 2019. [Online]. Available: https://arxiv.org/abs/ 1811.12345.

4.

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4510–4520. [Online]. Available: https://doi.org/10.1109/CVPR.2....

5.

F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size,” arXiv preprint arXiv:1602.07360, 2016. [Online]. Available: https://arxiv.org/abs/1602.073....

6.

Y. Umuroglu et al., “FINN: A framework for fast, scalable binarized neural network inference,” in Proc. ACM/SIGDA FPGA, 2017, pp. 65–74. [Online]. Available: https://doi.org/10.1145/302007.... 3021744.

7.

A. N. Mazumder et al., “TinyM2Net-V2: A compact low-power software–hardware architecture,” ACM Trans. Embedded Comput. Syst., vol. 21, no. 1, pp. 1–23, 2022. [Online]. Available: https://doi.org/10. 1145/3470139.

8.

H. Sharma et al., “DNNWeaver: From high-level deep network models to FPGA acceleration,” in Proc. ICCAD, 2016, pp. 1–8. [Online]. Available: https://doi.org/10.1145/296698....

9.

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998. [Online]. Available: https://doi.org/10. 1109/5.726791.

10.

R. T. Syed, M. Andjelkovic, M. Ulbricht, and M. Krstic, “Towards reconfigurable CNN accelerator for FPGA implementation,” IEEE Trans. Circuits Syst. II Express Briefs, vol. 70, no. 3, pp. 1082– 1086, 2023. [Online]. Available: https://doi.org/10.1109/TCSII..... 3220716.

11.

K. Khalil, A. Kumar, and M. Bayoumi, “Low-power convolutional neural network accelerator on FPGA,” in Proc. IEEE Int. Conf. Artif. Intell. Circuits Syst. (AICAS), 2023, pp. 1–5. [Online]. Available: https://doi.org/10.1109/AICAS5....

12.

M. Wang, X. Wu, J. Lin, and Z. Wang, “An FPGA-based accelerator enabling efficient support for CNNs with arbitrary kernel sizes,” arXiv preprint arXiv:2402.14307, 2024. [Online]. Available: https:// arxiv.org/abs/2402.14307.

13.

J. He, M. Zhang, J. Xu, L. Yu, and W. Li, “Optimizing CNN hardware acceleration with configurable vector units and feature layout strategies,” Electronics, vol. 13, no. 6, p. 1050, 2024. [Online]. Available: https://doi.org/10.3390/electr....

14.

C.-C. Chung, Y.-P. Liang, and H.-J. Jiang, “CNN hardware accelerator for real-time bearing fault diagnosis,” Sensors, vol. 23, no. 13, p. 5897, 2023. [Online]. Available: https://doi.org/10.3390/ s23135897.

15.

Z. Wang, H. Li, X. Yue, and L. Meng, “Brief analysis about CNN accelerator based on FPGA,” Procedia Comput. Sci., vol. 202, pp. 272– 277, 202-2. [Online]. Available: https://doi.org/10.1016/j.proc.... 04.036.

16.

Krizhevsky, A.,&Hinton, G. Learning multiple layers of features from tiny images. Technical Report, University of Toronto, 2009.

17.

Chollet, F. Keras. 2015. [Online]. Available: https://keras.io.

18.

Abadi, M., et al. TensorFlow: Large-scale machine learning on heterogeneous systems. 2016. [Online]. Available: https://www. tensorflow.org.

Submit your paper

Instructions for Authors

Share

RELATED ARTICLE

Hybrid Fixed-Point Control Architecture for Quadrotor Stabilization Using FOPI/FOPID on FPGA

Indexes

© 2006-2026 Journal hosting platform by Bentus

Scroll to top