ORIGINAL ARTICLE
FPGA-Based Accelerator for Quantized CNNs: High-Throughput Edge Deployment with Optimized Resource Utilization
More details
Hide details
1
Nanotechnology and Nanoelectronics Engineering Program, Zewail City of Science and Technology, Egypt
Submission date: 2025-06-04
Final revision date: 2025-07-30
Acceptance date: 2025-08-20
Publication date: 2025-08-31
Journal of Undergraduate Research International 2025;1(1):45-53
KEYWORDS
TOPICS
ABSTRACT
This paper presents MaxNet, a high-throughput FPGA-based accelerator for quantized convolutional neural networks (CNNs), designed to meet the demand for efficient edge AI deployment on low-cost hardware. By targeting the Intel MAX 10 FPGA (10M08DAF484C8GES), MaxNet stands out from prior work focused on high-end platforms, offering a tailored solution for resource-constrained applications like IoT and embedded vision. The optimized two-layer CNN with 8-bit quantization (Q0.8 inputs/activations, Q1.7 weights) achieves 77% accuracy on CIFAR-10, with a throughput of 8,065 frames per second (0.124 ms/image) and power consumption of 1.2 W, using 861 LUTs (11%) and 9 M9K blocks (2.9%), as synthesized in Quartus Prime Standard Edition 22.1. Comprehensive power measurements, derived from Quartus Power Analyzer, psutil, and nvidia-smi, demonstrate superior efficiency compared to CPU (Intel Core i5-10400, 578 fps, 15 W) and GPU (NVIDIA GTX 1650, 730 fps, 50 W) baselines. Ablation studies validate the two-layer design and quantization choices, while sensitivity analysis optimizes clock frequency and numerical formats. Standardized tables and detailed figures clarify resource utilization and per-class accuracy, reinforcing MaxNet’s suitability for low-power, high-performance edge AI on cost-effective FPGAs.
ACKNOWLEDGEMENTS
The authors gratefully acknowledge the support of Zewail City of Science and Technology for providing the research facilities and resources required for this study. We would also like to thank Maria Mansour, Teaching Assistant, Nanotechnology and Nanoelectronics Engineering Program, Zewail City of Science and Technology, Egypt, for her valuable assistance and guidance during the research.
REFERENCES (18)
1.
Y. Hou, W. Liu, J. Wang, and B. C. Zhang, “LeNet-5 improvement based on FPGA acceleration,” J. Eng., vol. 2020, no. 13, pp. 526–528, 2020. [Online]. Available:
https://doi.org/10.1049/joe.20....
2.
M. Cho and Y. Kim, “Implementation of data-optimized FPGAbased accelerator for convolutional neural network,” in Proc. Int. Conf. Electron. Inf. Commun., 2020, pp. 1–4. [Online]. Available:
https://doi. org/10.1109/ICEIC49074.2020.9050993.
3.
C. Wang, D. Li, L. Zhang, and J. Han, “LUTNet: Rethinking inference in FPGA-based neural network accelerators,” arXiv preprint arXiv:1811.12345, 2019. [Online]. Available:
https://arxiv.org/abs/ 1811.12345.
4.
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4510–4520. [Online]. Available:
https://doi.org/10.1109/CVPR.2....
5.
F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size,” arXiv preprint arXiv:1602.07360, 2016. [Online]. Available:
https://arxiv.org/abs/1602.073....
6.
Y. Umuroglu et al., “FINN: A framework for fast, scalable binarized neural network inference,” in Proc. ACM/SIGDA FPGA, 2017, pp. 65–74. [Online]. Available:
https://doi.org/10.1145/302007.... 3021744.
7.
A. N. Mazumder et al., “TinyM2Net-V2: A compact low-power software–hardware architecture,” ACM Trans. Embedded Comput. Syst., vol. 21, no. 1, pp. 1–23, 2022. [Online]. Available:
https://doi.org/10. 1145/3470139.
8.
H. Sharma et al., “DNNWeaver: From high-level deep network models to FPGA acceleration,” in Proc. ICCAD, 2016, pp. 1–8. [Online]. Available:
https://doi.org/10.1145/296698....
9.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998. [Online]. Available:
https://doi.org/10. 1109/5.726791.
10.
R. T. Syed, M. Andjelkovic, M. Ulbricht, and M. Krstic, “Towards reconfigurable CNN accelerator for FPGA implementation,” IEEE Trans. Circuits Syst. II Express Briefs, vol. 70, no. 3, pp. 1082– 1086, 2023. [Online]. Available:
https://doi.org/10.1109/TCSII..... 3220716.
11.
K. Khalil, A. Kumar, and M. Bayoumi, “Low-power convolutional neural network accelerator on FPGA,” in Proc. IEEE Int. Conf. Artif. Intell. Circuits Syst. (AICAS), 2023, pp. 1–5. [Online]. Available:
https://doi.org/10.1109/AICAS5....
12.
M. Wang, X. Wu, J. Lin, and Z. Wang, “An FPGA-based accelerator enabling efficient support for CNNs with arbitrary kernel sizes,” arXiv preprint arXiv:2402.14307, 2024. [Online]. Available: https:// arxiv.org/abs/2402.14307.
13.
J. He, M. Zhang, J. Xu, L. Yu, and W. Li, “Optimizing CNN hardware acceleration with configurable vector units and feature layout strategies,” Electronics, vol. 13, no. 6, p. 1050, 2024. [Online]. Available:
https://doi.org/10.3390/electr....
14.
C.-C. Chung, Y.-P. Liang, and H.-J. Jiang, “CNN hardware accelerator for real-time bearing fault diagnosis,” Sensors, vol. 23, no. 13, p. 5897, 2023. [Online]. Available:
https://doi.org/10.3390/ s23135897.
15.
Z. Wang, H. Li, X. Yue, and L. Meng, “Brief analysis about CNN accelerator based on FPGA,” Procedia Comput. Sci., vol. 202, pp. 272– 277, 202-2. [Online]. Available:
https://doi.org/10.1016/j.proc.... 04.036.
16.
Krizhevsky, A.,&Hinton, G. Learning multiple layers of features from tiny images. Technical Report, University of Toronto, 2009.
18.
Abadi, M., et al. TensorFlow: Large-scale machine learning on heterogeneous systems. 2016. [Online]. Available:
https://www. tensorflow.org.