Abstract
Recently, AI and deep neural networks have found extensive applications in mobile devices, drones, carts, and more. To meet the demands of processing large-scale data and providing DNN inference services with minimal latency, there is a need. However, IoT devices, with their limited computing capabilities, are not well-suited for AI inference. Moreover, considering the diverse requirements of different services, it is necessary to provide inference services that address these variations. To address these challenges, many previous studies have explored collaborative approaches between edge servers and cloud servers by partitioning DNN models. However, these methods face difficulties in finding optimal partitioning points for splitting DNN models and are heavily influenced by network bandwidth since intermediate computation results need to be transmitted to other devices. In this paper, we propose the Adaptive block-based DNN network inference framework. This involves breaking down a large DNN model into block-level networks, training them using knowledge distillation techniques to enable inference only through each block network. Subsequently, dynamic block-level inference calculations are offloaded based on the computing capabilities of edge clusters, providing inference results. Even when using multiple devices, our method is not affected by network bandwidth since only input images need to be transmitted. Experimental results demonstrate that our approach consistently reduces inference latency as the number of devices increases. Additionally, by controlling the trade-off between latency and accuracy, we can provide inference services tailored to various latency requirements.
Original language | English |
---|---|
Title of host publication | ICIIT 2024 - Proceedings of the 2024 9th International Conference on Intelligent Information Technology |
Publisher | Association for Computing Machinery |
Pages | 505-511 |
Number of pages | 7 |
ISBN (Electronic) | 9798400716713 |
DOIs | |
Publication status | Published - 23 Feb 2024 |
Event | 2024 9th International Conference on Intelligent Information Technology, ICIIT 2024 - Ho Chi Minh, Viet Nam Duration: 23 Feb 2024 → 25 Feb 2024 |
Publication series
Name | ACM International Conference Proceeding Series |
---|
Conference
Conference | 2024 9th International Conference on Intelligent Information Technology, ICIIT 2024 |
---|---|
Country/Territory | Viet Nam |
City | Ho Chi Minh |
Period | 23/02/24 → 25/02/24 |
Bibliographical note
Publisher Copyright:© 2024 Copyright held by the owner/author(s).
Keywords
- accuracy
- Block-based DNN network
- DNN inference
- trade-off between latency