Efficient Distributed Parallel Inference Strategies via Block-based DNN Structure in Edge-to-IoT Continuum

Inhun Choi, Sharmen Akhter, Hong Ju Jeong, Eui Nam Huh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recently, AI and deep neural networks have found extensive applications in mobile devices, drones, carts, and more. To meet the demands of processing large-scale data and providing DNN inference services with minimal latency, there is a need. However, IoT devices, with their limited computing capabilities, are not well-suited for AI inference. Moreover, considering the diverse requirements of different services, it is necessary to provide inference services that address these variations. To address these challenges, many previous studies have explored collaborative approaches between edge servers and cloud servers by partitioning DNN models. However, these methods face difficulties in finding optimal partitioning points for splitting DNN models and are heavily influenced by network bandwidth since intermediate computation results need to be transmitted to other devices. In this paper, we propose the Adaptive block-based DNN network inference framework. This involves breaking down a large DNN model into block-level networks, training them using knowledge distillation techniques to enable inference only through each block network. Subsequently, dynamic block-level inference calculations are offloaded based on the computing capabilities of edge clusters, providing inference results. Even when using multiple devices, our method is not affected by network bandwidth since only input images need to be transmitted. Experimental results demonstrate that our approach consistently reduces inference latency as the number of devices increases. Additionally, by controlling the trade-off between latency and accuracy, we can provide inference services tailored to various latency requirements.

Original languageEnglish
Title of host publicationICIIT 2024 - Proceedings of the 2024 9th International Conference on Intelligent Information Technology
PublisherAssociation for Computing Machinery
Pages505-511
Number of pages7
ISBN (Electronic)9798400716713
DOIs
Publication statusPublished - 23 Feb 2024
Event2024 9th International Conference on Intelligent Information Technology, ICIIT 2024 - Ho Chi Minh, Viet Nam
Duration: 23 Feb 202425 Feb 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2024 9th International Conference on Intelligent Information Technology, ICIIT 2024
Country/TerritoryViet Nam
CityHo Chi Minh
Period23/02/2425/02/24

Bibliographical note

Publisher Copyright:
© 2024 Copyright held by the owner/author(s).

Keywords

  • accuracy
  • Block-based DNN network
  • DNN inference
  • trade-off between latency

Fingerprint

Dive into the research topics of 'Efficient Distributed Parallel Inference Strategies via Block-based DNN Structure in Edge-to-IoT Continuum'. Together they form a unique fingerprint.

Cite this