Enhancing Cloud Service Failure Prediction Via Temporal Relationship Distillation

Sharmen Akhter, Md Imtiaz Hossain, Md Delowar Hossain, Nosin Ibna Mahbub, Eui Nam Huh

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Deep learning-based task failure prediction methods have achieved significant performance gains in ensuring high service reliability and availability at the core cloud by predicting task failure probability. With the increasing complexity of the deep-learning-based task failure prediction models, the number of service requests, and software and hardware failures, the ability to quickly detect and predict failures becomes essential for minimizing downtime and optimizing resource allocation. However, fast and lightweight deep learning-based task failure prediction models perform poorly in predicting failure occurrence probability compared to highly parameterized deep learning models. These failed tasks demand large amounts of resources, time, and cost to be recovered. Predicting task failure probabilities quickly before their occurrence with high accuracy may increase reliability and reduce wastage. Inspired by the aforementioned issue, we propose a knowledge distillation-based learning approach that empowers lightweight architectures to achieve improved performances during inference. The proposed approach leverages the representational capacity of the high-parameterized best-performing deep learning-based task failure prediction models considered by a teacher to the lightweight student. To extract the temporal pattern, we focus on a Bidirectional Long Short Term Memory (Bi-LSTM) based distillation approach where the teacher is the large and heavy architecture that transfers the predictive distribution as the regularization term to the tiny Bi-LSTM student network by analyzing past system message logs and identifying the relationship between the data and the failures. The proposed method achieves a state-of-the-art complexity-vs-accuracy trade-off in task failure prediction at the core cloud. Extensive experiments and ablation study depicts the superiority of our technique on large-scale standard benchmark dataset.

Original languageEnglish
Title of host publicationProceedings - 2023 International Conference on Computational Science and Computational Intelligence, CSCI 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1061-1067
Number of pages7
ISBN (Electronic)9798350361513
DOIs
Publication statusPublished - 2023
Event2023 International Conference on Computational Science and Computational Intelligence, CSCI 2023 - Las Vegas, United States
Duration: 13 Dec 202315 Dec 2023

Publication series

NameProceedings - 2023 International Conference on Computational Science and Computational Intelligence, CSCI 2023

Conference

Conference2023 International Conference on Computational Science and Computational Intelligence, CSCI 2023
Country/TerritoryUnited States
CityLas Vegas
Period13/12/2315/12/23

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Keywords

  • Cloud computing
  • Deep learning
  • Knowledge distillation
  • Task failure prediction

Fingerprint

Dive into the research topics of 'Enhancing Cloud Service Failure Prediction Via Temporal Relationship Distillation'. Together they form a unique fingerprint.

Cite this