Abstract
Deep learning-based task failure prediction methods have achieved significant performance gains in ensuring high service reliability and availability at the core cloud by predicting task failure probability. With the increasing complexity of the deep-learning-based task failure prediction models, the number of service requests, and software and hardware failures, the ability to quickly detect and predict failures becomes essential for minimizing downtime and optimizing resource allocation. However, fast and lightweight deep learning-based task failure prediction models perform poorly in predicting failure occurrence probability compared to highly parameterized deep learning models. These failed tasks demand large amounts of resources, time, and cost to be recovered. Predicting task failure probabilities quickly before their occurrence with high accuracy may increase reliability and reduce wastage. Inspired by the aforementioned issue, we propose a knowledge distillation-based learning approach that empowers lightweight architectures to achieve improved performances during inference. The proposed approach leverages the representational capacity of the high-parameterized best-performing deep learning-based task failure prediction models considered by a teacher to the lightweight student. To extract the temporal pattern, we focus on a Bidirectional Long Short Term Memory (Bi-LSTM) based distillation approach where the teacher is the large and heavy architecture that transfers the predictive distribution as the regularization term to the tiny Bi-LSTM student network by analyzing past system message logs and identifying the relationship between the data and the failures. The proposed method achieves a state-of-the-art complexity-vs-accuracy trade-off in task failure prediction at the core cloud. Extensive experiments and ablation study depicts the superiority of our technique on large-scale standard benchmark dataset.
Original language | English |
---|---|
Title of host publication | Proceedings - 2023 International Conference on Computational Science and Computational Intelligence, CSCI 2023 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 1061-1067 |
Number of pages | 7 |
ISBN (Electronic) | 9798350361513 |
DOIs | |
Publication status | Published - 2023 |
Event | 2023 International Conference on Computational Science and Computational Intelligence, CSCI 2023 - Las Vegas, United States Duration: 13 Dec 2023 → 15 Dec 2023 |
Publication series
Name | Proceedings - 2023 International Conference on Computational Science and Computational Intelligence, CSCI 2023 |
---|
Conference
Conference | 2023 International Conference on Computational Science and Computational Intelligence, CSCI 2023 |
---|---|
Country/Territory | United States |
City | Las Vegas |
Period | 13/12/23 → 15/12/23 |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- Cloud computing
- Deep learning
- Knowledge distillation
- Task failure prediction