TY - JOUR
T1 - A Semisupervised Approach for Industrial Anomaly Detection via Self-Adaptive Clustering
AU - Ma, Xiaoxue
AU - Keung, Jacky
AU - He, Pinjia
AU - Xiao, Yan
AU - Yu, Xiao
AU - Li, Yishu
N1 - Publisher Copyright:
© 2005-2012 IEEE.
PY - 2024/2/1
Y1 - 2024/2/1
N2 - With the rapid development of the Industrial Internet of Things, log-based anomaly detection has become vital for smart industrial construction that has prompted many researchers to contribute. To detect anomalies based on log data, semisupervised approaches stand out from supervised and unsupervised approaches because they only require a portion of labeled data and are relatively stable. However, the state-of-the-art semisupervised approaches still suffer from two main problems: manual parameter setting and unsatisfactory performance with high false positives. We propose AdaLog, an integrated semisupervised approach based on self-adaptive clustering, for industrial anomaly detection. In particular, the clustering step performs automatic label probability estimation by distinguishing 12 situations so that the label probability of each unlabeled data can be carefully calculated, leading to high accuracy. In addition, AdaLog employs a pretrained model to learn contextual information comprehensively and a transformer-based model to detect anomalies efficiently. To alleviate class imbalance, an undersampling method is incorporated. The results on three popular datasets demonstrate that AdaLog significantly outperforms three state-of-the-art semisupervised approaches by 17.8%-2489.8% on average in terms of F1-score, and is even superior to two supervised approaches in most cases with average improvements of 10.9%-23.8%.
AB - With the rapid development of the Industrial Internet of Things, log-based anomaly detection has become vital for smart industrial construction that has prompted many researchers to contribute. To detect anomalies based on log data, semisupervised approaches stand out from supervised and unsupervised approaches because they only require a portion of labeled data and are relatively stable. However, the state-of-the-art semisupervised approaches still suffer from two main problems: manual parameter setting and unsatisfactory performance with high false positives. We propose AdaLog, an integrated semisupervised approach based on self-adaptive clustering, for industrial anomaly detection. In particular, the clustering step performs automatic label probability estimation by distinguishing 12 situations so that the label probability of each unlabeled data can be carefully calculated, leading to high accuracy. In addition, AdaLog employs a pretrained model to learn contextual information comprehensively and a transformer-based model to detect anomalies efficiently. To alleviate class imbalance, an undersampling method is incorporated. The results on three popular datasets demonstrate that AdaLog significantly outperforms three state-of-the-art semisupervised approaches by 17.8%-2489.8% on average in terms of F1-score, and is even superior to two supervised approaches in most cases with average improvements of 10.9%-23.8%.
KW - Clustering
KW - deep learning
KW - intelligent anomaly detection
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85161022184&partnerID=8YFLogxK
U2 - 10.1109/TII.2023.3280246
DO - 10.1109/TII.2023.3280246
M3 - Article
AN - SCOPUS:85161022184
SN - 1551-3203
VL - 20
SP - 1687
EP - 1697
JO - IEEE Transactions on Industrial Informatics
JF - IEEE Transactions on Industrial Informatics
IS - 2
ER -