TY - JOUR
T1 - TerGEC
T2 - A graph enhanced contrastive approach for program termination analysis
AU - Liu, Shuo
AU - Keung, Jacky Wai
AU - Yang, Zhen
AU - Liao, Yihan
AU - Li, Yishu
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/10
Y1 - 2024/10
N2 - Context: Programs with non-termination behavior induce various bugs, such as denial-of-service vulnerability and memory exhaustion. Hence the ability to detect non-termination programs before software deployment is crucial. Existing detection methods are either execution-based or deep learning-based. Despite great advances, their limitations are evident. The former requires complex sandbox environments for execution, while the latter lacks fine-grained analysis. Objective: To overcome the above limitations, this paper proposes a graph-enhanced contrastive approach, namely TerGEC, which combines both inter-class and intra-class semantics to carry out a more fine-grained analysis and exempt execution during the detection process. Methods: In detail, TerGEC analyzes behaviors of programs from Abstract Syntax Trees (ASTs), thereby capturing intra-class semantics both syntactically and lexically. Besides, it incorporates contrastive learning to learn the discrepancy between program behaviors of termination and non-termination, thereby acquiring inter-class semantics. In addition, graph augmentation is designed to improve the robustness. Weighted contrastive loss and focal loss are also equipped in TerGEC to alleviate the classes-imbalance problem during the non-termination detection. Consequently, the whole detection process can be handled more fine-grained, and the execution can also be exempted due to the nature of deep learning. Results: We evaluate TerGEC on five datasets of both Python and C languages. Extensive experiments demonstrate TerGEC achieves the best performance overall. Among all experimented datasets, TerGEC outperforms state-of-the-art baselines by 8.20% in terms of mAP and by 17.07% in terms of AUC on average. Conclusion: TerGEC is capable of detecting non-terminating programs with high precision, showing that the combination of inter-class and intra-class learning, along with our proposed classes-imbalance solutions, is significantly effective in practice.
AB - Context: Programs with non-termination behavior induce various bugs, such as denial-of-service vulnerability and memory exhaustion. Hence the ability to detect non-termination programs before software deployment is crucial. Existing detection methods are either execution-based or deep learning-based. Despite great advances, their limitations are evident. The former requires complex sandbox environments for execution, while the latter lacks fine-grained analysis. Objective: To overcome the above limitations, this paper proposes a graph-enhanced contrastive approach, namely TerGEC, which combines both inter-class and intra-class semantics to carry out a more fine-grained analysis and exempt execution during the detection process. Methods: In detail, TerGEC analyzes behaviors of programs from Abstract Syntax Trees (ASTs), thereby capturing intra-class semantics both syntactically and lexically. Besides, it incorporates contrastive learning to learn the discrepancy between program behaviors of termination and non-termination, thereby acquiring inter-class semantics. In addition, graph augmentation is designed to improve the robustness. Weighted contrastive loss and focal loss are also equipped in TerGEC to alleviate the classes-imbalance problem during the non-termination detection. Consequently, the whole detection process can be handled more fine-grained, and the execution can also be exempted due to the nature of deep learning. Results: We evaluate TerGEC on five datasets of both Python and C languages. Extensive experiments demonstrate TerGEC achieves the best performance overall. Among all experimented datasets, TerGEC outperforms state-of-the-art baselines by 8.20% in terms of mAP and by 17.07% in terms of AUC on average. Conclusion: TerGEC is capable of detecting non-terminating programs with high precision, showing that the combination of inter-class and intra-class learning, along with our proposed classes-imbalance solutions, is significantly effective in practice.
KW - Code representation learning
KW - Contrastive learning
KW - Graph neural networks
KW - Termination analysis
UR - http://www.scopus.com/inward/record.url?scp=85193830731&partnerID=8YFLogxK
U2 - 10.1016/j.scico.2024.103141
DO - 10.1016/j.scico.2024.103141
M3 - Article
AN - SCOPUS:85193830731
SN - 0167-6423
VL - 237
JO - Science of Computer Programming
JF - Science of Computer Programming
M1 - 103141
ER -