CASMS: Combining clustering with attention semantic model for identifying security bug reports

Xiaoxue Ma, Jacky Keung, Zhen Yang, Xiao Yu, Yishu Li, Hao Zhang

Research output: Contribution to journalArticlepeer-review

17 Citations (Scopus)

Abstract

Context: Inappropriate public disclosure of security bug reports (SBRs) is likely to attract malicious attackers to invade software systems; hence being able to detect SBRs has become increasingly important for software maintenance. Due to the class imbalance problem that the number of non-security bug reports (NSBRs) exceeds the number of SBRs, insufficient training information, and weak performance robustness, the existing techniques for identifying SBRs are still less than desirable. Objective: This prompted us to overcome the challenges of the most advanced SBR detection methods. Method: In this work, we propose the CASMS approach to efficiently alleviate the imbalance problem and predict bug reports. CASMS first converts bug reports into weighted word embeddings based on tf−idf and word2vec techniques. Unlike the previous studies selecting the NSBRs that are the most dissimilar to SBRs, CASMS then automatically finds a certain number of diverse NSBRs via the Elbow method and k-means clustering algorithm. Finally, the selected NSBRs and all SBRs train an effective Attention CNN–BLSTM model to extract contextual and sequential information. Results: The experimental results have shown that CASMS is superior to the three baselines (i.e., FARSEC, SMOTUNED, and LTRWES) in assessing the overall performance (g-measure) and correctly identifying SBRs (recall), with improvements of 4.09%–24.26% and 10.33%–36.24%, respectively. The best results are easily obtained under the limited ratio ranges of the two-class training set (1:1 to 3:1), with around 20 experiments for each project. By evaluating the robustness of CASMS via the standard deviation indicator, CASMS is more stable than LTRWES. Conclusion: Overall, CASMS can alleviate the data imbalance problem and extract more semantic information to improve performance and robustness. Therefore, CASMS is recommended as a practical approach for identifying SBRs.

Original languageEnglish
Article number106906
JournalInformation and Software Technology
Volume147
DOIs
Publication statusPublished - Jul 2022
Externally publishedYes

Keywords

  • Clustering
  • Hybrid neural networks
  • Security bug report

Fingerprint

Dive into the research topics of 'CASMS: Combining clustering with attention semantic model for identifying security bug reports'. Together they form a unique fingerprint.

Cite this