Enhancing Deep Learning Vulnerability Detection through Imbalance Loss Functions: An Empirical Study

  • Yanzhong He
  • , Guancheng Lin
  • , Xiaoxue Ma
  • , Jacky Wai Keung
  • , Cheng Tan
  • , Wenhua Hu
  • , Fuyang Li

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Software Vulnerability Detection (VD) is crucial in software engineering, and Deep Learning (DL) has demonstrated effective in this domain. However, the class imbalance issue, where non-vulnerable code snippets vastly outnumber vulnerable ones, hinders the performance of DL-based Vulnerability Detection (DLVD) models. Recent studies have explored data resampling methods to address this, but these methods often lead to data distribution alterations, resulting in information loss, model overfitting, and reduced interpretability. Imbalance loss functions have thus emerged as viable alternatives. To comprehensively evaluate the effectiveness of imbalance loss functions in DLVD, we investigate six imbalance loss functions and Cross-Entropy Loss (the default for LineVul and ReVeal models) on two DLVD models across three public VD datasets, using three evaluation metrics and the Scott-Knott Effect Size Difference test. Our findings provide valuable insights into selecting loss functions and data resampling methods in DLVD. First, the DLVD model LineVul outperforms ReVeal across all datasets. Second, Label Distribution-Aware Margin loss and Random Under-Sampling generally yield the best Precision and Recall, respectively. Third, to avoid information loss and maintain interpretability, we recommend Logit Adjustment Loss (LALoss) due to its high Recall and superior F1 metric performance. Based on these findings, we suggest employing LineVul with LALoss for VD, as it detects more vulnerable code snippets (higher Recall) while providing comprehensive performance (higher F1).

Original languageEnglish
Title of host publication15th Asia-Pacific Symposium on Internetware, Internetware 2024 - Proceedings
Pages85-94
Number of pages10
ISBN (Electronic)9798400707056
DOIs
Publication statusPublished - 24 Jul 2024
Externally publishedYes
Event15th Asia-Pacific Symposium on Internetware, Internetware 2024 - Macao, China
Duration: 24 Jul 202426 Jul 2024

Publication series

NameACM International Conference Proceeding Series

Conference

Conference15th Asia-Pacific Symposium on Internetware, Internetware 2024
Country/TerritoryChina
CityMacao
Period24/07/2426/07/24

Keywords

  • Vulnerability detection
  • data resampling
  • deep learning
  • imbalance loss functions

Fingerprint

Dive into the research topics of 'Enhancing Deep Learning Vulnerability Detection through Imbalance Loss Functions: An Empirical Study'. Together they form a unique fingerprint.

Cite this