TY - JOUR
T1 - Three-stage data generation algorithm for multiclass network intrusion detection with highly imbalanced dataset
AU - Chui, Kwok Tai
AU - Gupta, Brij B.
AU - Chaurasia, Priyanka
AU - Arya, Varsha
AU - Almomani, Ammar
AU - Alhalabi, Wadee
N1 - Publisher Copyright:
© 2023 The Authors
PY - 2023/1
Y1 - 2023/1
N2 - The Internet plays a crucial role in our daily routines. Ensuring cybersecurity to Internet users will provide a safe online environment. Automatic network intrusion detection (NID) using machine learning algorithms has recently received increased attention recently. The NID model is prone to bias towards the classes with more training samples due to highly imbalanced datasets across different types of attacks. The challenge in generating additional training data for minority classes is the generation of insufficient data. The study's purpose is to address this challenge, which extends the data generation ability by proposing a three-stage data generation algorithm using the synthetic minority over-sampling technique, a generative adversarial network (GAN), and a variational autoencoder. A convolutional neural network is employed to extract the representative features from the data, which were fed into a support vector machine with a customised kernel function. An ablation study evaluated the effectiveness of the three-stage data generation, feature extraction, and customised kernel. This was followed by a performance comparison between our study and existing studies. The findings revealed that the proposed NID model achieved an accuracy of 91.9%–96.2% in the four benchmark datasets. In addition, it outperformed existing methods such as GAN-based deep neural networks, conditional Wasserstein GAN-based stacked autoencoder, synthesised minority oversampling technique-based random forest, and variational autoencoder-based deep neural network, by 1.51%–28.4%.
AB - The Internet plays a crucial role in our daily routines. Ensuring cybersecurity to Internet users will provide a safe online environment. Automatic network intrusion detection (NID) using machine learning algorithms has recently received increased attention recently. The NID model is prone to bias towards the classes with more training samples due to highly imbalanced datasets across different types of attacks. The challenge in generating additional training data for minority classes is the generation of insufficient data. The study's purpose is to address this challenge, which extends the data generation ability by proposing a three-stage data generation algorithm using the synthetic minority over-sampling technique, a generative adversarial network (GAN), and a variational autoencoder. A convolutional neural network is employed to extract the representative features from the data, which were fed into a support vector machine with a customised kernel function. An ablation study evaluated the effectiveness of the three-stage data generation, feature extraction, and customised kernel. This was followed by a performance comparison between our study and existing studies. The findings revealed that the proposed NID model achieved an accuracy of 91.9%–96.2% in the four benchmark datasets. In addition, it outperformed existing methods such as GAN-based deep neural networks, conditional Wasserstein GAN-based stacked autoencoder, synthesised minority oversampling technique-based random forest, and variational autoencoder-based deep neural network, by 1.51%–28.4%.
KW - Convolutional neural network
KW - Data generation
KW - Generative adversarial network
KW - Kernel function
KW - Multiclass classification
KW - Network intrusion detection
KW - Support vector machine
KW - Synthetic minority over-sampling technique
UR - http://www.scopus.com/inward/record.url?scp=85167967772&partnerID=8YFLogxK
U2 - 10.1016/j.ijin.2023.08.001
DO - 10.1016/j.ijin.2023.08.001
M3 - Article
AN - SCOPUS:85167967772
VL - 4
SP - 202
EP - 210
JO - International Journal of Intelligent Networks
JF - International Journal of Intelligent Networks
ER -