In the studies of cybersecurity, malicious traffic detection is attracting increasing attention on account of its capability of detecting attacks. Almost all of the intrusion detection methods based on deep learning have poor data processing capacity with the increase of the data length. Most intrusion detection methods only handle the header part of the traffic and omit valuable information from the payload. As a result, they could not detect the malicious traffic when the hacker hides attack behavior in the payload.
Researchers from the Institute of Acoustics of the Chinese Academy of Sciences (IACAS) proposed an attention model which could process network traffic flow with adjustable length to detect payload-based attacks. Furthermore, researchers designed a Flow-WGAN (Wasserstein Generative Adversarial Networks) model to generate new network traffic data from the original data sets to enhance network packet data and protect the users' privacy.
The study has been published online in the academic journal IEEE Access in June 2019.
In network traffic, different binary bytes have different meanings and are related to each other. However, the one-hot encoding (OHE) does not reflect the association between them but converts the bytes into byte vectors by means of word-embedding method. In this study, researchers processed the network flow as natural language and employed word2vector algorithm to embed the bytes-to-bytes vector. So the distance between any two vectors could represent a partial semantic relationship of two associated bytes.
Researchers proposed a hierarchical attention model which could learn information from two levels of the network flow structure. The model first built representations of bytes using bidirectional GRU (gated recurrent unit) and assigned different bytes to different weights through the attention mechanism. As a consequence, some critical bytes were assigned more weight compared to other bytes which were irrelevant to classifying this flow.
Then researchers processed the packet vector as same as processing bytes. They used bidirectional GRU to aggregate the representations of bytes to a flow representation since it made full use of the context information from both directions. Any packet was assigned a weight to let the model pay attention to a different packet when building the representation of the flow. Researchers found that the hierarchical embedding attention network often resulted in better performance.
The problem of the missing data was frequently encountered in intrusion detection studies. Especially in the deep learning method, the limitations of the training data severely limited the training effect of the model. Furthermore, detecting real user network traffic directly might cause a user data breach.
Under such circumstances, researchers proposed a Flow-WGAN to generate new data from the original data set. This method could not create new information, but Flow-WGAN could learn different feature from the same original training set. The reason was that the two models had different structures and the methods for extracting information were different. Thus, the generated network flow packets were new data for the classifier model.
In order to evaluate the performance of the classifiers or improve the classifiers, researchers could take advantage of the generated network flow packet to simulate a new type of Internet application. The experiment showed that all classifiers had higher FAR when process the generated network flow packets and our model possessed a lower FAR (false acceptance rate) than the HAST-IDS (hierarchical spatial-temporal features-based intrusion detection system) model.
Figure 1. The structure of our hierarchical attention model (Image by IACAS)
Figure 2. Flow-WGAN to generate new type of flow (Image by IACAS)
The experiments based on the ISCX-2012 and ISCX-2017 datasets proved that the proposed model had higher performance in accuracy and true positive rate (TPR) than four state-of-the-art deep learning methods. The experiment showed that the proposed model outperformed the existing HSAT-IDS in the detection of the generated packets. In addition, the training time of this model was 30% less than the training time of HAST-IDS. This indicated that the proposed model could find the critical parts with attention mechanism and convergence faster.
Reference:
HAN Luchao, SHENG Yiqiang, ZENG Xuewen. A Packet-Length-Adjustable Attention Model Based on Bytes Embedding Using Flow-WGAN for Smart Cybersecurity. IEEE Access, 2019, 7: 82913 - 82926. DOI: 10.1109/ACCESS.2019.2924492.
Contact:
ZHOU Wenjia
Institute of Acoustics, Chinese Academy of Sciences, 100190 Beijing, China
E-mail: media@mail.ioa.ac.cn