Abstract
Nowadays, cyber-attacks have become a common and persistent issue affecting various human activities in modern societies. Due to the continuously evolving landscape of cyber-attacks and the growing concerns around " black box " models, there has been a strong demand for novel explainable and interpretable intrusion detection systems with online learning abilities. In this paper, a novel soft prototype-based autonomous fuzzy inference system (SPAFIS) is proposed for network intrusion detection. SPAFIS learns from network traffic data streams online on a chunk-by-chunk basis and autonomously identifies a set of meaningful, human-interpretable soft prototypes to build an IF-THEN fuzzy rule base for classification. Thanks to the utilization of soft prototypes, SPAFIS can precisely capture the underlying data structure and local patterns, and perform internal reasoning and decision-making in a human-interpretable manner based on the ensemble properties and mutual distances of data. To maintain a healthy and compact knowledge base, a pruning scheme is further introduced to SPAFIS, allowing itself to periodically examine the learned solution and remove redundant soft prototypes from its knowledge base. Numerical examples on public network intrusion detection datasets demonstrated the efficacy of the proposed SPAFIS in both offline and online application scenarios, outperforming the state-of-the-art alternatives. Thanks to the rapid development in electronic manufacturing and information technology, the Internet has become an essential part of everyday life for billions of individuals in modern societies. The Internet has greatly transformed the way people communicate, network and access information. However, the ongoing digitalization in the world has also led to a significant rise in cyber-attacks. According to the Cyber Security Breaches Survey published by the UK government in April 2023 [1], 59% of medium businesses, 69% of large businesses and 56% of high-income charities have encountered cybersecurity breaches and/or cyber-attacks in the last 12 months. Nowadays, the escalating cyber-attacks have posed a major and persistent threat to individuals, businesses and organizations on the Internet. The need for effective techniques to protect information security is highly pronounced. Intrusion detection systems (IDSs) are one of the most effective security techniques to prevent cyber-attacks [2]. The function of an IDS is to monitor the network and identify malicious activities. Traditional IDSs are primarily based on signatures. Such IDSs utilize pattern matching methods to compare current activities against signatures of previous intrusions stored in the database [3]. Signature-based IDSs are highly effective in detecting known attacks, but they are unable to detect novel attacks because of the lack of matching signature in the database. As the technological evolution of cybercrime has made cyber-attacks more sophisticated and difficult to detect, traditional signature-based IDSs have become insufficient in real-world scenarios [4]. Machine learning techniques are capable of learning normal and malicious patterns from empirically observed network activities to constructing accurate predictive models with less human involvement [4]. Conventional machine learning methods, such as decision tree (DT) [5], random forest (RF) [6], support vector machine (SVM) [7], k-nearest neighbour (KNN) [8], etc., have been extensively used for identifying cyber-attacks. IDSs based on conventional machine learning have achieved many successes, but they generally struggle with large-scale, complex intrusion detection problems [9]. Due to the evolving landscape of cyber-attacks, characterized by the increasing sophistication and complexity, there has been a rapidly growing demand for IDSs that leverage more advanced machine learning techniques.