Abstract
The escalating cyber-attacks in recent years have created a significant demand for advanced intrusion detection systems. However, the dynamic characteristics of network data streams, scarcity of labelled data, imbalance of class distribution and concerns about the explainability of predictive models have greatly restricted the applicability of conventional machine learning and cutting-edge deep learning techniques in real-world scenarios for intrusion detection. In this paper, a novel soft prototype-based fuzzy ensemble intrusion detection system is proposed to autonomously exploit semi-supervised learning from network data streams by exploiting the pseudo-labelling method. To reduce the pseudo-labelling errors, although the proposed ensemble system involves all the base models for joint pseudo-labelling, it only utilises pseudo-labelled data with the highest consensus for self-improvement. To foster generalisation, the proposed ensemble system leverages sampling techniques to address the class imbalance within the labelled and pseudo-labelled data, and the qualities of base models are constantly monitored, ensuring that weaker base models are efficiently substituted. Additionally, instead of discarding these challenging-to-classify samples during online semi-supervised learning, the proposed ensemble system summarises them into a smaller number of unlabelled soft prototypes, allowing human experts to contribute to the learning at any point by manually labelling these soft prototypes to further augment the learned knowledge base. Numerical examples on public network intrusion detection datasets demonstrated the superior performance of the proposed ensemble system.