Abstract
Advanced technologies have accelerated the collection and storage of unlabeled, high dimensional data, such as visual data. Manually annotating these large datasets is time-consuming and only a temporary solution as the annotation relates to a particular dataset. Machine learning algorithms are capable of managing and analysing this large volume of data. However, due to high-dimensionality, traditional clustering techniques are inadequate.
In this thesis, we aim to overcome the insight of traditional clustering methods by proposing a two-phase training deep framework. The first phase consists of a generative adversarial network (GAN) used as a feature extraction pipeline. To improve GAN's capacity, we introduced pre-defined kernel filters that encourage identifying visual edges. The second phase deploys an auxiliary classifier which clusters the extracted features. Although the proposed framework demonstrates promising results, the two-phase training increases the computation burden and the number of hyper-parameters.
Therefore, we develop an innovative deep clustering framework capable to be optimised in a single-phase training. The GAN module is replaced with a grouping-based self-supervised learning (SSL) strategy, and by leveraging its learning process, clustering is achieved in real-time with the incorporation of mutual information as an objective function. Despite the significant high accuracy that this framework develops, the SSL strategy implements transformation schemes exclusively in visual data.
To address this constraint, our last work focuses on developing a generic SSL method without the requirement for the definition of an explicit transformation scheme in particular datasets. To accomplish this task, an internal transformation mechanism is introduced where its formulation is not limited to specific data types and can be widely applied to visual, audio, text, or mass spectrometry data.
All our proposed frameworks are evaluated with detailed ablation studies, compared with the latest state-of-the-art methods, and demonstrate high clustering precision and representation learning for a broad range of datasets.