Abstract
Head and neck squamous cell carcinoma (HNSCC) requires timely, accurate detection, yet expert
PET/CT interpretation is challenged by low-contrast lesions, heterogeneous presentation, and se-
vere class imbalance. This thesis develops methods that jointly enhance detection performance,
quantify predictive uncertainty, and deliver validated, slice-level localisation suitable for triage.
First, we investigate Capsule Networks (CapsNets), systematically varying Primary Capsule di-
mensionality to study its effect on efficiency and accuracy. Results show capsule configuration is a
critical, tunable hyperparameter rather than a fixed specification.
Second, we evaluate CT-based classification. For slice-level tumour detection, convolutional and
transformer baselines outperform CapsNets (CNN: AUROC 0.90, accuracy 0.832, sensitivity 0.961;
ViT: AUROC 0.89, accuracy 0.676, sensitivity 0.987; CapsNet: AUROC 0.640, accuracy 0.506,
sensitivity 0.914). For seven-class primary site classification, initially high scores (accuracy 0.8553,
AUCs ≥ 0.97) were attributable to patient-level leakage, motivating strict patient-wise splitting
and explainability checks.
Third, we propose a multi-view PET classifier that fuses axial, coronal, and sagittal planes via
cross-attention, exploiting complementary spatial context. Relative to single-view transformers,
sensitivity improved by up to 0.40, reaching 0.94. Interpretability is supported by DeepLIFT
with connected-component analysis (localisation sensitivity 0.41 at 1 false positive per slice), and
calibrated uncertainty via test-time augmentation.
Collectively, these contributions demonstrate that multi-view fusion, coupled with rigorous evalua-
tion and validated explainability, yields uncertainty-aware, interpretable models for HNSCC triage.
Future work includes external, multi-institutional validation, extension to CT-only fusion, PET/CT
multimodal integration, and assessment of generalisability to additional cancers