Abstract
Recent advancements in artificial intelligence (AI) have revealed important patterns in pathology images imperceptible to human observers that can improve diagnostic accuracy and decision support systems. However, progress has been limited due to the lack of publicly available medical/veterinary images. To address this scarcity, we explore Instagram as a novel source of pathology images with expert annotations. We curated the IPATH dataset from Instagram, comprising 45,609 pathology image-text pairs, using a combination of classifiers, large language models, and manual filtering. To demonstrate the value of this dataset, we developed a multimodal AI model called IP-CLIP by fine-tuning a pre-trained CLIP model using the IPATH dataset. IP-CLIP outperformed the original CLIP model in a zero-shot classification task on an external histopathology dataset, achieving an F1 score of 0.71 compared to the baseline model's 0.31. These findings demonstrate the effectiveness of the IPATH dataset and underscore the potential of leveraging social media data to develop robust AI models for medical/veterinary image classification and enhanced diagnostic accuracy.