Expertise
Dr Andrew Gilbert is an Associate Professor in Machine Learning at the University of Surrey, where he co-leads the interdisciplinary Centre for Creative Arts and Technologies (C-CATS). His research lies at the intersection of computer vision, generative modelling, and multimodal learning, with a particular focus on building interpretable and human-centred AI systems. His work aims to develop machines that not only see and recognise the world, but also understand and creatively respond to it.
Dr Gilbert has made significant contributions to the fields of video understanding, , long-form video captioning, visual style modelling, and AI-driven story understanding. A distinctive feature of his research is its integration into the creative industries, applying technical advances to domains such as media production, performance capture, and digital arts. From training models to classify genre from movie trailers to designing systems that can generate synthetic images and narrative content, his work consistently pushes the boundaries of how AI can support and enhance human creativity.
He leads a vibrant and diverse team of PhD students, collaborating on cutting-edge projects in areas such as self-supervised learning from video, video diffusion models, and multimodal scene understanding. Many of these projects are conducted in close partnership with creative practitioners, industry partners, and other academic disciplines, reflecting Dr Gilbert’s commitment to interdisciplinary and impact-driven research.
In addition to his research leadership, Dr Gilbert is an active contributor to the UK computer vision community. He serves on the British Machine Vision Association (BMVA) Executive Committee, where he organises national technical meetings to foster collaboration between academia and industry. Through this work, he helps shape the research agenda for future AI systems that are explainable, responsible, and aligned with human values.