Abstract
The development of stable antibodies formed by compatible heavy (H) and light (L) chain pairs is crucial in both the in vivo maturation of antibody-producing cells and the ex vivo designs of therapeutic antibodies. We present here a novel machine learning framework, ImmunoMatch, for deciphering the molecular rules governing the pairing of antibody chains. Fine-tuned on an antibody-specific language model, ImmunoMatch learns from paired H and L sequences from single human B cells to distinguish cognate H-L pairs and randomly paired sequences. We find that the predictive performance of ImmunoMatch can be augmented by training separate models on the two types of antibody L chains in humans, κ and λ, in line with the in vivo mechanism of B cell development in the bone marrow. Using ImmunoMatch, we illustrate that refinement of H-L chain pairing is a hallmark of B cell maturation in both healthy and disease conditions. We find further that ImmunoMatch is sensitive to sequence differences at the H-L interface. ImmunoMatch focusses on H-L chain pairing as a specific, under-explored problem in antibody developability, and facilitates the computational assessment and modelling of stably assembled immunoglobulins towards large-scale optimisation of efficacious antibody therapeutics.