Abstract
Cross-lingual alignment of nuanced sociological concepts is crucial for comparative cross-cultural research, harmonising longitudinal studies, and leveraging knowledge from social science taxonomies (e.g., ELSST). However, aligning these concepts is challenging due to cultural context-dependency, linguistic variation , and data scarcity, particularly for low-resource languages. Existing methods often fail to capture domain-specific subtleties or require extensive parallel data. Grounded in a Vector Decomposition Hypothesis—positing separable domain and language components within embeddings, supported by observed language-pair specific geometric structures—we propose DLIR (Dual-Branch LoRA for Invariant Representation). DLIR employs parallel Low-Rank Adaptation (LoRA) branches: one captures core sociological semantics (trained primarily on English data structured by the ELSST hierarchy), while the other learns language in-variance by counteracting specific language perturbations. These perturbations are mod-eled by Gaussian Mixture Models (GMMs) fitted on minimal parallel concept data using spherical geometry. DLIR significantly outper-forms strong baselines on cross-lingual sociological concept retrieval across 10 languages. Demonstrating powerful knowledge transfer, English-trained DLIR substantially surpasses target-language (French/German) LoRA fine-tuning even in monolingual tasks. DLIR learns disentangled, language-robust representations, advancing resource-efficient multilingual understanding and enabling reliable cross-lingual comparison of sociological constructs.