Logo image
Omni-C: Compressing Heterogeneous Modalities into a Single Dense Encoder
Other

Omni-C: Compressing Heterogeneous Modalities into a Single Dense Encoder

Kin Lau, Yasar Abbas Ur Rehman, Po Lai-Man, Pedro Porto Buarque de Gusmão and Pedro Porto Buarque De Gusmao
arXiv.org
Cornell University Library, arXiv.org
27/02/2026

Abstract

Audio data Coders Inference Optimization Parameters
Recent multimodal systems often rely on separate expert modality encoders which cause linearly scaling complexity and computational overhead with added modalities. While unified Omni-models address this via Mixture-of-Expert (MoE) architectures with specialized experts and routing, they still inflate parameter counts and introduce routing overhead. In this paper, we propose Omni-C (Omni-Compress), a single dense Transformer-based encoder that learns competitive shared representations across heterogeneous modalities--images, audio, and text--through unimodal contrastive pretraining on large-scale unaligned data. By maximizing parameter sharing in the backbone and using lightweight modality-specific projection heads, Omni-C effectively mitigates inter-modality conflicts without requiring MoE, paired supervision, or routing. This design supports efficient deployment on memory-constrained systems via sequential modality processing and low-memory inference, eliminating the need for parallel expert loading or specialized hardware. Experiments show Omni-C achieves performance comparable to expert models in unimodal and cross-model tasks, with modest zero-shot degradation on audio and text that is largely recovered through lightweight linear probing or parameter efficient fine-tuning. The unified architecture substantially reduces inference memory usage compared to multi-encoder baselines, advancing efficient and scalable multimodal learning.

Metrics

1 Record Views

Details

Logo image

Usage Policy