Abstract
We study the problem of secure joint source-channel coding for multimodal semantic sources transmitted over noisy wiretap channels. The source model consists ofmmodalities (e.g., image, audio, and sensor data), all represented as random variables. The encoder observes independent and identically distributed samples of an arbitrary non-empty subset of modalities. The samples are encoded and transmitted over a discrete memoryless wiretap channel. The legitimate receiver reconstructs all modalities. We extend the rate-distortion-perception problem formulation to multimodal sources. We establish converse and achievability bounds on the fundamental limits of transmission rate, fidelity, and secrecy, under per-modality distortion and perception constraints, and per-subset equivocation constraints. We show that the fundamental limit for secrecy consists of three operationally distinct components: the level of compression, the secret key rate, and the statistics of the wiretap channel.