Abstract
Genes carry core information for producing functional molecules that drive cellular processes. This information is translated into action through multiple tightly regulated stages, such as transcription and translation. Throughout these processes, genes interact with various cellular components, for instance, transcription factors. The resulting gene products often engage in intricate networks to participate in signaling cascades or form complex molecular assemblies. Capturing and understanding these interactions is crucial for deciphering cellular activities and has significant implications for drug and treatment development for illnesses and diseases.
Traditional experimental approaches to studying gene interactions typically focus on specific molecular pairs, limiting the scope of the analysis. However, advances in transcriptomic technologies have enabled the exploration of gene interactions on a much larger scale. Among these, single-cell RNA sequencing (scRNAseq) has emerged as a powerful technique to measure gene expression dynamics across cells from multiple populations, offering unprecedented insights into the complexity of gene interaction networks.
To extract meaningful interaction patterns from scRNAseq data, graphical modeling approaches have become popular. Unlike traditional correlation-based methods, which often capture both direct and indirect associations, graphical models aim to infer direct interactions by accounting for confounding effects. In this thesis, graphical modeling is employed to mitigate third-party influences and identify direct gene-gene interactions. Count data from scRNAseq experiments can potentially suffer from excessive zero counts, a framework called Zero-inflated Negative Binomial (ZINB) is proposed to account for these dropout zero counts while maintaining network estimation performance in non-dropout data. Through simulation benchmarking and experimental data analysis, the ZINB framework demonstrates robust performance in inferring gene networks in large-scale high-dimensional data analysis.
Building on standard graphical modeling, the thesis further explores joint graphical modeling, which enables the simultaneous estimation of multiple related gene networks. The computational requirements of current methods in joint graphical modeling hinder their application in large-scale analysis. To address this challenge, a new approach called two-target linear covariance shrinkage is proposed, leading to the development of JointStein framework. Simulation data analysis illustrates the proposed JointStein framework in reducing computational time and memory requirements while having relatively high performance compared to existing methods. Its potential is further illustrated in scRNAseq data analysis of Glioblastoma and Malaria studies where relevant modulation of gene expression and their interactions are revealed in different cell populations.