Large-Scale Mediation Effect Signal Detection

Mediation analysis is commonly adopted to assess the mediation role of DNA methylation in the causal pathway from an exposure to a clinical outcome. With high-dimensional mediators, it is important to select the truly possible mediators. In this paper, we focus on the problem of multiple testing for mediation effect with false discovery rate (FDR) control. Previous studies tried to approximate the underlying true $p$-values, however, this is not easy due to the composition nature of the null hypothesis. Instead, we introduce a novel procedure named as Mediation Identification by Splitting and Aggregation (MISA), a data-driven approach via constructing new test statistics with marginal symmetry property under null hypothesis and then utilizing the symmetry to derive a threshold. A notable feature of the proposed procedure is that no $p$-value is required. Moreover, the MISA procedure is computationally efficient --- its construction only uses a one-time split of the data and a product of two mediation effect estimators. This method achieves exact FDR control in finite sample settings when the populations are symmetric and independent no matter the number of tests or sample sizes. It is also able to control the FDR asymptotically for asymmetric populations even when the test statistics are dependent. Our procedure can also detect group mediation effect signals. Empirical results show that the resulting method has better FDR control than existing testing methods, while maintaining reasonably good detection ability. We applied the proposed method to identify DNA methylation sites that mediate the effect of ages at diagnosis on lung adenocarcinoma.