--- title: "SMUT Package: Testing the Mediation Effect of Multiple SNPs on an Outcome Through a Mediator" author: "Wujuan Zhong" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{SMUT} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## 0. Installation You can install the released version of SMUT from [CRAN](https://CRAN.R-project.org) with: ```{r, echo=TRUE, eval=FALSE} install.packages("SMUT") ``` And the development version from [GitHub](https://github.com/) with: ```{r, echo=TRUE, eval=FALSE} # install.packages("devtools") devtools::install_github("wjzhong/SMUT") ``` ## 1. Overview SMUT package has functions to test for mediation effects of multiple SNPs (G) on a continuous, binary, count or time-to-event outcome (Y) through a continuous mediator (M). Besides SNPs data, these functions can also be applied to test mediation effects in other fields. In general, SMUT package has functions to test mediation effects of multiple continuous variables (G) on a continuous, binary, count or time-to-event outcome (Y) through a continuous mediator (M). ## 2. Multi-SNP Mediation Intersection-Union Test (SMUT) : Testing mediation effects of multiple SNPs on a continuous outcome through a continuous mediator SMUT method leverages intersection-union test (IUT) to decompose mediation effects into two separate regression models: a mediator model and an outcome model. ### 2.1 Mediator model $$M = \alpha_1 + X\iota^M + G\beta + \epsilon_1$$ where $M$ is a $n$ by $1$ vector of the mediator, $\alpha_1$ is the intercept, $G$ is a $n$ by $q$ matrix of the SNPs data, $\beta$ is a $q$ by $1$ vector of SNPs' effects on the mediator, $X$ is a $n$ by $p$ matrix of covariates data, $\iota^M$ is a $p$ by $1$ vector of covariates effects' on the mediator, $\epsilon_1$ is a $n$ by $1$ vector of error terms in the mediator model. ### 2.2 Outcome model $$Y = \alpha_2 + M\theta + X\iota^Y + G\gamma + \epsilon_2$$ where $Y$ is a $n$ by $1$ vector of the outcome, $\alpha_2$ is the intercept, $M$ is the mediator, $\theta$ is the mediator's effect on the outcome, $X$ is the covariates data, $\iota^Y$ is a $p$ by $1$ vector of covariates effects' on the outcome, $G$ is the SNPs data, $\gamma$ is a $q$ by $1$ vector of SNPs' effects on the outcome, $\epsilon_2$ is a $n$ by $1$ vector of error terms in the outcome model. ### 2.3 Testing $\beta$ in the mediator model and $\theta$ in the outcome model SMUT method adopts SKAT method (Wu et al., 2011) to test $H^\beta_0: \beta=0$ versus $H^\beta_1: \beta \neq 0$ in the mediator model and adopts Expectation-Maximization (EM) algorithm as well as score statistics to test $H^\theta_0:\theta=0$ versus $H^\theta_1:\theta \neq 0$ in the outcome model. Suppose the p value of testing $\beta=0$ is $p_\beta$ and the p value of testing $\theta=0$ is $p_\theta$. SMUT method uses the hypotheses $H_0: \beta\theta = 0$ versus $H_1: \beta\theta \neq 0$ to test the mediation effect. By leveraging IUT, the p value of the medition effect is $p_{IUT}=\max(p_\beta, p_\theta)$. ### 2.4 Example of running R function SMUT The example data has a genotype matrix (G) of 100 individuals with 200 SNPs. ```{r, echo=TRUE} library(SMUT) dim(Genotype_data) Genotype_data[1:3,1:4] ``` We generate one continuous mediator and one continuous outcome based on this genotype matrix, as well as two covariates. ```{r, echo=TRUE} N_individual = nrow(Genotype_data) N_SNPs = ncol(Genotype_data) set.seed(1) # generate two covariates X1 = rnorm(N_individual, 2, 3) X2 = sample(c(0,1), N_individual, replace = TRUE) X = cbind(X1, X2) # generate coefficients: iota_M, iota_Y, beta, theta and gamma iota_M = c(0.3,0.5) iota_Y = c(0.2,0.6) beta = rnorm(N_SNPs, 1, 2) theta = 1.2 gamma = rnorm(N_SNPs, 0.5, 2) # generate error terms e1 = rnorm(N_individual, 0, 1) e2 = rnorm(N_individual, 0, 1) # generate the mediator mediator = 1 + X %*% iota_M + Genotype_data %*% beta + e1 # generate the outcome outcome = 2 + mediator*theta + X %*% iota_Y + Genotype_data %*% gamma + e2 ``` We apply SMUT function to test the mediation effect. ```{r, echo=TRUE} result_continuous = SMUT(G = Genotype_data, mediator = mediator, outcome = outcome, covariates = X) print( unlist( result_continuous )) ``` The warning messages are generated by the SKAT function in the R package SKAT. From the result, we can see that the p value of the mediation effect (*p_value_IUT*) is 0.02595523, which is the maximum of the p value of testing $\beta$ (*p_value_beta*) and the p value of testing $\theta$ (*p_value_theta*). ### 2.5 References Full details of the SMUT method can be found in the manuscript: Zhong, W., Spracklen, C.N., Mohlke, K.L., Zheng, X., Fine, J. and Li, Y., 2019. Multi-SNP mediation intersection-union test. *Bioinformatics*, 35(22), pp.4724-4729. ## 3. Generalized Multi-SNP Mediation Intersection-Union Test (GSMUT) : Testing mediation effects of multiple SNPs on a continuous, binary, count or time-to-event outcome through a continuous mediator GSMUT method leverages intersection-union test (IUT) to decompose mediation effects into two separate regression models: a mediator model and an outcome model. ### 3.1 Mediator model The mediator model in the GSMUT method is the same as the one in the SMUT method. $$M = \alpha_1 + X\iota^M + G\beta + \epsilon_1$$ ### 3.2 Outcome model #### 3.2.1 For a continuous, binary or count outcome $$g\{E(Y|\gamma)\} = \alpha_2 + M\theta + X\iota^Y + G\gamma $$ where $g$ is the link function ,$Y$ is a $n$ by $1$ vector of the outcome, $\alpha_2$ is the intercept, $M$ is the mediator, $\theta$ is the mediator's effect on the outcome, $X$ is the covariates data, $\iota^Y$ is a $p$ by $1$ vector of covariates effects' on the outcome, $G$ is the SNPs data, $\gamma$ is a $q$ by $1$ vector of SNPs' effects on the outcome. #### 3.2.2 For a time-to-event outcome $$ \lambda(t)=\lambda_0(t)\exp(M\theta + X\iota^Y + G\gamma) $$ where $\lambda(t)$ is the hazard function, $\lambda_0(t)$ is an unspecified baseline hazard function. ### 3.3 Testing $\beta$ in the mediator model and $\theta$ in the outcome model GSMUT method adopts SKAT method to test $H^\beta_0: \beta=0$ versus $H^\beta_1: \beta \neq 0$ in the mediator model and adopts likelihood ratio statistics to test $H^\theta_0:\theta=0$ versus $H^\theta_1:\theta \neq 0$ in the outcome model. Suppose the p value of testing $\beta=0$ is $p_\beta$ and the p value of testing $\theta=0$ is $p_\theta$. GSMUT method uses the hypotheses $H_0: \beta\theta = 0$ versus $H_1: \beta\theta \neq 0$ to test the mediation effect. By leveraging IUT, the p value of the medition effect is $p_{IUT}=\max(p_\beta, p_\theta)$. In the manuscript, GSMUT is referred to as SMUT_GLM for a continuous, binary or count outcome and as SMUT_PH for a time-to-event outcome. ### 3.4 Example of running R function GSMUT The example data has a genotype matrix (G) of 100 individuals with 200 SNPs. ```{r, echo=TRUE} library(SMUT) dim(Genotype_data) Genotype_data[1:3,1:4] ``` We generate one continuous mediator, one binary outcome and one time-to-event outcome based on this genotype matrix, as well as two covariates. ```{r, echo=TRUE} N_individual = nrow(Genotype_data) N_SNPs = ncol(Genotype_data) set.seed(1) # generate two covariates X1 = rnorm(N_individual, 2, 3) X2 = sample(c(0,1), N_individual, replace = TRUE) X = cbind(X1, X2) # generate coefficients: iota_M, iota_Y, beta, theta and gamma iota_M = c(0.3,0.5) iota_Y = c(0.2,-0.6) beta = rnorm(N_SNPs, 0, 0.5) theta = 1 gamma = rnorm(N_SNPs, 0, 0.3) # generate error terms e1 = rnorm(N_individual, 0, 1) # generate the mediator mediator = 1 + X %*% iota_M + Genotype_data %*% beta + e1 # generate the binary outcome eta = 2 + mediator*theta + X %*% iota_Y + Genotype_data %*% gamma pi = 1/(1+exp( -(eta ) )) binary_outcome = rbinom(length(pi),size=1,prob=pi) # generate the time-to-event outcome based on Weibull baseline hazard v = runif(N_individual) lambda=0.01; rho=1; rateC=0.01 Tlat = (- log(v) / (lambda * exp( eta )))^(1 / rho) # censoring times C = rexp(N_individual, rate=rateC) # follow-up times and event indicators time = pmin(Tlat, C) status = as.numeric(Tlat <= C) survival_outcome = cbind(time,status) colnames(survival_outcome) = c("time","status") ``` We apply GSMUT function to test the mediation effect. ```{r, echo=TRUE} result_binary = GSMUT(G = Genotype_data, mediator = mediator, outcome = binary_outcome, covariates = X, outcome_type = "binary") print( unlist( result_binary ) ) result_survival = GSMUT(G = Genotype_data, mediator = mediator, outcome = survival_outcome, covariates = X, outcome_type = "survival") print( unlist( result_survival )) ``` The warning messages are generated by the SKAT function in the R package SKAT. From the result, we can see that the p value of the mediation effect (*p_value_IUT*) for the binary outcome is 0.000421673; p value of the mediation effect for the time-to-event outcome is 0.02470066. Here the p value of the mediation effect (*p_value_IUT*) is the maximum of the p value of testing $\beta$ (*p_value_beta*) and the p value of testing $\theta$ (*p_value_theta*). And *theta_hat* is the point estimate of the $\theta$ in the outcome model. ### 3.5 Options for the parameter outcome_type in the GSMUT function | Type of outcome | outcome_type | |:------:|:------:| | continuous | "continuous" | | binary | "binary" | | count | "count" | | time-to-event | "survival" | ### 3.6 References Full details of the GSMUT method can be found in the manuscript: Zhong, W., Darville, T., Zheng, X., Fine, J. and Li, Y., 2019. Generalized Multi-SNP Mediation Intersection-Union Test. *Submitted*