--- title: "Compare susie_rss variants" author: "Peter Carbonetto" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Compare susie_rss variants} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- In this vignette, we briefly illustrate the different ways [susie_rss][susie_rss] can be called, and draw connections between running `susie_rss` on summary data, and running `susie` on individual-level data. ```{r setup, include=FALSE} knitr::opts_chunk$set(collapse = TRUE,comment = "#",fig.width = 5, fig.height = 3,fig.align = "center", dpi = 120) ``` ```{r load-pkgs} library(susieR) ``` Simulate a data set with 200 samples and 1,000 variables, in which the only first 4 variables affect the outcome. ```{r simdata} set.seed(1) n <- 200 p <- 1000 beta <- rep(0,p) beta[1:4] <- 1 X <- matrix(rnorm(n*p),nrow = n,ncol = p) X <- scale(X,center = TRUE,scale = FALSE) y <- drop(X %*% beta + rnorm(n)) ``` Compute summary statistics $\hat{b}_j, \hat{s}_j$ and the correlation matrix, ${\bf R}$. These quantities will be provided as input to susie_rss. ```{r sumstats-no-standardize} ss <- univariate_regression(X,y) dat <- compute_suff_stat(X,y,standardize = FALSE) R <- cov2cor(dat$XtX) ``` The susie and susie_rss analyses produce the exact same results when the summary statistics `bhat`, `shat`, `var_y` and `n` are provided to susie_rss (and when `R` is an "in sample" correlation estimate---that is, when it was computed from the same matrix `X` that was used to obtain the other statistics). If the covariate effects are removed from the genotypes in univariate regression, the in-sample LD matrix should compute from the genotype residuals where the covariate effects have been removed. ```{r first-comparison, fig.height=3.5, fig.width=3} res1 <- susie(X,y,L = 10) res2 <- susie_rss(bhat = ss$betahat,shat = ss$sebetahat,R = R,n = n, var_y = var(y),L = 10,estimate_residual_variance = TRUE) plot(coef(res1),coef(res2),pch = 20,xlab = "susie",ylab = "susie_rss") abline(a = 0,b = 1,col = "skyblue",lty = "dashed") ``` When some but not all of these statistics are provided, the results should be similar, but not exactly the same. Next let's compare the susie and susie_rss outputs when ${\bf X}, y$ are *standardized* before computing the summary statistics (by "standardize", we mean that $y$ and the columns of ${\bf X}$ are each divided by the sample standard deviation so that they each have the same standard deviation). ```{r sumstats-standardize-1} ss <- univariate_regression(scale(X),scale(y)) dat <- compute_suff_stat(X,y,standardize = TRUE) R <- cov2cor(dat$XtX) ``` Then we compute the *z*-scores: ```{r sumstats-standardize-2} zhat <- ss$betahat/ss$sebetahat ``` When standardizing, providing susie_rss with summary data `z` (or `bhat`, `shat`), `R` and `n` is sufficient for susie_rss to recover the same results as susie: ```{r second-comparison, fig.height=3.5, fig.width=6, message=FALSE} res1 <- susie(scale(X),scale(y),L = 10) res2 <- susie_rss(bhat = ss$betahat,shat = ss$sebetahat,R = R,n = n, L = 10,estimate_residual_variance = TRUE) res3 <- susie_rss(zhat,R,n = n,L = 10,estimate_residual_variance = TRUE) layout(matrix(1:2,1,2)) plot(coef(res1),coef(res2),pch = 20,xlab = "susie", ylab = "susie_rss(bhat,shat)") abline(a = 0,b = 1,col = "skyblue",lty = "dashed") plot(coef(res1),coef(res3),pch = 20,xlab = "susie",ylab = "susie_rss(z)") abline(a = 0,b = 1,col = "skyblue",lty = "dashed") ``` When the residual variance is not estimated in susie_rss, the susie_rss results may be close to susie, but may no longer be exactly the same: ```{r third-comparison, fig.height=3.5, fig.width=3, message=FALSE} res4 <- susie_rss(zhat,R,n = n,L = 10) plot(coef(res1),coef(res4),pch = 20,xlab = "susie",ylab = "susie_rss") abline(a = 0,b = 1,col = "skyblue",lty = "dashed") ``` Whenever `R` is an "in sample" correlation matrix, we recommend estimating the residual variance. Without providing the sample size, `n`, the coefficients are interpreted as the "noncentrality parameters" (NCPs), and are (roughly) related to the susie parameters by a factor of $\sqrt{n}$: ```{r fourth-comparison, fig.height=3.5, fig.width=3, message=FALSE, warning=FALSE} res5 <- susie_rss(zhat,R,L = 10) plot(coef(res1),coef(res5)/sqrt(n),pch = 20,xlab = "susie", ylab = "susie_rss/sqrt(n)") abline(a = 0,b = 1,col = "skyblue",lty = "dashed") ``` Whenever possible, the sample size, or a reasonable estimate of the sample size, should be provided. [susie_rss]: https://stephenslab.github.io/susieR/reference/susie_rss.html