For each gene in sca, fits the hurdle model in formula
(linear for et>0), logistic for et==0 vs et>0.
Return an object of class ZlmFit
containing slots giving the coefficients, variance-covariance matrices, etc.
After each gene, optionally run the function on the fit named by 'hook'
zlm(
formula,
sca,
method = "bayesglm",
silent = TRUE,
ebayes = TRUE,
ebayesControl = NULL,
force = FALSE,
hook = NULL,
parallel = TRUE,
LMlike,
onlyCoef = FALSE,
exprs_values = assay_idx(sca)$aidx,
...
)
a formula with the measurement variable on the LHS and predictors present in colData on the RHS
SingleCellAssay object
character vector, either 'glm', 'glmer' or 'bayesglm'
Silence common problems with fitting some genes
if TRUE, regularize variance using empirical bayes method
list with parameters for empirical bayes procedure. See ebayes.
Should we continue testing genes even after many errors have occurred?
a function called on the fit
after each gene.
If TRUE and option(mc.cores)>1
then multiple cores will be used in fitting.
if provided, then the model defined in this object will be used, rather than following the formulas. This is intended for internal use.
If TRUE then only an array of model coefficients will be returned (probably only useful for bootstrapping).
character or integer passed to `assay` specifying which assay to use for testing
arguments passed to the S4 model object upon construction. For example, fitArgsC
and fitArgsD
, or coefPrior
.
a object of class ZlmFit
with methods to extract coefficients, etc.
OR, if data is a data.frame
just a list of the discrete and continuous fits.
The empirical bayes regularization of the gene variance assumes that the precision (1/variance) is drawn from a
gamma distribution with unknown parameters.
These parameters are estimated by considering the distribution of sample variances over all genes.
The procedure used for this is determined from
ebayesControl
, a named list with components 'method' (one of 'MOM' or 'MLE') and 'model' (one of 'H0' or 'H1')
method MOM uses a method-of-moments estimator, while MLE using the marginal likelihood.
H0 model estimates the precisions using the intercept alone in each gene, while H1 fits the full model specified by formula
ZlmFit-class, ebayes, GLMlike-class, BayesGLMlike-class
data(vbetaFA)
zlmVbeta <- zlm(~ Stim.Condition, subset(vbetaFA, ncells==1)[1:10,])
#>
#> Done!
slotNames(zlmVbeta)
#> [1] "coefC" "coefD" "vcovC"
#> [4] "vcovD" "LMlike" "sca"
#> [7] "deviance" "loglik" "df.null"
#> [10] "df.resid" "dispersion" "dispersionNoshrink"
#> [13] "priorDOF" "priorVar" "converged"
#> [16] "hookOut" "exprs_values"
#A matrix of coefficients
coef(zlmVbeta, 'D')['CCL2',]
#> (Intercept) Stim.ConditionUnstim
#> -3.8329217 -0.5108005
#An array of covariance matrices
vcov(zlmVbeta, 'D')[,,'CCL2']
#> (Intercept) Stim.ConditionUnstim
#> (Intercept) 0.1439196 -0.1254838
#> Stim.ConditionUnstim -0.1254838 0.9182853
waldTest(zlmVbeta, CoefficientHypothesis('Stim.ConditionUnstim'))
#> , , metric = lambda
#>
#> test.type
#> primerid cont disc hurdle
#> B3GAT1 0.9617702324 0.038250068 1.000020
#> BAX 7.2211565188 3.645901878 10.867058
#> BCL2 0.3766814067 2.202291748 2.578973
#> CCL2 0.8414775522 0.284135226 1.125613
#> CCL3 NA 3.548463195 NA
#> CCL4 NA 2.012308210 NA
#> CCL5 0.1746468538 0.862093478 1.036740
#> CCR2 5.3383734489 2.308408187 7.646782
#> CCR4 2.0437612666 0.003737042 2.047498
#> CCR5 0.0005534473 2.811952306 2.812506
#>
#> , , metric = df
#>
#> test.type
#> primerid cont disc hurdle
#> B3GAT1 1 1 2
#> BAX 1 1 2
#> BCL2 1 1 2
#> CCL2 1 1 2
#> CCL3 1 1 2
#> CCL4 1 1 2
#> CCL5 1 1 2
#> CCR2 1 1 2
#> CCR4 1 1 2
#> CCR5 1 1 2
#>
#> , , metric = Pr(>Chisq)
#>
#> test.type
#> primerid cont disc hurdle
#> B3GAT1 0.326741272 0.84494185 0.606524503
#> BAX 0.007204927 0.05620735 0.004367654
#> BCL2 0.539384664 0.13780573 0.275412150
#> CCL2 0.358974532 0.59400356 0.569608276
#> CCL3 NA 0.05960063 NA
#> CCL4 NA 0.15602777 NA
#> CCL5 0.676014596 0.35315351 0.595490308
#> CCR2 0.020860929 0.12867576 0.021853574
#> CCR4 0.152831343 0.95125460 0.359245545
#> CCR5 0.981231129 0.09356445 0.245059834
#>
## Can also provide just a \code{data.frame} instead
data<- data.frame(x=rnorm(500), z=rbinom(500, 1, .3))
logit.y <- with(data, x*2 + z*2); mu.y <- with(data, 10+10*x+10*z + rnorm(500))
y <- (runif(500)<exp(logit.y)/(1+exp(logit.y)))*1
y[y>0] <- mu.y[y>0]
data$y <- y
fit <- zlm(y ~ x+z, data)
summary.glm(fit$disc)
#>
#> Call:
#> NULL
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -53.823 -1.196 1.011 1.206 10.097
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.04928 0.14244 -0.346 0.729
#> x 2.14698 0.19496 11.012 < 2e-16 ***
#> z 2.19058 0.31755 6.898 5.26e-12 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 687.73 on 499 degrees of freedom
#> Residual deviance: 389.01 on 221 degrees of freedom
#> AIC: 395.01
#>
#> Number of Fisher Scoring iterations: 6
#>
summary.glm(fit$cont)
#>
#> Call:
#> NULL
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -2.60596 -0.70254 0.00196 0.77047 2.23471
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 9.99422 0.09416 106.14 <2e-16 ***
#> x 9.96465 0.07646 130.32 <2e-16 ***
#> z 10.01528 0.12713 78.78 <2e-16 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for gaussian family taken to be 1.062237)
#>
#> Null deviance: 22136.29 on 275 degrees of freedom
#> Residual deviance: 289.99 on 273 degrees of freedom
#> AIC: 804.9
#>
#> Number of Fisher Scoring iterations: 2
#>