Gene set analysis for hurdle model

Modules defined in sets are tested for average differences in expression from the "average" gene. By using bootstraps, the between-gene covariance of terms in the hurdle model is found, and is used to adjust for coexpression between genes. We drop genes if the coefficient we are testing was not estimible in original model fit in zFit or in any of the bootstrap replicates (evidenced an NA in the bootstrap array). This might yield overly conservative inference. Since bootstrapping is a randomized procedure, the degrees of freedom of a module (and its variance parameters) might differ from run-to-run. You might try setting var_estimate='modelbased' to relax this requirement by assuming independence between genes and then using the asymptotic covariance estimates, which are deterministic, but may result in overly-generous inference.

gseaAfterBoot(
  zFit,
  boots,
  sets,
  hypothesis,
  control = gsea_control(n_randomize = Inf, var_estimate = "bootall")
)

gsea_control(n_randomize = Inf, var_estimate = "bootall")

Arguments

zFit: object of class ZlmFit
boots: bootstraps of zFit
sets: list of indices of genes
hypothesis: a Hypothesis to test. Currently only one degree CoefficientHypothesis are supported.
control: parameters as provided by gsea_control. See details.
n_randomize: the number of genes to sample to approximate the non-module average expression. Set to Inf to turn off the approximation (the default).
var_estimate: the method used to estimate the variance of the modules, one of bootall, bootdiag, or modelbased.

Value

Object of class GSEATests, containing slots tests, 4D array and bootR, the number of boostrap replicates.

Functions

gsea_control: set control parameters. See Details.

`control`

control is a list with elements:

n_randomize, giving the number of genes to sample to approximate the non-module average expression. Set to Inf to turn off the approximation (the default).
var_estimate, giving the method used to estimate the variance of the modules. bootall uses the bootstrapped covariance matrices. bootdiag uses only the diagonal of the bootstrapped covariance matrix (so assuming independence across genes). modelbased assumes independence across genes and uses the variance estimated from the model.

Return Value

A 4D array is returned, with dimensions "set" (each module), "comp" ('disc'rete or 'cont'inuous), "metric" ('stat' gives the average of the coefficient, 'var' gives the variance of that average, 'dof' gives the number of genes that were actually tested in the set), "group" ('test' for the genes in test-set, "null" for all genes outside the test-set).

Examples

data(vbetaFA)
vb1 = subset(vbetaFA, ncells==1)
vb1 = vb1[,freq(vb1)>.1][1:15,]
zf = zlm(~Stim.Condition, vb1)
#> 
#> Done!
boots = bootVcov1(zf, 5)
sets = list(A=1:5, B=3:10, C=15, D=1:5)
gsea = gseaAfterBoot(zf, boots, sets, CoefficientHypothesis('Stim.ConditionUnstim'))
## Use a model-based estimate of the variance/covariance.
gsea_mb = gseaAfterBoot(zf, boots, sets, CoefficientHypothesis('Stim.ConditionUnstim'),
control = gsea_control(var_estimate = 'modelbased'))
calcZ(gsea)
#> , , metric = Z
#> 
#>    comp
#> set      cont      disc
#>   A -1.301014 -0.583344
#>   B -2.237545 -3.852267
#>   C       NaN -7.174852
#>   D -1.301014 -0.583344
#> 
#> , , metric = P
#> 
#>    comp
#> set       cont         disc
#>   A 0.22959122 0.5803073617
#>   B 0.08608739 0.0128445625
#>   C        NaN 0.0006287414
#>   D 0.22959122 0.5803073617
#> 
summary(gsea)
#>    set    cont_Z     cont_P    disc_Z       disc_P combined_Z  combined_P
#> 1:   C       NaN        NaN -7.174852 0.0006287414 -4.0832752 0.008350288
#> 2:   B -2.237545 0.08608739 -3.852267 0.0128445625 -1.7951462 0.138073721
#> 3:   A -1.301014 0.22959122 -0.583344 0.5803073617 -0.8522742 0.414748270
#> 4:   D -1.301014 0.22959122 -0.583344 0.5803073617 -0.8522742 0.414748270
#>    cont_effect disc_effect combined_adj
#> 1:         NaN   -1.503876   0.03340115
#> 2:  -0.8174711   -1.235057   0.27614744
#> 3:  -0.3400834   -0.184799   0.41474827
#> 4:  -0.3400834   -0.184799   0.41474827
# \dontshow{
stopifnot(all.equal(gsea@tests['A',,,],gsea@tests['D',,,]))
stopifnot(all.equal(gsea@tests['C','cont','stat','test'], coef(zf, 'C')[15,'Stim.ConditionUnstim']))
# }