There are three main functions that can be run within the CellRegMap package (detailed in our docs):

Association test (persistent effects)

The main functionality of CellRegMap is to investigate genotype-context (GxC) interactions and identify context-specific genetic effects on expression in cohort-scale single-cell data (see Interaction test below). However, to improve scalability, we recommend running the (computationally more intensive) interaction-test function only on a set of candidate eQTLs. In the original CellRegMap paper we consider eQTLs previously identified in the original studies[1,2], but it is now also possible to test for persistent eQTL effects within the CellRegMap framework itself, using the association-test function. In this case, the model can be cast as:

$y = W\alpha + g\beta_G + c + u + \epsilon$,

which is similar to the main model except for the GxC term, which is missing. Here, we test for a persistent effect only, i.e., $\beta_G \neq 0$.

CellRegMap function: run_association()

Interaction test (GxC effects)

This is the main test implemented in CellRegMap, where we test for GxC effects across cellular states and individual SNP variants. In this case we consider the full model:

$y = W\alpha + g\beta_G + g \odot \beta_{GxC} + c + u + \epsilon$

and test for $\beta_{GxC} \neq 0$. While in principal any SNP-gene pairs can be tested for GxC effects, we recommend running this test on a set of candidate eQTLs (either known a priori or identified using the Association test described above), or interesting (e.g., disease-linked) variants to improve statistical power.

CellRegMap function: run_interaction()

Estimation of effect sizes

Finally, CellRegMap can be used to estimate cell-level effect sizes driven by GxC effects for individual eQTLs ($\beta_{GxC}$), thus predicting the cells where those effects are detected. Generally, it makes sense to use this function to characterise eQTLs that show evidence of GxC effects, i.e., for which significant p-values were obtained when using the Interaction test above. The model is the same except for the term $c$, which is now modelled as fixed effects in order to estimate the GxC term itself.

CellRegMap function: estimate_betas()

For more details on the tests above and underlying assumptions we refer the reader to the Supplementary Methods available as part of the paper’s Appendix.

Simple usage example

The model is implemented in python. All vectors and matrices should be provided as numpy arrays, and there should be no flat (one-dimensional) arrays. If the shape of a vector is (n,) please reshape to (n,1).

Below, see a simple usage example with toy inputs:

from numpy import ones
from numpy.random import RandomState

from cellregmap import run_association, run_interaction, estimate_betas

random = RandomState(1)
n = 30                               # number of samples (cells)
p = 5                                # number of individuals
k = 4                                # number of contexts
y = random.randn(n, 1)               # outcome vector (expression phenotype, one gene only)
C = random.randn(n, k)               # context matrix (cells by contexts/factors)
W = ones((n, 1))                     # intercept (covariate matrix)
hK = random.randn(n, p)              # decomposition of kinship matrix (K = hK @ hK.T)
g = 1.0 * (random.rand(n, 1) < 0.2)  # SNP vector

## Association test
pv0 = run_association(y=y, G=g, W=W, E=C, hK=hK)[0]
print(f'Association test p-value: {pv0}')

## Interaction test
pv = run_interaction(y=y, G=g, W=W, E=C, hK=hK)[0]
print(f'Interaction test p-value: {pv}')

# Effect sizes estimation
betas = estimate_betas(y=y, G=g, W=W, E=C, hK=hK)
beta_G = betas[0]                         # persistent effect (scalar)
beta_GxC = betas[1][0]                    # GxC effects (vector)

print(f'persistent genetic effect (betaG): {betaG}')
print(f'cell-level effect sizes due to GxC (betaGxC): {betaGxC}')

For more guidelines and instructions on how to construct input files from real data, please visit the Input Files page.

For a workflow description language (WDL) pipeline to run CellRegMap across thousands of genes check the pipeline’s Github page and Dockerhub repo.

References

  1. Cuomo*, Seaton*, McCarthy* et al, Nature Communications, 2020 - Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression

  2. Jerber*, Seaton*, Cuomo* et al, Nature Genetics, 2021 - Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation