introduction.RmdGenotypeMixtures is a handy package that builds on the souporcell package (Heaton et al. 2020), to stitch together genotypes across multiple single cell genomics experiments with an overlapping mixture experimental design…
Install GenotypeMixtures from github. Requires devtools.
#devtools::install_github("bjstewart1/GenotypeMixtures")Load GenotypeMixtures
library(GenotypeMixtures)Experimental designs can be read in using this function if you point at a .csv Alternatively you can read in the .csv however you like, or construct from another file The experiments (10X channels (mixtures) should be rows, and the donors/genotypes should be columns. Membership is denoted by 1 vs 0.
#> reading experimental design
#> Using channel as id variables

We can also read in the locations of the souporcell directories The first column should be the mixture name, the second column should be the path to the soup or cell directory There is some built in dummy vcf files in the package for this vignette
#> channel
#> 1 mixtures_1
#> 2 mixtures_2
#> 3 mixtures_3
#> 4 mixtures_4
#> 5 mixtures_5
#> SOC_directory
#> 1 /Users/runner/work/_temp/Library/GenotypeMixtures/extdata/mixtures_1
#> 2 /Users/runner/work/_temp/Library/GenotypeMixtures/extdata/mixtures_2
#> 3 /Users/runner/work/_temp/Library/GenotypeMixtures/extdata/mixtures_3
#> 4 /Users/runner/work/_temp/Library/GenotypeMixtures/extdata/mixtures_4
#> 5 /Users/runner/work/_temp/Library/GenotypeMixtures/extdata/mixtures_5
Now we plug this into the main function which constructs a genotype cluster graph
#> checking files
#>
|
| | 0%
#> reading in VCF files
#>
#> ***** *** vcfR *** *****
#> This is vcfR 1.13.0
#> browseVignettes('vcfR') # Documentation
#> citation('vcfR') # Citation
#> ***** ***** ***** *****
#>
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
#> the membership graph is a collection of complete subgraphs as expected
Now we can plot the graph which stitches together the genotypes
genotype_clustering_output$graph_plot
Now we can plot the membership matrix which tells us which of our genotypes belongs to which mixtures
genotype_clustering_output$membership_plot
Now we can plot genotype VAFs - this is a useful diagnostic plot; matching genotypes should have their variants along the diagnonal. This is synthetic data, but real data should look reasonably similar to this
plot_cross_vaf(experiment_1_path = file_locations[2, 2],
experiment_2_path = file_locations[3,2],
experiment_1_name = file_locations[2,1],
experiment_2_name = file_locations[3,1])
We can map these computed genotypes back to the original genotypes in our experimental design
cluster_mapping <- membership_map(experimental_design = experimental_design,
graph_output = genotype_clustering_output)
tail(cluster_mapping)
#> channel SOC_cluster genotype_cluster genotype
#> 26 mixtures_2 mixtures_2_4 3 genotype_8
#> 27 mixtures_4 mixtures_4_4 3 genotype_8
#> 28 mixtures_5 mixtures_5_4 3 genotype_8
#> 29 mixtures_3 mixtures_3_4 1 genotype_9
#> 30 mixtures_4 mixtures_4_5 1 genotype_9
#> 31 mixtures_5 mixtures_5_5 1 genotype_9Finally we can assign single cells across our experiments to genotype - feed the output of membership_map() to cells_to_genotypes The output of this can be easily added to the metadata of your single cell experiment/seurat/anndata object
cell_assignments <- cells_to_genotypes(SOC_locations = file_locations,
membership_mat =cluster_mapping)
tail(cell_assignments)
#> barcodes status assignment channel genotype SOC_cluster
#> 23993 CAGCCGATCGTAGGAG-1 singlet 3 mixtures_5 genotype_7 mixtures_5_3
#> 23994 ACACTGAAGCTCCTTC-1 singlet 0 mixtures_5 genotype_2 mixtures_5_0
#> 23995 CATGCCTCACCTCGGA-1 singlet 5 mixtures_5 genotype_9 mixtures_5_5
#> 23996 CCTACCACAAATACAG-1 singlet 0 mixtures_5 genotype_2 mixtures_5_0
#> 23997 AGATTGCTCTCCTATA-1 singlet 0 mixtures_5 genotype_2 mixtures_5_0
#> 23998 CCAGCGACATGGTTGT-1 singlet 1 mixtures_5 genotype_4 mixtures_5_1The package can also output an experimental design with varying levels of density
dense_design <- make_overlapping_mixture(n_mixtures = 12, n_genotypes = 7, density = 1 )
medium_density_design <- make_overlapping_mixture(n_mixtures = 12, n_genotypes = 7, density = 0.5 )
sparse_design <- make_overlapping_mixture(n_mixtures = 12, n_genotypes = 7, density = 0 )This is a dense design
plot_experimental_design(dense_design)
#> Using channel as id variables
This is a medium design
plot_experimental_design(medium_density_design)
#> Using channel as id variables
This is a sparse design
plot_experimental_design(sparse_design)
#> Using channel as id variables