I recently ran across a situation that I think is going to be increasingly common as I do more and more single-cell analyses. Specifically, I had a project where the investigator had several experiments in related conditions that they want to merge and evaluate with a pseudotime analysis. I could not find any useful tools within Monocle itself for merging data (please correct me in the comments if I’m missing something). It looks as if you have to import a pre-merged seurat dataset.
Here is the workaround that I found [please note these commands are for Seurat v2, they will likely *not* work in v3]:
naive.data <- Read10X(data.dir = paste0(base_dir, "naive_outs/filtered_gene_bc_matrices/mm10/")) naive <- CreateSeuratObject(raw.data = naive.data, min.cells = 3, min.genes = 200, project = "10X_naive") lcmv.data <- Read10X(data.dir = paste0(base_dir, "lcmv_outs/filtered_gene_bc_matrices/mm10/")) lcmv <- CreateSeuratObject(raw.data = lcmv.data, min.cells = 3, min.genes = 200, project = "10X_lcmv") pygp.data <- Read10X(data.dir = paste0(base_dir, "pygp_outs/filtered_gene_bc_matrices/mm10/")) pygp <- CreateSeuratObject(raw.data = pygp.data, min.cells = 3, min.genes = 200, project = "10X_pygp") cd4.combined <- MergeSeurat(object1 = naive, object2 = lcmv, add.cell.id1 = "naive", add.cell.id2 = "lcmv") cd4.combined <- AddSamples(object = cd4.combined, new.data = pygp.data, add.cell.id = "pygp") cd4.combined@raw.data <- as.matrix(cd4.combined@raw.data) cd4.monocle <- importCDS(cd4.combined, import_all = TRUE)
Here, I am reading in 10X data using Seurat (v2) w/ the Read10X
function and then creating the Seurat object with CreateSeuratObject
.
Once this done I use MergeSeurat
to merge the first two experiments, and then AddSamples
to add in the final experiment. Then we can take advantage of the monocle function importCDS
to import the combined object into monocle.
Now there is one final problem and that is that the “orig.ident” field is blank:
To recover the original identity of each cell, we can use the updated cell names from the merged Seurat dataset (i.e., “naive_AAACTGAGAAACCGA”). We just need to split these and recover which experiment each cell came from with:
cellnames <- rownames(pData(cd4.monocle)) orig_ident <- sapply(strsplit(cellnames, split = "_"), '[', 1) pData(cd4.monocle)$orig.ident <- orig_ident
We do a strsplit
on the cellnames, splitting on underscore. The first value from the split in each case is assigned back into the ‘orig.ident’ field of the cell dataset object.
Now you’re ready to continue with the normal downstream analysis in monocle. With dimensionality reduction and clustering done (not shown), we can plot the calculated clusters side-by-side with the experiment of origin (from the merged seurat dataset):
library(cowplot) p1 <- plot_cell_clusters(cd4.monocle, color_by = 'orig.ident') p2 <- plot_cell_clusters(cd4.monocle) #color_by cluster is default behavior plot_grid(p1,p2)
And we get: