Seurat is a popular R/Bioconductor package for working with single-cell RNA-seq data. As part of the very first steps of filtering and quality-controlling scRNA-seq data in Seurat, you calculate the % mitochondrial gene expression in each cell, and filter out cells above a threshold. The tutorial provides the following code for doing this in human cells:
mito.genes = grep(pattern = "^MT-", x = rownames(x = pbmc@data), value = TRUE)
percent.mito = Matrix::colSums(pbmc@raw.data[mito.genes, ])/Matrix::colSums(pbmc@raw.data)
pbmc = AddMetaData(object = pbmc, metadata = percent.mito, col.name = "percent.mito")
VlnPlot(object = pbmc, features.plot = c("nGene", "nUMI", "percent.mito"), nCol = 3)
Creating a catalog of mitochondrial genes by searching with ‘grep’ for any gene names that start with “MT-” works just fine for the human reference transcriptome. Unfortunately, it doesn’t work for mouse (at least for mm10, which is the reference assembly I’m working with). There are two workarounds for this, in my opinion.
The easiest is to change the regular expression in the “grep” command from “^MT-” to “^mt-” since a search through the mm10 reference (version 3.0.0) in the cellranger reference files reveals that for whatever reason, the MT genes are labeled with lowercase ‘mt’ instead.
A second, and perhaps more thorough, approach is to take advantage of the Broad Institute’s “Mouse Mitocarta 2.0” encyclopedia of mitochondrial genes (note that you could do this same procedure for human MT genes too).
By creating a list of the top 100-200 genes with the strongest evidence for MT expression, it seems likely that you more accurately capture true mitochondrial gene expression. Below is some code to use the “MitoCarta 2.0” (downloaded as a CSV file) for this procedure. You will need to import “tidyverse” to work with tibbles:
library(tidyverse)
library(seurat)
mouse_mito = as.tibble(read.csv("Mouse.MitoCarta2.0_page2.csv", header = TRUE))
mouse_mito = mouse_mito %>% select(c(Symbol, MCARTA2.0_score)) %>% slice(1:100)
mito.genes = as.character(mouse_mito$Symbol)
mito.genes = mito.genes[mito.genes %in% rownames(sample2@raw.data)]
percent.mito = Matrix::colSums(sample2@raw.data[mito.genes,]) / Matrix::colSums(sample2@raw.data)