How to identify and compare cell types across multiple experiments using Cell Ranger and Seurat
This guided analysis supports 10X Single cell mouse and human scRNA data. Refer to this document to prepare your files and set up your BioBox library prior to beginning your Guided Analysis. If you would like to analyze a single experiment follow this tutorial here
Step 1. Read Alignment using Cell Ranger
This step will use CellRanger to align your FASTQ reads to a reference transcriptome and will generate the Seurat Object you will be using in the downstream analysis.
- Create a single cell experiment and upload your sequencing Data.
- Create project and within that project create an analysis, select "Guided Analysis"
- Select "Single Cell RNAseq Integrated Analysis - Cell Ranger + Seurat"
- Specify the number of groups you would like to compare in your analysis and provide them with a name
- e.g. if you would like to compare tissue from normal controls and GATA2 deficient you would create 2 groups 1 named "Normal" and the other "GATA2 Deficient"
- Select the experiments you would like to be included in each group.
- Configure any parameters associated with CellRanger that you would like to modify and launch your analysis. More information regarding CellRanger and their customizable workflow parameters can be found here.
- Most Important parameter: Reference Transcriptome. Select between GRCh38 and GRCm38
- Expected Cells - Expected number of recovered cells. Default: 3,000 cells.
- Force Cells- Force pipeline to use this number of cells, bypassing the cell detection algorithm. Only modify this parameter if you know exactly how many cell there are and want to be very precise, if this number is overestimated the downstream results may be corrupted. This is an optional parameter that should only be used if necessary.
- R1 and R2 length - Hard-trim the input R1 and R2 sequence to this length. Note that the length includes the Barcode and UMI sequences so do not set this below 26 for Single Cell 3′ v2 or Single Cell 5′. This is an optional parameter that should only be used if necessary.
- Chemistry - Assay configuration. NOTE: by default the assay configuration is detected automatically, which is the recommended mode. You should only specify chemistry if there is an error in automatic detection
- This step will take upwards of 2 hours to complete depending on your file sizes.
Note: For your convenience and flexibility all steps after Step 1 (Steps 2-7) can be re-ran as many time as you like with updated parameters.
Step 2: Seurat Quality Control
In this step you will use quality control information including Unique Feature Count, Precent Mitochondrial Count, and Reads Per Cell to Subset your data.
- Above the right hand panel you will find a drop down for each group you have created. Use the right hand panel to filter the data on the basis of Unique Feature Count, Precent Mitochondrial Count, and Reads Per Cell to Subset your data for each group.
- Use the plot drop down to swap between Scatter and Violin Plots . Plots can be exported by selecting the cloud icon located in the top right hand corner of the plot. The three violin plots show the distribution of cells according to the following parameters
- nFeature_RNA= the number of genes detected in each cell
- nCount_RNA= the number of molecules detected per cell
- precent.mt = the percentage of transcripts that map to mitochondria genes
More information regarding Seurat Quality Control and best practices can be found here and here. The Methods used to process this data will be available under the "Methods" section once this step is complete.
Step 3 : Seurat Linear Dimensional Reduction
This step will allow you to reduce technical noise associated with your data. The elbow plot allows you to determine how many PCs are needed to capture the majority of the variation in the data. This plot displays the standard deviation of each principal component. The PCs are calculated through a Principle Component Analysis, a refresher on this statistical analysis can be found here. The majority of variation occurs where the PCs begin to level off towards the X axis or "elbow".
- Use the right hand panel to specify the number of PCs you would like to move forward with and the cluster resolution that will be used to generate your downstream data. A lower cluster resolution will result in fewer and more defined clusters.
The Methods used to process this data will be available under the "Methods" section once this step is complete.
Step 4: Seurat Differential Expression Configuration
In this step you will evaluate the clusters generated in a UAMP and specify if you would like additional clusters to be compared in your differential expression analysis. By default, Seurat will identify all positive and negative markers of each cluster compared to all other cell clusters. If you would like additional comparisons to be generated between clusters this is where you will specify those comparisons.
- Use the plot drop down to swap between a UMAP of a group overlay, cluster overlay, and group + cluster view.
- If you arrive at this step and are not satisfied with the clusters in your UMAP, go back to Step 3: Linear Dimensional Reduction and adjust your cluster resolution and/or PC cut offs. A lower cluster resolution will result in fewer and more defined clusters.
- After you have explored your differential expression insights in Step 5 and identified your cell type identities, you can always return to this step and rerun it to generate specific cluster comparisons.
- To learn more about Seurat Differential expression, read this article here.
Step 5 -6: Seurat Differential Expression Analysis and Assigning Cell Identities
- On Step 5 "Differential Expression Analysis" you will have access to all differential expression file and their insights generated from the previous step. Each differential expression file corresponds to 1 cluster in your UMAP and is a comparison of one cluster to all other clusters e.g. Cluster 1 vs all.
- Click on the eye icon next to the cluster to explore the markers and insights to determine the cell type identity.
- This will take you to the data table containing the differential expression file. You can filter the genes to subset your data on the basis of any property in the table e.g. log2 fold change and P value. To do this, select the drop down arrow next to the column header of the property you would like to apply filters to.
- The "insights" tab will contain a summary of all pathways and gene sets enriched in your dataset from EnrichR and and Reactome. Explore the "Tissue and Cells" category in addition to the genes in your data file to identify your cell type. You may have to adjust the filters you have applied in the data table, explore different pathways in your insights as well as look through the genes in your differential expression file to identify your cell identity type.
- Conserved Cell Type Markers. Conserved markers highlights nine markers that are conserved across all groups for each cluster. This may aid in the identification of cell type identities for each cluster.
- Select "inspect" next to the conserved cell type identities tab on the differential expression overview page. Select "Plots" and view a UMAP of the top 9 conserved genes for each cluster.
Prior to proceeding to Step 7 you will select the groups you would like to compare for the Group Differential Expression Analysis. This will allow you to explore plots displaying the average expression of genes within each cell type across groups.
Step 7: Group Differential Expression Analysis
Identify differential genes expressed across cell types and groups.
- Each group comparison that you have selected for differential group comparison will have a section on the overview page where you can explore all of the expression plots e.g. If you identified CD44+ as a cell type in Step 6 you will be able to see the average expression levels of genes associated with the cluster identified as CD44 across Group 1 and Group 2 in addition to any other comparisons you generated.
- Select "inspect" next to one of the cell types you have identified. This will take you to a scatter plot displaying the average expression of genes across the two groups you are comparing.
- Under the data tab you can filter the data on values such as average log2 fold change and P value by using the column headers in the table. Select the drop down arrow next to the column you would like to filter and apply filters to subset your data.
- Select "Plots" to visualize the expression levels of genes associated with a cell type in a scatter plot. Fully customize the plot to your liking using the right hand panel.
- Select "insights" to run a pathway enrichment on the genes found in your data table. e.g Filtering for genes that are upregulated and exploring their insights may highlight cell pathways and genes that are conserved across groups.
- An additional way to visualize changes in gene expression across groups is to use the "Expression Plots". The expression plots display the changes in gene expression across groups in a UMAP and a violin plot where the X axis corresponds to the cell identities you have defined in Step 6.
- Select the "Expression Plots" from the Step 7 overview. Use the right hand panel to select the gene you would like to visualize. Use the "Plots" tab to swap between a UMAP and a violin plot
You can obtain a consolidated summary of your results by proceeding to the results summary. Once you proceed to the results summary you will be unable to rerun intermediate steps but you can revisit each step to continue exploring your insights, customizing and exporting plots.
Use the drop down to toggle between all seven steps to access the associated results and methods.
To explore each step individually select "Hide Summary". Next to hide summary there is a drop down that you can use to access each step.
Select the step you would like to navigate to from the "View all Steps" drop down.