Guided Analysis: Single Cell RNA Sequencing

How to process your raw scRNA data from start to finish using the BioBox Guided Analysis

This guided analysis supports 10X  Single cell mouse and human scRNA data.  Refer to this document to prepare your files and set up your BioBox library prior to beginning your Guided Analysis.  

Step 1. Cell Ranger and Seurat 

This step will use CellRanger to align your FASTQ reads to a reference transcriptome and will generate the Seurat Object you will be using in the downstream analysis, 

HubSpot Video
  • Create a single cell experiment and upload your sequencing Data.
  • Create project and within that project create an analysis, select "Guided Analysis" 
  • Select the scRNA experiment from your library that you would like to work with. 
  • Configure any parameters associated with CellRanger that you would like to modify and launch your analysis. More information regarding CellRanger and their customizable workflow  parameters can be found here
    • Most Important parameter: Reference Transcriptome. Select between GRCh38 and GRCm38
    • Expected Cells - Expected number of recovered cells. Default: 3,000 cells.
    • Force Cells- Force pipeline to use this number of cells, bypassing the cell detection algorithm. Only modify this parameter if you know exactly how many cell there are and want to be very precise, if this number is overestimated the downstream results may be corrupted. This is an optional parameter that should only be used if necessary. 
    • R1 and R2 length - Hard-trim the input R1 and R2 sequence to this length. Note that the length includes the Barcode and UMI sequences so do not set this below 26 for Single Cell 3′ v2 or Single Cell 5′.  This is an optional parameter that should only be used if necessary.
    • Chemistry - Assay configuration. NOTE: by default the assay configuration is detected automatically, which is the recommended mode. You should only specify chemistry if there is an error in automatic detection 
  • This step will take upwards of 2 hours to complete depending on your file sizes.

Step 2: Seurat Quality Control 

In this step you will use quality control information including Unique Feature Count, Precent Mitochondrial Count, and Reads Per Cell to Subset your data. 

HubSpot Video
  • Use the right hand panel to filter the data on the basis of Unique Feature Count, Precent Mitochondrial Count, and Reads Per Cell to Subset your data
  • Use the plot drop down to swap between Scatter and Violin Plots . Plots can be exported by selecting the cloud icon located in the top right hand corner of the plot. The three violin plots show the distribution of cells according to the following parameters 
    • nFeature_RNA= the number of genes detected in each cell 
    • nCount_RNA= the number of molecules detected per cell 
    • = the percentage of transcripts that map to mitochondria genes 

More information regarding Seurat Quality Control and best practices can be found here and here. The Methods used to process this data will be available under the "Methods" section once this step is complete. 

Step 3 : Seurat Linear Dimensional Reduction

This step will allow you to reduce technical noise associated with your  data. The elbow plot allows you to  determine how many PCs are needed to capture the majority of the variation in the data. This plot displays the standard deviation of each principal component. The PCs are calculated through a Principle Component Analysis, a refresher on this statistical analysis can be found here.  The majority of variation occurs where the PCs begin to level off towards the X axis or "elbow". 

HubSpot Video
  • Use the right hand panel to specify the number of PCs you would like to move forward with and the cluster resolution that will be used to generate your downstream data.  A lower cluster resolution will  result in fewer and more defined clusters. 

The Methods used to process this data will be available under the "Methods" section once this step is complete. 

Step 4: Seurat Differential Expression Configuration 

In this step you will evaluate the clusters generated in a UAMP and specify if you would like additional clusters to be compared in your differential expression analysis.  By default, Seurat will  identify all positive and negative markers of each cluster compared to all other cell clusters. If you would like additional comparisons to be generated between clusters this is where you will specify those comparisons. 

HubSpot Video
  • If you arrive at this step and are not satisfied with the clusters in your UMAP, go back to Step 3: Linear Dimensional Reduction and adjust your cluster resolution and/or PC cut offs. A lower cluster resolution will  result in fewer and more defined clusters. 
  • After you have explored your differential expression insights in Step 5 and identified your cell type identities, you can always return to this step and rerun it to generate specific cluster comparisons. 
  • To learn more about Seurat Differential expression, read this article here

Step 5 -6: Seurat Differential Expression Analysis and Assigning Cell Identities

HubSpot Video


  • On Step 5 "Differential Expression Analysis" you will have access to all differential expression file and their insights generated from the previous step. Each differential expression file corresponds to 1 cluster in your UMAP and is a comparison of one cluster to all other clusters e.g. Cluster 1 vs all.

  • Click on the eye icon next to the cluster to explore the markers and insights to determine the cell type identity.
    • This will take you to the data table containing the differential expression file. You can filter the genes to subset your data on the basis of any property in the table e.g. log2 fold change and P value. To do this, select the drop down arrow next to the column header of the property you would like to apply filters to. 
    • The "insights" tab will contain a summary of all pathways and gene sets enriched in your dataset from EnrichR and and Reactome. Explore the "Tissue and Cells" category in addition to the genes in your data file to identify your cell type. You may have to adjust the filters you have applied in the data table, explore different pathways in your insights as well as look through the genes in your differential expression file to identify your cell  identity type.
  • Once you have identified a cell type for a cluster select " Assigning Cell Identity" in the bottom right hand corner of the page. This will take you to the UMAP where you can assign your Cell Identities to a Cluster. 
    Screen Shot 2022-05-02 at 2.43.15 PM
  • Select "series"  and change the title of the cluster to the cell identity you have determined.  E.g. If you just explored Cluster 1 and determined its cell identity to be CD33+ change the name of Cluster 1 to C33+. You can also customize the colour of the data points in the cluster by selecting the colour icon. 
  • Use the back button to return to Step 5 (Differential Expression Comparison) and explore the insights associated with a different cluster to determine its identity.  Repeat this process until all clusters have been updated on Step 6 with their appropriate cell identity. 

Once you are satisfied with your results select "Proceed to Results Summary" This will provide you with a step by step summary of all plots and methods used to generate your data. Once this has your data has been summarized it can no longer be updated or manipulated. 

All differential expression files will be saved to your BioBox library. Use the drop down in the right upper right hand panel to see the methods and plots generated with each step. 

Screen Shot 2022-05-04 at 6.21.55 PM