How to explore and run pipelines with publicly available data

How to access, explore, filter and run pipelines with publicly available data.

 

Screen Shot 2022-05-09 at 9.59.46 AM

Public data can be access through the Knowledge Engine. The Knowledge Engine is our gene search engine, where you can learn everything you would like to know about a gene in one place.  Search for any human or mouse gene of interest and explore the overview, papers, and genes tabs. Under the "Datasets" tab you will find publicly available data from the ENCODE database.  Each record in the table corresponds to 1 ENCODE Biosample. The Biosamples we display in the table correspond to tissues and cell lines with total RNAseq data.  More information about the ECNODE database can be found here

We currently support human total RNAseq data from the ENCODE database.  Once you find the data you would like to work with, it can be added to your cart and you can run a pipeline in a project to process and analyze the data. 

This feature is currently in beta and we would love to hear your feedback. Please email support@biobox.io with any feedback  or suggestions. 

Finding data of your interest and launching a pipeline 

Filtering public data 

HubSpot Video

Within the "Datasets" tab use the left hand panel to filter the ENCODE Biosample records that you would like to see.  The metadata corresponding to each record can be seen in the table e.g. cell type, disease, description etc.

These records can be filtered on the basis of the properties found in the left hand panel. For example, if you would like to see only the cancer cell records, select the "Cell" drop down from the left hand panel and within that drop down select the word "Cancer Cell". This will highlight the term in blue and the table will be subsetted to only display "Cancer Cell" records.  Select the term again to remove the filter. If you would like to display everything except for "Cancer Cell" records, select the minus icon next to "Cancer Cell" and this will remove those records from your table. 

Screen Shot 2022-05-09 at 10.17.31 AM

  • Select the "Experiments' button located in the table to see all of the files and experiments from the records you are interested in. Ensure that the files of interest are either all paired end (two FASTQ files under each experiment) or single end (One FASTQ file under each experiment).

Screen Shot 2022-05-09 at 10.18.38 AM

  • Use the checkbox to select the BioSample records that you would like to work with and add them to your cart. 

Accessing your Cart and removing items 

Screen Shot 2022-05-09 at 10.15.13 AM

The cart can be accessed from the cart icon located in the top right hand corner of the application. Selecting the cart icon will provide you with a preview of some of your items.

Screen Shot 2022-05-09 at 10.15.30 AM

Selecting "Expand" will display all of the information associated with the Biosamples you have added to your cart.  To remove items from your cart select all items using the check box and select the "Remove from cart" button. To export the items as a CSV select the specific items using the checkbox and then "Export CSV", if you would like to export all items simply select "Export CSV". 

Launching a pipeline with public data 

How to create a Gene Counts Matrix 

HubSpot Video

Screen Shot 2022-05-09 at 10.28.35 AM

Once all of your files have been added to your cart, go to your workspace and create a project or open an existing project. From here, create an analysis to launch your pipeline. 

Screen Shot 2022-05-09 at 10.30.09 AM

Select the STAR + Gene Counts Create BioBox pipeline. 

Screen Shot 2022-05-09 at 10.31.50 AM

Configure the pipeline by selecting the read specificity (paired or single end), reference genome, and  the number technical replicates. The number of technical replicates corresponds to the number of BioSamples you have added to your cart. 

Screen Shot 2022-05-09 at 10.34.54 AM

On the following page you will be adding your files and configuring the STAR workflow.  Select the "Cart" tab.  From here you will be able to add your files to the pipeline. 

Screen Shot 2022-05-09 at 10.36.16 AM

Select the "View Experiment"  button next to each Biosample and add your files to your pipeline. 

If you are working with a lot of public data records you can use the filters within the table to subset them.  Once all of your Files have been added to your pipeline you can launch your pipeline. 

Screen Shot 2022-05-09 at 10.38.10 AM

You data will be processed and once the pipeline is complete you can visual your data, explore and compare your insights to your own data.  To learn how to visual data and explore your insights, read this article here.