How to access, explore, filter and run pipelines with publicly available data.
Public data can be access through the Knowledge Engine. The Knowledge Engine is our gene search engine, where you can learn everything you would like to know about a gene in one place. Search for any human or mouse gene of interest and explore the overview, papers, and genes tabs. Under the "Datasets" tab you will find publicly available data from the ENCODE database. Each record in the table corresponds to 1 ENCODE Biosample. The Biosamples we display in the table correspond to tissues and cell lines with total RNAseq data. More information about the ECNODE database can be found here.
We currently support human total RNAseq data from the ENCODE database. Once you find the data you would like to work with, it can be added to your cart and you can run a pipeline in a project to process and analyze the data.
This feature is currently in beta and we would love to hear your feedback. Please email support@biobox.io with any feedback or suggestions.
Finding data of your interest and launching a pipeline
Filtering public data
Within the "Datasets" tab use the left hand panel to filter the ENCODE Biosample records that you would like to see. The metadata corresponding to each record can be seen in the table e.g. cell type, disease, description etc.
These records can be filtered on the basis of the properties found in the left hand panel. For example, if you would like to see only the cancer cell records, select the "Cell" drop down from the left hand panel and within that drop down select the word "Cancer Cell". This will highlight the term in blue and the table will be subsetted to only display "Cancer Cell" records. Select the term again to remove the filter. If you would like to display everything except for "Cancer Cell" records, select the minus icon next to "Cancer Cell" and this will remove those records from your table.
- Select the "Experiments' button located in the table to see all of the files and experiments from the records you are interested in. Ensure that the files of interest are either all paired end (two FASTQ files under each experiment) or single end (One FASTQ file under each experiment).
- Use the checkbox to select the BioSample records that you would like to work with and add them to your cart.
Accessing your Cart and removing items
The cart can be accessed from the cart icon located in the top right hand corner of the application. Selecting the cart icon will provide you with a preview of some of your items.
Selecting "Expand" will display all of the information associated with the Biosamples you have added to your cart. To remove items from your cart select all items using the check box and select the "Remove from cart" button. To export the items as a CSV select the specific items using the checkbox and then "Export CSV", if you would like to export all items simply select "Export CSV".
Launching a pipeline with public data
How to create a Gene Counts Matrix
Once all of your files have been added to your cart, go to your workspace and create a project or open an existing project. From here, create an analysis to launch your pipeline.
Select the STAR + Gene Counts Create BioBox pipeline.
Configure the pipeline by selecting the read specificity (paired or single end), reference genome, and the number technical replicates. The number of technical replicates corresponds to the number of BioSamples you have added to your cart.
On the following page you will be adding your files and configuring the STAR workflow. Select the "Cart" tab. From here you will be able to add your files to the pipeline.
Select the "View Experiment" button next to each Biosample and add your files to your pipeline.
If you are working with a lot of public data records you can use the filters within the table to subset them. Once all of your Files have been added to your pipeline you can launch your pipeline.
You data will be processed and once the pipeline is complete you can visual your data, explore and compare your insights to your own data. To learn how to visual data and explore your insights, read this article here.