Setting up your Library and Uploading Data

How to organize your BioBox library and upload your data

Customizing Models and Biological Replicates 

A model is the overarching category that will connect all of your lab's data. A model can be anything from a cell line, patient, mouse, organoid etc.  Model Properties would refer to any model information that you would like to keep track of e.g. a cell line mutation, a treatment applied to the cell line, a patient's date of birth etc. The properties that go into the model are unique to your lab and are full customizable. Cell line and Patient are BioBox example models that can be fully customized. 

Each record that you create within a model is a biological replicate. e.g. If you are working with a cell line model, each record within the cell line model would be a unique cell line that your lab works with. 

HubSpot Video

Step 1. Access the library by selecting the grid icon in the top left hand corner of the application. Select Library > Organization.  Go to the "Models" tab within the library and select "Create New Model"  or use one of the BioBox standard models Cell Line or Patient. 

Step 2. If creating a new model or customizing an existing model, name your model and add properties that you would like to track e.g. Mutational status, surgery date, treatment etc. This information can be appended to figures like a heatmap.

Step 3. Save your model.

Step 4.  Create new records in your model (biological replicates). You can add entities one at a time or in bulk. E.g. if your lab has 8 cell lines, these would be 8 cell line records under the model Cell Line. If you already have this information saved in an excel sheet, it can be copy and pasted into the table.


Creating Experiments 

Experiments refer to the specific sequencing technique that was applied to one of your biological replicates. Experiments are categorized according to Transcriptomic, Genomic, and Epigenomic sequencing techniques. The metadata associated with your experiment includes the type of experiment,  run type (paired or single end) and the number of technical replicates. 

Each biological replicate can have multiple experiments associated with it e.g.  cell line CELL-001 could have been sent for total-RNA sequencing, whole genome sequencing, and single cell sequencing. The three different experiments could have a varying number of technical replicates and different files associated with them. 

HubSpot Video

Step 1:  Go to the "Experiments" tab and select the type of experiment you would like to create. e.g. If you would like to create a total RNAseq experiment, select the "Total RNAseq" experiment type under the subsection "Transcriptomics".  Select the "Create" button, you will have the option to create 1 or more experiments. Selecting " Create Multiple" will allow you to create multiple experiments at once by copy and pasting your metadata from excel or typing it into the table. 

Step 2: Fill out the table to create your experiments. You will be required to provide 

  • Experiment Name 
  • Species 
  • Derived from (Which model record the experiment corresponds to)
  • Run Type (Paired or Single end)
  • Number of technical replicates

Why do I need to provide an experiment name and why do they have to be unique? 

It is important that you provide unique names for all of your experiments because 1 biological replicate may be associated with multiple experiments. In the example above the biological replicate CELL-001 is associated with 2 different experiments;

  • 1 total RNAseq experiment 
  • 1 Whole Genome Sequencing experiment

In order to differentiate the files and downstream insights associated with each experiment it is mandatory that they have distinct names. The nomenclature used to name your experiments is completely up to you and your lab. 


Uploading Raw Data

HubSpot Video

Step 1:  Go to the "Data" tab within your library.  Select "FASTQ"  and "Upload", you will be able to drag multiple FASTQ files in to be uploaded simultaneously.

Step  2: Once all of files have been selected, you will specify which Experiment and technical replicate they correspond to. 

Step 3: Wait for your FASTQ files to upload. Do not close or refresh the browser during this time or it will cancel the upload. Feel free to continue to work on the platform as the files are in the process of uploading. 


Uploading Processed Data 

HubSpot Video

Step 1:  Go to the "Data" tab within your library.  Select the tab corresponding to the type of processed data you would like to upload (Differential Expression, Gene Counts, VCF, BED, BigWig) 

Step 2: Once you are on the tab corresponding to the file type that you would like to upload, select the upload button. Select the file from your computer and fill in the information required in the file uploader 

  • Gene counts: Specify which column in your file correspond to the "Gene ID Column", your Gene ID format ( Gene Symbol or Ensemble), the units of your counts, and which Experiments your file corresponds to. 
  • Differential Expression:  Specify which columns correspond to Gene ID, Fold Change,  and P-Value. Specify which Experiments your file corresponds to.
  • VCF: Specify the VCF type (SNV or SV) and which experiment the file corresponds to. 
  • BED: Specify the type of BED file you are uploading (Standard BED or BED6+4) and which experiment the file corresponds to.
  • BigWig: Specify the experiment the file corresponds to. 

Once a processed data file has been uploaded, it undergoes a data cleaning check to ensure no errors such as NAs or invalid cells are detected in the file. If errors have been detected in the file an X will appear in the status column. Select the X and "Repair File" to begin the data cleaning process. Once the data has been cleaned and is ready for visualization within a project, a green check mark will appear in the status column. Learn more about the data cleaning process here