πŸ—‚οΈFormat Internal Data For Uploading

How to use our python data adapters to format internal data to upload.

To make the data ingestion process as simple as possible we have created several python data adapters that will convert your processed data files into JSON files compatible with the BioBox platform.

Each adapter will

  • Handle the column mapping to ensure the necessary experimental metadata required for each sequencing modality is available.

  • Produce two JSON files (objects and edges) that can be uploaded to the platform.

Create a copy of the adapter file, import your files and execute the code. Once the objects and edges files have been created, follow the instructions here to upload them to the platform.

Single Cell RNAseq Adapter

The required input for this adapter is an h5ad file. This adapter will enable you to upload Single Cell RNAseq observations.

BioBox scRNAseq Adapter

Single Cell ATACseq Adapter

The required input for this adapter is an h5ad file. This adapter will enable you to upload Single Cell ATACseq observations.

BioBox Single Cell ATACseq adapter
Example ATACseq Data Schema

scrna.list_schema()

{'_meta': {'version': '0.0.1', 'date_updated': '2024-06-18 01:33:36.226002'},
 'name': 'SingleCellRNASeq Datapack - 2024-06-18 01:33:36.226002',
 'key': 'scrna:2024-06-18 01:33:36.226002',
 'description': 'SingleCellRNASeq Datapack created through Python SDK',
 'dependencies': ['Ensembl'],
 'concepts': {'Experiment': {'label': 'Experiment',
   'dbLabel': 'Experiment',
   'definition': 'Experiment of the sample tissue'},
  'SingleCellExperiment': {'label': 'SingleCellExperiment',
   'dbLabel': 'SingleCellExperiment',
   'definition': 'Single Cell Experiment of the sample tissue',
   'sco': 'Experiment'},
  'SingleCellRNAseqExperiment': {'label': 'SingleCellRNAseqExperiment',
   'dbLabel': 'SingleCellRNAseqExperiment',
   'definition': 'Single Cell RNAseq Experiment of the sample tissue',
   'sco': 'SingleCellExperiment'},
  'Sample': {'label': 'Sample',
   'dbLabel': 'Sample',
   'definition': 'Sample organism from which tissue was taken to be analyzed'},
  'CellBarcode': {'label': 'Cell Barcode',
   'dbLabel': 'CellBarcode',
   'definition': 'Individual cell from scRNA experiment, identified by barcode'}},
 'relationships': {'contains cell': {'from': 'SingleCellExperiment',
   'to': 'CellBarcode'},
  'expresses': {'from': 'CellBarcode', 'to': 'Gene'},
  'has experiment': {'from': 'Sample', 'to': 'Experiment'},
  'has cell type': {'from': 'CellBarcode', 'to': 'CellType'}}}

ChIPseq Adapter

The required input for this adapter is a BED file. This adapter will enable you to upload ChIPseq observations.

BioBox ChIPseq Adapter
Example ChIPseq Data Schema
{'_meta': {'version': '0.0.1', 'date_updated': '2024-06-24 14:47:37.050712'},
 'name': 'My ChIPSeq',
 'key': 'My ChIPSeq',
 'description': '',
 'concepts': {'NarrowPeak': {'label': 'NarrowPeak',
   'dbLabel': 'NarrowPeak',
   'definition': ''},
  'ChIPseq': {'label': 'ChIPseq', 'dbLabel': 'ChIPseq', 'definition': ''}},
 'relationships': {'has narrow peak': {'from': 'ChIPseq', 'to': 'NarrowPeak'},
  'peak start on': {'from': 'NarrowPeak', 'to': 'GenomicInterval'},
  'peak end on': {'from': 'NarrowPeak', 'to': 'GenomicInterval'},
  'assay target on': {'from': 'ChIPseq', 'to': 'Protein'},
  'has chipseq': {'from': 'Sample', 'to': 'ChIPseq'}}}

Genome Adapter

The required input for this adapter is a gzipped GTF (Gene transfer format) file. This adapter will enable you to upload a custom reference genome.

BioBox Genome Adapter
Example Data Schema
{'_meta': {'version': '0.0.1', 'date_updated': '2024-06-24 17:28:23.528499'},
 'name': 'Genome Datapack - homo sapiens 9606 (2024-06-24 17:28:23.528499)',
 'key': 'genome:9606:2024-06-24 17:28:23.528499',
 'description': 'Genome Datapack created through Python SDK',
 'concepts': {'Gene': {'label': 'Gene',
   'dbLabel': 'Gene',
   'definition': 'Gene encompassing all biotypes'},
  'Transcript': {'label': 'Transcript',
   'dbLabel': 'Transcript',
   'definition': 'Transcripts derived from gene'},
  'Protein': {'label': 'Protein',
   'dbLabel': 'Protein',
   'definition': 'Protein derived from gene'},
  'Genome': {'label': 'Genome',
   'dbLabel': 'Genome',
   'definition': 'Genome encompassing this data pack'},
  'GenomicInterval': {'label': 'Genomic Interval',
   'dbLabel': 'GenomicInterval',
   'definition': "Genomic Interval splitting the genome's chromosomal regions into sections of 1kbp"}},
 'relationships': {'genome contains interval': {'from': 'Genome',
   'to': 'GenomicInterval'},
  'next': {'from': 'GenomicInterval', 'to': 'GenomicInterval'},
  'transcribed to': {'from': 'Gene', 'to': 'Transcript'},
  'has translation': {'from': 'Transcript', 'to': 'Protein'}}}

Last updated