⤴️Upload Internal Data
How to add internal data to your graph through the creation of a data package.
Data packages are a collection of objects, the concepts they are associated with, and the relationships that exist between the objects. Concepts and relationships that you would like to leverage must be created prior to uploading loading. To make the data upload as easy as possible, we have several python data adapters that will convert your processed data files to the required JSON format. Information on how to use our data adapters can be found here. Should you choose to format the data yourself without the use of our data adapters, please follow the instructions outlined in this article.
Uploading your data
Within the Foundry use the left hand panel to select “Internal Data Package”. Select “Upload Data” and follow the stepper to create a new Data Package.
When creating a Data Package you will have to provide the following information;
Name
Name of the data package you are creating
Description
Description of the data package you are creating
Object Data
Nodes
If using our python data adapters, this would be the node.json file
Node Payload Object Example
Requirements
All nodes must have a universally unique identifier Generally, we recommend using uuid-v4
All nodes must have 1 or more labels. These labels are synonymous with your ontology Concepts. Specifically, to make sure they get picked up in your knowledge graph, you must supply them exactly as you’ve specified in your DbLabel.
A properties object must be defined with the
uuid
and thedisplayName
Here theuuid
is the same as the_id
All property values must be primitives (string, number, boolean) or arrays of primitives.
It is your responsibility to ensure that labels used are semantically consistent with your Concept structure
Concepts
Select from any of the concepts that exist within you graph that pertain to the labels you have provided in your node JSON schema.
If you have used a BioBox adapter to generate your file run
list_schema
to see the concepts that should be selected.
Relationship Data
Relationships data describe the relationships between objects within your Node JSON schema. This is optional.
If using our data adapters, this would be the edge.json file.
Edge Payload Object Example
Requirements:
from.uuid
andto.uuid
must be ids of objects that exist or are expected to exist during loading
Relationships
Select from any of the relationships that exist within your graph that pertain to the relationship data you have provided.
If you have used a BioBox adapter to generate the file, run
list_schema
to see the relationships that could be selected.
File Formats
All nodes and edges should be written as .jsonl
new-line delimited jsons as separate files.
Best practice: Namespace your files for clarity e.g. gene_sets.node.jsonl.gz gene_sets.edge.jsonl.gz
Preparing your data
If you are not using our python adapters, please format your data as outlined below.
Expression Data
For bulk RNAseq, on a per library basis, you will want to store the RAW values and the TPM adjusted values. The graph representation looks like this:
How to prepare the data?
Quantified data should be serialized into an array of edges.
If you plan on quantifying and storing the results for the same library twice, you should attribute the file_id or analysis_id on all the edges for a given run
Differential Expression
How to prepare the data?
Create a node row for each Differential Expression Dataset. Keep track of the uuid you’ve assigned to it, you’ll need to reference it later.
Assuming you know the sample (and their ids) that map to the experimental and reference group. You will need to set up the edges to connect them.
Repeat this process for the rest of your experimental group and your reference groups.
It is not recommended to have your edges point across different Concepts. Using these paths consistently will improve your ability to reuse them in your graph models later.
For each dataset, iterate through the rows and transform them into edges. All genes will have their uuid set to their corresponding ensembl stable id.
Single Cell RNAseq
For scRNA, we want to isolate each by its barcode and treat them similar to RNAseq libraries.
How to prepare this data?
From
.h5ad
files where the cell barcode is available, transform the count columns of raw or log-normalized values and create nodes and edges similar to the process with bulk RNAseq, except here you are swapping Cells for Samples.From any differential expression dataset where you’ve compared cell populations. The groupings here don’t matter as much, because by connecting the dataset directly to the cells, you can always identify along what separation the exp/ref was built. So long as you maintain the appropriate annotations for the Cell node(s), they can be used in the graph explorer and the graph models.
Last updated