Data Packages

Epigenetic

ENCODE cCREs

The ENCODE project provides candidate cis regulatory elements (cCREs) that are computationally inferred signatures of regulatory elements and their predicted target using ChIP-seq on histone marks and CTCF-binding across reference tissues, primary cells, and in-vitro cell line models.

Concepts

  • Regulatory Feature

  • Cis Regulatory Element

  • Distal Enhancer Like

  • Proximal Enhancer Like

  • Promoter Like

Relationships

  • (Regulatory Feature) - predicted to regulate -> (Gene)

Expression

GTEx - Tissues

Bulk RNAseq analysis of 50+ human tissue samples from the GTEx consortium. The tissue- and cell-based samples in these experiments are processed separately to avoid batch effects during normalisation. The samples of each group are then processed together to generate an expression table of normalised Transcripts Per Million (TPMs) units for every gene in each tissue or cell type.

This data pack creates direct reference edges that represent baseline expression of a gene in a specific biological system. Because we are taking these values directly from cited sources, we do not track the sample metadata and experimental conditions.

Concepts

  • Tissue

  • Gene

  • Cell Type

Relationships

  • (Tissue) - baseline expression of -> (Gene)

  • (Cell Type) - baseline expression of -> (Gene)

Human Protein Atlas - General

The Human Protein Atlas is a Swedish-based program initiated in 2003 with the aim to map all the human proteins in cells, tissues, and organs using an integration of various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, transcriptomics, and systems biology. All the data in the knowledge resource is open access to allow scientists both in academia and industry to freely access the data for exploration of the human proteome.

Includes expression data from bulk RNA in tissues, as well as, scRNA libraries.

Concepts

  • Tissue

  • Cell Type

  • Disease

  • Gene

Relationships

  • (Tissue) - baseline expression of -> (Gene)

  • (Cell Type) - baseline expression of -> (Gene)

  • (Gene) - is a marker of -> (Cell Type)

  • (Gene) - has specificity for -> (Cell Type)

Molecular Interactions

BIOGRID

BioGRID is a biomedical interaction repository with data compiled through comprehensive curation efforts. The latest version (4.4.233) data release is transformed:

Concepts

  • Protein

  • Transcript

Relationships

  • (Protein) - interacts with -> (Protein)

  • (Transcript)- rna binds with -> (Protein)

Metabolomics

CellphoneDB - Metabolites

CellphoneDB database is a publicly available repository of curated receptors, ligands and their interactions. Subunit architecture is included for both ligands and receptors, representing heteromeric complexes accurately.

Concepts

  • Ligand

  • Protein

  • Protein Domain

  • Protein Complex

Relationships

  • (Ligand) - has affinity for -> (Protein Domain)

  • (Ligand) - binds to -> (Protein Complex)

  • (Ligand) - inhibitory to -> (Ligand)

  • (Transcription Factor) - binds to -> (Receptor)

Pharmacogenomics

Comparative Toxicogenomics Database (CTD)

CTD is a robust, publicly available database that aims to advance understanding about how environmental exposures affect human health. It provides manually curated information about chemical–gene/protein interactions, chemical–disease and gene–disease relationships. These data are integrated with functional and pathway data to aid in development of hypotheses about the mechanisms underlying environmentally influenced diseases.

Concepts

  • Chemical

  • Gene

  • Disease

  • Protein

  • Phenotype

Relationships

CTD utilizes qualified edges, which are directly compatible with our knowledge graphs. There are 2 main connections: (Gene) -> (Chemical) and (Gene) -> (Disease). Through the use of qualifiers and modifiers, a very descriptive and rich vocabulatory can be created. For the full list see here.

Last updated