Expression call download file documentation

Bgee provides calls of baseline presence/absence of expression, and of differential over-/under-expression, either for single species, or compared between species (orthologous genes in homologous organs). This documentation describes the format of these download files.

Single-species download files

Jump to:

Presence/absence of expression

Bgee provides calls of presence/absence of expression. Each call corresponds to a unique combination of a gene, an anatomical entity, and a life stage, with reported presence or absence of expression. Life stages describe development and aging. Only "normal" expression is considered in Bgee (i.e., no treatment, no disease, no gene knock-out, etc.). Bgee collects data from different types, from different studies, in different organisms, and provides a summary from all these data as unique calls gene - anatomical entity - developmental stage, with confidence information, notably taking into account potential conflicts.

Calls of presence/absence of expression are very similar to the data that can be reported using in situ hybridization methods; Bgee applies dedicated statistical analyses to generate such calls from EST, Affymetrix, and RNA-Seq data, with confidence information, and also collects in situ hybridization calls from model organism databases. This offers the possibility to aggregate and compare these calls of presence/absence of expression between different experiments, different data types, and different species, and to benefit from both the high anatomy coverage provided by low-throughput methods, and the high genomic coverage provided by high-throughput methods.

After presence/absence calls are generated from the raw data, they are propagated using anatomical and life stage ontologies:

  • calls of expression are propagated to parent anatomical entities and parent developmental stages. For instance, if gene A is expressed in midbrain at young adult stage, it will also be considered as expressed in brain at adult stage.
  • calls of absence of expression are propagated to child anatomical entities (and not to child developmental stages). For instance, if gene A is reported as not expressed in the brain at young adult stage, it will also be considered as not expressed in the midbrain at young adult stage. This is only permitted when it does not generate any contradiction with expression calls from the same data type (for instance, no contradiction permitted of reported absence of expression by RNA-Seq, with report of expression by RNA-Seq for the same gene, in the same anatomical entity and developmental stage, or any child anatomical entity or child developmental stage).

Call propagation allows a complete integration of the data, even if provided at different anatomical or developmental levels. For instance: if gene A is reported to be expressed in the midbrain dura mater at young adult stage; gene B is reported to be expressed in the midbrain pia mater at late adult stage; and gene C has an absence of expression reported in the brain at adult stage; it is then possible to retrieve that, in the midbrain at adult stage, gene A and B are both expressed, while gene C is not, thanks to call propagation.

Presence/absence calls are then filtered and presented differently depending on whether a simple file, or a complete file is used. Notably: simple files aim at providing summarized information over all data types, and only in anatomical entities and developmental stages actually used in experimental data; complete files aim at reporting all information, allowing for instance to retrieve the contribution of each data type to a call, in all possible anatomical entities and developmental stages.

Jump to format description for:

Simple file

In simple files, propagated presence/absence expression calls are provided, but only calls in conditions of anatomical entity/developmental stage actually used in experimental data are displayed (no calls generated from propagation only).

Format description for single species simple expression file
ColumnContentExample
1Gene IDFBgn0005427
2Gene nameewg
3Anatomical entity IDFBbt:00003404
4Anatomical entity namemesothoracic extracoxal depressor muscle 66 (Drosophila)
5Developmental stage IDFBdv:00005348
6Developmental stage nameprepupal stage P4(ii) (Drosophila)
7Expressionpresent
8Call qualityhigh quality
Example lines for single species simple expression file
Gene IDGene nameAnatomical entity IDAnatomical entity nameDevelopmental stage IDDevelopmental stage nameExpressionCall quality
FBgn0005533RpS17UBERON:0015230dorsal vessel heartFBdv:00007124day 49 of adulthood (Drosophila)presenthigh quality
FBgn0005536MbsFBbt:00003023adult abdomen (Drosophila)UBERON:0000066fully formed stagepresentpoor quality
FBgn0005558eyFBbt:00001684embryonic/larval hemocyte (Drosophila)FBdv:00005339third instar larval stage (Drosophila)absenthigh quality
Gene ID (column 1)

Unique identifier of gene from Ensembl.

Please note that for P. paniscus (bonobo) we use P. troglodytes genome (chimpanzee), and that for P. pygmaeus (Bornean orangutan) we use P. abelii genome (Sumatran orangutan). Only for those species (bonobo and Bornean orangutan), we modify the Ensembl gene IDs, to ensure that we provide unique gene identifiers over all species. It is therefore necessary, to obtain correct Ensembl gene IDs for those species, to replace gene ID prefix 'PPAG' with 'ENSPTRG', and 'PPYG' prefix with 'ENSPPYG'.

Gene name (column 2)

Name of the gene defined by Gene ID (column 1)

Anatomical entity ID (column 3)

Unique identifier of the anatomical entity, from the Uberon ontology.

Anatomical entity name (column 4)

Name of the anatomical entity defined by Anatomical entity ID (column 3)

Developmental stage ID (column 5)

Unique identifier of the developmental stage, from the Uberon ontology.

Developmental stage name (column 6)

Name of the developmental stage defined by Developmental stage ID (column 4)

Expression (column 7)

Call generated from all data types for Gene ID (column 1), in Anatomical entity ID (column 3), at Developmental stage ID (column 5). One of:

  • present: report of presence of expression, from Bgee statistical tests and/or from in situ data sources. See Call quality (column 8) for associated quality level.
  • absent: report of absence of expression, from Bgee statistical tests and/or from in situ data sources. In Bgee, calls of absence of expression are always discarded if there exists a contradicting call of expression, from the same data type and for the same gene, in the same anatomical entity and developmental stage, or in a child entity or child developmental stage. See Call quality (column 8) for associated quality level.
  • low ambiguity: there exists a call of expression generated from a data type, but there exists a call of absence of expression generated from another data type for the same gene in a parent anatomical entity at the same developmental stage. For instance, gene A is reported to be expressed in the midbrain at young adult stage from Affymetrix data, but is reported to be not expressed in the brain at young adult stage from RNA-Seq data.
  • high ambiguity: there exists a call of expression generated from a data type, but there exists a call of absence of expression generated from another data type for the same gene, anatomical entity and developmental stage. For instance, gene A is reported to be expressed in the midbrain at young adult stage from Affymetrix data, but is reported to be not expressed in the midbrain at young adult stage from RNA-Seq data.
Call quality (column 8)

Quality associated to the call in column Expression (column7). One of:

  • high quality:
    • In case of report of expression, expression reported as high quality from Bgee statistical tests and/or from in situ data sources, with no contradicting call of absence of expression for same gene, in same anatomical entity and developmental stage (call generated either from multiple congruent data, or from single data).
    • In case of report of absence of expression, call reported as high quality either from Bgee statistical tests and/or from in situ data sources. In Bgee, calls of absence of expression are always discarded if there exists a contradicting call of expression, from the same data type and for the same gene, in the same anatomical entity and developmental stage, or in a child entity or child developmental stage. This is why they are always considered of high quality.
  • poor quality: in case of report of expression, expression reported as low quality from Bgee statistical tests and/or from in situ data sources, or because there exists a conflict of presence/absence of expression for the same gene, anatomical entity and developmental stage, from different data of a same type (conflicts between different data types are treated differently, see ambiguity states in column Expression ).
  • NA: when the call in column Expression is ambiguous.

Back to presence/absence of expression menu

Complete file

The differences between simple and complete files are that, in complete files:

  • details of expression status generated from each data type are provided.
  • all calls are provided, propagated to all possible anatomical entities and developmental stages, including in conditions not annotated in experimental data (calls generated from propagation only).
  • a column allows to determine whether a call was generated from propagation only, or whether the anatomical entity/developmental stage was actually seen in experimental data (such a call would then also be present in simple file).
Format description for single species complete expression file
ColumnContentExample
1Gene IDENSDARG00000070769
2Gene namefoxg1a
3Anatomical entity IDUBERON:0000955
4Anatomical entity namebrain
5Developmental stage IDUBERON:0000113
6Developmental stage namepost-juvenile adult stage
7Expressionpresent
8Call qualityhigh quality
9Including observed datayes
10Affymetrix datapresent
11Affymetrix call qualityhigh quality
12Including Affymetrix observed datayes
13EST datapresent
14EST call qualitypoor quality
15Including EST observed datayes
16In situ datapresent
17In situ call qualityhigh quality
18Including in situ observed datayes
19RNA-Seq datano data
20RNA-Seq call qualityno data
21Including RNA-Seq observed datano
Example lines for single species complete expression file
Gene IDGene nameAnatomical entity IDAnatomical entity nameDevelopmental stage IDDevelopmental stage nameExpressionCall qualityIncluding observed dataAffymetrix dataAffymetrix call qualityIncluding Affymetrix observed dataEST dataEST call qualityIncluding EST observed dataIn situ dataIn situ call qualityIncluding in situ observed dataRNA-Seq dataRNA-Seq call qualityIncluding RNA-Seq observed data
ENSDARG00000000002ccdc80UBERON:0000965lens of camera-type eyeZFS:0000033Hatching:Long-pec (Danio)presenthigh qualityyesno datano datanono datano datanopresenthigh qualityyesno datano datano
ENSDARG00000000175hoxb2aUBERON:0004734gastrulaZFS:0000017Gastrula:50%-epiboly (Danio)absenthigh qualityyesabsenthigh qualitynono datano datanoabsenthigh qualityyesno datano datano
ENSDARG00000000241slc40a1UBERON:0000922embryoZFS:0000019Gastrula:Shield (Danio)low ambiguityNAnoabsenthigh qualitynono datano datanopresenthigh qualitynono datano datano
Gene ID (column 1)

Unique identifier of gene from Ensembl.

Please note that for P. paniscus (bonobo) we use P. troglodytes genome (chimpanzee), and that for P. pygmaeus (Bornean orangutan) we use P. abelii genome (Sumatran orangutan). Only for those species (bonobo and Bornean orangutan), we modify the Ensembl gene IDs, to ensure that we provide unique gene identifiers over all species. It is therefore necessary, to obtain correct Ensembl gene IDs for those species, to replace gene ID prefix 'PPAG' with 'ENSPTRG', and 'PPYG' prefix with 'ENSPPYG'.

Gene name (column 2)

Name of the gene defined by Gene ID (column 1)

Anatomical entity ID (column 3)

Unique identifier of the anatomical entity, from the Uberon ontology.

Anatomical entity name (column 4)

Name of the anatomical entity defined by Anatomical entity ID (column 3)

Developmental stage ID (column 5)

Unique identifier of the developmental stage, from the Uberon ontology.

Developmental stage name (column 6)

Name of the developmental stage defined by Developmental stage ID (column 5)

Expression (column 7)

Call generated from all data types for Gene ID (column 1), in Anatomical entity ID (column 3), at Developmental stage ID (column 5). One of:

  • present: report of presence of expression, from Bgee statistical tests and/or from in situ data sources. See Call quality (column 8) for associated quality level.
  • absent: report of absence of expression, from Bgee statistical tests and/or from in situ data sources. In Bgee, calls of absence of expression are always discarded if there exists a contradicting call of expression, from the same data type and for the same gene, in the same anatomical entity and developmental stage, or in a child entity or child developmental stage. See Call quality (column 8) for associated quality level.
  • low ambiguity: there exists a call of expression generated from a data type, but there exists a call of absence of expression generated from another data type for the same gene in a parent anatomical entity at the same developmental stage. For instance, gene A is reported to be expressed in the midbrain at young adult stage from Affymetrix data, but is reported to be not expressed in the brain at young adult stage from RNA-Seq data.
  • high ambiguity: there exists a call of expression generated from a data type, but there exists a call of absence of expression generated from another data type for the same gene, anatomical entity and developmental stage. For instance, gene A is reported to be expressed in the midbrain at young adult stage from Affymetrix data, but is reported to be not expressed in the midbrain at young adult stage from RNA-Seq data.
Call quality (column 8)

Quality associated to the call in column Expression (column7). One of:

  • high quality:
    • In case of report of expression, expression reported as high quality from Bgee statistical tests and/or from in situ data sources, with no contradicting call of absence of expression for same gene, in same anatomical entity and developmental stage (call generated either from multiple congruent data, or from single data).
    • In case of report of absence of expression, call reported as high quality either from Bgee statistical tests and/or from in situ data sources. In Bgee, calls of absence of expression are always discarded if there exists a contradicting call of expression, from the same data type and for the same gene, in the same anatomical entity and developmental stage, or in a child entity or child developmental stage. This is why they are always considered of high quality.
  • poor quality: in case of report of expression, expression reported as low quality from Bgee statistical tests and/or from in situ data sources, or because there exists a conflict of presence/absence of expression for the same gene, anatomical entity and developmental stage, from different data of a same type (conflicts between different data types are treated differently, see ambiguity states in column Expression ).
  • NA: when the call in column Expression is ambiguous.
Including observed data (column 9)

Values permitted: yes and no.

Defines whether a call was generated from propagation only, or whether this call in this anatomical entity/developmental stage condition was actually seen in experimental data (in which case, the call will also be present in the expression simple file).

In this column, the information is provided by considering all data types together.

Affymetrix data (column 10)

Call generated by Affymetrix data for Gene ID (column 1), in Anatomical entity ID (column 3), at Developmental stage ID (column 5). One of:

  • present: report of presence of expression from Bgee statistical tests. See Affymetrix call quality (column 11) for associated quality level.
  • absent: report of absence of expression from Bgee statistical tests, with no contradicting call of presence of expression generated by other Affymetrix probesets or chips for the same gene, in the same anatomical entity and developmental stage, or in a child entity or child developmental stage.
  • no data: no Affymetrix data available for this gene/anatomical entity/developmental stage (data either not available, or discarded by Bgee quality controls).
Affymetrix call quality (column 11)

Quality associated to the call in column Affymetrix data (column 10). One of:

  • high quality:
    • In case of report of expression, expression reported as high quality from Bgee statistical tests, with no contradicting call of absence of expression for same gene, in same anatomical entity and developmental stage, that would have been generated by other Affymetrix probesets or chips (meaning that the call was either generated from multiple congruent data, or from a single probeset/chip).
    • In case of report of absence of expression, call reported as high quality from Bgee statistical tests, with no contradicting call of presence of expression generated by other Affymetrix probesets or chips for the same gene, in the same anatomical entity and developmental stage, or in a child entity or child developmental stage.
  • poor quality: in case of report of expression, expression reported as low quality either from Bgee statistical tests, or because there exists a conflict of presence/absence of expression for the same gene, anatomical entity and developmental stage, generated from other Affymetrix probesets/chips.
  • no data: no Affymetrix data available for this gene/anatomical entity/developmental stage (data either not available, or discarded by Bgee quality controls).
Including Affymetrix observed data (column 12)

Values permitted: yes and no.

Defines whether a call was generated from propagation only, or whether this call in this anatomical entity/developmental stage condition was actually seen in experimental data (in which case, the call will also be present in the expression simple file).

In this column, the information is provided by solely considering Affymetrix data.

EST data (column 13)

Call generated by EST data for Gene ID (column 1), in Anatomical entity ID (column 3), at Developmental stage ID (column 5). Note that EST data are not used to produce calls of absence of expression. One of:

  • present: expression reported from Bgee statistical tests. See EST call quality (column 14) for associated quality level.
  • no data: no EST data available for this gene/anatomical entity/developmental stage (data either not available, or discarded by Bgee quality controls).
EST call quality (column 14)

Quality associated to the call in column EST data (column 13). One of:

  • high quality: expression reported as high quality from Bgee statistical tests.
  • poor quality: expression reported as poor quality from Bgee statistical tests.
  • no data: no EST data available for this gene/anatomical entity/developmental stage (data either not available, or discarded by Bgee quality controls).
Including EST observed data (column 15)

Values permitted: yes and no.

Defines whether a call was generated from propagation only, or whether this call in this anatomical entity/developmental stage condition was actually seen in experimental data (in which case, the call will also be present in the expression simple file).

In this column, the information is provided by solely considering EST data.

In situ data (column 16)

Call generated by in situ data for Gene ID (column 1), in Anatomical entity ID (column 3), at Developmental stage ID (column 5). One of:

  • present: report of presence of expression from in situ data sources. See In situ call quality (column 17) for associated quality level.
  • absent: report of absence of expression from in situ data sources, with no contradicting call of presence of expression generated by other in situ hybridization evidence lines for the same gene, in the same anatomical entity and developmental stage, or in a child entity or child developmental stage.
  • no data: no in situ data available for this gene/anatomical entity/developmental stage (data either not available, or discarded by Bgee quality controls).
In situ call quality (column 17)

Quality associated to the call in column In situ data (column 16). One of:

  • high quality:
    • In case of report of expression, expression reported as high quality from in situ data sources, with no contradicting call of absence of expression for same gene, in same anatomical entity and developmental stage (meaning that the call was either generated from multiple congruent in situ hybridization evidence lines, or from a single hybridization).
    • In case of report of absence of expression, call reported as high quality from in situ data sources, with no contradicting call of presence of expression generated by other in situ hybridization evidence lines for the same gene, in the same anatomical entity and developmental stage, or in a child entity or child developmental stage.
  • poor quality: in case of report of expression, expression reported as low quality either from in situ data sources, or because there exists a conflict of presence/absence of expression for the same gene, anatomical entity and developmental stage, generated from different in situ hybridization evidence lines.
  • no data: no in situ data available for this gene/anatomical entity/developmental stage (data either not available, or discarded by Bgee quality controls).
Including in situ observed data (column 18)

Values permitted: yes and no.

Defines whether a call was generated from propagation only, or whether this call in this anatomical entity/developmental stage condition was actually seen in experimental data (in which case, the call will also be present in the expression simple file).

In this column, the information is provided by solely considering in situ data.

RNA-Seq data (column 19)

Call generated by RNA-Seq data for Gene ID (column 1), in Anatomical entity ID (column 3), at Developmental stage ID (column 5). One of:

  • present: report of presence of expression from Bgee statistical tests. See RNA-Seq call quality (column 20) for associated quality level.
  • absent: report of absence of expression from Bgee statistical tests, with no contradicting call of presence of expression generated by other RNA-Seq libraries for the same gene, in the same anatomical entity and developmental stage, or in a child entity or child developmental stage.
  • no data: no RNA-Seq data available for this gene/anatomical entity/developmental stage (data either not available, or discarded by Bgee quality controls).
RNA-Seq call quality (column 20)

Quality associated to the call in column RNA-Seq data (column 19). One of:

  • high quality:
    • In case of report of expression, expression reported as high quality from Bgee statistical tests, with no contradicting call of absence of expression for same gene, in same anatomical entity and developmental stage, that would have been generated from other RNA-Seq libraries (meaning that the call was either generated from several libraries providing congruent results, or from a single library).
    • In case of report of absence of expression, call reported as high quality from Bgee statistical tests, with no contradicting call of presence of expression generated by other RNA-Seq libraries for the same gene, in the same anatomical entity and developmental stage, or in a child entity or child developmental stage.
  • poor quality: in case of report of expression, expression reported as low quality either from Bgee statistical tests, or because there exists a conflict of presence/absence of expression for the same gene, anatomical entity and developmental stage, generated from other RNA-Seq libraries.
  • no data: no RNA-Seq data available for this gene/anatomical entity/developmental stage (data either not available, or discarded by Bgee quality controls).
Including RNA-Seq observed data (column 21)

Values permitted: yes and no.

Defines whether a call was generated from propagation only, or whether this call in this anatomical entity/developmental stage condition was actually seen in experimental data (in which case, the call will also be present in the expression simple file).

In this column, the information is provided by solely considering RNA-Seq data.

Back to presence/absence of expression menu

This corresponds to the same expression state summary column as in simple files (column 7 of presence/absence simple file)

Over-/under-expression across anatomy or life stages

Bgee provides calls of over-/under-expression. A call corresponds to a gene, with significant variation of its level of expression, in an anatomical entity during a developmental stage, as compared to, either: i) other anatomical entities at the same (broadly defined) developmental stage (over-/under-expression across anatomy); ii) the same anatomical entity at different (precise) developmental stages (over-/under-expression across life stages). These analyses of differential expression are performed using Affymetrix and RNA-Seq experiments with at least 3 suitable conditions (anatomical entity/developmental stage), and at least 2 replicates for each; as for all data in Bgee, only "normal" expression is considered (i.e., no treatment, no disease, no gene knock-out, etc.).

Bgee runs all possible differential expression analyses for each experiment independently, then collects all results and provides a summary as unique calls gene - anatomical entity - developmental stage, with confidence information, and conflicts within each data type resolved using a voting system weighted by p-values (conflicts between different data types are treated differently). This offers the possibility to aggregate and compare these calls between different experiments, different data types, and different species.

Note that, as opposed to calls of presence/absence of expression, no propagation of differential expression calls is performed using anatomical and life stage ontologies.

Over-/under-expression calls are then filtered and presented differently depending on whether a simple file, or a complete file is used. Notably: simple files aim at providing summarized information over all data types; complete files aim at reporting all information, allowing for instance to retrieve the contribution of each data type to a call, or to retrieve all genes and conditions tested, including genes having no differential expression in these conditions.

Jump to format description for:

Simple file

In simple files, only calls of over-expression and under-expression are provided, summarizing the contribution of each data type to the call.

Format description for single species simple differential expression file
ColumnContentExample
1Gene IDENSG00000000419
2Gene nameDPM1
3Anatomical entity IDUBERON:0009834
4Anatomical entity namedorsolateral prefrontal cortex
5Developmental stage IDHsapDv:0000083
6Developmental stage nameinfant stage (human)
7Differential expressionunder-expression
8Call qualityhigh quality
Example lines for single species simple differential expression file
Gene IDGene nameAnatomical entity IDAnatomical entity nameDevelopmental stage IDDevelopmental stage nameDifferential expressionCall quality
ENSG00000000003TSPAN6UBERON:0000922embryoHsapDv:0000017Carnegie stage 10 (human)over-expressionlow quality
ENSG00000000419DPM1UBERON:0000922embryoHsapDv:0000020Carnegie stage 13 (human)under-expressionlow quality
ENSG00000000457SCYL3UBERON:0000178bloodHsapDv:000009465-79 year-old human stage (human)over-expressionlow quality
Gene ID (column 1)

Unique identifier of gene from Ensembl.

Please note that for P. paniscus (bonobo) we use P. troglodytes genome (chimpanzee), and that for P. pygmaeus (Bornean orangutan) we use P. abelii genome (Sumatran orangutan). Only for those species (bonobo and Bornean orangutan), we modify the Ensembl gene IDs, to ensure that we provide unique gene identifiers over all species. It is therefore necessary, to obtain correct Ensembl gene IDs for those species, to replace gene ID prefix 'PPAG' with 'ENSPTRG', and 'PPYG' prefix with 'ENSPPYG'.

Gene name (column 2)

Name of the gene defined by Gene ID (column 1)

Anatomical entity ID (column 3)

Unique identifier of the anatomical entity, from the Uberon ontology.

Anatomical entity name (column 4)

Name of the anatomical entity defined by Anatomical entity ID (column 3)

Developmental stage ID (column 5)

Unique identifier of the developmental stage, from the Uberon ontology.

Developmental stage name (column 6)

Name of the developmental stage defined by Developmental stage ID (column 5)

Differential expression (column 7)

Call generated from all data types for Gene ID (column 1), in Anatomical entity ID (column 3), at Developmental stage ID (column 5). One of:

  • over-expression: the gene was shown in one or more analyses to have a significant over-expression in this condition, as compared to the expression levels in other conditions of the analyses.
  • under-expression: the gene was shown in one or more analyses to have a significant under-expression in this condition, as compared to the expression levels in other conditions of the analyses.
  • weak ambiguity: there exists a call of over-expression or under-expression generated from a data type, but another data type showed no significant variation of the level of expression of this gene in the same condition; or, a gene was shown to be never expressed in a condition by some analyses of a given data type, but other analyses of different data types produced a call of over-expression or of absence of differential expression for the same gene, in the same condition (note that conflicts where a data type produced an under-expression call in a condition, while another data type showed the same gene to be never expressed in that condition, do not produce a weak ambiguity call, but a call of under-expression low quality).
  • strong ambiguity: there exists a call of over-expression or under-expression generated from a data type, but there exists a call in the opposite direction generated from another data type for the same gene, anatomical entity and developmental stage. For instance, gene A is reported to be over-expressed in the midbrain at young adult stage from Affymetrix data, but is reported to be under-expressed in the midbrain at young adult stage from RNA-Seq data.
Call quality (column 8)

Confidence in the differential expression call provided in Differential expression (column 7). One of:

  • high quality: differential expression reported as high quality, with no contradicting call from same type of analysis (across anatomy/across life stages), for same gene, in same anatomical entity and developmental stage, (call generated either from multiple congruent analyses, or from a single analysis).
  • poor quality: differential expression reported as low quality, or there exists a conflict for the same gene, anatomical entity and developmental stage, from different analyses of a same data type (conflicts between different data types are treated differently). For instance, an analysis showed a gene to be over-expressed in a condition, while another analysis showed the same gene to be under-expressed or not differentially expressed in the same condition. Such conflicts are resolved by a voting system based on the number of conditions compared, weighted by p-value. Note that in one case, this quality level is used to reconcile conflicting calls from different data types: when a data type produced an under-expression call, while a different data type has shown that the same gene was never seen as expressed in the same condition. In that case, the overall summary is under-expression low quality.
  • NA: no quality applicable when ambiguity state in Differential expression (column 7).

Back to over-/under-expression menu

Complete file

The differences between simple and complete files are that, in complete files:

  • details of the contribution of each data type to the final calls are provided, notably with information about best p-values, or number of supporting/conflicting analyses.
  • calls representing absence of differential expression are provided, allowing to determine all genes and conditions tested for differential expression.
Format description for single species complete differential expression file
ColumnContentExample
1Gene IDENSMUSG00000093930
2Gene nameHmgcs1
3Anatomical entity IDUBERON:0002107
4Anatomical entity nameliver
5Developmental stage IDUBERON:0000113
6Developmental stage namepost-juvenile adult stage
7Differential expressionover-expression
8Call qualityhigh quality
9Affymetrix dataover-expression
10Affymetrix call qualitypoor quality
11Affymetrix best supporting p-value0.0035659347
12Affymetrix analysis count supporting Affymetrix call1
13Affymetrix analysis count in conflict with Affymetrix call1
14RNA-Seq dataover-expression
15RNA-Seq call qualityhigh quality
16RNA-Seq best supporting p-value2.96E-8
17RNA-Seq analysis count supporting RNA-Seq call2
18RNA-Seq analysis count in conflict with RNA-Seq call0
Example lines for single species complete differential expression file
Gene IDGene nameAnatomical entity IDAnatomical entity nameDevelopmental stage IDDevelopmental stage nameDifferential expressionCall qualityAffymetrix dataAffymetrix call qualityAffymetrix best supporting p-valueAffymetrix analysis count supporting Affymetrix callAffymetrix analysis count in conflict with Affymetrix callRNA-Seq dataRNA-Seq call qualityRNA-Seq best supporting p-valueRNA-Seq analysis count supporting RNA-Seq callRNA-Seq analysis count in conflict with RNA-Seq call
ENSMUSG00000000001Gnai3UBERON:0000081metanephrosMmusDv:0000027Theiler stage 20 (mouse)no diff expressionhigh qualityno diff expressionhigh quality0.2216658910no datano data1.000
ENSMUSG00000000028Cdc45UBERON:0000992female gonadMmusDv:0000035Theiler stage 26 (mouse)under-expressionpoor qualityunder-expressionpoor quality6.386149E-411no datano data1.000
ENSMUSG00000000031H19UBERON:0002037cerebellumMmusDv:0000036Theiler stage 27 (mouse)over-expressionhigh qualityover-expressionhigh quality1.2336E-620no datano data1.000
Gene ID (column 1)

Unique identifier of gene from Ensembl.

Please note that for P. paniscus (bonobo) we use P. troglodytes genome (chimpanzee), and that for P. pygmaeus (Bornean orangutan) we use P. abelii genome (Sumatran orangutan). Only for those species (bonobo and Bornean orangutan), we modify the Ensembl gene IDs, to ensure that we provide unique gene identifiers over all species. It is therefore necessary, to obtain correct Ensembl gene IDs for those species, to replace gene ID prefix 'PPAG' with 'ENSPTRG', and 'PPYG' prefix with 'ENSPPYG'.

Gene name (column 2)

Name of the gene defined by Gene ID (column 1)

Anatomical entity ID (column 3)

Unique identifier of the anatomical entity, from the Uberon ontology.

Anatomical entity name (column 4)

Name of the anatomical entity defined by Anatomical entity ID (column 3)

Developmental stage ID (column 5)

Unique identifier of the developmental stage, from the Uberon ontology.

Developmental stage name (column 6)

Name of the developmental stage defined by Developmental stage ID (column 5)

Differential expression (column 7)

Call generated from all data types for Gene ID (column 1), in Anatomical entity ID (column 5), at Developmental stage ID (column 3). One of:

  • over-expression: the gene was shown in one or more analyses to have a significant over-expression in this condition, as compared to the expression levels in other conditions of the analyses.
  • under-expression: the gene was shown in one or more analyses to have a significant under-expression in this condition, as compared to the expression levels in other conditions of the analyses.
  • no diff expression: the gene was tested for differential expression in this condition, but was never shown to have a significant variation of expression as compared to the other conditions of the analyses.
  • weak ambiguity: there exists a call of over-expression or under-expression generated from a data type, but another data type showed no significant variation of the level of expression of this gene in the same condition; or, a gene was shown to be never expressed in a condition by some analyses of a given data type, but other analyses of different data types produced a call of over-expression or of absence of differential expression for the same gene, in the same condition (note that conflicts where a data type produced an under-expression call in a condition, while another data type showed the same gene to be never expressed in that condition, do not produce a weak ambiguity call, but a call of under-expression low quality).
  • strong ambiguity: there exists a call of over-expression or under-expression generated from a data type, but there exists a call in the opposite direction generated from another data type for the same gene, anatomical entity and developmental stage. For instance, gene A is reported to be over-expressed in the midbrain at young adult stage from Affymetrix data, but is reported to be under-expressed in the midbrain at young adult stage from RNA-Seq data.

This corresponds to the same differential expression state summary column as in simple files (column 7 of over-/under-expression simple file)

Call quality (column 8)

Confidence in the differential expression call provided in Differential expression (column 7). One of:

  • high quality: differential expression reported as high quality, with no contradicting call from same type of analysis (across anatomy/across life stages), for same gene, in same anatomical entity and developmental stage, (call generated either from multiple congruent analyses, or from a single analysis).
  • poor quality: differential expression reported as low quality, or there exists a conflict for the same gene, anatomical entity and developmental stage, from different analyses of a same data type (conflicts between different data types are treated differently). For instance, an analysis showed a gene to be over-expressed in a condition, while another analysis showed the same gene to be under-expressed or not differentially expressed in the same condition. Such conflicts are resolved by a voting system based on the number of conditions compared, weighted by p-value. Note that in one case, this quality level is used to reconcile conflicting calls from different data types: when a data type produced an under-expression call, while a different data type has shown that the same gene was never seen as expressed in the same condition. In that case, the overall summary is under-expression low quality.
  • NA: no quality applicable when ambiguity state in Differential expression (column 7).

This corresponds to the same differential expression quality column as in simple files (column 8 of over-/under-expression simple file)

Affymetrix data (column 9)

Call generated from Affymetrix data for Gene ID (column 1), in Anatomical entity ID (column 5), at Developmental stage ID (column 3). One of:

  • over-expression: the gene was shown in one or more analyses to have a significant over-expression in this condition, as compared to the expression levels in other conditions of the analyses.
  • under-expression: the gene was shown in one or more analyses to have a significant under-expression in this condition, as compared to the expression levels in other conditions of the analyses.
  • no diff expression: the gene was tested for differential expression in this condition, but was never shown to have a significant variation of expression as compared to the other conditions of the analyses.
  • no data: no analyses of this data type compared expression level of this gene in this condition.
Affymetrix call quality (column 10)

Confidence in the differential expression call provided in Affymetrix data (column 9). One of:

  • high quality: differential expression reported as high quality, with no contradicting call from same type of analysis (across anatomy/across life stages), for same gene, in same anatomical entity and developmental stage, (call generated either from multiple congruent analyses, or from a single analysis).
  • poor quality: differential expression reported as low quality, or there exists a conflict for the same gene, anatomical entity and developmental stage, from different analyses of a same data type (conflicts between different data types are treated differently). For instance, an analysis showed a gene to be over-expressed in a condition, while another analysis showed the same gene to be under-expressed or not differentially expressed in the same condition. Such conflicts are resolved by a voting system based on the number of conditions compared, weighted by p-value. Note that in one case, this quality level is used to reconcile conflicting calls from different data types: when a data type produced an under-expression call, while a different data type has shown that the same gene was never seen as expressed in the same condition. In that case, the overall summary is under-expression low quality.
  • no data: no data associated to Affymetrix data (column 9).
Affymetrix best supporting p-value (column 11)

Best p-value from the Affymetrix analyses supporting the Affymetrix call provided in Affymetrix data (column 9). Set to 1.0 if no data available by Affymetrix.

Affymetrix analysis count supporting Affymetrix call (column 12)

Number of Affymetrix analyses supporting the Affymetrix call provided in Affymetrix data (column 9). Set to 0 if no data available by Affymetrix.

Affymetrix analysis count in conflict with Affymetrix call (column 13)

Number of Affymetrix analyses in conflict, generating a call different from the call provided in Affymetrix data (column 9). Set to 0 if no data available by Affymetrix.

RNA-Seq data (column 14)

Call generated from RNA-Seq data for Gene ID (column 1), in Anatomical entity ID (column 5), at Developmental stage ID (column 3). One of:

  • over-expression: the gene was shown in one or more analyses to have a significant over-expression in this condition, as compared to the expression levels in other conditions of the analyses.
  • under-expression: the gene was shown in one or more analyses to have a significant under-expression in this condition, as compared to the expression levels in other conditions of the analyses.
  • no diff expression: the gene was tested for differential expression in this condition, but was never shown to have a significant variation of expression as compared to the other conditions of the analyses.
  • no data: no analyses of this data type compared expression level of this gene in this condition.
RNA-Seq call quality (column 15)

Confidence in the differential expression call provided in RNA-Seq data (column 14). One of:

  • high quality: differential expression reported as high quality, with no contradicting call from same type of analysis (across anatomy/across life stages), for same gene, in same anatomical entity and developmental stage, (call generated either from multiple congruent analyses, or from a single analysis).
  • poor quality: differential expression reported as low quality, or there exists a conflict for the same gene, anatomical entity and developmental stage, from different analyses of a same data type (conflicts between different data types are treated differently). For instance, an analysis showed a gene to be over-expressed in a condition, while another analysis showed the same gene to be under-expressed or not differentially expressed in the same condition. Such conflicts are resolved by a voting system based on the number of conditions compared, weighted by p-value. Note that in one case, this quality level is used to reconcile conflicting calls from different data types: when a data type produced an under-expression call, while a different data type has shown that the same gene was never seen as expressed in the same condition. In that case, the overall summary is under-expression low quality.
  • no data: no data associated to RNA-Seq data (column 14).
RNA-Seq best supporting p-value (column 16)

Best p-value from the RNA-Seq analyses supporting the RNA-Seq call provided in RNA-Seq data (column 14). Set to 1.0 if no data available by RNA-Seq.

RNA-Seq analysis count supporting RNA-Seq call (column 17)

Number of RNA-Seq analyses supporting the RNA-Seq call provided in RNA-Seq data (column 14). Set to 0 if no data available by RNA-Seq.

RNA-Seq analysis count in conflict with RNA-Seq call (column 18)

Number of RNA-Seq analyses in conflict, generating a call different from the call provided in RNA-Seq data (column 14). Set to 0 if no data available by RNA-Seq.

Back to over-/under-expression menu

Multi-species download files

Bgee provides the ability to compare expression data between species, with great anatomical detail, using formal concepts of homology: orthology of genes, homology of anatomical entities. This allows to perform accurate comparisons between species, even for distant species for which the anatomy mapping might not be obvious.

  • homology of anatomical entities: When comparing multiple species, only anatomical entities homologous between all species compared are considered, meaning, only anatomical entities derived from an organ existing before the divergence of the species compared. This requires careful annotations of the homology history of animal anatomy. These annotations are described in a separate project maintained by the Bgee team, see homology annotation project on GitHub.
    In practice, when comparing expression data between several species, the anatomical entities used are those with a homology relation valid for their Least Common Ancestor (LCA), or any of its ancestral taxa. For instance, if comparing data between human and zebrafish, the LCA would be the taxon Euteleostomi; as a result, annotations to this taxon would be used, such as the relation of homology between "tetrapod parietal bone" (UBERON:0000210) and "actinopterygian frontal bone" (UBERON:0004866); but also, annotations to ancestral taxa, such as the annotation stating that "ophthalmic nerve" appeared in the Vertebrata common ancestor; annotations to more recent taxa than the LCA would be discarded, such as the annotation to the "forelimb" structure (UBERON:0002102), homologous in the Tetrapoda lineage.
  • orthology of genes: relations of orthology between genes are retrieved using OMA; when comparing several species, Bgee identifies their Least Common Ancestor (LCA), and retrieve genes that have descended from a single common ancestral gene in that LCA. Relations of orthology between genes are provided in Bgee through hierarchical orthologous groups files.

Jump to:

OMA Hierarchical orthologous groups file

OMA Hierarchical orthologous groups files provide gene orthology relations, by grouping genes that have descended from a single common ancestral gene in the taxon of interest. The targeted taxon is provided in the file name. Orthologous genes are grouped by common OMA IDs, provided in the column OMA ID (column 1, see below).

Format description for OMA Hierarchical orthologous groups file
ColumnContentExample
1OMA ID10
2Gene IDENSG00000105298
3Gene nameCACTIN
Example lines for a OMA Hierarchical orthologous groups file
OMA IDGene IDGene name
98828ENSG00000158473CD1D
98828ENSMUSG00000028076Cd1d1
98828ENSMUSG00000041750Cd1d2
OMA ID (column 1)

Unique identifier of the OMA gene orthology group. Note that these identifiers are not stable between releases, and cannot be used to retrieve data from the OMA browser. They are provided solely to group data from orthologous genes belonging to a same orthology group. Genes member of a OMA gene orthology group can be retrieved through the associated hierarchical orthologous groups file.

Gene ID (column 2)

Unique identifier of gene from Ensembl.

Please note that for P. paniscus (bonobo) we use P. troglodytes genome (chimpanzee), and that for P. pygmaeus (Bornean orangutan) we use P. abelii genome (Sumatran orangutan). Only for those species (bonobo and Bornean orangutan), we modify the Ensembl gene IDs, to ensure that we provide unique gene identifiers over all species. It is therefore necessary, to obtain correct Ensembl gene IDs for those species, to replace gene ID prefix 'PPAG' with 'ENSPTRG', and 'PPYG' prefix with 'ENSPPYG'.

Gene name (column 3)

Name of the gene defined by Gene ID (column 2)

Back to multi-species download files menu

Over-/under-expression across anatomy or life stages in multiple species

Bgee provides calls of over-/under-expression. A call corresponds to a gene, with significant variation of its level of expression, in an anatomical entity during a developmental stage, as compared to, either: i) other anatomical entities at the same (broadly defined) developmental stage (over-/under-expression across anatomy); ii) the same anatomical entity at different (precise) developmental stages (over-/under-expression across life stages). These analyses of differential expression are performed using Affymetrix and RNA-Seq experiments with at least 3 suitable conditions (anatomical entity/developmental stage), and at least 2 replicates for each; as for all data in Bgee, only "normal" expression is considered (i.e., no treatment, no disease, no gene knock-out, etc.).

Bgee runs all possible differential expression analyses for each experiment independently, then collects all results and provides a summary as unique calls gene - anatomical entity - developmental stage, with confidence information, and conflicts within each data type resolved using a voting system weighted by p-values (conflicts between different data types are treated differently). This offers the possibility to aggregate and compare these calls between different experiments, different data types, and different species.

In multi-species files, results are made comparable between orthologous genes, in homologous anatomical entities and comparable developmental stages: only genes sharing a common ancestral gene in the least common ancestor of the species compared are studied, and only in anatomical entities sharing a homology relation between all species compared, with data mapped to broad developmental stages shared across animal kingdom (see use of homology in multi-species files).

Note that, as opposed to calls of presence/absence of expression, no propagation of differential expression calls is performed using anatomical and life stage ontologies.

Over-/under-expression calls are then filtered and presented differently depending on whether a simple file, or a complete file is used. Notably: simple files aim at providing one line per gene orthology group and homologous anatomical entities/developmental stage, and only for anatomical entities with a homology relation defined with good level of confidence. complete files aim at reporting all information, for each gene of the orthology groups, using all available homology relations between anatomical entities, and allowing for instance to retrieve the contribution of each data type to a call, or to retrieve all genes and conditions tested, including genes having no differential expression in these conditions.

Jump to format description for:

Simple file

In simple files, each line provides information for a gene orthology group, in a condition (homologous anatomical entity/comparable developmental stage); columns then provide, for each species, the number of genes over-expressed, under-expressed, not differentially expressed or with inconclusive results, and with no data. This means that the number of columns is variable depending on the number of species compared.

In simple files, only lines with data in at least two species, and at least one over-expression or under-expression call in a species, are provided, and only for anatomical entities with a homology relation defined with a good level of confidence.

Relations of orthology between genes member of a same orthology gene group are provided through the associated hierarchical orthologous groups file.

Format description for multi-species simple differential expression file
ColumnContentCardinalityExample
1OMA ID180
2Anatomical entity IDs1 or greaterUBERON:0001898
3Anatomical entity names1 or greaterhypothalamus
4Developmental stage ID1UBERON:0000113
5Developmental stage name1post-juvenile adult stage
6Over-expressed gene count for species1 (e.g., Over-expressed gene count for Homo sapiens)11
7Under-expressed gene count for species1 (e.g., Under-expressed gene count for Homo sapiens)10
8Not diff. expressed gene count for species1 (e.g., Not diff. expressed gene count for Homo sapiens)10
9NA gene count for species1 (e.g., NA gene count for Homo sapiens)10
10Over-expressed gene count for species2 (e.g., Over-expressed gene count for Mus musculus)11
11Under-expressed gene count for species2 (e.g., Under-expressed gene count for Mus musculus)10
12Not diff. expressed gene count for species2 (e.g., Not diff. expressed gene count for Mus musculus)10
13NA gene count for species2 (e.g., NA gene count for Mus musculus)10
...Over-expressed gene count for speciesXX 1...
......
(species*4 + 6)Gene IDs2 or greaterENSG00000169057|ENSMUSG00000031393
(species*4 + 7)Gene names2 or greaterMECP2|Mecp2
Example lines for multi-species simple differential expression file
OMA IDAnatomical entity IDsAnatomical entity namesDevelopmental stage IDDevelopmental stage nameOver-expressed gene count for Homo sapiensUnder-expressed gene count for Homo sapiensNot diff. expressed gene count for Homo sapiensNA gene count for Homo sapiensOver-expressed gene count for Mus musculusUnder-expressed gene count for Mus musculusNot diff. expressed gene count for Mus musculusNA gene count for Mus musculusGene IDsGene names
93UBERON:0000473testisUBERON:0000113post-juvenile adult stage01000100ENSG00000162512|ENSMUSG00000025743SDC3|Sdc3
93UBERON:0000955brainUBERON:0000113post-juvenile adult stage10001000ENSG00000162512|ENSMUSG00000025743SDC3|Sdc3
93UBERON:0001134skeletal muscle tissueUBERON:0000113post-juvenile adult stage01000100ENSG00000162512|ENSMUSG00000025743SDC3|Sdc3
OMA ID (column 1)

Unique identifier of the OMA gene orthology group. Note that these identifiers are not stable between releases, and cannot be used to retrieve data from the OMA browser. They are provided solely to group data from orthologous genes belonging to a same orthology group. Genes member of a OMA gene orthology group can be retrieved through the associated hierarchical orthologous groups file.

Anatomical entity IDs (column 2)

Unique identifiers of the homologous anatomical entities, from the Uberon ontology. Cardinality 1 or greater. When more than one anatomical entity is used, they are separated with the character |.

In most cases, the cardinality is 1, as most of the homologous anatomical entities compared in different species are not derived enough so that they are described by different anatomical concepts. But the cardinality can sometimes be greater, when homologous anatomical entities are highly derived in the species compared, and represented by distinct anatomical concepts.

For instance, if comparing expression data in human and zebrafish, the anatomical entity "bulbus arteriosus" (UBERON:0004152) would be considered, as it is believed to be homologous in the Euteleostomi lineage; as it is represented by the same anatomical term in both species, the cardinality of the value for this column would be 1. But homology relations between distinct anatomical concepts would also be considered, such as the homology between lung (UBERON:0002048) and swim bladder (UBERON:0006860): these organs are believed to descend from a same common ancestral organ, existing in the ancestor of Gnathostomata, but are now sufficiently derived that they are represented by different anatomical concepts in these species; the cardinality of the value of this column would be 2, and the IDs of these anatomical entities would be separated by the character |, e.g., UBERON:0002048|UBERON:0006860.

Anatomical entity names (column 3)

Names of the anatomical entities defined by Anatomical entity IDs (column 2). Cardinality 1 or greater. When more than one anatomical entity is used, they are separated with the character |. See Anatomical entity IDs column description for more details.

Developmental stage ID (column 4)

Unique identifier of the developmental stage, from the Uberon ontology. For multi-species analyses, only broad developmental stages are used, common to the species being compared.

Developmental stage name (column 5)

Name of the developmental stage defined by Developmental stage ID (column 4)

Over-expressed gene count for speciesXX

Number of genes, members of the OMA orthologous gene group with ID provided in OMA ID (column 1), shown in one or more analyses to have a significant over-expression in this condition (Anatomical entity IDs (column 2), at Developmental stage ID (column 4)), as compared to the expression levels in other conditions of the analyses. This means that there were no conflicts found between results generated from different data types (result generated either from a single data type, or from congruent analyses of different data types). Note that there can still be conflicts between different analyses within a same data type, but such conflicts are resolved by a voting system based on the number of conditions compared, weighted by p-value, in order to produce a single differential expression call, taking into account all analyses from a given data type.

Please note that the list of all genes member of the OMA orthologous gene group with ID provided in OMA ID (column 1) is provided through the hierarchical orthologous groups file.

Under-expressed gene count for speciesXX

Number of genes, members of the OMA orthologous gene group with ID provided in OMA ID (column 1), shown in one or more analyses to have a significant under-expression in this condition (Anatomical entity IDs (column 2), at Developmental stage ID (column 4)), as compared to the expression levels in other conditions of the analyses. This means that there were no conflicts found between results generated from different data types (result generated either from a single data type, or from congruent analyses of different data types). Note that there can still be conflicts between different analyses within a same data type, but such conflicts are resolved by a voting system based on the number of conditions compared, weighted by p-value, in order to produce a single differential expression call, taking into account all analyses from a given data type.

Please note that the list of all genes member of the OMA orthologous gene group with ID provided in OMA ID (column 1) is provided through the hierarchical orthologous groups file.

Not diff. expressed gene count for speciesXX

Number of genes, members of the OMA orthologous gene group with ID provided in OMA ID (column 1), that were tested for differential expression in this condition (Anatomical entity IDs (column 2), at Developmental stage ID (column 4)), but that were never shown to have a significant variation of their level of expression as compared to the other conditions of the analyses, or for which conflicting results were generated from different data types.

Please note that the list of all genes member of the OMA orthologous gene group with ID provided in OMA ID (column 1) is provided through the hierarchical orthologous groups file.

NA gene count for speciesXX

Number of genes, members of the OMA orthologous gene group with ID provided in OMA ID (column 1), that were not tested for differential expression in this condition (Anatomical entity IDs (column 2), at Developmental stage ID (column 4)).

Please note that the list of all genes member of the OMA orthologous gene group with ID provided in OMA ID (column 1) is provided through the hierarchical orthologous groups file.

Gene IDs

IDs of the genes member of the OMA orthologous gene group with ID provided in OMA ID (column 1). Cardinality 2 or greater. IDs are separated with the character |.

This column is provided as additional information, members of OMA orthologous gene groups can be retrieved through the use of the hierarchical orthologous groups file.

Gene names

Name of the genes member of the OMA orthologous gene group with ID provided in OMA ID (column 1). Cardinality 2 or greater. Names are separated with the character |.

This column is provided as additional information, members of OMA orthologous gene groups can be retrieved through the use of the hierarchical orthologous groups file.

Back to over-/under-expression menu

Complete file

In complete files, information for all genes are provided, in all conditions tested, for anatomical entities homologous between all species compared, and comparable broad developmental stages. As opposed to simple multi-species files, all homology relations available for the anatomical entities are considered, even from homology hypotheses with low support; a column allows to retrieve the level of confidence in the homology hypothesis used. Also, the number of columns in complete files is not variable, whatever the number of species compared is.

Relations of orthology between genes can be retrieved through the use of the hierarchical orthologous groups file. This allows notably to detect genes with no data for a condition: if a gene is listed as a member of an orthology group, but there is no call for this gene in a given condition, it means that there is no data available for this gene in this condition.

Format description for multi-species complete differential expression file
ColumnContentCardinalityExample
1OMA ID142865
2Gene ID1ENSMMUG00000012094
3Gene name1RAB17
4Anatomical entity IDs1 or greaterUBERON:0002037
5Anatomical entity names1 or greatercerebellum
6Developmental stage ID1UBERON:0018241
7Developmental stage name1prime adult stage
8Latin species name1Macaca_mulatta
9Differential expression1under-expression
10Call quality1high quality
11Affymetrix data1no data
12Affymetrix call quality1no data
13Affymetrix best supporting p-value11.0
14Affymetrix analysis count supporting Affymetrix call10
15Affymetrix analysis count in conflict with Affymetrix call10
16RNA-Seq data1under-expression
17RNA-Seq call quality1high quality
18RNA-Seq best supporting p-value18.82E-7
19RNA-Seq analysis count supporting RNA-Seq call11
20RNA-Seq analysis count in conflict with RNA-Seq call10
21Anatomy homology CIO ID1CIO:0000003
22Anatomy homology CIO name1high confidence from single evidence
Example lines for multi-species complete differential expression file
OMA IDGene IDGene nameAnatomical entity IDsAnatomical entity namesDevelopmental stage IDDevelopmental stage nameLatin species nameDifferential expressionCall qualityAffymetrix dataAffymetrix call qualityAffymetrix best supporting p-valueAffymetrix analysis count supporting Affymetrix callAffymetrix analysis count in conflict with Affymetrix callRNA-Seq dataRNA-Seq call qualityRNA-Seq best supporting p-valueRNA-Seq analysis count supporting RNA-Seq callRNA-Seq analysis count in conflict with RNA-Seq callAnatomy homology CIO IDAnatomy homology CIO name
59ENSMUSG00000030516Tjp1UBERON:0000948heartUBERON:0018241prime adult stageMus_musculusover-expressionhigh qualityover-expressionhigh quality0.050no datano data1.000CIO:0000004medium confidence from single evidence
59ENSMMUG00000017878Tjp1UBERON:0000948heartUBERON:0018241prime adult stageMacaca_mulattano diff expressionhigh qualityno datano data1.000no diff expressionhigh quality0.623927520CIO:0000004medium confidence from single evidence
59ENSBTAG00000015398ZO1UBERON:0000948heartUBERON:0018241prime adult stageBos_taurusover-expressionhigh qualityno datano data1.000over-expressionhigh quality8.741838E-410CIO:0000004medium confidence from single evidence
OMA ID (column 1)

Unique identifier of the OMA gene orthology group. Note that these identifiers are not stable between releases, and cannot be used to retrieve data from the OMA browser. They are provided solely to group data from orthologous genes belonging to a same orthology group. Genes member of a OMA gene orthology group can be retrieved through the associated hierarchical orthologous groups file.

Gene ID (column 2)

Unique identifier of gene from Ensembl.

Please note that for P. paniscus (bonobo) we use P. troglodytes genome (chimpanzee), and that for P. pygmaeus (Bornean orangutan) we use P. abelii genome (Sumatran orangutan). Only for those species (bonobo and Bornean orangutan), we modify the Ensembl gene IDs, to ensure that we provide unique gene identifiers over all species. It is therefore necessary, to obtain correct Ensembl gene IDs for those species, to replace gene ID prefix 'PPAG' with 'ENSPTRG', and 'PPYG' prefix with 'ENSPPYG'.

Please note that the list of all genes member of the OMA ortholoogous gene group with ID provided in OMA ID (column 1) is provided through the hierarchical orthologous groups file. If a gene listed in this file has no call for the condition Anatomical entity IDs (column 4), at Developmental stage ID (column 6), it means that there is no data available for this gene in this condition.

Gene name (column 3)

Name of the gene defined by Gene ID (column 2)

Anatomical entity IDs (column 4)

Unique identifiers of the homologous anatomical entities, from the Uberon ontology. Cardinality 1 or greater. When more than one anatomical entity is used, they are separated with the character |.

In most cases, the cardinality is 1, as most of the homologous anatomical entities compared in different species are not derived enough so that they are described by different anatomical concepts. But the cardinality can sometimes be greater, when homologous anatomical entities are highly derived in the species compared, and represented by distinct anatomical concepts.

For instance, if comparing expression data in human and zebrafish, the anatomical entity "bulbus arteriosus" (UBERON:0004152) would be considered, as it is believed to be homologous in the Euteleostomi lineage; as it is represented by the same anatomical term in both species, the cardinality of the value for this column would be 1. But homology relations between distinct anatomical concepts would also be considered, such as the homology between lung (UBERON:0002048) and swim bladder (UBERON:0006860): these organs are believed to descend from a same common ancestral organ, existing in the ancestor of Gnathostomata, but are now sufficiently derived that they are represented by different anatomical concepts in these species; the cardinality of the value of this column would be 2, and the IDs of these anatomical entities would be separated by the character |, e.g., UBERON:0002048|UBERON:0006860.

Anatomical entity names (column 5)

Names of the anatomical entities defined by Anatomical entity IDs (column 4). Cardinality 1 or greater. When more than one anatomical entity is used, they are separated with the character |. See Anatomical entity IDs column description for more details.

Developmental stage ID (column 6)

Unique identifier of the developmental stage, from the Uberon ontology. For multi-species analyses, only broad developmental stages are used, common to the species being compared.

Developmental stage name (column 7)

Name of the developmental stage defined by Developmental stage ID (column 6)

Latin species name (column 8)

The latin name of the species which the gene in Gene ID (column 2) belongs to.

Differential expression (column 9)

Call generated from all data types for Gene ID (column 2), in Anatomical entity IDs (column 4), at Developmental stage ID (column 6). One of:

  • over-expression: the gene was shown in one or more analyses to have a significant over-expression in this condition, as compared to the expression levels in other conditions of the analyses.
  • under-expression: the gene was shown in one or more analyses to have a significant under-expression in this condition, as compared to the expression levels in other conditions of the analyses.
  • no diff expression: the gene was tested for differential expression in this condition, but was never shown to have a significant variation of expression as compared to the other conditions of the analyses.
  • weak ambiguity: there exists a call of over-expression or under-expression generated from a data type, but another data type showed no significant variation of the level of expression of this gene in the same condition; or, a gene was shown to be never expressed in a condition by some analyses of a given data type, but other analyses of different data types produced a call of over-expression or of absence of differential expression for the same gene, in the same condition (note that conflicts where a data type produced an under-expression call in a condition, while another data type showed the same gene to be never expressed in that condition, do not produce a weak ambiguity call, but a call of under-expression low quality).
  • strong ambiguity: there exists a call of over-expression or under-expression generated from a data type, but there exists a call in the opposite direction generated from another data type for the same gene, anatomical entity and developmental stage. For instance, gene A is reported to be over-expressed in the midbrain at young adult stage from Affymetrix data, but is reported to be under-expressed in the midbrain at young adult stage from RNA-Seq data.
Call quality (column 10)

Confidence in the differential expression call provided in Differential expression (column 2). One of:

  • high quality: differential expression reported as high quality, with no contradicting call from same type of analysis (across anatomy/across life stages), for same gene, in same anatomical entity and developmental stage, (call generated either from multiple congruent analyses, or from a single analysis).
  • poor quality: differential expression reported as low quality, or there exists a conflict for the same gene, anatomical entity and developmental stage, from different analyses of a same data type (conflicts between different data types are treated differently). For instance, an analysis showed a gene to be over-expressed in a condition, while another analysis showed the same gene to be under-expressed or not differentially expressed in the same condition. Such conflicts are resolved by a voting system based on the number of conditions compared, weighted by p-value. Note that in one case, this quality level is used to reconcile conflicting calls from different data types: when a data type produced an under-expression call, while a different data type has shown that the same gene was never seen as expressed in the same condition. In that case, the overall summary is under-expression low quality.
  • NA: no quality applicable when ambiguity state in Differential expression (column 2).
Affymetrix data (column 11)

Call generated from Affymetrix data for Gene ID (column 2), in Anatomical entity IDs (column 4), at Developmental stage ID (column 6). One of:

  • over-expression: the gene was shown in one or more analyses to have a significant over-expression in this condition, as compared to the expression levels in other conditions of the analyses.
  • under-expression: the gene was shown in one or more analyses to have a significant under-expression in this condition, as compared to the expression levels in other conditions of the analyses.
  • no diff expression: the gene was tested for differential expression in this condition, but was never shown to have a significant variation of expression as compared to the other conditions of the analyses.
  • no data: no analyses of this data type compared expression level of this gene in this condition.
Affymetrix call quality (column 12)

Confidence in the differential expression call provided in Affymetrix data (column 9). One of:

  • high quality: differential expression reported as high quality, with no contradicting call from same type of analysis (across anatomy/across life stages), for same gene, in same anatomical entity and developmental stage, (call generated either from multiple congruent analyses, or from a single analysis).
  • poor quality: differential expression reported as low quality, or there exists a conflict for the same gene, anatomical entity and developmental stage, from different analyses of a same data type (conflicts between different data types are treated differently). For instance, an analysis showed a gene to be over-expressed in a condition, while another analysis showed the same gene to be under-expressed or not differentially expressed in the same condition. Such conflicts are resolved by a voting system based on the number of conditions compared, weighted by p-value. Note that in one case, this quality level is used to reconcile conflicting calls from different data types: when a data type produced an under-expression call, while a different data type has shown that the same gene was never seen as expressed in the same condition. In that case, the overall summary is under-expression low quality.
  • no data: no data associated to Affymetrix data (column 9).
Affymetrix best supporting p-value (column 13)

Best p-value from the Affymetrix analyses supporting the Affymetrix call provided in Affymetrix data (column 11). Set to 1.0 if no data available by Affymetrix.

Affymetrix analysis count supporting Affymetrix call (column 14)

Number of Affymetrix analyses supporting the Affymetrix call provided in Affymetrix data (column 11). Set to 0 if no data available by Affymetrix.

Affymetrix analysis count in conflict with Affymetrix call (column 15)

Number of Affymetrix analyses in conflict, generating a call different from the call provided in Affymetrix data (column 11). Set to 0 if no data available by Affymetrix.

RNA-Seq data (column 16)

Call generated from RNA-Seq data for Gene ID (column 2), in Anatomical entity IDs (column 4), at Developmental stage ID (column 6). One of:

  • over-expression: the gene was shown in one or more analyses to have a significant over-expression in this condition, as compared to the expression levels in other conditions of the analyses.
  • under-expression: the gene was shown in one or more analyses to have a significant under-expression in this condition, as compared to the expression levels in other conditions of the analyses.
  • no diff expression: the gene was tested for differential expression in this condition, but was never shown to have a significant variation of expression as compared to the other conditions of the analyses.
  • no data: no analyses of this data type compared expression level of this gene in this condition.
RNA-Seq call quality (column 17)

Confidence in the differential expression call provided in RNA-Seq data (column 16). One of:

  • high quality: differential expression reported as high quality, with no contradicting call from same type of analysis (across anatomy/across life stages), for same gene, in same anatomical entity and developmental stage, (call generated either from multiple congruent analyses, or from a single analysis).
  • poor quality: differential expression reported as low quality, or there exists a conflict for the same gene, anatomical entity and developmental stage, from different analyses of a same data type (conflicts between different data types are treated differently). For instance, an analysis showed a gene to be over-expressed in a condition, while another analysis showed the same gene to be under-expressed or not differentially expressed in the same condition. Such conflicts are resolved by a voting system based on the number of conditions compared, weighted by p-value. Note that in one case, this quality level is used to reconcile conflicting calls from different data types: when a data type produced an under-expression call, while a different data type has shown that the same gene was never seen as expressed in the same condition. In that case, the overall summary is under-expression low quality.
  • no data: no data associated to RNA-Seq data (column 16).
RNA-Seq best supporting p-value (column 18)

Best p-value from the RNA-Seq analyses supporting the RNA-Seq call provided in RNA-Seq data (column 16). Set to 1.0 if no data available by RNA-Seq.

RNA-Seq analysis count supporting RNA-Seq call (column 19)

Number of RNA-Seq analyses supporting the RNA-Seq call provided in RNA-Seq data (column 16). Set to 0 if no data available by RNA-Seq.

RNA-Seq analysis count in conflict with RNA-Seq call (column 20)

Number of RNA-Seq analyses in conflict, generating a call different from the call provided in RNA-Seq data (column 16). Set to 0 if no data available by RNA-Seq.

Anatomy homology CIO ID (column 21)

Unique identifier from the Confidence Information Ontology, providing the confidence in the annotation of homology of anatomical entities defined in Anatomical entity IDs (column 4). This ontology is an attempt to provide a mean to capture the confidence in annotations. See project home for more details.

Anatomy homology CIO name (column 22)

Name of the CIO term defined by Anatomy homology CIO ID (column 21)

Back to over-/under-expression menu