Yuji Mo, Catherine Anderson, and Stephen D. Scott. A Study of Correlations Between the Definition and Application of the Gene Ontology. In Proceedings of BIOCOMP '11: The 2011 International Conference on Bioinformatics and Computational Biology, to appear. Las Vegas, Nevada, July 2011.
Abstract
406Kb PDF


Abstract

When using the Gene Ontology (GO), nucleotide and amino acid sequences are annotated by terms in a structured and controlled vocabulary organized into a relational graph. The usage of the vocabulary (GO terms) in the annotation of these sequences may diverge from the relations defined in the ontology. We measure the consistency of the use of GO terms by comparing GO's defined structure to the terms' application. To do this, we first use synthetic data with different characteristics to understand how these characteristics influence the correlation values determined by various similarity measures. Using these results as a baseline, we found that the correlation between GO's definition and its application to real data is relatively low, suggesting that GO annotations might not be applied in a manner consistent with its definition. In contrast, we found a sub-ontology of GO that correlates well with its usage in UniProtKB.

Keywords: Gene Ontology, semantic similarity, Kendall's tau coefficient


To Stephen D. Scott's home page


Last modified 18 May 2011.