Gene Ontology Analysis helps to annotate gene products with a controlled and hierachical set of vocabularies.
About Gene Ontology
Gene Ontology is a Bioinformatics initiative to standardize the representation of the attributes of genes and gene products across all species. Gene products refer to all possible product of gene including but not limited to transcript, proteins, RNA splice variant or modified protein. A set of controlled vocabulary is utilized for annotation. Annotated data are assimilated and disseminated across different databases for easy access.
Standardize description of the functions of gene and gene product would facilitate computer-driven information retrieval and generation of new knowledge from omics scale data. The term, ontology, if interpreted in the field of computer science and bioinformatics, are “specification of a relational vocabulary”. In order word, ontologies are vocabularies of terms used in a specific domain, definition of those term and the defined relationships between the terms. Gene ontology have a hierarchical relationship and thus support attribution and queries to be made at different level.
The Gene Ontology Consortium is the one who build GOs and support their use. The focus of GO project have three-fold. The first goal is to compile and provide the GOs: structured vocabularies describing domains of molecular biology. The current covered domain are Molecular Function, Biological Process and Cellular Component. They are considered to be orthogonal to each other. Second, the project support the use of GOs for the annotation of gene products. Third, the gene product-to-GO annotation sets are make open to the public for access through the GO database or other web resources e.g. Ensemble. Thus, the community can access standardized annotation of gene-product across multiple species. Moreover, the GO projects focus on the development of vocabularies to describe attributes of biological object but not the naming of the objects themselves. Two gene products may be associated to the same set of GOs because they involve in the same molecular phenomena.
In discovery proteomics, MS2 spectra generated by mass spectrometer can be used for the identification of peptides that are present in a sample. With the use of proper scoring algorithm, peptides derived from the same protein could be grouped. Probability of correctly identifying the protein could then be assigned. After the application of a cutoff value, usually a list of hundreds or up to a thousand of proteins are resulted. This approach is more accurately termed bottom-up approach in contrast to top-down approach. Within the dataset, sometime it may be necessary to find out what kind of proteins are enriched. For example, if the set is enriched with transcription factors, chaperone or DNA-repairing factors. To answer this question, we will need to perform GO analysis.
As the dataset of proteins identified from MS experiment is large, it is quite impossible to perform online query to obtain the GO annotation sets for each gene-product. Our GO analysis service, is based on an in-house tool backed by a monthly updated GO database to perform the association. Once the association completed, terms that are enriched in the dataset are identified by specific algorithm. Alternatively, gene products that are associate with specific terms like transcription factor complex (GO:0005667) can be sorted out deliberately. With GO analysis, one can answer question like: What type of proteins are enriched in a sample? Are particular type of proteins present in my sample?
Annotation of proteins with a list of GO terms
Estimation of fisher's exact test p-value which gives a probability of identifying the set of proteins given a term by chance alone
Calculation of relative enrichment factor (E-value) which is the frequency of a GO term association with a group of proteins in a subset divided by that in a whole list
Gene Ontology Analysis Service Inquiry