This project is funded by:
Integrating genomic data across species boundaries is critical to the successful exploitation of previous investment in this area. Systematic attempts to do this have thus far carried a single species focus e.g. annotating the genome of one species using functional data from a second. Due to the multiple potential views that could be applied to the combined data set, a generalised ‘warehousing’ approach will not succeed.

We will develop a new GRID-based system to capture the details of relationships between genomic data either within or across species in a way that will enable complex ad-hoc queries to be run and demonstrate that the underlying raw data can be combined to draw maximum benefit from those data for all genomic communities.


  • To define controlled vocabularies describing:
    • Evolutionary relationships
    • Containment relationships
    • Nomenclature relationships relevant to comparative genomics
  • To develop drop-in wrappers for primary and comparative data sources, across a number of animal, plant and microbial species.
  • To implement a Web/GRID middleware layer that will support operations over the wrapped databases including the integration of data by reference to controlled vocabularies.
  • To demonstrate practical applications based on those web services
    • To address biologically-relevant questions e.g. to assist in identifying candidate genes underlying QTL in farm animals or crop plant species
    • To use existing comparative genomics knowledge to infer further comparative observations and stimulate hypothesis-driven experiments.