Existing metadata services
This is a survey of available metadata standards relevant to the set of services offered by the West-Life VRE. Current workflows in Structural Biology may not be properly described in an unambiguous manner due to the lack of appropriate metadata standards specifying them. For example, while some standards exist for models that are built using structural data, there is no agreed ontology for the primary data processing, either at the level of integrated studies combining different technologies or even at the single technique level. In some cases, relevant ontologies may exist, but not be applied in the structural biology context. In other cases, relevant ontologies are still under development, for example for describing complexes and their components.
Schemata for expressing metadata:
- PROV-O http://www.citeulike.org/group/16801/article/13938133
- EMDB data model
Services for saving and reading metadata:
Describing a recombinant protein sample involves describing the solution conditions (or crystallization conditions), purification method, expression method, and DNA construct. The mmCIF standard includes vocabulary for some of this, and is extensible. But there is a long way to go to record all the relevant information along the full chain of custody.
Progress in conversion from PDB format to mmCIF
PDBx/mmCIF became the standard archive format for the Protein Data Bank in 2014. Coordinate data for large structures (containing >62 chains and/or 99999 ATOM records) cannot be held in a single file of the old PDB format. Nevertheless, many programs still only support PDB format for coordinate data. The table below shows the current situation for WestLife services. Note that some services use mmCIF for reflection data or for a chemical description of ligands, but this is not covered by the table below.
An extension to HADDOCK to create mmCIF outputs is planned.
Note on CCP4: Although the CCP4 coordinate library MMDB has handled mmCIF for many years, this functionality is not widely used by CCP4 applications and services. There is a current collaboration between CCP4 and Global Phasing to modernise the MMDB library, and increase its usage. Refmac provides mmCIF output files, so CCP4 pipelines can easily change to outputting mmCIF.
The PDBj provides an online conversion tool.
|Portal name||Partner||reads .pdb?||reads .mmcif?||writes .pdb?||writes .mmcif?|
|SCIPIO WEB TOOLS||CSIC||Yes||No||No||No|
|ARP/wARP||EMBL||yes||no (*except ligands*)||yes||no|
|ViCi||EMBL||yes, and SDF||no||yes, and SDF||no|
Services marked with '*' have been converted to mmCIF with effort from West-Life.