Existing metadata services

From West-Life
Jump to: navigation, search

This is a survey of available metadata standards relevant to the set of services offered by the West-Life VRE. Current workflows in Structural Biology may not be properly described in an unambiguous manner due to the lack of appropriate metadata standards specifying them. For example, while some standards exist for models that are built using structural data, there is no agreed ontology for the primary data processing, either at the level of integrated studies combining different technologies or even at the single technique level. In some cases, relevant ontologies may exist, but not be applied in the structural biology context. In other cases, relevant ontologies are still under development, for example for describing complexes and their components.

Schemata for expressing metadata:

Services for saving and reading metadata:

Describing a recombinant protein sample involves describing the solution conditions (or crystallization conditions), purification method, expression method, and DNA construct. The mmCIF standard includes vocabulary for some of this, and is extensible. But there is a long way to go to record all the relevant information along the full chain of custody.

Progress in conversion from PDB format to mmCIF

PDBx/mmCIF became the standard archive format for the Protein Data Bank in 2014. Coordinate data for large structures (containing >62 chains and/or 99999 ATOM records) cannot be held in a single file of the old PDB format. Nevertheless, many programs still only support PDB format for coordinate data. The table below shows the current situation for WestLife services. Note that some services use mmCIF for reflection data or for a chemical description of ligands, but this is not covered by the table below.

An extension to HADDOCK to create mmCIF outputs is planned.

Note on CCP4: Although the CCP4 coordinate library MMDB has handled mmCIF for many years, this functionality is not widely used by CCP4 applications and services. There is a current collaboration between CCP4 and Global Phasing to modernise the MMDB library, and increase its usage. Refmac provides mmCIF output files, so CCP4 pipelines can easily change to outputting mmCIF.

The PDBj provides an online conversion tool.

Portal name Partner reads .pdb? reads .mmcif? writes .pdb? writes .mmcif?
AMPS NMR CIRMPP yes no yes no
FANTEN CIRMPP yes no yes no
XPLOR CIRMPP yes no yes no
CCP4-Ample STFC yes yes yes no
CCP4-Balbes STFC n/a n/a yes no
CCP4-Crank2 STFC n/a n/a n/a n/a
CCP4-MrBump STFC n/a n/a yes no
CCP4-Shelx STFC n/a n/a n/a n/a
CCP4-Zanuda STFC yes no yes no
CS-ROSETTA3 UU n/a n/a yes no
GROMACS UU yes no yes no
HADDOCK UU yes no yes no
UNIO UU yes no yes no
PRODIGY UU yes yes n/a n/a
DISVIS UU yes yes n/a n/a
POWERFIT UU yes yes yes no
PDB-REDO NKI Yes *Yes* Yes *Yes*
ARP/wARP EMBL yes no (*except ligands*) yes no
DipCheck EMBL yes *yes* yes no
ViCi EMBL yes, and SDF no yes, and SDF no
AutoRickshaw EMBL yes no yes no

Services marked with '*' have been converted to mmCIF with effort from West-Life.