D6.2:Metadata design

From West-Life
Jump to: navigation, search

Metadata analysis and design

The metadata of dataset may contain information about

  • the device (dataset was acquired),
  • file format (RAW data are stored),
  • date time of acquisition,
  • dataset's provenance and it's ownership
  • relation to projects, publication, other datasets etc.
  • sample, what protein, structure etc. it contains information about.

device information

metadata about device are sometimes in RAW files obtained in datasets. E.g. JCAMP/NUTS has some text information about the device in header of file

file format

various, binary, little endian JCAMP - header information preceeding the binary data (usually 4 bytes float stream in order RIRIRIRI....)

data acquisition

file metadata, proposal dates



ownership, relations to project

ARIA api


Adobe SVG Viewer plugin (for Internet Explorer) or use Firefox, Opera or Safari instead.


Dataset entity can be presented as:

  1. metadata structure - which can be browsed e.g. via RESTful API to obtain details, or links to raw entries of the dataset
  2. directory in virtual folder - each entry can be a subdirectory or file, proper mapping to WEBDAV should be implemented e.g. by downloading and exposing it to the current WEBDAV interface
  3. file - link to the file can cause to prepare the file - ZIP file with the directory as defined above

An example of open data repository network in Europe is OpenAIRE [[1]]. In order to interlink - register a west-life portal and it's public dataset into open access infrastructure database - openaire, the following guidelines must be met [[2]]. CERIF standard is recommended as the format of metadata of dataset - The CERIF entity cfResultProduct. Datasets are linked with publications, with funded projects, with persons and organisations, and with equipment. Additionally REST api must be implemented in order to return the metadata of dataset entries within /resultproducts.

XML format is mandatory by default, JSON format is not mandatory. As XML is generated by the current framework of metadataservice of VF (ServiceStack) it might be investigated whether the generated format will be automatically compliant with Xml Schema defined by CERIF standard based on dataset (cfResultProduct) structure, otherwise customization should be made.

Possible database schema for dataset

Database schema for entities: Dataset and DatasetEntry and relation: DatasetEntries. Adobe SVG Viewer plugin (for Internet Explorer) or use Firefox, Opera or Safari instead.