Documents of D6.2 namespace
- D6.2:Prototype Implementation
- Related topics:
- D6.2:Virtual Folder and West-Life Portal integration
- D6.2:Virtual Folder and Partner Portal integration
- D6.2:Virtual Folder and PDB Components integration
- D6.2:Virtual Folder and Cloud integration
- D6.2:Virtual Folder and Access to Dataset
- D6.2:Meeting and conferences notes
Metadata analysis and design
The metadata of dataset may contain information about
- the device (dataset was acquired),
- file format (RAW data are stored),
- date time of acquisition,
- dataset's provenance and it's ownership
- relation to projects, publication, other datasets etc.
- sample, what protein, structure etc. it contains information about.
metadata about device are sometimes in RAW files obtained in datasets. E.g. JCAMP/NUTS has some text information about the device in header of file
various, binary, little endian JCAMP - header information preceeding the binary data (usually 4 bytes float stream in order RIRIRIRI....)
file metadata, proposal dates
ownership, relations to project
Dataset entity can be presented as:
- metadata structure - which can be browsed e.g. via RESTful API to obtain details, or links to raw entries of the dataset
- directory in virtual folder - each entry can be a subdirectory or file, proper mapping to WEBDAV should be implemented e.g. by downloading and exposing it to the current WEBDAV interface
- file - link to the file can cause to prepare the file - ZIP file with the directory as defined above
An example of open data repository network in Europe is OpenAIRE []. In order to interlink - register a west-life portal and it's public dataset into open access infrastructure database - openaire, the following guidelines must be met []. CERIF standard is recommended as the format of metadata of dataset - The CERIF entity cfResultProduct. Datasets are linked with publications, with funded projects, with persons and organisations, and with equipment. Additionally REST api must be implemented in order to return the metadata of dataset entries within /resultproducts.
XML format is mandatory by default, JSON format is not mandatory. As XML is generated by the current framework of metadataservice of VF (ServiceStack) it might be investigated whether the generated format will be automatically compliant with Xml Schema defined by CERIF standard based on dataset (cfResultProduct) structure, otherwise customization should be made.
Possible database schema for dataset
Database schema for entities: Dataset and DatasetEntry and relation: DatasetEntries.