STEP 3

From West-Life
Jump to: navigation, search

Once people can view their virtual folder in the VRE, the next step is to have interconnection with e-science portals:

- in the VRE, view the status of your jobs on all portals - in the VRE, view the metadata of all the jobs you launched

While user data is NOT stored in the VRE (the VRE uses virtual folder from WP6 but stores nothing locally), job status and metadata are stored locally inside the VRE.


These are high level ideas. More technical specifications to be determined.

1. Expose the state of a job through an API

Goal: allow the VRE to monitor running jobs, keep track of parameters and results, and of the  data used.


- Define a set of simple APIs that portals should be able to answer to.

- For each job, the portal should be able to provide a parameters file, and a results file

(Here, job is meant in the sense of 1 job = 1 submission by a user on a portal, not 1 job

= 1 job submitted on a grid, since 1 submission can be split by the portal into many small grid

jobs)

Sample REST API:

GET /api/<job_id>/state

-> returns the state of the job (Pending, Running, Failed, Succeeded, etc) and basic

monitoring info (running time, etc)

-> follows a standard format (json?)

GET /api/<job_id>/parameters

-> returns a parameters file, describing the input parameters for the job

-> format depends on the portal, but it has to return something

GET /api/<job_id>/results

-> if the job has succeeded, returns a results file, describing the output of the job

-> format depends on the portal, but it has to return something


(all these request would be authentified through the SSO)



2. VRE file integration

   Goal: allow the portals to upload/download files directly from the storage backends managed 

through the virtual folder.


   At the moment, portals expect users to upload their data files through the web interfaces. It 

should be possible to enter the id (or path) to a file in the VRE instead.

   This would be a generalisation of what some portals already support for PDB (the user enters 

a PDB id and the PDB file is downloaded directly from the PDB database)


   Plus, they should upload the results to the virtual folder.

-> These results can't be kept forever. But once uploaded on the virtual folder, they count into

a user's general quota, so it's their problem how they keep it.

   - STFC and Luna contributed code to the virtual folder layer, which provides a unified 

interface to storage repositories (EUDAT, Google Drive, S3, Dropbox…).

The data uploads/downloads will happen directly between the portal and the storage backend,

using the storage backend native API. The VRE is only a central repository of the metadata of

the files.

The process is basically:

- The portal makes a call to the virtual folder to determine where a file is stored (which

storage backend, exact path)

- The portal retrieves the file directly from the storage backend.

And the same thing goes for uploads.



3. Job submission from the VRE

   Goal: allow submissions of new jobs to be done from inside the VRE, and prepare support for 

workflows

   Possible options:
   - the VRE shows the portals in a frame => not a lot of work for now, but it doesn't allow 

to create integrated workflows.

   - portals accept job submissions using a parameter file, and the VRE has a way to 

create/manage parameters files => more work for the portals that don't support it yet, but it

makes it possible to build workflows above it.


  4. Job Metadata storage


Nor the VRE nor the virtual folder will actually store files and results. The virtual folder appliance

only provides a unified layer to fetch and dump files to storage repositores.

However, the VRE will store jobs metadata, so that the user can have one central view of their

workflows and their pipelines.

Concretely, portal operators would be asked to keep the VRE updated each time a job is

submitted, via a REST API. The system would be based on 1. Specifications, with possibly

additional application-specific stuff.