Discussion with Penomenal H2020

From West-Life
Jump to: navigation, search

Summary of a discussion on September 20th 2017


Westlife: roughly 3M€ budget, 3 years duration Phenomenal: roughly 8M€ budget, 3 years duration

*** Infrastructure (Westlife WP4 / Phenomenal WP5)

Westlife decided not to use containers because of doubts about Docker/MPI and Docker/shared storage maturity. Instead Westlife uses VMs, and DIRAC4EGI for job management. Phenomenal uses containers, and Kubernetes to manage it all (it manages jobs dispatch as well). Phenomenal does not need MPI (jobs are usually embarrassingly parallel), but uses shared storage with ClusterFS.

In both Westlife and Phenomenal, tools existed before the start of the project. Westlife asked services operator to move to DIRAC4EGI for job management. Phenomenal asked their tool developers to develop containers.

Phenomenal controls versioning with its own Docker container registry. Westlife VM image uses remote software sources so versioning is less tightly controlled (if I remember correctly).

In terms of SSO, Westlife uses Instruct ARIA, Phenomenal uses Elyxir SSO.

Overall, Phenomenal is a kind of deployable cluster instance that provides all the tools. The current portal is a public instance, and everyone can deploy their own. --> Phenomenal is a deployable "packaged self-contained infrastructure".

Westlife is a bit different. Westlife VM is of course deployable and aggregates West-life services (fetches binaries from each tool repo if I remember correctly), so tool developers do not maintain a Westlife-specific version of their package. (at the cost of consistency of course). In addition to that Westlife made legacy web portals of those tools standardize over DIRAC4EGI and ARIA SSO.

So, Phenomenal made tool developers package containers, and provide an all-in-one cluster instance, while Westlife provides more of an aggregation layer, and made legacy tool portals standardize their job management and SSO mechanisms.

Westlife approach is more low-cost (which is logical since budget is smaller) and a little bit easier to maintain in the long run, but there is no consistency between tools packages, they are just aggregated from their sources. On the contrary, Phenomenal provides containers for everything.

Note: Some Westlife tools had licences so this played a role in limiting integration effort.


*** Virtual Research Environment (Westlife WP5 / Phenomenal WP6)

Phenomenal virtual research environment (VRE) lets users submit jobs directly on the underlying Phenomenal public instance. Uses Galaxy for workflows. Westlife VRE redirects the user to services portals, to submit jobs there. No workflow mechanism.

Westlife provides a consolidated data management interface that connects to Dropbox, EUDAT etc, Phenomenal is working on it


*** Sustainability

For Phenomenal, it's mainly about keeping the containers up-to-date, and keep up with new versions of Galaxy. Elyxir SSO is expected to remain maintained

For Westlife, it's mainly about maintaining the virtual folder API, and the single point of entry web portal (documentation etc). Westlife VM aggregates packages from existing repos (so not much to do) and ARIA SSO is expected to remain maintained.


CCL

The impression I have is that the main collaboration opportunity would be the virtual folder, which provides common data management. There isn't much Westlife-specific stuff in it. Also, the Westlife VM aggregates packages from Westlife partners, it might be able to aggregate also Phenomenal tools. What I'm saying is that we shouldn't duplicate our efforts since it decreases the likelihood that those efforts are maintained after the end of both projects.

This are just ideas I'm throwing around, they're not necessarily good. But I think it's interesting that we at least keep sharing how we cope with the sustainability requirement