There are 3 topics to consider:
- Hardware provisioning: Overall, the community = project partners contribute hardware resources. What we would like is to have one unique pool of resources for the project = "federation" between project members. This pool would be organized as an EGI VO. Resources will be merged in enmr existing VO. Note: in terms of architecture, the web portals would submit jobs to a static grid registered with EGI, and the grid would scale elastically with cloud provisioning. --> static grids with elastic scaling
- Templating system: we want to pool partners resources so this means each node needs to be used for several different jobs. We thus need a configuration templating system. We can use VM Images (ex: Gromacs VM Image), or Docker Images (that would run in a generic docker VM image). Docker containers are more lightweight and easier to use. BUT they can be problematic if: -- compute jobs run on multiple nodes (Docker and MPI do not play well together) -- compute jobs require shared drive mount (Docker does not like that)
Note: if we go for VM templates, we should probably integrate to the EGI system: https://wiki.egi.eu/wiki/Federated_Cloud_Architecture#VM_Image_management
Note2: if we preinstall all software on all nodes, we don't need templating system. However we will need cloud technology (object storage) for storage. It is preferable NOT to keep preinstalling software on all nodes, since as software diversifies, configuration management is going to get harder.
- Dispatcher: the web portals need an end point to submit jobs. Most dispatchers support only 1 kind of templating system: pre-installed VM Images, or Docker containers. For docker, one of the best is Mesos (used at Twitter). For pre-installed VM Images, any classic existing grid dispatcher should be able to do the job.
So we have several trade-off.
- If some portals require MPI, Docker containers is out of the game, and we should use VM Images for templating. - if we use VM images all partners need to set up a cloud framework - if we have the choice between Docker containers and pre-installed VM Images, then we should probably go for the option that is closest to existing running facilities standards, AND that minimizes maintenance for partners, since WestLife funding is not eternal.
We have already reports that some portals require MPI + shared storage, so Docker is probably not viable. This means we would use VM Images for the templating system. The architecture will probably be a static grid, with elastic cloud scaling capabilities.
For dispatcher choice, DIRAC4EGI is the lead candidate: it is simple, and can use grid + cloud resources. DIRAC4EGI can take care of dispatching AND cloud provisioning. EC3 has been mentionned too, but it's not clear yet if it can complement DIRAC4EGI or if there is overlap.