WP4 Scipion Usecase
- 1 Current status
- 2 Goals in West-Life
- 3 Proposed architecture overview
- 4 Technical description of the workflow
- 5 Components in detail
Scipion covers the whole workflow of about a dozen of steps of processing raw CryoEM data to refined electron density map. Some of the worklflow steps are automatic, some require user interaction. The individual steps are implemented by various software tools, including third party, and providing alternatives (there is no one-size fits all choice, some approaches work better for some inputs and vice versa). The Scipion framework integrates those tools to work together smoothly. Details can be found at the Scipion documentation page
In general, the processing starts with rather huge raw data (CryoEM "movies", up to terabytes) on which fairly lightweight calculation (mostly I/O bound) is performed. In subsequent workflow steps the amount of data gets reduced while the computational complexity increases, ending with many days of CPU time.
The implementation may use multiple computing nodes to distribute the load but it relies on a shared POSIX filesystem. The filesystem is organized in projects -- self-contained folders where all input data, intermediate results, and status of the workflow processing is stored.
Computations can be spawned through traditional batch systems (Oracle Grid Engine, Torque, Slurm, ...).
Some computationally demanding steps of the workflow support parallel execution with either threading or MPI, some of the tools leverage GPU acceleration.
The full suite is implemented as desktop application with integrated GUI.
Recently, within the EGI MoBrain competence centre, a proof-of-concept deployment in the cloud was done.
It shows general feasibility of the approach, however, the process is still strongly manual and rather complicated for non-expert cloud users.
Scipion web tools
Specific pieces of the workflow are available as web tools, with no need to install software at the user's machine.
Currently those are the following:
My movie alignment
Corrects for global frame movements, as well as local within-frame movements of your movies. Data requirements: upload 1-2TB (resumable upload implemented, for other web tools too) Computational requirements: 3-5 minutes with 1 CPU
NOTE: one of the methods uses GPU, so a GPU queue is in place in production.
My first map
Generates a first volume using particle averages to be use a a latter volume reference for refinement. Data requirements: upload 0.5 MB. Computational requirements: 5 hours with 1 CPU
My resolution map
Computes the local resolution of 3D density map using Resmap. Data requirements: upload 55 MB Computational requirements: 1 minute with 1 CPU
So far the web tools run on a single server. They are integrated with queue managers, so it is possible to perform the processing in a cluster.
They have been successfully deployed to virtual machines in the cloud (the process is highly automated, but there is still room for polishment)
Impact on Scipion development
User management (Identification, authorization, data access) Possible data file transfer outside standard SWT upload/download interoperability Moving files to run things on the “Compute site”. (Desktop and web tools with high computational needs)
Goals in West-Life
Easy virtualized deployment
The West-Life infrastructure will be designed in a way that will allow easy deployment of Scipion, without the need of extensive modifications and breaking its basic design principles (e.g. the shared project folder).
Vice versa, the code will be adapted to allow for automated deployment, even in multiple instances (e.g. for different user group).
Permanent and reliable data storage
The deployment scenario will include permanent (or semipermanent at least) storage of data (the whole project folders including the raw microscope data). The benefit is two-fold
- the user is not required to manage the data on local resources anymore
- experiments are reproducible, they can be run again with different choice of alternatives in particular steps etc.
Leverage distributed infrastructure
The usual goals of any grid or cloud portal solution. The users don't need to run their computing resources, various groups can use capacity of various sites according to negotiated contracts/agreements, with the technical solution supporting flexibility in those agreements.
Co-existence of desktop and web versions
Not all steps of the workflow are ported to the web environment, the implementation will take time, and it may not be even feasible to port all the steps -- access via remote desktop can be still better for the user.
However, the web and desktop versions should share as much software and infrastructure as possible, and the design must allow gradual development of the web tools as alternatives (or eventual replacement) the corresponding desktop components.
Moreover, access to the same project folders by both web and desktop tools is also desirable.
Proposed architecture overview
The proposed architecture is sketched in the following diagram.
There are two categories of sites, both are expected to exist in multiple instances. Further, shared West-Life services are leveraged.
The site provides permanent fairly large data storage (up to petabytes) to accommodate the full CryoEM experiment data.
Besides storage, moderate-size (order of 1000, not 100,000 CPU cores) computing resources are available locally, through cloud interface, and sufficiently fast and low latency network allowing to access the stored data through POSIX file system mounted in the cloud-hosted VMs. Those are
- front-end of the web tools,
- virtual cluster of worker nodes to process the computations of the workflow steps (elastic, managed by batch system),
- VMs running virtualized desktop tools
Intensive calculation is expected to be offloaded from the storage site, to virtual clusters hosted at other sites.
This is expected for the later workflow steps (e.g. 2D elignment, 3D refinement etc.) which are computationally demanding but don't require huge inputs neither generate huge outputs. Therefore is is affordable to stage in/out the data (subsets of the project folders), the time to transfer data is amortised with time to compute.
The virtual clusters still need shared storage accessed as POSIX filesystem mounted in the cloud nodes.
MPI among the cluster nodes must be supported to run parallel calculations.
Details on management of the virtual clusters TBD
Management of remote compute clusters and submission to them is expected to be controlled by common West-Life service.
Technical description of the workflow
The user accesses a chosen Scipion web server, and creates a project there (either empty or clone of existing one). The corresponding Scipion project folder is created in the POSIX filesystem mounted from the site storage.
Authentication is done through common West-Life mechanism (federated identities, TBD). The authenticated user becomes the project owner, further sharing (other users, groups) can be set up as well.
Authorization -- the user belongs to the Scipion group, eventually finer-grain (sub-group authorized to use this portal instance)
Project-specific access URL for massive data transfer is created, either permanent (if the data transfer protocol support strong authentication) or temporary. This URL points directly to the project folder at the site storage, it needn't pass through the web front-end.
Moderate size uploads (up to gigabytes) are feasible (and easiest) through the web interface directly, however, bigger data (terabytes) must be uploaded with specialized protocol (gridftp, rsync, ...).
Appropriate AuthN/Z must be implemented at the storage endpoint -- just user based (allow access to the folder owner only) is sufficient. Alternatively, "secret" temporary URLs (or other poor man solution) can be used, if the storage technology does not support strong AuthN and/or propagation of AuthZ information.
Processing with web tools
Once input data are uploaded, processing with web tools can start. The user can access the projects at this site owned by or shared with him/her.
Because of preventing overload of the web server, even lightweight calculation should be run through the batch system at the virtual cluster at the site. This cluster should support MPI for parallel execution. The shared POSIX filesystem is mounted on all nodes of the cluster.
Authorization: to make things simple, we assume to run a single instance of the portal with a service identity authorized to access its area at the strorage filesystems (where all the project folders are hosted), as well as to submit to the batch system of the local virtual cluster. Isolation of the users is done at the application level.
If this is not sufficient, multiple instances of both web front-end and virtual cluster can be spawned, each using different service identity, as well as different area of the site storage.
The same service identity is used by the virtual cluster to access the shared filesystem. Again, isolation of users is done at the application level, the users do not access this cluster directly.
Spawn user's virtual desktop instance
Workflow steps not covered by the web tools (or not feasible to do so) are done with "remote desktop" version of scipion, running at dedicated VMs at the same site.
The remote desktop solution needs only a HTML5 browser on the client side. On the server side, the initial requirements are quite light, in terms of computing power and network ports (HTTPs; HTTP as a backup). If the remote desktop VMs have public IPs, the user could be redirected from the main web. It is possible to use a proxy solution, though it implies a penalty in performance and complexity.
The VMs are instantiated from prepared VM images with postponed software deployment eventually (to avoid maintenance of complex, large VM images; shared West-Life approach will be followed here).
The VM mounts the POSIX filesystem of the site storage to access the user's data.
Probably as part of VM contextualization the user's credentials must be delegated to the VM to access the data. Details TBD.
Submit remote job
In general, sites providing large storage may not be able to dedicate extensive computing resources to a single community. Therefore, heavyweight calculation (later steps of the workflow, e.g. 2D & 3D alignment) should be possible to offload to other sites. Fortunately, these stages of workflow don't require large inputs anymore, therefore it is feasible to copy in/out the data.
The mechanism is not clearly defined yet, just the following is expected hold.
Common West-Life infrastructure manager service is expected to maintain the pool of available resources -- remote virtual clusters where jobs can be submitted, including the required software.
Upon job submission, the West-Life broker decides which site the job should be submitted to.
The required part of the project folder is copied to the remote site before starting the actual payload, it is copied back after successful completion.
Service identities are used again in case of submission from the web tools, delegated user credentials otherwise.
Components in detail
We will put details on implementation here, as soon as we decide on them and start deploying.