WP4 Architecture

From West-Life
Jump to: navigation, search

Goals of architecture consolidation

  • uniform "West-life" recipe to deploy portal on the cloud+grid infrastructure
  • allow multiple instances of the same portal (load balancing ...)
  • still homogeneous approach for different portals
  • consolidated maintenance (expertise, documentation, ...), easy migration (e.g. hw upgrade)
  • ...

Current status

  • references to M4.1 and D5.2, brief description of each (internals rather than user-visible functionality)
  • very short summary of security issues (it must not be neglected by the architecture but it's the topic of D4.2 otherwise)

Proposed architecture overview

We start from Kickoff architecture schema

Kickoff architecture schema

The main pieces are:

  • Virtual cluster to host the main portal web server and its 'local' cluster used for quick-turnaround tasks like pre- and postprocessing
  • Remote job submission to offload the main computation in reasonable chunks to other resources (grid & cloud)
  • VM pool management in case of submission to cloud, keep a pool of available VMs
  • Software distribution and maintenance for both 'local' nodes (at the virtual cluster) and 'remote' ones

Virtual cluster

  • deploy at one physical cluster as a whole
    • fast and reliable network among nodes
    • shared filesystem (NFS)
    • MPI possible
  • specialized nodes
    • web server itself
    • services (e.g. Torque server, Spark cluster manager, ...)
    • worker nodes
    • ...
  • high availability of critical nodes
  • elasticity (worker nodes on demand)
  • AuthN/Z
    • spawning the cluster
    • portal users (another story)
    • service identities to authenticate nodes among one another

Remote job submission

  • Grid: stick with existing solution (Dirac)
  • Cloud: TODO later, not critical in PY1

Remote VM pool management

  • TODO: realated to cloud submission, not critical now
  • shared by multiple portals?

Data storage and transfer

  • subject of D4.4, however, it should appear here briefly, at least due to the Scipion usecase

Software deployment and maintenance

  • covers both 'local' nodes, i.e. the virtual cluster, and remote nodes
  • grid case: use existing mechanisms
    • tags in CE info in BDII
    • CernVMFS to distribute updates?
  • 3rd party packages, whatever we use
  • in-house stuff
    • portal pages
    • portal logic (workflow engine)
    • configurations
    • auxiliary software (scripts etc.)
  • VM image repository with heavy-weight updates, or generic 'light' VM images and recipes to customize (eg. Puppet)

Available technology

  • gather information from Technology_Review
  • review relevant deliverables of Indigo project
  • assess usability of each of the pieces for our purpose

Proposed implementation version 1

  • what we plan to deploy this year
  • what decisions we did and why
  • experience and proposed revisions will be subject of D4.3
  • MU: focus on virutalized portals (cloudify + puppet)
  • most of the D4.1 draft of May 19 fits here
  • ...