M11

From West-Life
Jump to: navigation, search

The milestone was finalized in Feb 2016, a document (polished snapshot of this page) is available, including appendices, at these links:

We keep updating this page with current status.

Inventory of available resources and testbed setup

Existing portals

Template of information to gather: File:Template survay.pdf

Portal name Partner File
AMPS-NMR CIRMMP File:Pq amber.docx
CPP4online STFC File:Pq ccp4online.docx
CS-ROSETTA3 UU File:Pq csrosetta3.docx
GROMACS UU File:Pq gromacs.docx
HADDOCK UU File:Pq haddock.docx
SCIPIO WEB TOOLS CSIC File:Pq csic.odt
UNION UU File:Pq unio.docx
ARP/wARP EMBL File:Pq arpwarp.docx
ViCi EMBL File:Pq vici.docx
AutoRickshaw EMBL File:Pq autorickshaw.docx
Crystallographic construct designer (CCD) NKI File:Pq ccd.docx
PDB_REDO NKI File:Pq pdb redo.docx
XPLOR-NIH CIRMMP File:Pq xplor.docx


Collected data

Summary

Data uploaded on job submission

  • Generally up to a few tens of MB of input data is required.
  • Exceptions:
    • Input data for Scipio takes 2 TB.

Data downloaded as the result

  • Generally from MBs to GBs.
  • Exceptions:
    • CPP4online Balbes/MrBUMP downloads are up to tens of GB, but it is not typical use case
    • Scipio download can be 2 TB if input is included


Background data

  • Generally portals do not required any background data
  • Exceptions:
    • CPP4online – local copies of the PDB and th XXprod databases are used


Usage of local and remore resources

  • 6 portals use remote resources, (EGI grid + gLite), just Haddock sends some jobs via Dirac
  • 5 portals use only local resources


User submissions and grid jobs

  • Generally hundreds or small thousands of user submissions translate into thousands or tens of thousands grid submissions.
  • Exceptions:
    • Haddock – 25k user submissions jobs translates into 7,5M grid jobs
    • CS-Rosetta3 – 67 jobs translates into 189k grid jobs


MPI

  • Generally not used.
  • Exceptions:
    • Scipion
    • AutoRickshaw

Testbed

Resource inventory (last update 15 September 2017)

Cloud resources

-
Cloud name Cloud framework Institute Phys.Cores Phys.RAM GPUs Block Storage Object Storage OCCI endpoint West-Life VOs supported
INFN-PADOVA-STACK OpenStack/Mitaka INFN 120 240 GB 0 3.7 TB - https://egi-cloud.pd.infn.it:8787/occi1.1/ enmr.eu
CESNET-MetaCloud OpenNebula MU 60 (+400 best effort) 360 GB (+2.5 TB) 4x M2090 (shared) 6 TB (+25 TB) - https://carach5.ics.muni.cz:11443/ enmr.eu
IISAS-GPUCloud OpenStack/Mitaka IISAS 96 384 GB 8x K20 (shared) 6 TB - https://nova3.ui.savba.sk:8787/occi1.1/ enmr.eu
IISAS-Nebula OpenNebula/5.0 IISAS 48 192 GB 4x K20 (shared) 6 TB - https://nebula2.ui.savba.sk:11443 enmr.eu
STFC SCD Cloud OpenStack/Mitaka STFC 64 initially 256 GB 0 6 TB TDB TBD enmr.eu

CESNET-MetaCloud (Czech Republic) has signed a SLA with EGI-Engage MoBrain Competence Centre (that includes enmr.eu VO)

Comments:

  • For the time being, MU collaborates with CESNET to provide access to the resources as a single site. We count MU resources available to Westlife here, not the whole FedCloud site. A standalone MU site is planned by the end of 2016.
  • IISAS seems to support enmr.eu too, we may negotiate with them
  • SurfSara provides cloud machines to test Scipion, but not via FederatedCloud -- why?

Grid resources

We consider the existing EGI resourcess accessed in the legacy ways and accessible to the enmr.eu VO to be part of the Westlife testbed because they can accept jobs of the community portals.

On the other hand, it is not worth building a separate parallel infrastructure in this way.

TODO: inventory of resources, maybe just through appropriate links to EGI

EGI production grid sites capacity (from EGI Vapor tool)

Subset of EGI production grid sites who signed a SLA with EGI-Engage MoBrain Competence Centre (that includes enmr.eu VO)

  • 7 resource centres: INFN-PADOVA (Italy), RAL-LCG2 (UK), TW-NCHC (Taiwan), SURFsara (The Netherlands), NCG-INGRID-PT (Portugal), NIKHEF (The Netherlands), IFCA-LCG2 (Spain)
  • they have pledged 55 million hours of computing time and 61 TB storage capacity for the entire year 2016, and 60 million hours of computing time and 250 TB storage capacity until the end of 2017.

EGI prototype GPU-enabled grid sites:

  • A cluster with 3 nodes (2x Intel Xeon E5-2620v2) with 2 NVIDIA Tesla K20m GPUs per node is available at CIRMMP. It uses Torque 4.2.10 (source compiled with NVML libs)/ Maui 3.3.1 as batch system/scheduler and its CREAM-CE endpoint is cegpu.cerm.unifi.it:8443/cream-pbs-batch
  • A cluster with 2 nodes (2x Intel Xeon E5530 and 2x Intel Xeon E5-2620v4) hosting respectively 1 NVIDIA Tesla K40c and 4 NVIDIA Tesla K80 is available at QMUL. It uses Slurm as batch system and its CREAM-CE endpoint is ice.esc.qmul.ac.uk:8443/cream-slurm-sl6_lcg_gpu


Non-EU sites

The enmr.eu VO is supported by non-EU sites through EGI interoperability mechanisms, a report is available at

http://pos.sissa.it/archive/conferences/162/040/EGICF12-EMITC2_040.pdf


Access

TODO: consider restricting access to the testbed to a specific group "westlife" inside enmr.eu VO; we don't want all portal users to spawn VMs here probably.


Monitoring (last update 15 September 2017)

Grid Resources Inventory from EGI Monitoring tools:

  • Gstat service was decomissioned at the end of 2016

Cloud Resources from EGI Monitoring tools:


Accounting (last update 15 September 2017)

Grid Resources from EGI Accounting Portal

(last 18 months data, web access restricted to VO administrators)

Njobs-aug17.jpg

Cpuhrs-aug17.jpg

Roles-aug17.jpg

Cloud Resources from EGI Accounting Portal

(last 18 months data, web access restricted to VO administrators)

Nvms-aug17.jpg

Wtimevms-aug17.jpg