NEP-101 HEP Data-Intensive Distributed Cloud Computing Technical Review 15 October 2014 Colin Leavett-Brown, University of Victoria Technical Review - 1 NEP-101 HEP Data-Intensive Distributed Cloud Computing ● Agenda: 1.Percent Completion 2.Project Progress Discussion 3.Collaboration Opportunities 4.Summary 15 October 2014 Colin Leavett-Brown, University of Victoria Technical Review - 2 NEP-101 HEP Data-Intensive Distributed Cloud Computing 1. Percent Completion: 15 October 2014 Colin Leavett-Brown, University of Victoria Technical Review - 3 NEP-101 HEP Data-Intensive Distributed Cloud Computing 2. Project Progress Discussion: – Batch Services (CloudScheduler) – Software Distribution (CVMFS, Shoal, Squid) – Storage Federation (UGR) – VM Image Distribution (Glint) – VM Image Optimization (CernVM3) 15 October 2014 Colin Leavett-Brown, University of Victoria Technical Review - 4 NEP-101 HEP Data-Intensive Distributed Cloud Computing 2.1. Project Progress Discussion, Batch Services: - Monitoring & Diagnostics * Work continues * Nagios friendly - Other Activities: * Belle-II production * OpenStack/Nova Fairshare Scheduler 15 October 2014 Colin Leavett-Brown, University of Victoria Technical Review - 5 NEP-101 HEP Data-Intensive Distributed Cloud Computing 2.1.1 Belle-II production: • 2-15 October. • Utilizing up to 7 clouds concurrently (including 3 commercial clouds). • Averaging more than 1170 concurrent jobs (peaking at over 1950). 15 October 2014 Colin Leavett-Brown, University of Victoria Technical Review - 6 NEP-101 HEP Data-Intensive Distributed Cloud Computing 2.1.2 Belle-II production: • 3rd highest producer in the world. 15 October 2014 Colin Leavett-Brown, University of Victoria Technical Review - 7 NEP-101 HEP Data-Intensive Distributed Cloud Computing 2.1.2 Fairshare Schedule: • Replacement for the standard OpenStack filter scheduler. • Improves resource usage in a batch environment. • When there are insufficient resources, queues requests rather than rejecting them. • Allows the definition of project and user shares. • • • project_shares={'ATLAS':30, 'Belle-II':30, 'HEP':5, 'staticVMs':10, 'testing':5} – user_shares={'p1':{'p1_u1':11, 'p1_u3':13}, 'p2':{'p2_u1':21, 'p1_u3':13}} Implements the SLURM utilization: (https://computing.llnl.gov/linux/slurm/priority_multifactor.html) fair share algorithm which tracks resource – – USER | PROJECT | USER SHARE | PROJECT SHARE | FAIRSHARE (Vcpus) | FAIRSHARE (Memory) | actual vcpus usage | effec. vcpus usage | priority | VMs – – igable | staticVMs| 1% | 10.0% | 0.142883896655 | 0.144670972976 | 2.6% | 46.8% | 2441 | 1 – crlb | staticVMs| 1% | 10.0% | 0.0240432745384 | 0.0265300227472 | 88.3% | 89.6% | 426 | 8 – crlb | testing | 10.0% | 10.0% | 0.975288048768 | 0.982024799983 | 1.2% | 1.2% | 16627 | 0 – batchaccount| HEP | 1% | 10.0% | 0.844486880279 | 0.807007869005 | 0.1% | 2.7% | 14093 | 0 – frank | HEP | 1% | 10.0% | 0.625423100715 | 0.551355623614 | 7.4% | 7.5% | 10113 | 2 – crlb | HEP | 1% | 10.0% | 0.837189731471 | 0.798172023564 | 0.3% | 2.8% | 13959 | 0 – Collaborating with developers from Istituto Nazionale di Fisica Nucleare. 15 October 2014 Colin Leavett-Brown, University of Victoria Technical Review - 8 NEP-101 HEP Data-Intensive Distributed Cloud Computing 2.2. Project Progress Discussion, Software Distribution: - In production for 9 months - Multiple geographically distributed ATLAS squid caches - Adopted by CERN, Oxford, & others. - Included in CernVM3 15 October 2014 Colin Leavett-Brown, University of Victoria Technical Review - 9 NEP-101 HEP Data-Intensive Distributed Cloud Computing 2.3. Project Progress Discussion, Storage Federation: - Supports WebDAV servers and ATLAS storage elements (SE) - ATLAS SE authentication via VOMS proxy – Tested interactively - Canadian sites configured: - Production testing waiting for modification to Rucio/Aria2C, the glue between ATLAS DB, user, and UGR. - Simulation tests being formulated. 15 October 2014 Colin Leavett-Brown, University of Victoria Technical Review - 10 NEP-101 HEP Data-Intensive Distributed Cloud Computing 2.4. Project Progress Discussion, Image Distribution: - Integrated with OpenStack Dashboard * Adding HTTPs & Branding * Moving to production * Packaging * Preparing for OpenStack Summit in November - Uses OpenStack development architecture, and keystone authentication - Supports Glance, EC2, & GCE 15 October 2014 Colin Leavett-Brown, University of Victoria Technical Review - 11 NEP-101 HEP Data-Intensive Distributed Cloud Computing 2.5. Project Progress Discussion, Image Optimization: - Work continues on contextualization of CernVM-3: * Cloud type discovery. * Contextualization switching from a combination of puppet/cloud-init to pure cloud-init. * Collaborating with CERN and pushing code changes directly to the CERN repository. 15 October 2014 Colin Leavett-Brown, University of Victoria Technical Review - 12 NEP-101 HEP Data-Intensive Distributed Cloud Computing 3. Collaboration Opportunities: – Latest CloudScheduler, Glint, and UGR will be available on DAIR – All source code developed by the project is on Github – Seeking to have Glint installed at CERN, on Westgrid, and other third party sites – Seeking to have Glint included as an OpenStack project – Using code provided by CERN, OpenStack, and the open source community 15 October 2014 Colin Leavett-Brown, University of Victoria Technical Review - 13 NEP-101 HEP Data-Intensive Distributed Cloud Computing 4. Summary: – Project is on track and making good progress. – Many of the pieces are already in production by NEP-101 project group, CERN/ATLAS, Belle II, and CANFAR. 15 October 2014 Colin Leavett-Brown, University of Victoria Technical Review - 14
© Copyright 2024 ExpyDoc