Large-scale distributed computing systems Lecture 3: Infrastructures and deployment January 2015 Johan Montagnat CNRS, I3S, MODALIS http://www.i3s.unice.fr/~johan/ Course overview ► ► ► ► ► ► ► ► 1. Distributed computing and models 2. Remote services and cloud computing 3. Infrastructures and deployment 4. Workload and performance modeling 5. Workflows 6. Authentication, authorization, security 7. Data management 8. Evaluation Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 2 Course content ► 3. Infrastructures and deployment ▸ ▸ ▸ ▸ Research and production infrastructures Middleware development Deployment Operations Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 3 Recipe to make a distributed sytem ► Deploying, operating and using a large-scale distributed system is a long process 1. System developers design, develop and package middleware services 2. System administrators select the collection of interoperable services to run 3. System administrators deploy and operate the services ∙ ∙ Decide on services distribution Decide on operation policies 4. Application developers build applications against services available 5. End user exploit infrastructure through applications ► Fortunately, infrastructures are being operated, e.g. ▸ ▸ EGI European Grid (running the gLite middleware) French Aladdin/Grid5000 infrastructure (flexible middleware) Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 4 Basic components for large-scale, intensive computing Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 5 Resources management strategies Cluster computing Grid computing Cloud infrastructures Cloud Resources Manager Resources Broker Computing Cluster Application Tasks Manager Site X Site X Master (Batch manager) Site Y Site Y Workers Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 6 Main middleware components Data Management DataSets info User requests reso u Information Service rces e rc ou sa llo ca n qu er ies tio q ue r i e s Author. &Authen. Site X I n de x i ng s Re Workload Management Pu b l i c a t i on info Sites Resources Computing Resources Storage Resources Dynamic evolution Logging, real time monitoring Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 7 Functionality overview ► ► ► ► ► Authentication and authorization ▸ Identification, security, access control Information service ▸ Large scale distribution of resources, volatility Resource broker ▸ Load distribution, reliability, heterogeneity of computing resources Data Management Service ▸ Storage distribution, reliability, heterogeneity of storage resources Logging, monitoring, accounting ▸ Traces collection, system heartbeat, security enforcement, billing Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 8 User Interface ► System entry point ▸ ▸ ► Machine available to the user with middleware service clients installed ▸ ► Client-side: no service, no need for inbound connection The system interface may be implemented at multiple levels ▸ ▸ ▸ ▸ ► May be the user workstation May be an intermediate host with heavy client / specific OS installed Command line interface API Service interface Portal Control client credentials Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 9 Authentication and authorization ► Authentication: user identities may be determined through different credentials ▸ ▸ ▸ ► Authorization ▸ ▸ ► ▸ ► Access control to any resource (hardware, software, data...) Requires authentication User identifiers are belonging to VOs ▸ ► Login / passwords Personal certificates (usually X509) issued by a global certification authority Physical devices (fingerprint...) registered in VOMS (VO Management Services) The VOMS maps users to groups and roles Enable single sign-on to all grid services Identity delegation is often needed (for a service to act on behalf of the user) Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 10 Information System ► ► Collects / provides information on available resources (minimally computing and storage resources) Uses a standardized information schema ▸ ► Information Indexes are built and periodically updated ▸ ▸ ► GLUE (Grid Laboratory Uniform Environment) is standardized across different grids Trade-off between freshness of information and status update frequency Information may (and will often be) outdated Scalability is critical ▸ ▸ Large amount of resources is federated Hierarchical distribution, caching... Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 11 Workload Management System ► User computational tasks (jobs) are described ▸ ▸ ► The Resource Brokering matches jobs requirements with available resources and make resources assignment decisions ▸ ► Through a Job Description Language (task-based) Through a standard invocation interface (service-based) Use information on available resources, data location, dynamic load, etc. Follow on the job life cycle: input data transfer, execution and monitoring, transfer output data, etc. Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 12 Data Management System ► ► Infrastructure-wide file management system, based on heterogeneous local storages Different storage performance and use ▸ ▸ ▸ ► ► ► Disks Tapes Mass storage devices... File catalogs provide a transparent view of distributed data files Additional services may provide data distribution, data transfer load balancing, improved reliability and performance through replication, etc. Data access patterns matter Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 13 Metadata management ► The file catalogs use metadata associated to files ▸ ► Other databases services are commonly needed for storing user-defined metadata using more general relational schema. They provide: ▸ ▸ ► ► File size, checksum... A standard interface to different DBMS A certificate-based authentication and authorization policy Centralized DBMS adopted whenever possible Distributed relational database systems are challenging ▸ ▸ ▸ Data distribution policies Distributed queries optimization Distribution challenges ∙ ∙ ∙ Coherency of data Concurrent updates Transactions management Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 14 Middleware examples Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 15 Global computing example: Condor ► Workload management Heterogeneous resources Deliver High Throughput Computing ▸ For many experimental sciences, the computing throughput matters. Focus is not instantaneous computing power but the amount of computing that can be harnessed over a long period. ▸ HTC is a 24-7-365 activity: fault tolerance is critical Batch-oriented system ▸ Batch extended with Job Control Languages to face grid heterogeneity Distributed computing IS difficult ▸ team of ~35 faculty, full time staff and students (U. Winsconsin) ▸ established in 1985 ▸ Faces software/middleware engineering challenges in a UNIX/Linux/Windows/OS X environment ▸ ► ► ► Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 16 Matchmaking heterogeneous resources ► Run jobs in a variety of environments ▸ ▸ ▸ ► Local dedicated clusters (machine rooms) Local opportunistic (desktop) computers Grid environments; Interface to other systems Desktop Computers Matchmaking process I need a mac for this code to run I need a linux box with 2Gb RAM Master Ubinet: Large-Scale Distributed Computing (3) Matchmaker Dedicated Clusters Johan Montagnat 17 Matchmaking ► Condor conceptually divides people into three groups ▸ ▸ ▸ ► I need Linux, and I prefer faster machines Machine owner preferences ▸ ▸ ▸ ▸ ► } May or may not be the same people Job submitter preferences ▸ ► Job submitters Machine owners Pool (cluster) administrator I prefer jobs from the physics group I will only run jobs between 8pm and 4am I will only run certain types of jobs Jobs can be preempted if something better comes along System administrator preferences ▸ ▸ ▸ When can jobs preempt other jobs? Which users have higher priority? Do some groups of users have allocations of computers? Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 18 Matchmaking ► Matchmaking is two-way ▸ Job describes what it requires: ∙ ▸ Machine describes what it provides: ∙ ► I will only run jobs from the Physics department Matchmaking allows preferences ▸ ► I need Linux && 8 GB of RAM I need Linux, and I prefer machines with more memory but will run on any machine you provide me ClassAds Job Description Language (JDL) ▸ Stating facts ∙ ∙ ▸ Job’s executable is analysis.exe Machine’s load average is 5.6 Stating preferences ∙ I require a computer with Linux Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 19 ClassAds JDL ► ClassAds are: Semi-structured ▸ Attribute = Expression ▸ Schema-free, userextensible Extensible declaration ▸ HasJava_1_4 = TRUE ▸ ShoeLength = 7 ▸ ► ► Extensible matchmaking ▸ Requirements = OpSys == "LINUX" & HasJava_1_4 == TRUE Master Ubinet: Large-Scale Distributed Computing (3) ► Example MyType = "Job" TargetType = "Machine" ClusterId = 1377 Owner = "roy“ Cmd = “analysis.exe“ Requirements = (Arch == "INTEL") && (OpSys == "LINUX") && (Disk >= DiskUsage) && ((Memory * 1024)>=ImageSize) … Johan Montagnat 20 Matchmaking diagram Matchmaker Matchmaking Service Negotiator Collector 2 Information service 1 3 condor_schedd Job queue service Queue ClassAd Type = “Machine” Requirements = “…” Including dynamic information (load...) Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 21 Job submission ► ► Users submit jobs to scheduler ▸ Jobs described as ClassAds ▸ Each scheduler has a queue ▸ Scheduler / queues are not centralized Negotiator ▸ collects list of computers ▸ contacts each schedd (What jobs do you have to run?) ▸ compares each job to each computer to find a match ∙ ∙ ► Evaluate requirements of job & machine in context of both ClassAds If both evaluate to true, there is a match Fault tolerance scheduler ▸ Resubmission ▸ Fail-over scheduler Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 22 Job submission Matchmaker condor_negotiator condor_submit performs matchmaking condor_schedd Manage queue Queue condor_shadow manage job (submit side) condor_collector store ClassAds condor_startd manage computer condor_starter manage job (execution side) Job Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 23 Deployment example ► ► Always a single matchmaker Possibly several schedulers Matchmaker Fail-Over Matchmakers Dedicated Cluster Pool schedds & Desktop Schedds (Many job queues) Fail-Over Schedd Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 24 Job policies ► Implemented through ClassAd expressions ▸ ► Examples of user policies ▸ ▸ ▸ ► periodic_remove, periodic_hold, periodic_release and on_exit_remove attributes periodic_remove = JobStatus == 2 && ((CurrentTime - EnteredCurrentStatus) > 43200)) If my job runs too long, kill it If my job runs too long, run it somewhere else If my job finished with exit code 12, run it again Examples of system policies ▸ ▸ ▸ ▸ If a job is has run more than 12 hours and is using more than 2GB of RAM, then kill it How many batch slots per computer? Will only run jobs during day Authorized users (can control fair share) Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 25 Large-scale computing techniques ► Flocking ▸ ► Condor-G ▸ ▸ ► Metascheduling: connect scheduler to several condor pools Grid interface Including Condor-C condor interface(!) Pilot jobs ▸ Job-based reservation of resources and application level scheduling Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 26 Flocking ► Submit a scheduler to several pools ▸ ▸ Share condor pools between institutions Try to run on local pool first, then try to run on remote pool Scheduler ► Execute nodes Matchmakers Networking issues with private networks ▸ A communication broker may be needed if the scheduler is not able to communicate directly with every execute node broker 1 2 schedd 3 Master Ubinet: Large-Scale Distributed Computing (3) startd Johan Montagnat 27 Condor-G ► Submit jobs to other grid systems ▸ ► Minimal changes to job description Grids ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ Globus 2 Globus 4 Amazon EC2 Nordugrid Unicore PBS LSF Condor ... Matchmaker submit negotiator collector Custom Advertiser schedd Grid container Queue condor_gridmanager Job Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 28 Condor-C ► ► Connect two condor pools under different administrative domains Easier connectivity requirements ▸ ▸ Scheduler A only communicates to scheduler B and delegates running of jobs (instead of contacting all startd) Schedd A still receives notification of delegated jobs progresses matchmaker condor_submit Schedd A (Job caretaker) gridmanager Schedd B condor-gahp startd gahp: isolates interaction with grid system Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 29 Pilot jobs ► Pilot jobs are application-level scheduler jobs that once executing on a grid resource schedule other jobs on this resource ▸ ▸ ► Example: overlaying Condor on another system ▸ ▸ ► Resources allocation through a batch system (bypasses grid workload manager) Enables application-level scheduling Submit startd as a grid job to start a new pilot Grow the condor pool with the new startd daemon Limitations ▸ ▸ ▸ In push mode (e.g. Condor), the scheduler has to be able to open communication with the pilot jobs (or use a com. broker) Security is tricky (whose job is ran by the pilot?) System administrators do not like pilots so much (although users just love them) Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 30 Condor pilot jobs ► Startd can be rub as a grid pilot job (Condor Glide-in) 1. Pilot P (Job is Condor!) 2. Submit jobs J1, J2... condor_submit P P schedd Grid J1 J2 P J2 J1 Master Ubinet: Large-Scale Distributed Computing (3) Condor Central Manager Johan Montagnat 31 From Pilots to Clouds ► ► Clouds: resources allocation Pilots: resource dedication through job submission Pilot submission over grid Cloud resource reservation Application Tasks manager Cloud Resources Manager Resources Broker Application Tasks Manager Site X Site Y Site Y Master Ubinet: Large-Scale Distributed Computing (3) Site X Johan Montagnat 32 Wrap-up ► Condor develops extensible job management interfaces ▸ ▸ ► Reliability, reliability, reliability ▸ ▸ ▸ ► Loosely coupled components Modularized code Many internal safe-guards Mostly focus on jobs management ▸ ▸ ▸ ► Focus on scalability Tackles challenges of large-scale clusters / grids Hardly any concern on processed data No other grid services Not even workload balancing Not a complete middleware ▸ Condor is one middleware service Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 33 Global computing example: gLite ► gLite: “Lightweight” middleware for grid computing ▸ ▸ ► gLite and the EGI grid ▸ ▸ ▸ ► http://www.glite.org/, doc, download, etc Apache 2-like license The EGI infrastructure is a grid of clusters Interface to multiple batch systems, data managers, etc Global computing model gLite middleware stack ▸ ▸ ▸ Reusing existing technologies Higher level services Compatibility with (some) standards gLite components Virtual Data Toolkit (VDT) CONDOR GT2 components (GSI, GridFTP) Local systems (PBS, LSF...) OS (including SSL, Apache...) Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 34 gLite Grid Middleware Services CLI API Access Authorization Information & Monitoring Auditing Authentication Security Services Metadata Catalog File & Replica Catalog Storage Element Data Movement Application Monitoring Information & Monitoring Services Accounting Job Provenance Package Manager Connectivity Computing Element Workload Management Data Management Master Ubinet: Large-Scale Distributed Computing (3) Workload Mgmt Services Johan Montagnat 35 gLite Information system ► Lightweight Directory Access Protocol (LDAP) ▸ ▸ ▸ ▸ ▸ ► Berkeley Database Information Index (BDII) ▸ ▸ ► Directory: specialized database optimized for reading and searching. Hold descriptive, attribute-based information. No complex DB transactions High volume lookup c=UK c=FR Distribution, hierarchical Replication and caching, accepting temporary o=CNRS inconsistencies ou=I3S ou=DR20 LDAP database updated externally Merge number of sources using the LDAP Data Interchange Format Grid Laboratory Uniform Environment (GLUE) schema ▸ ▸ Common data model for all resources Contain information on Sites, Services, Computing and Storage Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 36 Workload Management System ► From jobs submission to jobs execution Pre-scheduling Meta-scheduling (WMS) Scheduling Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 37 Workload Management System ► WMS = collection of services Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 38 Workload Management System ► ► ► ► ► Helps the user accessing computing resources ▸ resource brokering ▸ management of input and output Different job types ▸ parametric jobs ▸ management of simple workflows (Direct Acyclic Graphs) ▸ support for MPI jobs Reliability ▸ Support for complete and shallow resubmission in case of failure Using information system ▸ Collection of information from many sources (BDII, R-GMA...) ▸ Support for Data management interfaces Support for file peeking during job execution (Job File Perusal) Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 39 Batch-oriented process ► User describe tasks using the CONDOR ClassAds language Executable = “application.exe”; ▸ ▸ ▸ ▸ ▸ ► ▸ Manages batch systems heterogeneity Performs global (cross-sites) monitoring Jobs execution environment ▸ ▸ ► Arguments = “-s -in config.data”; InputSandbox = {“application.exe”}; StdOutput = “out”; StdError = “err”; OutputSandbox = {“out”, “err”}; InputData = {“lfn:/biomed/RadioImg08.dicom”} DataAccessProtocol = {“gsiftp”}; Requirments = other.GlueCEUniqueID == “egee.fr”; Computing Element (CE) front-end for each cluster ▸ ► Executable program Command line parameters Input and Output sandboxes Input and Output data files Node selection constraints UNIX job running under normal user account on a UNIX node Executable may be shipped with the job (input sandbox) or preinstalled (per site) User environment/files cleaned-up for every job Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 40 Job information and workload mgt ► ► ► ► Logging and Bookkeeping service Tracks jobs during their lifetime (in terms of events) L&B Proxy provides faster, synchronous and more efficient access to L&B services to Workload Management Services Support for “sites reputability ranking” ▸ ▸ Maintains recent statistics of job failures at sites Feeds back to WMS to facilitate planning Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 41 Local users mapping: glexec ► ► ► Users have no dedicated account on worker nodes ▸ Pool accounts are used glexec changes the local UID as function of the user identity Enable VO-agent to control different user jobs Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 42 Data Management ► ► ► ► Client Disk or tape storage resource Global file catalog (LFC) Data Management System ▸ Critical centralized database, with LCG File access possible replicas (RO) service LFC Common file access interface: SRM File Catalog (Storage Resource Manager) ▸ negotiable transfer protocols (GridFTP, gsidcap, RFIO, …) ▸ SRM Various implementation from SRM Local Local common common catalog external projects catalog interface interface LCG File access API ▸ Abstractions for Storage Element, Storage Storage File Catalog, Information System System System 1 Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 2 43 Metacomputing example: DIET ► DIET (Distributed Interactive Engineering Toolbox) ▸ ▸ ▸ ▸ ► http://graal.ens-lyon.fr/~diet Focus on distributed computing, scalability Some integration of data (data transfers for computation, no real data storage) Exploits a Remote Procedure Call (RPC) interface Architecture ▸ ▸ Each computing resource is installing a Server Daemon (SeD) Hierarchical set of agents for load balancing: Leader Agents (LA), Master Agents (MA). Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 44 DIET components ► Client ▸ ► Server Daemon (SeD) ▸ ▸ ▸ ▸ ▸ ▸ ► GridRPC client able to interface to DIET Front-end to a computational server: GridRPC enabled Manages a processor (single server) or a cluster (batch interface) Application software is pre-installed on SeDs Update information on the node (CPU availability, memory...) Record a list of locally available softwares Stores the list of the data available on a server (eventually with their distribution and the way to access them) Remote Procedure Calls ▸ ▸ Synchronous and asynchronous calls to remote procedures Wait / probe / cancel on asynchronous calls Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 45 DIET components ► Master Agent (MA) ▸ ▸ ▸ ► Leader Agent (LA) ▸ ▸ ▸ ▸ ► Distributed servers receiving client requests Make scheduling decisions: pick the best computing resource Enable direct connection between the client and the computer Transmit requests and information between MAs and SeDs Hold a list of requests Know the servers that can solve a given problem and information Specific LA hierarchy deployed depending on the network topology Flexible and scalable scheme ▸ ▸ Various SeD, LA and MA can be implemented Hierarchical load distribution Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 46 Scalability ► ► Most trivial deployment: one MA, directly connected to all SeDs in the platform Multiple MAs ▸ ▸ ▸ ▸ ▸ ► Deploy multiple MAs with known “neighbors” Manage several sub-systems Accept more client connections An MA attempts to answer the request of its clients. If no resources were found by the MA, the request is forwarded to other MAs. No client load distribution Hierarchy of LAs ▸ ▸ Hierarchical load distribution to handle very large systems Experimented up to 10 000's SeDs Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 47 Scheduling ► Customizable scheduler ▸ ▸ ► Scheduling information collected by CoRI ▸ ▸ ► Round-robin, Random, FAST... Possible scheduler plugins (application specific) Static: number of processors, memory, cache and disk performance... Dynamic: execution time prediction, time since last execution, CPU load, disk space... Performance estimation function ▸ ▸ ▸ Needed for smart scheduling Require application codes performance analysis, not always possible Attempts to collect statistics from runs history (NWS, FAST...) Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 48 CoRI/FAST ► Information collection: CoRI ▸ ▸ ▸ ► Modular/extensible collectors: CoRI-Easy, NWS, Ganglia, FAST CoRI manager aggregates information for scheduling Extensible application-dependent metrics Probes: FAST ▸ ▸ ▸ ▸ Dynamic performance forecasting CPU monitoring Extended NWS network monitoring and forecasting Model time and space needed for applicaitons: ∙ ∙ Collects (procedure, machine, parameters) Estimate time for a given run Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 49 Data Management ► GridRPC model ▸ ▸ ► Data transmitted as procedures parameters No data resilient, no data management Data Tree Manager ▸ ▸ ▸ ▸ Enable persistent data on computing nodes Logical data manager hierarchy mapped on the DIET agents hierarchy Physical data manager on each SeD, interfaced to a data mover service No data access control Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 50 Fault Tolerance ► Failure detection ▸ ▸ ► MA/LA failure recovery ▸ ► Heartbeat + timeout method WAN: unpredictable round trip-time, trade-off between long failure detection time and accuracy Agent topology adaptation on failure detection SeD failure recovery ▸ ▸ ▸ ▸ Potentially large computing time wasted Redundant computations: efficient but costly Checkpointing / restarting implemented at service level Trade-off between checkpointing overhead and benefit Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 51 GoDIET ► DIET deployment tool ▸ ▸ ▸ ▸ ▸ ▸ XML description of the resources Desired agents and server heirarchy Additional services Write configurations file Stage files to remote resources Timely launch components Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 52 Deployment and operations Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 53 The EGI Grid example ► ► European Grid Infrastructure Production infrastructure ▸ ▸ Operate a large-scale, production quality grid infrastructure for e-Science This is 24-7-365 activity Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 54 54 In 2015: > 50 countries > 315 sites > 450,000 CPU cores > 280 PB disks > 120 PB tapes > 200 VOs > 20,000 users > 45 Mjobs / month Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 55 Capacity ► Regular increase of infrastructure growth since 2004 CIC Portal: http://cic.gridops.org/ Accounting Portal: http://www3.egee.cesga.es/ Average 1.6 Mjobs / day in 2012 4.4 million normalized CPU years since 2004 1.6 billion jobs processed since 2004 Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 56 User communities ► Regular increase of infrastructure growth since 2004 ▸ > 20,000 registered users Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 57 gLite Software Process Software Development Technical input Error Fixing Software Serious problem Integration Testing & Certification Pre-Production Deployment Packages Problem Fail Production Infrastructure Integration Tests Testbed Deployment Fail Pass Functional Tests Release (RPM-based distribution) Pre-Production Deployment Installation Guide, Release Notes, etc Master Ubinet: Large-Scale Distributed Computing (3) Pass Fail Scalability Tests Pass Johan Montagnat 58 EGI Operation Services Support Structures & Processes Training infrastructure Training activities Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 59 Grid management: structure • Operations Coordination Centre (OCC) ▸ • Regional Operations Centres (ROC) ▸ Master Ubinet: Large-Scale Distributed Computing (3) management, oversight of all operational and support activities ▸ providing the core of the support infrastructure, each supporting a number of resource centres within its region Grid Operator on Duty • Resource centres • Grid User Support (GGUS) ▸ ▸ providing resources (computing, storage, network, etc.); At FZK, coordination and management of user support, single point of contact for users Johan Montagnat 60 Grid monitoring tools ► ► Tools used by the Grid Operator on Duty team to detect problems CIC portal http://cic.gridops.org/ ▸ ▸ ► ► ► ► ► single entry point Integrated view of monitoring tools Site Functional Tests (SFT) Service Availability Monitoring (SAM) Grid Operations Centre Core Database (GOCDB) GIIS monitor (Gstat) ... Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 61 Example report ► ► Availability = the fraction of time the same was up Reliability = Availability / scheduled availability Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 62 User support and feedback ► GGUS ticket submission and tracking portal ▸ ▸ ▸ Problem reporting, logging and traceability Collect, group, and prioritise requirements from the VOs and from within operations Dispatch tickets to entities concerned (operations, middleware development, VO management...) Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 63 The Aladdin / Grid5000 example ► Research-oriented infrastructure ▸ ▸ ▸ ► Understand behavior of grid systems Between production-scale and simulation 7700 CPU cores deployed in 9 sites over France In 2014 Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 64 Aladdin / Grid5000 architecture ► Multi-clusters environment ▸ ▸ ► Multi-sites user account management ▸ ▸ ► User account replicated on all sites NFS file sharing in each site Reserve resource, deploy anything ▸ ▸ ▸ ▸ ► One gateway per site Multiple clusters per gateway Advanced reservation of resource pools System image deployment on reserved nodes Nodes hot rebooting (with specified system image) Get root access to all nodes Security ▸ Confined environment (only ssh traffic from outside to gateways, no outbound connection from worker nodes) Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 65 Aladdin / Grid5000 network ► ► ► Dedicated 10Gbits links “Dark fiber” dedicated lambdas Monitoring tools Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 66 Tooling ► OAR2 ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ ▸ ► First-Fit Scheduler with matching resource (job/node properties) Advance Reservation Batch and Interactive jobs Walltime Hold and resume jobs Multi-queues with priority Best-effort queues (for exploiting idle resources) Epilogue/Prologue scripts No Daemon on compute nodes; rsh and ssh as remote execution protocols Dynamic insertion/deletion of compute node Logging kdeploy ▸ System images deployment on reserved nodes Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 67 Tooling ► Monitoring tools ► Schedules Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 68 References ► ► ► ► EGEE/EGI ▸ L. Erwin, H., Stockinger and K. Stockinger, “Performance Engineering in Data Grids”, journal of Concurrency and Computation: Practice & Experience, vol 17(2-4), Feb.-April, 2005. Grid'5000 ▸ E. Caron and F. Desprez. “DIET: A Scalable Toolbox to Build Network Enabled Servers on the Grid”. International Journal of High Performance Computing Applications, 20(3):335-352, 2006. Condor ▸ D. Thain, T. Tannenbaum and M. Livny, “Distributed Computing in Practice: The Condor Experience”. Concurrency and Computation: Practice and Experience. vol. 17(2-4):323-356, Feb.-April, 2005. OAR ▸ N. Capit, G. Da Costa, Y. Georgiou, G. Huard, C. Martin, G. Mounié, P. Neyron and O. Richard. “A batch scheduler with high level components.” In Cluster Computing and Grid, 2005. Master Ubinet: Large-Scale Distributed Computing (3) Johan Montagnat 69
© Copyright 2024 ExpyDoc