20140430-Li-Challenges Encountered.pptx

SuperStack Next Exit Challenges on CC*IIE at UF Xiaolin (Andy) Li
Associate Professor
Director, Scalable Software Systems Laboratory (S3Lab)
Area Chair of Computer Engineering Division
Department of Electrical and Computer Engineering
University of Florida, USA
http://www.s3lab.ece.ufl.edu/
Acknowledgement: NSF MRI, CC-NIE, GENI, CAREER, PetaApps
UF Campus CI Units •  RC = Research Compu=ng •  UFIT = UF Informa=on Technology –  CNS: Computing & Networking Services •  FLR = Florida Lambda Rail •  SSERCA = Sunshine State Educa=on and Research Compu=ng Alliance –  FAU, FIU, FSU, UCF, UF, UM, USF –  FAMU, UNF, UWF UF CI Goal •  Increase research porHolio –  From about $700 million per year –  To $1 billion per year •  Meet a wide spectrum of research needs –  Health, Engineering, Science –  Agriculture –  Business, Law –  Computer Science and Engineering •  Cloud, Big Data, Future Internet, Storage •  Machine Learning, Data Mining, Bioinforma=cs, CDN •  CPS, Internet of Things, Mobile Social Networks RC infrastructure before 2010 • 
• 
• 
• 
• 
• 
• 
Compute system with 3,500 cores Storage system of 200 TB Networking of 20Gbps Staff of 3.20 FTE 95 research groups from 29 departments 400 users 35 investors RC resources in 2014 •  Compute system with 21,000 cores (x6) •  Storage system total of 5 PB (x25) •  Networking of 200Gbps (x10) –  NSF CC-­‐NIE (Gateway), MRI (CRNv2/GatorCloud) –  ExoGENI Rack –  FutureGrid • 
• 
• 
• 
• 
• 
Staff of 10.25 FTE (x3) UF Informa=cs Ins=tute 327 research groups from 87 departments (x3) 1067 users (x2.5) 150 investors (x4) Suppor=ng over $300M in grant ac=vity Science DMZ: Campus Research Network National Lambda
Rail, Internet2, GENI
(via Jacksonville)
2*10Gb/s
upgraded to
2*100Gb/s
FLR
SSRB
CNS Lab
Physics
HPC Center - Phy
SSRB
Campus Datacenter
2
100G
2*10Gb/s
upgraded to
2*100Gb/s
UF
2
GatorVisor
100G
40G
100G 2
2
Physics
CMS/OSG
46 U
8 U
ECDC
HPC Center - ES
100G
2
40G
4
10G
4
4
100G
46 U
Data Center
8 U
2 U
1 U
CISE Lab
Nets Controller
2 U
1 U
3 U
3 U
3 U
3 U
3 U
Cloud Green
3 U
8 U
Cloud Orange
Data Cloud
VM Cloud
Larsen
HPC Center - Eng
8 U
8 U
Larsen
HCS Lab
Hybrid Controller
Golfer
Cloud Portal
VM Cloud
Data Cloud
8 U
NEB
S3Lab Apps Controller
Golfer
SDN Switch
Phase 1 SDN, 40G/10G
Phase 2 SDN, 100G
SDN Control Plane
New Datacenter Major Data Centers at UF HiPerGator Supercomputer Ranking from top500 supercomputer list # 10 among public universi=es in US # 14 among universi=es in US # 493 among all machines listed HiPerGator Supercomputer CMS/OSG Physics HPC Centers ICBR: Interdisciplinary Center for Biotech Research CTSI: Clinical and Transla=onal Science Ins=tute ACIS/CAC Data Center CHREC Data Center NEB Data Center Stakeholders Security
Physics
Science
Artifacts
CNS
UFIT/
CIO Chemistry
RC/HPC
Biology/
Bioinform
atics
BigData/
HPC/CS&E Research Health
Energy Others
Labor Resources Cost Astronomy
MAE
Climate/
Ag
EE/
SmartGrid
Usage RC: Research Computing
CNS: Computing & Networking Services
CS&E: Computer Science & Engineering
Example Uses of RC Infrastructure •  The RC infrastructure adds value •  Complex mul=disciplinary research endeavors: –  CCMT = Center for Compressible Mul=phase Turbulence, PI: S. Balachandar –  SECIM = Southeast Center for Integra=ve Metabolomics, PI: A. Edison –  Super-­‐app for gene-­‐sequencing, PI: L. Moroz –  Medical sensing, analy=cs, and knowledge fusion, PI: X. Li CMS Experiment $500M experiment Highlighted in red is the endcap muon system whose design and construc=on was led by UF CMS Collabora=on 38 countries 182 ins=tu=ons 3,000 scien;sts and engineers US CMS Collabora=on 48 ins=tu=ons: 2 Na=onal Labs and 46 universi=es UF CMS With almost 40 scien=sts, UF is the 3rd largest US CMS ins=tu=on behind only Fermilab and Univ. of Wisconsin-­‐Madison (and ahead of MIT, Princeton, Caltech, Cornell, CMU, and 39 others ) Compressible Mul;phase Turbulence Goals of the Center §  To radically advance the field of Compressible Mul=phase Turbulence §  To advance predic=ve simula=on science through high performance compu=ng §  To advance a co-­‐design strategy that combines exascale emula=on, exascale algorithms, exascale M&S §  To educate students/postdocs in simula=on science and place them at Na=onal laboratories T1-­‐ 14 Integra=ve Metabolomics Core 1: Tim Garrej DRCC at UCSD Within 2 months Core 3: Rick Yost UF RC Storage Process/Normalize Galaxy + Command line Core 2: Art Edison Client mee=ng PI and group members working on the project Gene sequencing from a ship Medical Sensing, Analy=cs, and Fusion Near-line Stream
Processing
Two-­‐Way Knowledge Fusion Context-­‐Aware Medical Diagnosis Advanced Signal Processing BLE
Noncontact Sensing Brokers
Pub/Sub Streaming
2.7cm
Sensor PlaRorm Online Query, Interaction,
Decision Making Interface
Offline Big
Data Engine
Platform Multiplexing
Brokers
Sensor Data Streaming Distributed File System
Nova
Swift
Cinder
Glance
Keystone
VitalCloud: Noncontact Vital Sensing Cloud
Healthcare Providers Pub/Sub Interaction
BLE
Features, PaIerns, Symptoms 2.7cm
4.8cm
Mobile Apps, Third-­‐Party Apps Web Portals/Browsers Mobile Devices Security & Privacy Mechanisms
Healthcare Providers Social Friends/Rela;ves Pa;ents, Elders 4.8cm
Social Networks/Rela;ves Data Exchange
Middleware
Neutron
EHR Medical Facili;es Challenges & Opportuni=es •  Build services on top of this infrastructure •  Large need with non-­‐tradi=onal users –  Tradi=onal Linux cmdline users are just a frac=on –  Turbo-­‐charge the desktop –  Seamless connec=on over networks to storage and compute •  Cloud is the way people expect services –  Self-­‐provision –  Self-­‐configure –  Always on –  Accessible from anywhere on any device Emerging Programming Models -­‐ for Big Data, Big System, Big Science •  Emerging frameworks and rapid innova=ons CIEL" Pregel
Pig
Percolator
Dryad
Conven;onal Prac;ces •  Rela=vely sta=c environments –  HPC administrators are responsible for maintaining the sonware stack –  Use PBS/Torque for batch job execu=on –  Focusing on MPI framework –  (Op=onal) Maintain separated cluster for other frameworks, e.g., MapReduce, OpenStack. Inadequate Support -­‐ Mismatch between Requirements of Rapid Innova=on and Rela=vely Rigid Resource Configura=on •  Users –  Try new frameworks for big data analy=cs; –  Innovate on current data analysis frameworks; –  Cooperate with other organiza=ons; –  …… HPC Clusters
»  Reduce administrative operation
overhead;
»  Improve resource utilization;
»  Enrich user experience;
»  ……
q  No privileges to users; q  No guarantee for cross-­‐organiza=on compa=bility; q  No method for fine-­‐grained, dynamic resource alloca=on; q  …… Time for Change Current Sta;c, Par;;oned Target Unified, Mul;plexed, Dynamic Hadoop CIEL" Pregel
Pig
Percolator
Dryad
OpenStack Container
Torque Virtual
Machine
Bare Metal
OpenFlow SuperStack:
Unified Campus Cloud
Software-Defined Ecosystem
*-as-a-Service
Compute, Storage, Network, Platform, Big Data, HPC,
CPS, Cloud, AppStore/AppEngine (Res, Net, Sec)