Folien / Slides (PDF)

Ganeti
Private Cloud as Google does it
· Helga Velroyen <[email protected]>
· Linuxtag Berlin, May 9th, 2014
A Ganeti Cluster
· Instance: a virtualization guest
· Node: a virtualization host
· Nodegroup: a homogeneous set of nodes
· Cluster: a set of nodes, managed as a collective, partitioned by nodegroups
4/20
What can it do?
· Manage clusters of physical machines
· Deploy virtual machines on them
- Resiliency to failure (distributed storage)
- Live migration
- Ease of repairs and hardware swaps
- Cluster balancing
5/20
Ideas
· Interact with the cluster as an entity, instead of the individual machines.
· Making the virtualization entry level as low as possible
- Easy to install/manage
- Lightweight (no "expensive" dependencies)
- No specialized hardware needed (eg. SANs)
- Start small, grow big
· Scale to enterprise ecosystems
- Manage simultaneously from 1 to ~200 host machines
- Access to advanced features (distributed storage, live migration, cluster
balancing)
6/20
Technologies
· Linux and standard utils (iproute2, bridge-utils, ssh)
· Hypervisors:
- Xen, KVM, LXC
· Storage:
- DRBD, LVM, file, distributed storage, Ceph/Gluster
· Programming languages:
- Python, Haskell
7/20
Controlling Ganeti
· Command line (*)
· RAPI (Rest-full http interface) (*)
· Webinterfaces:
- Ganeti Web manager, aiming for admins, but includes "self-service
management" for users
- ganetimgr web manager, simplified multicluster web manager for end
users
- Synnefo, complete cloud service solution, OpenStack API compatible
(*) Programmable interfaces
8/20
Production cluster
current master node
Ganeti node
Ganeti node
Ganeti node
Ganeti node
Ganeti node
Ganeti node
Ganeti node
Ganeti node
Ganeti node
Ganeti node
Ganeti node
Ganeti node
...
Ganeti node group / rack
Ganeti node
Ganeti node
...
Ganeti node group / rack
Per machine monitoring
SSH access
Ganeti node
Per machine monitoring
Remote API
Ganeti node
Ganeti node
Ganeti node
...
Per machine monitoring
As we use it in a Google Datacentre
Ganeti node group / rack
Ganeti cluster
9/20
Fleet at Google
Ganeti cluster
type Office
Fleet Management
no maint window
Virgil
Ganeti cluster
type General
maint window A
Euripides
Office ZURICH
Dradis
Ganeti cluster
type General
maint window B
Ganeti cluster
Ganeti cluster
type General
maint window A
type Ubiquity
no maint window
Ganeti cluster
type Dedicated
maint window A
Ganeti cluster
type General
maint window B
Datacenter Z
an
tr
VM
sf
er
Ganeti cluster
type Ubiquity
no maint window
Ganeti cluster
type Dedicated
maint window B
Ganeti cluster
type General
maint window A
Datacenter X
Datacenter Y
10/20
Instance provisioning at Google
Ganeti cluster
Ganeti cluster
RAPI interface
type General
Alloc request
type General
Virgil
Ganeti cluster
Ganeti cluster
scan capacity
type Ubiquity
Machine DB
type Dedicated
Monitoring
gather capacity
11/20
Auto node repair at Google
Send machine
Euripides
Virgil
the broken machine (4)
Ganeti HW
Ganeti HW
broken HW
Ganeti HW
Machine database
Ganeti cluster
Ganeti HW
Send to repairs (5)
Tell cluster to evacuate
Ganeti HW
Monitoring ­ detects fault (1)
Mark machine broken (3)
to repairs (2)
12/20
Auto node readd at Google
Euripides
Detects machine
Machine DB
was repaired (1)
Mark machine serving (6)
Tells Virgil to reintegrate machine (3)
Watches machine for 24hrs (2)
Tell cluster to add it (5)
Ganeti HW
Ganeti HW
Ganeti HW
Ganeti HW
repaired HW
Ganeti HW
Virgil
Configure machine (4)
Dradis
13/20
Ganeti 2.8, 2.9
2.8.4
· Downgrading
· Autorepair tool
· Hroller
· Improvements on storage, monitoring
2.9.6
· DRBD 8.4 support
· Continued work on monitoring, storage, hroller
14/20
Ganeti 2.10
2.10.3, available in debian wheezy backports, debian jessi
· Cross-cluster instance moves:
- automatic node allocation on destination cluster
- convert disk templates on the fly
· Cluster balancing based on CPU load
· KVM: Hotplug support, direct access to RBD storage
· Ganeti upgrades!
15/20
Updates
In the past, updating Ganeti was a pain:
/etc/init.d/ganeti stop // on all nodes
apt-get install ganeti2=2.7.1-1 ganeti-htools=2.7.1-1 // on all nodes
/usr/lib/ganeti/tools/cfgupgrade // on master
/etc/init.d/ganeti start // on all nodes
gnt-cluster redist-conf // on master
... // lots of other steps, depending on the version
// If something goes wrong, fix the mess manually.
From 2.10 on, Ganeti comes with a built-in upgrade mechanism:
apt-get install ganeti-2.11 // on all nodes
gnt-cluster upgrade --to 2.11 // on master
gnt-cluster upgrade --to 2.10 // to roll back
Note that you still have to install the new and deinstall the old packages
manually.
16/20
Ganeti 2.11
Current stable release, 2.11.0.
· RPC security: individual node certificates
· Compression for instance moves / backups / imports
· Configurable SSH ports per node group
· Gluster support (experimental)
17/20
Current and Future development
No guarantees!
· Network improvements (IPv6, more flexibility)
· Storage: more work on shared storage
· Heterogeneous clusters
· Improvements on cross-cluster instances moves
Google Summer of Code:
· Make LXC support production-ready
· Conversion between arbitrary disk templates
18/20
Open Source Events
Confirmed:
· Linuxcon Japan, Tokyo, May 20th 2014
· Ganeticon, Portland, Oregon, September
Not confirmed yet:
· Linuxcon North America, Chicago, August
· FrOSCon, St. Augustin, Germany, August
· LISA '14, Seattle, November
19/20
Thank You!
Questions?
· © 2010 - 2014 Google
· Use under GPLv2+ or CC-by-SA
· Some images borrowed / modified from Lance Albertson, Iustin
Pop, and Guido Trotter
· Some slides were borrowed / modified from Tom Limoncelli
·