Galera Performance and Scaling Analysis

Galera Performance and Scaling Analysis
LIA Project Proposal
Esan Wit
Jan-Willem Selij
February 21, 2014
1
Introduction
Normal replication as offered by MySQL is an asynchronous replication with a single master and multiple
slaves. Although the slaves allow for read scalability, write scalability remains problematic. This is
especially true with database constraints that must be checked between tables.
Galera[1] is marketed as a clustering software enabling scalable MySQL databases. Galera claims to
deliver consistency, read and write scalability, no slave lag, and improved availability. It is designed to
work as middleware with a patched MySQL version to enable Write-Set Replication (WSREP) between
multiple master nodes. Galera uses a form of multi-master synchronous replication. This means that all
nodes in the server can be read and written to at the same time and synchronous replication makes it
that all nodes have the same view of the database. This allows all databases to successfully write data
and maintain a consistent state. We intend to analyze the behavior of Galera in large scale clusters.
Current research and benchmarks are limited to relatively small clusters, 1 to 4 nodes.
2
Related Work
Benchmarks performed by Galera use clusters with a size between 1 and 4 nodes[2]. Other work claims a
size of 3 to 6+ nodes is used for the more realistic clusters[3]. Research performed by Tkachenko shows
that Galera incurs only minor overhead on smaller node counts[4].
Limitations on the replication as used by Galera are not tested for higher node counts. Although
research shows there is some overhead the practical implications of this and actual limits are not discussed.
3
Research Questions
Our main purpose is to determine what the practical limitations are of Galera in terms of performance
and scalability. To help us determine this more objectively the following sub-questions will be discussed:
• What is the performance of a normal-sized cluster?
• What is the performance of a cluster significantly larger then normal?
• What is the network overhead caused by the replication in the cluster?
4
Scope
As performance can be measured in a plentitude of ways we will focus on the number of transactions per
second as a performance benchmark. This will also be plotted against the size of the transactions as we
suspect there may be a bottleneck in network throughput.
1
5
Approach
Initially we plan to do a bit of literature study to determine normal cluster sizes and performance
benchmarks. These will be used as a baseline. Subsequently we plan to increase the scale of the previous
experiments and analyze their behavior. If time permits we also intend to do an analysis of link latency
and throughput between the nodes on cluster performance.
6
Materials
In order to perform test on larger scale cluster we require an environment where multiple machines can
be deployed. We presume to test on counts between 3 and 32 for cluster sizes. Although depending on
the performance more nodes may be required to reach the actual limits.
For the analysis of the impact of network latency and throughput on cluster performance we need to
have some form of control on the link between nodes.
Possibly rate limit from machines on high bandwidth links?
7
Planning
Week 1
Week 2
Week 3
Week 4
Literature study and performance analysis of ”normal” size cluster.
Testing performance on larger clusters (2, 4, 8, 16 times larger).
Analyze the influence of link latency and throughput on cluster performance.
Begin report.
Work on report and presentation.
References
[1] Codership.
Codership
using-galera-cluster, .
galera
cluster.
http://www.codership.com/content/
[2] Codership. Codership benchmarks. http://www.codership.com/info/benchmarks, .
[3] Jay
Janssen.
Multicast
replication
in
percona
xtradb
cluster
(pxc)
and
galera.
http://www.mysqlperformanceblog.com/2013/06/05/
multicast-replication-in-percona-xtradb-cluster-pxc-and-galera/, 2013.
[4] Vadim Tkachenko. Benchmarking galera replication overhead. http://www.mysqlperformanceblog.
com/2011/10/13/benchmarking-galera-replication-overhead/, 2011.
2