Download Full Version

RozoFS® Performance
In Sequential and Random IO Workloads
Fizians SAS - April 2014
rozofs.com
rozofs.org
RozoFS® Benchmarking
With the explosion of data, storage difcultes are amplifed and require companies to fnd new ways
to manage the scalability of their infrastructure, while controlling costs and maintaining the
performance required by their applicatons.
Scale-Out NAS is the last evoluton of distributed fle system which allows scalability of your
architecture and a management simplicity. But to provide with a high data protecton it must
manage redundancy.
The usual replicaton does not match the new scale of storage. Indeed, distributed storage involves
lots of components likely to fail and the 3 replicas needed to provide service contnuity lead to extra
costs unaffordable at a petabyte scale.
Everyone agrees that erasure coding is the right answer. It provides with high data protecton
(actually beter than replicas) with a redundancy in between 0.3 and 0.5. However Everyone agrees
that erasure coding suffers from poor performance when dealing with small data and thus dedicates
this tremendous technology to large cold objects storage.
However, customers need a huge capacity of hot storage matching their existng infrastructures that
were not designed with object storage in mind.
Antcipatng all of these new features of storage, Fizians has designed both a distributed File System
and a relevant erasure code. These two technologies were designed to be gathered, thus providing
an efcient erasure code based Scale-Out NAS: RozoFS®!
RozoFS® Overview
RozoFS® is a scale-out NAS fle system. RozoFS® aims to provide an open source high performance and
high availability scale out storage sofware appliance for intensive disk IO data center scenario.
RozoFS® provides an easy way to scale to petabytes storage but using erasure coding. It was designed
to provide very high availability levels with optmized raw capacity usage on heterogeneous
commodity hardware.
RozoFS® provides a natve open source POSIX flesystem, build on top of a usual out-band scale-out
storage architecture. The RozoFS® specifcity lies in the way data is stored. The data to be stored is
translated into several chunks named projectons using Mojete Transform and distributed across
storage devices in such a way that it can be retrieved even if several pieces are unavailable. On the
other hand, chunks are meaningless alone. Redundancy schemes based on coding techniques like the
one used by RozoFS® allow to achieve signifcant storage savings as compared to simple replicaton.
Benchmark Goals
The main goal of this benchmark is to gather enough performance measurement of the RozoFS ®
sofware either in Random and Sequental IO workloads and thus bring out both its versatlity and its
scalability.
In doing so, demonstrate the non-blocking usage of erasure coding in a POSIX distributed Scale-out
NAS: RozoFS®
These tests are primarily designed to measure the RozoFS ® performance and its ability to get the best
from the hardware it would have to manage. They are not oriented towards the measurement of the
given underlying infrastructure. Indeed, the platorm used is unbalanced in terms of processing
power and bandwidth compared to the number of disks.
RozoFS Performance in Sequential and Random IO Workload
2
Benchmark Setup
The benchmark environment is composed of RozoFS ® cluster of 4 identcal nodes connect thru a 10
Gb network. From this 4 nodes, 3 distnct volumes are confgured each of them gathering disk with
different performance:
● SSD based;
● SAS 10K based;
● SAS 7.2K based.
3 exports (aka mountable fle systems) were created, 1 on each volume. These exports were then
mount on each node of the cluster, ready to be tested.
(see Appendix A - Platorm Confguraton for more details).
Illustration 1: benchmark environment.
Benchmarks are made via Iozone1.
Random IO Benchmark Results
The Random IO benchmarks are only passed on the fle system relying on SSD disks volume that is
the more relevant for this case. All of them use a 4 KB record size on 10 MB fles. The following
measures are made:
● On 1 node from 1 to 16 simultaneous processes reading/writng;
● From 1 to 4 nodes with 32 simultaneous processes reading/writng on each node (thus up to
128 simultaneous processes).
(See Appendix B - Running the Benchmark for more details).
1
www.iozone.org.
RozoFS Performance in Sequential and Random IO Workload
3
Random IO on a Single Node
Single Node Random IO
80000
70000
60000
IOPS
50000
Write
Read
40000
30000
20000
10000
0
1
2
4
8
16
# Processes
Illustration 2: 4K random read/write on a single node. From 1 to 16 files simultaneous.
File size (KiB) for each threads
10,240
Record size used (KiB)
4
Nb. Client Node(s)
1
Nb. threads in parallel
Series
Type of HDD Nb. files read or write in parallel
1
2
4
8
1
2
4
8
16
16
1
SSD
Write random throughput (IO/s)
19,173
35,890
40,912
48,324
48,338
2
SSD
Write random throughput (IO/s)
19,300
35,649
44,354
49,162
50,946
3
SSD
Write random throughput (IO/s)
19,287
35,591
40,273
49,723
49,740
4
SSD
Write random throughput (IO/s)
19,231
36,086
46,398
48,114
46,211
5
SSD
Write random throughput (IO/s)
19,461
36,259
40,147
49,609
50,310
1
SSD
Read random throughput (IO/s)
15,225
29,251
50,495
52,054
74,940
2
SSD
Read random throughput (IO/s)
15,211
28,902
51,571
47,645
74,528
3
SSD
Read random throughput (IO/s)
15,356
28,832
52,316
63,640
71,245
4
SSD
Read random throughput (IO/s)
15,089
28,947
50,902
52,771
71,700
5
SSD
Read random throughput (IO/s)
16,797
29,504
51,675
53,400
76,220
Average
SSD
Write random throughput (IO/s)
19,290
35,895
42,417
48,987
49,109
Average
SSD
Read random throughput (IO/s)
15,535
29,087
51,392
53,902
73,727
Tableau 1: detailed results for single node 4K random read write.
RozoFS Performance in Sequential and Random IO Workload
4
Random IO on Multiple Nodes
Multiple Nodes Random IO
120000
Cumulated IOPS
100000
80000
Write
Read
60000
40000
20000
0
1
2
3
4
# Nodes
Illustration 3: 4K random read/write aggregated from 1 to 4 nodes simultaneous.
RozoFS Performance in Sequential and Random IO Workload
5
File size (KiB) for each thread
10,240
Record size used (KiB)
4
Nb. thread(s) / node
32
Nb. threads in parallel
32
64
96
Nb. node(s)
1
2
3
128
4
Series
Type of HDD
1
SSD
Write random throughput (IO/s)
39,778
45,948
77,949
85,094
2
SSD
Write random throughput (IO/s)
44,676
75,317
87,137
82,615
3
SSD
Write random throughput (IO/s)
37,985
74,447
83,806
76,456
4
SSD
Write random throughput (IO/s)
44,948
74,365
80,380
83,317
5
SSD
Write random throughput (IO/s)
40,186
73,704
73,834
83,319
1
SSD
Read random throughput (IO/s)
52,409
96,752
105,296
106,344
2
SSD
Read random throughput (IO/s)
57,990
101,095
113,462
104,880
3
SSD
Read random throughput (IO/s)
57,420
96,278
116,735
113,890
4
SSD
Read random throughput (IO/s)
50,413
101,040
103,054
124,910
104,357
5
SSD
Read random throughput (IO/s)
52,796
95,591
104,996
Average
SSD
Write random throughput (IO/s)
41,515
68,756
80,621
82,160
Average
SSD
Read random throughput (IO/s)
54,205
98,151
108,709
110,876
Tableau 2: detailed results for multiple nodes 4K random read write.
Sequental IO Benchmark Results
The Sequental IO benchmarks are only passed on the fle system relying on SAS 7.2K disks volumes
that is the more relevant for this case. All of them use a 64 KB record size on fles of 100 MB. The
following measures are made:
● On 1 node from 1 to 6 simultaneous processes reading/writng ;
● From 1 to 4 nodes with 5 simultaneous processes reading/writng on each node.
(See Appendix B - Running the Benchmark for more details).
Sequential IO on a Single Node
Single Node Sequential IO
2500000
Throughput (KiBps)
2000000
1500000
Write
Read
1000000
500000
0
1
2
3
4
5
6
# Processes
Illustration 4: 64K sequential read/write on a single node. From 1 to 6 files simultaneous.
RozoFS Performance in Sequential and Random IO Workload
6
File size (KiB) for each thread
Multiple Node Sequential IO
102,400
Record size used (KiB)
Nb. Client Node(s)
Series
Cumulated Throughput (KiBps)
1
7000000
64
1
Nb. threads in parallel
Type of HDD Nb. files read or write in parallel
SAS 7.2K Write sequential throughput (KiB/s)
6000000
1
2
3
4
5
1
2
3
4
5
6
6
574,509
744,870
770,642
1,247,073
1,306,929
1,355,549
2
SAS 7.2K
Write sequential throughput (KiB/s)
572,893
743,575
761,399
1,094,711
1,171,431
1,130,293
3
SAS 7.2K
5000000
Write sequential throughput (KiB/s)
567,536
753,694
759,430
1,255,656
1,206,491
1,290,407
4
SAS 7.2K
Write sequential throughput (KiB/s)
560,705
731,469
754,287
1,213,220
1,194,163
1,337,038
Write sequential throughput (KiB/s)
571,671
736,048
758,037
1,243,426
1,246,932
1,168,703
Write
Read sequential throughput (KiB/s)
465,178
970,527
1,311,394
1,653,565
1,811,093
2,124,690
Read
Read sequential throughput (KiB/s)
419,419
956,857
1,302,198
1,670,267
1,818,390
1,839,916
5
4000000
SAS 7.2K
1
SAS 7.2K
2
3000000
SAS 7.2K
3
SAS 7.2K
2000000
Read sequential throughput (KiB/s)
477,839
972,956
1,310,371
1,629,647
1,713,050
2,147,444
SAS 7.2K
Write sequential throughput (KiB/s)
421,065
967,066
1,293,147
1,705,671
1,758,278
1,882,553
5
SAS 7.2K
1000000
Read sequential throughput (KiB/s)
462,852
955,792
1,311,260
1,657,820
1,852,897
2,146,782
Average
SAS 7.2K
Write sequential throughput (KiB/s)
569,463
741,931
760,759
1,210,818
1,225,189
1,256,398
Average
SAS 7.2K0 Read sequential throughput (KiB/s)
449,271
964,639
1,305,674
1,663,394
1,790,742
2,028,277
4
2 single nodes 64K3 sequential read write.
4
Tableau 13: detailed results for
# Nodes
Sequential IO on Multiple Nodes
Multiple Node Sequential IO
Cumulated Throughput (KiBps)
7000000
6000000
5000000
4000000
Write
Read
3000000
2000000
1000000
0
1
2
3
4
# Nodes
Illustration 5: 64K sequential read/write aggregated from 1 to 4 nodes simultaneous.
RozoFS Performance in Sequential and Random IO Workload
7
File size (KiB) for each threads
102,400
Record size used (KiB)
64
Nb. thread(s) / node
5
Nb. threads in parallel
5
10
15
Nb. node(s)
1
2
3
20
4
Nb. files read or write in parallel
5
10
15
20
Series
Type of HDD
1
SAS 7.2K
Write sequential throughput (KiB/s)
587,678
1,508,037
2,271,113
2,149,317
2
SAS 7.2K
Write sequential throughput (KiB/s)
1,225,142
2,435,336
3,028,145
3,267,827
3
SAS 7.2K
Write sequential throughput (KiB/s)
1,341,135
2,243,494
3,038,279
3,424,960
4
SAS 7.2K
Write sequential throughput (KiB/s)
1,198,583
2,332,085
3,210,770
3,612,069
5
SAS 7.2K
Write sequential throughput (KiB/s)
1,106,982
2,036,082
3,094,232
3,465,675
1
SAS 7.2K
Read sequential throughput (KiB/s)
1,787,721
3,351,234
4,603,602
5,934,271
2
SAS 7.2K
Read sequential throughput (KiB/s)
1,948,195
3,488,238
4,786,637
5,898,291
3
SAS 7.2K
Read sequential throughput (KiB/s)
1,790,178
3,219,196
4,662,605
5,942,076
4
SAS 7.2K
Write sequential throughput (KiB/s)
1,685,320
3,361,622
4,173,239
5,793,129
5
SAS 7.2K
Read sequential throughput (KiB/s)
1,800,245
3,425,635
4,787,534
6,215,238
Average
SAS 7.2K
Write sequential throughput (KiB/s)
1,091,904
2,111,007
2,928,508
3,183,970
Average
SAS 7.2K
Read sequential throughput (KiB/s)
1,802,332
3,369,185
4,602,723
5,956,601
Tableau 4: detailed results for multiple nodes 64K sequential read write.
Conclusion
Given the scale of systems, failure of a signifcant subset of the consttuent nodes, as well as other
network components, is a norm rather than the excepton. To enable a highly available overall
service, it is thus essental to both tolerate short-term outages of some nodes and to provide
resilience against permanent failures of individual components. Given the performance above,
RozoFS® breaks the FEC technical limitatons, not only FEC can be used for cold large data storage
(Object, Archival etc…) but it can provides with benefts of FEC for all storage needs (Multmedia post
producton, virtualizaton, HPC, Big Data…).
RozoFS Performance in Sequential and Random IO Workload
8
Appendix A - Platorm Confguraton
The following secton describes hardware and sofware confguraton used in previous tests. We
installed four servers Fujitsu RX300-S8 with Debian Wheezy OS, the four servers are used as storage
servers and clients for RozoFS ® and two servers are used to store RozoFS ® metadata ( high-availability
with DRBD and Pacemaker).
In this chapter we will describe the following components:
● Hardware confguraton;
● RAID server confguraton;
● File System confguraton;
● Network confguraton;
● RozoFS® confguraton.
The following diagram describes the confguraton:
Illustration 6: RozoFS® configuration.
RozoFS Performance in Sequential and Random IO Workload
9
Hardware Specifications
The following table describes the hardware characteristcs for the four servers (based on Fujitsu
hardware):
Server type
Fujistu RX300-S8 (R3008S0035FR)
CPU model name
2 x Intel Xeon CPU E5-2650 v2 @ 2.60GHz (8 cores & 16 threads/core)
Memory (GB)
64 GB
RAID card
RAID Controller SAS 6Gbit/s 1GB (D3116C)
Virtual DRIVE 0
- Seagate Constellaton.2, SAS 6Gb/s, 1TB, 2.5", 7200 RPM (ST91000640SS)
- 11 drives
- RAID 5
Virtual DRIVE 1
- Seagate Pulsar.2, SAS 6Gb/s, 100GB, 2.5", MLC (ST100FM0002)
- 1 drive
- RAID 0
Virtual DRIVE 2
- WD Xe, SAS 6Gb/s, 900GB, 2.5", 10000 RPM (WD9001BKHG)
- 4 drives
- RAID 0
Ethernet
controllers
- Intel 82599EB 10-Gigabit SFI/SFP+ - 2*10Gb
- Intel I350 Gigabit Network - 2*1Gb
- Intel I350 Gigabit Network - 4*1Gb
RAID Specifications
This table describes the RAID HDD arrays confgured for each servers:
Virtual drive ID
0
1
2
Number of Drives
11
1
4
RAID Level
5
0
0
Strip Size
64 KB
64 KB
64 KB
Read Policy
ReadAdaptve
ReadAdaptve
ReadAdaptve
Write Policy
WriteBack
WriteThrough
WriteThrough
IO Policy
Direct
Direct
Direct
Drive Cache
Enabled
Enabled
Enabled
Total Size
9.091 TB
92.656 GB
3.272 TB
RozoFS Performance in Sequential and Random IO Workload
10
Partitioning and File Systems Specifications
The following tables summarize the parttoning and fle systems used for the different RAID HDD
arrays.
Parttoning and fle systems used on virtual drive 0:
LV
Name
Size
Mountpoint
FS
Mount Options
root
332 MiB
/
ext4
errors=remount-ro
usr
8,38 GiB
/usr
ext4
defaults
var
2,79 GiB
/var
ext4
defaults
tmp
380 MiB
/tmp
ext4
defaults
swap_1
29,80 GiB
none
swap
sw
home
93,13 GiB
/home
ext4
defaults
xfs
noatme,
nodiratme,
logbufs=8,
logbsize=256k,
largeio,
inode64,
swalloc,
allocsize=131072k,
nobarrier
storages
8,96 TiB
/srv/rozofs/storage-sas7.2K
Create options
-f -d su=64k,
sw=10
-l version=2,
su=64k
-isize=512
Parttoning and fle systems used on virtual drive 1:
LV Name
Size
lv-ssdstorages
88,02 GiB
lv-ssd-exports
4,63 GiB
Mountpoint
/srv/rozofs/storagessd
/srv/rozofs/exportsssd
FS
Mount Options
Create options
xfs
noatme,
nodiratme,
logbufs=8,
logbsize=256k,
largeio,
inode64,
swalloc,
allocsize=131072k,
nobarrier
-f -d
su=64k,sw=1 -l
version=2,su=6
4k -isize=512
user_xatr,
acl,
-b 1024 -I 256 -i
4096 -E
ext4
RozoFS Performance in Sequential and Random IO Workload
11
noatme
stride=64,stripe
_width=64 -J
size=400
-Odir_index,fle
type,^extents
Parttoning and fle systems used on virtual drive 2:
LV Name
lv-sas-10Kstorages
lv-ssd-exports
Size
88,02 GiB
4,63 GiB
Mountpoint
/srv/rozofs/storagesas-10K
/srv/rozofs/exportssas-10K
FS
Mount Options
Create
options
xfs
noatme,
nodiratme,
logbufs=8,
logbsize=256k,
largeio,
inode64,
swalloc,
allocsize=131072k,
nobarrier
-f -d
su=64k,sw=1 -l
version=2,su=
64k -isize=512
user_xatr,
acl,
noatme
-b 1024 -I 256
-i 4096 -E
stride=64,strip
e_width=64 -J
size=400
-Odir_index,fl
etype,^extents
ext4
Network Specifications
Network interfaces usage for RozoFS® storage servers:
Network
interface type
Interface
IDs
Usage
Bonding
1Gb
0&1
RozoFS meta-data IO /
Cluster management /
Meta-data replicaton
Yes
(LACP)
1Gb
2
Not used
---
1Gb
3
Not used
---
1Gb
4
Not used
---
RozoFS Performance in Sequential and Random IO Workload
12
1Gb
5
Not used
---
10Gb SFP+
6
RozoFS data IO
No
10Gb SFP+
7
RozoFS data IO
No
IP addressing plan:
Server hostname
Interface
IDs
Bonding
IP
MTU
rx300-01
0&1
Yes (LACP)
1.1.0.81/16
1500
rx300-01
6
No
192.168.30.11/24
9000
rx300-01
7
No
192.168.30.12/24
9000
rx300-02
0&1
Yes (LACP)
1.1.0.82/16
1500
rx300-02
6
No
192.168.30.21/24
9000
rx300-02
6
No
192.168.30.22/24
9000
rx300-03
0&1
Yes (LACP)
1.1.0.83/16
1500
rx300-03
6
No
192.168.30.31/24
9000
rx300-03
7
No
192.168.30.32/24
9000
rx300-04
0&1
Yes (LACP)
1.1.0.84/16
1500
rx300-04
6
No
192.168.30.41/24
9000
rx300-04
7
No
192.168.30.42/24
9000
Operating System and Tuning Options
The following table summarizes the operatng systems and various sofwares used on each server:
Servers
rx300-01
Operating system /
kernel version
Additional software
Debian Wheezy /
3.2.0-4-amd64
rozofs-exportd
rozofs-storaged
rozofs-rozofsmount
rozofs-manager-lib
rozofs-manager-cli
rozofs-manager-agent
rozofs-rozodebug
drbd8-utls
pacemaker
RozoFS Performance in Sequential and Random IO Workload
13
rx300-02
rx300-03
rx300-04
Debian Wheezy /
3.2.0-4-amd64
rozofs-exportd
rozofs-storaged
rozofs-rozofsmount
rozofs-manager-lib
rozofs-manager-cli
rozofs-manager-agent
rozofs-rozodebug
drbd8-utls
pacemaker
Debian Wheezy /
3.2.0-4-amd64
rozofs-storaged
rozofs-rozofsmount
rozofs-manager-lib
rozofs-manager-cli
rozofs-manager-agent
rozofs-rozodebug
Debian Wheezy /
3.2.0-4-amd64
rozofs-storaged
rozofs-rozofsmount
rozofs-manager-lib
rozofs-manager-cli
rozofs-manager-agent
rozofs-rozodebug
The I/O scheduler is a component of the Linux kernel which decides how the read and write buffers
are to be queued for the underlying device. For storaged and exportd daemon the most appropriate
scheduler seems to be the deadline scheduler.
The I/O scheduler has been modifed via the sysfs virtual fle system:
# echo deadline > /sys/block/sda/queue/scheduler
# echo deadline > /sys/block/sdb/queue/scheduler
# echo deadline > /sys/block/sdc/queue/scheduler
The following values has been also modifed to provide additonal latency benefts:
#
#
#
#
#
echo
echo
echo
echo
echo
‘0’ > /sys/block/sda/queue/iosched/front_merges
‘150’ > /sys/block/sda/queue/iosched/read_expire
‘1500’ > /sys/block/sda/queue/iosched/write_expire
‘4096’ > /sys/block/sda/queue/nr_requests
‘4096’ > /sys/block/sda/queue/read_ahead_kb
#
#
#
#
#
echo
echo
echo
echo
echo
‘0’ > /sys/block/sdb/queue/iosched/front_merges
‘150’ > /sys/block/sdb/queue/iosched/read_expire
‘1500’ > /sys/block/sdb/queue/iosched/write_expire
‘4096’ > /sys/block/sdb/queue/nr_requests
‘4096’ > /sys/block/sdb/queue/read_ahead_kb
# echo ‘0’ > /sys/block/sdc/queue/iosched/front_merges
RozoFS Performance in Sequential and Random IO Workload
14
#
#
#
#
echo
echo
echo
echo
‘150’ > /sys/block/sdc/queue/iosched/read_expire
‘1500’ > /sys/block/sdc/queue/iosched/write_expire
‘4096’ > /sys/block/sdc/queue/nr_requests
‘4096’ > /sys/block/sdc/queue/read_ahead_kb
For make these values permanent so they are automatcally set at system startup by installing the
sysfsutls package and modifying the confguraton fle /etc/sysfs.conf on each server.
# apt-get install sysfsutils
Lines added to fle /etc/sysfs.conf:
# sda = SAS 7.2K
block/sda/queue/scheduler = deadline
block/sda/queue/iosched/front_merges = 0
block/sda/queue/iosched/read_expire = 150
block/sda/queue/iosched/write_expire = 1500
block/sda/queue/nr_requests = 4096
block/sda/queue/read_ahead_kb = 4096
# sdb = SSD
block/sdb/queue/scheduler = deadline
block/sdb/queue/iosched/front_merges = 0
block/sdb/queue/iosched/read_expire = 150
block/sdb/queue/iosched/write_expire = 1500
block/sdb/queue/nr_requests = 4096
block/sdb/queue/read_ahead_kb = 4096
# sdc = SAS 10K
block/sdc/queue/scheduler = deadline
block/sdc/queue/iosched/front_merges = 0
block/sdc/queue/iosched/read_expire = 150
block/sdc/queue/iosched/write_expire = 1500
block/sdc/queue/nr_requests = 4096
block/sdc/queue/read_ahead_kb = 4096
The following kernel parameters have also been modifed via sysctl tool to increase performance:
#
#
#
#
#
#
#
#
sysctl
sysctl
sysctl
sysctl
sysctl
sysctl
sysctl
sysctl
-w
-w
-w
-w
-w
-w
-w
-w
vm.swappiness=0
net.core.rmem_max=134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.netdev_max_backlog = 300000
net.ipv4.tcp_moderate_rcvbuf = 1
net.unix.max_dgram_qlen = 128
These settings take effect immediately but do not persist over a reboot. To make these values
permanent, we add the following values to /etc/sysctl.conf:
# No swap
vm.swappiness = 0
# Tuning for 10GbE
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.netdev_max_backlog = 300000
net.ipv4.tcp_moderate_rcvbuf = 1
# Update nb. of datagrams can be queued on a unix domain socket's
net.unix.max_dgram_qlen = 128
RozoFS Performance in Sequential and Random IO Workload
15
RozoFS® Configuration
As indicated above the RozoFS® platorm is confgured with three separate storage volumes (one on
SAS 7.2K HDDs, one on SAS 10K HDDs and one on SSD HDDs) and one Rozo fle system will be created
on each volume.
Storaged confguraton fles (/etc/rozofs/storage.conf) have been modifed on each server as
shown below:
/etc/rozofs/storage.conf on rx300-01:
threads = 8;
storio = "multiple";
listen = (
{
addr = "192.168.30.11"; port = 41001;
},
{
addr = "192.168.40.12"; port = 41001;
} );
storages = (
{
cid = 1; sid = 1; root = "/srv/rozofs/storage-ssd";
},
{
cid = 2; sid = 1; root = "/srv/rozofs/storage-sas-10K";
},
{
cid = 3; sid = 1; root = "/srv/rozofs/storage-sas-7.2K";
}, );
/etc/rozofs/storage.conf on rx300-02:
threads = 8;
storio = "multiple";
listen = (
{
addr = "192.168.30.21"; port = 41001;
},
{
addr = "192.168.40.22"; port = 41001;
} );
storages = (
{
cid = 1; sid = 2; root = "/srv/rozofs/storage-ssd";
},
{
cid = 2; sid = 2; root = "/srv/rozofs/storage-sas-10K";
},
{
cid = 3; sid = 2; root = "/srv/rozofs/storage-sas-7.2K";
}, );
/etc/rozofs/storage.conf on rx300-03:
threads = 8;
storio = "multiple";
listen = (
{
addr = "192.168.30.31"; port = 41001;
},
{
RozoFS Performance in Sequential and Random IO Workload
16
addr = "192.168.40.32"; port = 41001;
} );
storages = (
{
cid = 1; sid = 3; root = "/srv/rozofs/storage-ssd";
},
{
cid = 2; sid = 3; root = "/srv/rozofs/storage-sas-10K";
},
{
cid = 3; sid = 3; root = "/srv/rozofs/storage-sas-7.2K";
}, );
/etc/rozofs/storage.conf on rx300-04:
threads = 8;
storio = "multiple";
listen = (
{
addr = "192.168.30.41"; port = 41001;
},
{
addr = "192.168.40.42"; port = 41001;
} );
storages = (
{
cid = 1; sid = 4; root = "/srv/rozofs/storage-ssd";
},
{
cid = 2; sid = 4; root = "/srv/rozofs/storage-sas-10K";
},
{
cid = 3; sid = 4; root = "/srv/rozofs/storage-sas-7.2K";
}, );
Exportd confguraton fles (/etc/rozofs/export.conf) have been modifed on rx300-01 and
rx300-02 as shown below:
/etc/rozofs/export.conf on rx300-01 and rx300-02:
layout = 0;
volumes = (
{
vid = 1;
cids = (
{
cid = 1;
sids = (
{
sid = 1; host = "1.1.0.81";
},
{
sid = 2; host = "1.1.0.82";
},
{
sid = 3; host = "1.1.0.83";
},
{
sid = 4; host = "1.1.0.84";
} );
} );
},
{
vid = 2;
cids = (
{
cid = 2;
sids = (
{
RozoFS Performance in Sequential and Random IO Workload
17
sid = 1; host = "1.1.0.81";
},
{
sid = 2; host = "1.1.0.82";
},
{
sid = 3; host = "1.1.0.83";
},
{
sid = 4; host = "1.1.0.84";
} );
} );
},
{
vid = 3;
cids = (
{
cid = 3;
sids = (
{
sid = 1; host = "1.1.0.81";
},
{
sid = 2; host = "1.1.0.82";
},
{
sid = 3; host = "1.1.0.83";
},
{
sid = 4; host = "1.1.0.84";
} );
} );
} );
exports = (
{
eid = 1;
vid = 1;
root = "/srv/rozofs/exports-ssd/export-ssd-storages-ssd";
md5 = "";
squota = "";
hquota = "";
},
{
eid = 2;
vid = 2;
root = "/srv/rozofs/exports-ssd/export-ssd-storages-sas-10K";
md5 = "";
squota = "";
hquota = "";
},
{
eid = 3;
vid = 3;
root = "/srv/rozofs/exports-ssd/export-ssd-storages-sas-7.2K";
md5 = "";
squota = "";
hquota = "";
} );
The system confguraton fle (/etc/fstab) have been modifed on each server as shown below for
mount automatcally RozoFS® fle systems:
rozofsmount /mnt/[email protected]/export-ssd-storages-ssd rozofs \
exporthost=1.1.0.80,exportpath=/srv/rozofs/exports-ssd/export-ssd-storages-ssd,instance=0, \
rozofsstoragetimeout=15,rozofsstorclitimeout=30,rozofsshaper=0,rozofsmaxwritepending=4, \
rozofsminreadsize=4,noXattr,rozofsnbstorcli=2,_netdev
0
0
rozofsmount /mnt/[email protected]/export-ssd-storages-sas-10K rozofs \
exporthost=1.1.0.80,exportpath=/srv/rozofs/exports-ssd/export-ssd-storages-sas10K,instance=1, \
rozofsstoragetimeout=15,rozofsstorclitimeout=30,rozofsshaper=0,rozofsmaxwritepending=4, \
rozofsminreadsize=4,noXattr,rozofsnbstorcli=2,_netdev
0
0
rozofsmount /mnt/[email protected]/export-ssd-storages-sas-7.2K rozofs \
exporthost=1.1.0.80,exportpath=/srv/rozofs/exports-ssd/export-ssd-storages-sas7.2K,instance=2, \
rozofsstoragetimeout=15,rozofsstorclitimeout=30,rozofsshaper=0,rozofsmaxwritepending=4, \
rozofsminreadsize=4,noXattr,rozofsnbstorcli=2,_netdev
0
0
RozoFS Performance in Sequential and Random IO Workload
18
Appendix B - Running the Benchmark
A simple bash script is used to launch benchmarks. Parameters are:
1. test to run: see man iozone
2. mount point index: according to MOUNTPOINTS in the script
3. record size: see man iozone
4. fle size; see man iozone
5. number of nodes: how many nodes to perform bench on
6. number of thread by node: see man iozone
#!/bin/bash
TEST_TO_RUN=$1
MOUNTPOINT_IDX=$2
RECORD_SIZE=$3
FILE_SIZE=$4
NB_NODES=$5
NB_THREADS_BY_NODE=$6
IOZONE_BIN_PATH="/usr/bin/iozone"
LOCAL_OUTPUT_PATH="iozone_test_`date "+%Y%m%d_%Hh%Mm%Ss"`"
CONF_CLIENTS_PATH=${LOCAL_OUTPUT_PATH}/"clients_list_iozone"
SPECIAL_ARGS="-R -c -+n -+u -C -w"
SSH_USER="root"
NB_STORAGE_NODES=4
FREE_PAGECACHE=1
IOZONE_STD_OUTPUT_PATH=${LOCAL_OUTPUT_PATH}/"std_output-"${NB_NODES}"-node-"$
{NB_THREADS_BY_NODE}"-th-"${RECORD_SIZE}"-rs-"${FILE_SIZE}"-fs-"${TEST_TO_RUN}"-test.txt"
# set mount points according to platform
MOUNTPOINTS=(
"/mnt/[email protected]/export-ssd-storages-ssd"
"/mnt/[email protected]/export-ssd-storages-sas-10K"
"/mnt/[email protected]/export-ssd-storages-sas-7.2K"
)
# Create directory for store configuration file and output files for this test
[ ! -e ${LOCAL_OUTPUT_PATH} ] && mkdir ${LOCAL_OUTPUT_PATH}
# Create clients file for iozone
[ -e ${CONF_CLIENTS_PATH} ] && rm -f ${CONF_CLIENTS_PATH}
for node in $(seq ${NB_NODES}); do
for i in $(seq ${NB_THREADS_BY_NODE}); do
printf "1.1.0.8"${node}'\t'${MOUNTPOINTS[${MOUNTPOINT_IDX}]}'\t'$
{IOZONE_BIN_PATH}'\n' >> ${CONF_CLIENTS_PATH}
done;
done;
# Print iozone clients list file
echo "Generate clients file (${CONF_CLIENTS_PATH}): "
cat ${CONF_CLIENTS_PATH}
if [ ${FREE_PAGECACHE} -eq 1 ]; then
# Free pagecache on each storage server
for node in $(seq ${NB_STORAGE_NODES}); do
echo "Free pagecache on 1.1.0.8"${node}
ssh ${SSH_USER}@"1.1.0.8"${node} "sync ; echo 1 > /proc/sys/vm/drop_caches ;
sync"
done;
fi
# Sleep time
sleep 10
# Compute nb. of threads
RozoFS Performance in Sequential and Random IO Workload
19
let NB_THREADS=${NB_NODES}*${NB_THREADS_BY_NODE}
# Launch iozone test
echo "Launch iozone test:"
echo "Test run: "${TEST_TO_RUN}
echo "Nb. of clients node: "${NB_NODES}
echo "Nb. of thread(s)/node: "${NB_THREADS_BY_NODE}
echo "Record size: "${RECORD_SIZE}
echo "File size: "${FILE_SIZE}
if [ ${TEST_TO_RUN} -eq 2 ]; then
echo "Random test"
# Give results in operations per second for random read/write test
iozone -t ${NB_THREADS} -i ${TEST_TO_RUN} -r ${RECORD_SIZE} -s ${FILE_SIZE} -+m $
{CONF_CLIENTS_PATH} ${SPECIAL_ARGS} -O 2>&1 | tee ${IOZONE_STD_OUTPUT_PATH}
else
echo "Sequential test"
iozone -t ${NB_THREADS} -i ${TEST_TO_RUN} -r ${RECORD_SIZE} -s ${FILE_SIZE} -+m $
{CONF_CLIENTS_PATH} ${SPECIAL_ARGS} 2>&1 | tee ${IOZONE_STD_OUTPUT_PATH}
fi
Examples:
●
●
●
●
Random IO on a single node: ./iozone_test.sh 2 0 4k 10m 1 1
Random IO on multple nodes: ./iozone_test.sh 2 0 4k 10m 4 32
Sequental IO on a single node: ./iozone_test.sh 0 1 64k 100m 1 5
Sequental IO on multple nodes: ./iozone_test.sh 0 1 64k 100m 4 5
RozoFS Performance in Sequential and Random IO Workload
20