RozoFS® Performance In Sequential and Random IO Workloads Fizians SAS - April 2014 rozofs.com rozofs.org RozoFS® Benchmarking With the explosion of data, storage difcultes are amplifed and require companies to fnd new ways to manage the scalability of their infrastructure, while controlling costs and maintaining the performance required by their applicatons. Scale-Out NAS is the last evoluton of distributed fle system which allows scalability of your architecture and a management simplicity. But to provide with a high data protecton it must manage redundancy. The usual replicaton does not match the new scale of storage. Indeed, distributed storage involves lots of components likely to fail and the 3 replicas needed to provide service contnuity lead to extra costs unaffordable at a petabyte scale. Everyone agrees that erasure coding is the right answer. It provides with high data protecton (actually beter than replicas) with a redundancy in between 0.3 and 0.5. However Everyone agrees that erasure coding suffers from poor performance when dealing with small data and thus dedicates this tremendous technology to large cold objects storage. However, customers need a huge capacity of hot storage matching their existng infrastructures that were not designed with object storage in mind. Antcipatng all of these new features of storage, Fizians has designed both a distributed File System and a relevant erasure code. These two technologies were designed to be gathered, thus providing an efcient erasure code based Scale-Out NAS: RozoFS®! RozoFS® Overview RozoFS® is a scale-out NAS fle system. RozoFS® aims to provide an open source high performance and high availability scale out storage sofware appliance for intensive disk IO data center scenario. RozoFS® provides an easy way to scale to petabytes storage but using erasure coding. It was designed to provide very high availability levels with optmized raw capacity usage on heterogeneous commodity hardware. RozoFS® provides a natve open source POSIX flesystem, build on top of a usual out-band scale-out storage architecture. The RozoFS® specifcity lies in the way data is stored. The data to be stored is translated into several chunks named projectons using Mojete Transform and distributed across storage devices in such a way that it can be retrieved even if several pieces are unavailable. On the other hand, chunks are meaningless alone. Redundancy schemes based on coding techniques like the one used by RozoFS® allow to achieve signifcant storage savings as compared to simple replicaton. Benchmark Goals The main goal of this benchmark is to gather enough performance measurement of the RozoFS ® sofware either in Random and Sequental IO workloads and thus bring out both its versatlity and its scalability. In doing so, demonstrate the non-blocking usage of erasure coding in a POSIX distributed Scale-out NAS: RozoFS® These tests are primarily designed to measure the RozoFS ® performance and its ability to get the best from the hardware it would have to manage. They are not oriented towards the measurement of the given underlying infrastructure. Indeed, the platorm used is unbalanced in terms of processing power and bandwidth compared to the number of disks. RozoFS Performance in Sequential and Random IO Workload 2 Benchmark Setup The benchmark environment is composed of RozoFS ® cluster of 4 identcal nodes connect thru a 10 Gb network. From this 4 nodes, 3 distnct volumes are confgured each of them gathering disk with different performance: ● SSD based; ● SAS 10K based; ● SAS 7.2K based. 3 exports (aka mountable fle systems) were created, 1 on each volume. These exports were then mount on each node of the cluster, ready to be tested. (see Appendix A - Platorm Confguraton for more details). Illustration 1: benchmark environment. Benchmarks are made via Iozone1. Random IO Benchmark Results The Random IO benchmarks are only passed on the fle system relying on SSD disks volume that is the more relevant for this case. All of them use a 4 KB record size on 10 MB fles. The following measures are made: ● On 1 node from 1 to 16 simultaneous processes reading/writng; ● From 1 to 4 nodes with 32 simultaneous processes reading/writng on each node (thus up to 128 simultaneous processes). (See Appendix B - Running the Benchmark for more details). 1 www.iozone.org. RozoFS Performance in Sequential and Random IO Workload 3 Random IO on a Single Node Single Node Random IO 80000 70000 60000 IOPS 50000 Write Read 40000 30000 20000 10000 0 1 2 4 8 16 # Processes Illustration 2: 4K random read/write on a single node. From 1 to 16 files simultaneous. File size (KiB) for each threads 10,240 Record size used (KiB) 4 Nb. Client Node(s) 1 Nb. threads in parallel Series Type of HDD Nb. files read or write in parallel 1 2 4 8 1 2 4 8 16 16 1 SSD Write random throughput (IO/s) 19,173 35,890 40,912 48,324 48,338 2 SSD Write random throughput (IO/s) 19,300 35,649 44,354 49,162 50,946 3 SSD Write random throughput (IO/s) 19,287 35,591 40,273 49,723 49,740 4 SSD Write random throughput (IO/s) 19,231 36,086 46,398 48,114 46,211 5 SSD Write random throughput (IO/s) 19,461 36,259 40,147 49,609 50,310 1 SSD Read random throughput (IO/s) 15,225 29,251 50,495 52,054 74,940 2 SSD Read random throughput (IO/s) 15,211 28,902 51,571 47,645 74,528 3 SSD Read random throughput (IO/s) 15,356 28,832 52,316 63,640 71,245 4 SSD Read random throughput (IO/s) 15,089 28,947 50,902 52,771 71,700 5 SSD Read random throughput (IO/s) 16,797 29,504 51,675 53,400 76,220 Average SSD Write random throughput (IO/s) 19,290 35,895 42,417 48,987 49,109 Average SSD Read random throughput (IO/s) 15,535 29,087 51,392 53,902 73,727 Tableau 1: detailed results for single node 4K random read write. RozoFS Performance in Sequential and Random IO Workload 4 Random IO on Multiple Nodes Multiple Nodes Random IO 120000 Cumulated IOPS 100000 80000 Write Read 60000 40000 20000 0 1 2 3 4 # Nodes Illustration 3: 4K random read/write aggregated from 1 to 4 nodes simultaneous. RozoFS Performance in Sequential and Random IO Workload 5 File size (KiB) for each thread 10,240 Record size used (KiB) 4 Nb. thread(s) / node 32 Nb. threads in parallel 32 64 96 Nb. node(s) 1 2 3 128 4 Series Type of HDD 1 SSD Write random throughput (IO/s) 39,778 45,948 77,949 85,094 2 SSD Write random throughput (IO/s) 44,676 75,317 87,137 82,615 3 SSD Write random throughput (IO/s) 37,985 74,447 83,806 76,456 4 SSD Write random throughput (IO/s) 44,948 74,365 80,380 83,317 5 SSD Write random throughput (IO/s) 40,186 73,704 73,834 83,319 1 SSD Read random throughput (IO/s) 52,409 96,752 105,296 106,344 2 SSD Read random throughput (IO/s) 57,990 101,095 113,462 104,880 3 SSD Read random throughput (IO/s) 57,420 96,278 116,735 113,890 4 SSD Read random throughput (IO/s) 50,413 101,040 103,054 124,910 104,357 5 SSD Read random throughput (IO/s) 52,796 95,591 104,996 Average SSD Write random throughput (IO/s) 41,515 68,756 80,621 82,160 Average SSD Read random throughput (IO/s) 54,205 98,151 108,709 110,876 Tableau 2: detailed results for multiple nodes 4K random read write. Sequental IO Benchmark Results The Sequental IO benchmarks are only passed on the fle system relying on SAS 7.2K disks volumes that is the more relevant for this case. All of them use a 64 KB record size on fles of 100 MB. The following measures are made: ● On 1 node from 1 to 6 simultaneous processes reading/writng ; ● From 1 to 4 nodes with 5 simultaneous processes reading/writng on each node. (See Appendix B - Running the Benchmark for more details). Sequential IO on a Single Node Single Node Sequential IO 2500000 Throughput (KiBps) 2000000 1500000 Write Read 1000000 500000 0 1 2 3 4 5 6 # Processes Illustration 4: 64K sequential read/write on a single node. From 1 to 6 files simultaneous. RozoFS Performance in Sequential and Random IO Workload 6 File size (KiB) for each thread Multiple Node Sequential IO 102,400 Record size used (KiB) Nb. Client Node(s) Series Cumulated Throughput (KiBps) 1 7000000 64 1 Nb. threads in parallel Type of HDD Nb. files read or write in parallel SAS 7.2K Write sequential throughput (KiB/s) 6000000 1 2 3 4 5 1 2 3 4 5 6 6 574,509 744,870 770,642 1,247,073 1,306,929 1,355,549 2 SAS 7.2K Write sequential throughput (KiB/s) 572,893 743,575 761,399 1,094,711 1,171,431 1,130,293 3 SAS 7.2K 5000000 Write sequential throughput (KiB/s) 567,536 753,694 759,430 1,255,656 1,206,491 1,290,407 4 SAS 7.2K Write sequential throughput (KiB/s) 560,705 731,469 754,287 1,213,220 1,194,163 1,337,038 Write sequential throughput (KiB/s) 571,671 736,048 758,037 1,243,426 1,246,932 1,168,703 Write Read sequential throughput (KiB/s) 465,178 970,527 1,311,394 1,653,565 1,811,093 2,124,690 Read Read sequential throughput (KiB/s) 419,419 956,857 1,302,198 1,670,267 1,818,390 1,839,916 5 4000000 SAS 7.2K 1 SAS 7.2K 2 3000000 SAS 7.2K 3 SAS 7.2K 2000000 Read sequential throughput (KiB/s) 477,839 972,956 1,310,371 1,629,647 1,713,050 2,147,444 SAS 7.2K Write sequential throughput (KiB/s) 421,065 967,066 1,293,147 1,705,671 1,758,278 1,882,553 5 SAS 7.2K 1000000 Read sequential throughput (KiB/s) 462,852 955,792 1,311,260 1,657,820 1,852,897 2,146,782 Average SAS 7.2K Write sequential throughput (KiB/s) 569,463 741,931 760,759 1,210,818 1,225,189 1,256,398 Average SAS 7.2K0 Read sequential throughput (KiB/s) 449,271 964,639 1,305,674 1,663,394 1,790,742 2,028,277 4 2 single nodes 64K3 sequential read write. 4 Tableau 13: detailed results for # Nodes Sequential IO on Multiple Nodes Multiple Node Sequential IO Cumulated Throughput (KiBps) 7000000 6000000 5000000 4000000 Write Read 3000000 2000000 1000000 0 1 2 3 4 # Nodes Illustration 5: 64K sequential read/write aggregated from 1 to 4 nodes simultaneous. RozoFS Performance in Sequential and Random IO Workload 7 File size (KiB) for each threads 102,400 Record size used (KiB) 64 Nb. thread(s) / node 5 Nb. threads in parallel 5 10 15 Nb. node(s) 1 2 3 20 4 Nb. files read or write in parallel 5 10 15 20 Series Type of HDD 1 SAS 7.2K Write sequential throughput (KiB/s) 587,678 1,508,037 2,271,113 2,149,317 2 SAS 7.2K Write sequential throughput (KiB/s) 1,225,142 2,435,336 3,028,145 3,267,827 3 SAS 7.2K Write sequential throughput (KiB/s) 1,341,135 2,243,494 3,038,279 3,424,960 4 SAS 7.2K Write sequential throughput (KiB/s) 1,198,583 2,332,085 3,210,770 3,612,069 5 SAS 7.2K Write sequential throughput (KiB/s) 1,106,982 2,036,082 3,094,232 3,465,675 1 SAS 7.2K Read sequential throughput (KiB/s) 1,787,721 3,351,234 4,603,602 5,934,271 2 SAS 7.2K Read sequential throughput (KiB/s) 1,948,195 3,488,238 4,786,637 5,898,291 3 SAS 7.2K Read sequential throughput (KiB/s) 1,790,178 3,219,196 4,662,605 5,942,076 4 SAS 7.2K Write sequential throughput (KiB/s) 1,685,320 3,361,622 4,173,239 5,793,129 5 SAS 7.2K Read sequential throughput (KiB/s) 1,800,245 3,425,635 4,787,534 6,215,238 Average SAS 7.2K Write sequential throughput (KiB/s) 1,091,904 2,111,007 2,928,508 3,183,970 Average SAS 7.2K Read sequential throughput (KiB/s) 1,802,332 3,369,185 4,602,723 5,956,601 Tableau 4: detailed results for multiple nodes 64K sequential read write. Conclusion Given the scale of systems, failure of a signifcant subset of the consttuent nodes, as well as other network components, is a norm rather than the excepton. To enable a highly available overall service, it is thus essental to both tolerate short-term outages of some nodes and to provide resilience against permanent failures of individual components. Given the performance above, RozoFS® breaks the FEC technical limitatons, not only FEC can be used for cold large data storage (Object, Archival etc…) but it can provides with benefts of FEC for all storage needs (Multmedia post producton, virtualizaton, HPC, Big Data…). RozoFS Performance in Sequential and Random IO Workload 8 Appendix A - Platorm Confguraton The following secton describes hardware and sofware confguraton used in previous tests. We installed four servers Fujitsu RX300-S8 with Debian Wheezy OS, the four servers are used as storage servers and clients for RozoFS ® and two servers are used to store RozoFS ® metadata ( high-availability with DRBD and Pacemaker). In this chapter we will describe the following components: ● Hardware confguraton; ● RAID server confguraton; ● File System confguraton; ● Network confguraton; ● RozoFS® confguraton. The following diagram describes the confguraton: Illustration 6: RozoFS® configuration. RozoFS Performance in Sequential and Random IO Workload 9 Hardware Specifications The following table describes the hardware characteristcs for the four servers (based on Fujitsu hardware): Server type Fujistu RX300-S8 (R3008S0035FR) CPU model name 2 x Intel Xeon CPU E5-2650 v2 @ 2.60GHz (8 cores & 16 threads/core) Memory (GB) 64 GB RAID card RAID Controller SAS 6Gbit/s 1GB (D3116C) Virtual DRIVE 0 - Seagate Constellaton.2, SAS 6Gb/s, 1TB, 2.5", 7200 RPM (ST91000640SS) - 11 drives - RAID 5 Virtual DRIVE 1 - Seagate Pulsar.2, SAS 6Gb/s, 100GB, 2.5", MLC (ST100FM0002) - 1 drive - RAID 0 Virtual DRIVE 2 - WD Xe, SAS 6Gb/s, 900GB, 2.5", 10000 RPM (WD9001BKHG) - 4 drives - RAID 0 Ethernet controllers - Intel 82599EB 10-Gigabit SFI/SFP+ - 2*10Gb - Intel I350 Gigabit Network - 2*1Gb - Intel I350 Gigabit Network - 4*1Gb RAID Specifications This table describes the RAID HDD arrays confgured for each servers: Virtual drive ID 0 1 2 Number of Drives 11 1 4 RAID Level 5 0 0 Strip Size 64 KB 64 KB 64 KB Read Policy ReadAdaptve ReadAdaptve ReadAdaptve Write Policy WriteBack WriteThrough WriteThrough IO Policy Direct Direct Direct Drive Cache Enabled Enabled Enabled Total Size 9.091 TB 92.656 GB 3.272 TB RozoFS Performance in Sequential and Random IO Workload 10 Partitioning and File Systems Specifications The following tables summarize the parttoning and fle systems used for the different RAID HDD arrays. Parttoning and fle systems used on virtual drive 0: LV Name Size Mountpoint FS Mount Options root 332 MiB / ext4 errors=remount-ro usr 8,38 GiB /usr ext4 defaults var 2,79 GiB /var ext4 defaults tmp 380 MiB /tmp ext4 defaults swap_1 29,80 GiB none swap sw home 93,13 GiB /home ext4 defaults xfs noatme, nodiratme, logbufs=8, logbsize=256k, largeio, inode64, swalloc, allocsize=131072k, nobarrier storages 8,96 TiB /srv/rozofs/storage-sas7.2K Create options -f -d su=64k, sw=10 -l version=2, su=64k -isize=512 Parttoning and fle systems used on virtual drive 1: LV Name Size lv-ssdstorages 88,02 GiB lv-ssd-exports 4,63 GiB Mountpoint /srv/rozofs/storagessd /srv/rozofs/exportsssd FS Mount Options Create options xfs noatme, nodiratme, logbufs=8, logbsize=256k, largeio, inode64, swalloc, allocsize=131072k, nobarrier -f -d su=64k,sw=1 -l version=2,su=6 4k -isize=512 user_xatr, acl, -b 1024 -I 256 -i 4096 -E ext4 RozoFS Performance in Sequential and Random IO Workload 11 noatme stride=64,stripe _width=64 -J size=400 -Odir_index,fle type,^extents Parttoning and fle systems used on virtual drive 2: LV Name lv-sas-10Kstorages lv-ssd-exports Size 88,02 GiB 4,63 GiB Mountpoint /srv/rozofs/storagesas-10K /srv/rozofs/exportssas-10K FS Mount Options Create options xfs noatme, nodiratme, logbufs=8, logbsize=256k, largeio, inode64, swalloc, allocsize=131072k, nobarrier -f -d su=64k,sw=1 -l version=2,su= 64k -isize=512 user_xatr, acl, noatme -b 1024 -I 256 -i 4096 -E stride=64,strip e_width=64 -J size=400 -Odir_index,fl etype,^extents ext4 Network Specifications Network interfaces usage for RozoFS® storage servers: Network interface type Interface IDs Usage Bonding 1Gb 0&1 RozoFS meta-data IO / Cluster management / Meta-data replicaton Yes (LACP) 1Gb 2 Not used --- 1Gb 3 Not used --- 1Gb 4 Not used --- RozoFS Performance in Sequential and Random IO Workload 12 1Gb 5 Not used --- 10Gb SFP+ 6 RozoFS data IO No 10Gb SFP+ 7 RozoFS data IO No IP addressing plan: Server hostname Interface IDs Bonding IP MTU rx300-01 0&1 Yes (LACP) 1.1.0.81/16 1500 rx300-01 6 No 192.168.30.11/24 9000 rx300-01 7 No 192.168.30.12/24 9000 rx300-02 0&1 Yes (LACP) 1.1.0.82/16 1500 rx300-02 6 No 192.168.30.21/24 9000 rx300-02 6 No 192.168.30.22/24 9000 rx300-03 0&1 Yes (LACP) 1.1.0.83/16 1500 rx300-03 6 No 192.168.30.31/24 9000 rx300-03 7 No 192.168.30.32/24 9000 rx300-04 0&1 Yes (LACP) 1.1.0.84/16 1500 rx300-04 6 No 192.168.30.41/24 9000 rx300-04 7 No 192.168.30.42/24 9000 Operating System and Tuning Options The following table summarizes the operatng systems and various sofwares used on each server: Servers rx300-01 Operating system / kernel version Additional software Debian Wheezy / 3.2.0-4-amd64 rozofs-exportd rozofs-storaged rozofs-rozofsmount rozofs-manager-lib rozofs-manager-cli rozofs-manager-agent rozofs-rozodebug drbd8-utls pacemaker RozoFS Performance in Sequential and Random IO Workload 13 rx300-02 rx300-03 rx300-04 Debian Wheezy / 3.2.0-4-amd64 rozofs-exportd rozofs-storaged rozofs-rozofsmount rozofs-manager-lib rozofs-manager-cli rozofs-manager-agent rozofs-rozodebug drbd8-utls pacemaker Debian Wheezy / 3.2.0-4-amd64 rozofs-storaged rozofs-rozofsmount rozofs-manager-lib rozofs-manager-cli rozofs-manager-agent rozofs-rozodebug Debian Wheezy / 3.2.0-4-amd64 rozofs-storaged rozofs-rozofsmount rozofs-manager-lib rozofs-manager-cli rozofs-manager-agent rozofs-rozodebug The I/O scheduler is a component of the Linux kernel which decides how the read and write buffers are to be queued for the underlying device. For storaged and exportd daemon the most appropriate scheduler seems to be the deadline scheduler. The I/O scheduler has been modifed via the sysfs virtual fle system: # echo deadline > /sys/block/sda/queue/scheduler # echo deadline > /sys/block/sdb/queue/scheduler # echo deadline > /sys/block/sdc/queue/scheduler The following values has been also modifed to provide additonal latency benefts: # # # # # echo echo echo echo echo ‘0’ > /sys/block/sda/queue/iosched/front_merges ‘150’ > /sys/block/sda/queue/iosched/read_expire ‘1500’ > /sys/block/sda/queue/iosched/write_expire ‘4096’ > /sys/block/sda/queue/nr_requests ‘4096’ > /sys/block/sda/queue/read_ahead_kb # # # # # echo echo echo echo echo ‘0’ > /sys/block/sdb/queue/iosched/front_merges ‘150’ > /sys/block/sdb/queue/iosched/read_expire ‘1500’ > /sys/block/sdb/queue/iosched/write_expire ‘4096’ > /sys/block/sdb/queue/nr_requests ‘4096’ > /sys/block/sdb/queue/read_ahead_kb # echo ‘0’ > /sys/block/sdc/queue/iosched/front_merges RozoFS Performance in Sequential and Random IO Workload 14 # # # # echo echo echo echo ‘150’ > /sys/block/sdc/queue/iosched/read_expire ‘1500’ > /sys/block/sdc/queue/iosched/write_expire ‘4096’ > /sys/block/sdc/queue/nr_requests ‘4096’ > /sys/block/sdc/queue/read_ahead_kb For make these values permanent so they are automatcally set at system startup by installing the sysfsutls package and modifying the confguraton fle /etc/sysfs.conf on each server. # apt-get install sysfsutils Lines added to fle /etc/sysfs.conf: # sda = SAS 7.2K block/sda/queue/scheduler = deadline block/sda/queue/iosched/front_merges = 0 block/sda/queue/iosched/read_expire = 150 block/sda/queue/iosched/write_expire = 1500 block/sda/queue/nr_requests = 4096 block/sda/queue/read_ahead_kb = 4096 # sdb = SSD block/sdb/queue/scheduler = deadline block/sdb/queue/iosched/front_merges = 0 block/sdb/queue/iosched/read_expire = 150 block/sdb/queue/iosched/write_expire = 1500 block/sdb/queue/nr_requests = 4096 block/sdb/queue/read_ahead_kb = 4096 # sdc = SAS 10K block/sdc/queue/scheduler = deadline block/sdc/queue/iosched/front_merges = 0 block/sdc/queue/iosched/read_expire = 150 block/sdc/queue/iosched/write_expire = 1500 block/sdc/queue/nr_requests = 4096 block/sdc/queue/read_ahead_kb = 4096 The following kernel parameters have also been modifed via sysctl tool to increase performance: # # # # # # # # sysctl sysctl sysctl sysctl sysctl sysctl sysctl sysctl -w -w -w -w -w -w -w -w vm.swappiness=0 net.core.rmem_max=134217728 net.core.wmem_max = 134217728 net.ipv4.tcp_rmem = 4096 87380 134217728 net.ipv4.tcp_wmem = 4096 65536 134217728 net.core.netdev_max_backlog = 300000 net.ipv4.tcp_moderate_rcvbuf = 1 net.unix.max_dgram_qlen = 128 These settings take effect immediately but do not persist over a reboot. To make these values permanent, we add the following values to /etc/sysctl.conf: # No swap vm.swappiness = 0 # Tuning for 10GbE net.core.rmem_max = 134217728 net.core.wmem_max = 134217728 net.ipv4.tcp_rmem = 4096 87380 134217728 net.ipv4.tcp_wmem = 4096 65536 134217728 net.core.netdev_max_backlog = 300000 net.ipv4.tcp_moderate_rcvbuf = 1 # Update nb. of datagrams can be queued on a unix domain socket's net.unix.max_dgram_qlen = 128 RozoFS Performance in Sequential and Random IO Workload 15 RozoFS® Configuration As indicated above the RozoFS® platorm is confgured with three separate storage volumes (one on SAS 7.2K HDDs, one on SAS 10K HDDs and one on SSD HDDs) and one Rozo fle system will be created on each volume. Storaged confguraton fles (/etc/rozofs/storage.conf) have been modifed on each server as shown below: /etc/rozofs/storage.conf on rx300-01: threads = 8; storio = "multiple"; listen = ( { addr = "192.168.30.11"; port = 41001; }, { addr = "192.168.40.12"; port = 41001; } ); storages = ( { cid = 1; sid = 1; root = "/srv/rozofs/storage-ssd"; }, { cid = 2; sid = 1; root = "/srv/rozofs/storage-sas-10K"; }, { cid = 3; sid = 1; root = "/srv/rozofs/storage-sas-7.2K"; }, ); /etc/rozofs/storage.conf on rx300-02: threads = 8; storio = "multiple"; listen = ( { addr = "192.168.30.21"; port = 41001; }, { addr = "192.168.40.22"; port = 41001; } ); storages = ( { cid = 1; sid = 2; root = "/srv/rozofs/storage-ssd"; }, { cid = 2; sid = 2; root = "/srv/rozofs/storage-sas-10K"; }, { cid = 3; sid = 2; root = "/srv/rozofs/storage-sas-7.2K"; }, ); /etc/rozofs/storage.conf on rx300-03: threads = 8; storio = "multiple"; listen = ( { addr = "192.168.30.31"; port = 41001; }, { RozoFS Performance in Sequential and Random IO Workload 16 addr = "192.168.40.32"; port = 41001; } ); storages = ( { cid = 1; sid = 3; root = "/srv/rozofs/storage-ssd"; }, { cid = 2; sid = 3; root = "/srv/rozofs/storage-sas-10K"; }, { cid = 3; sid = 3; root = "/srv/rozofs/storage-sas-7.2K"; }, ); /etc/rozofs/storage.conf on rx300-04: threads = 8; storio = "multiple"; listen = ( { addr = "192.168.30.41"; port = 41001; }, { addr = "192.168.40.42"; port = 41001; } ); storages = ( { cid = 1; sid = 4; root = "/srv/rozofs/storage-ssd"; }, { cid = 2; sid = 4; root = "/srv/rozofs/storage-sas-10K"; }, { cid = 3; sid = 4; root = "/srv/rozofs/storage-sas-7.2K"; }, ); Exportd confguraton fles (/etc/rozofs/export.conf) have been modifed on rx300-01 and rx300-02 as shown below: /etc/rozofs/export.conf on rx300-01 and rx300-02: layout = 0; volumes = ( { vid = 1; cids = ( { cid = 1; sids = ( { sid = 1; host = "1.1.0.81"; }, { sid = 2; host = "1.1.0.82"; }, { sid = 3; host = "1.1.0.83"; }, { sid = 4; host = "1.1.0.84"; } ); } ); }, { vid = 2; cids = ( { cid = 2; sids = ( { RozoFS Performance in Sequential and Random IO Workload 17 sid = 1; host = "1.1.0.81"; }, { sid = 2; host = "1.1.0.82"; }, { sid = 3; host = "1.1.0.83"; }, { sid = 4; host = "1.1.0.84"; } ); } ); }, { vid = 3; cids = ( { cid = 3; sids = ( { sid = 1; host = "1.1.0.81"; }, { sid = 2; host = "1.1.0.82"; }, { sid = 3; host = "1.1.0.83"; }, { sid = 4; host = "1.1.0.84"; } ); } ); } ); exports = ( { eid = 1; vid = 1; root = "/srv/rozofs/exports-ssd/export-ssd-storages-ssd"; md5 = ""; squota = ""; hquota = ""; }, { eid = 2; vid = 2; root = "/srv/rozofs/exports-ssd/export-ssd-storages-sas-10K"; md5 = ""; squota = ""; hquota = ""; }, { eid = 3; vid = 3; root = "/srv/rozofs/exports-ssd/export-ssd-storages-sas-7.2K"; md5 = ""; squota = ""; hquota = ""; } ); The system confguraton fle (/etc/fstab) have been modifed on each server as shown below for mount automatcally RozoFS® fle systems: rozofsmount /mnt/[email protected]/export-ssd-storages-ssd rozofs \ exporthost=1.1.0.80,exportpath=/srv/rozofs/exports-ssd/export-ssd-storages-ssd,instance=0, \ rozofsstoragetimeout=15,rozofsstorclitimeout=30,rozofsshaper=0,rozofsmaxwritepending=4, \ rozofsminreadsize=4,noXattr,rozofsnbstorcli=2,_netdev 0 0 rozofsmount /mnt/[email protected]/export-ssd-storages-sas-10K rozofs \ exporthost=1.1.0.80,exportpath=/srv/rozofs/exports-ssd/export-ssd-storages-sas10K,instance=1, \ rozofsstoragetimeout=15,rozofsstorclitimeout=30,rozofsshaper=0,rozofsmaxwritepending=4, \ rozofsminreadsize=4,noXattr,rozofsnbstorcli=2,_netdev 0 0 rozofsmount /mnt/[email protected]/export-ssd-storages-sas-7.2K rozofs \ exporthost=1.1.0.80,exportpath=/srv/rozofs/exports-ssd/export-ssd-storages-sas7.2K,instance=2, \ rozofsstoragetimeout=15,rozofsstorclitimeout=30,rozofsshaper=0,rozofsmaxwritepending=4, \ rozofsminreadsize=4,noXattr,rozofsnbstorcli=2,_netdev 0 0 RozoFS Performance in Sequential and Random IO Workload 18 Appendix B - Running the Benchmark A simple bash script is used to launch benchmarks. Parameters are: 1. test to run: see man iozone 2. mount point index: according to MOUNTPOINTS in the script 3. record size: see man iozone 4. fle size; see man iozone 5. number of nodes: how many nodes to perform bench on 6. number of thread by node: see man iozone #!/bin/bash TEST_TO_RUN=$1 MOUNTPOINT_IDX=$2 RECORD_SIZE=$3 FILE_SIZE=$4 NB_NODES=$5 NB_THREADS_BY_NODE=$6 IOZONE_BIN_PATH="/usr/bin/iozone" LOCAL_OUTPUT_PATH="iozone_test_`date "+%Y%m%d_%Hh%Mm%Ss"`" CONF_CLIENTS_PATH=${LOCAL_OUTPUT_PATH}/"clients_list_iozone" SPECIAL_ARGS="-R -c -+n -+u -C -w" SSH_USER="root" NB_STORAGE_NODES=4 FREE_PAGECACHE=1 IOZONE_STD_OUTPUT_PATH=${LOCAL_OUTPUT_PATH}/"std_output-"${NB_NODES}"-node-"$ {NB_THREADS_BY_NODE}"-th-"${RECORD_SIZE}"-rs-"${FILE_SIZE}"-fs-"${TEST_TO_RUN}"-test.txt" # set mount points according to platform MOUNTPOINTS=( "/mnt/[email protected]/export-ssd-storages-ssd" "/mnt/[email protected]/export-ssd-storages-sas-10K" "/mnt/[email protected]/export-ssd-storages-sas-7.2K" ) # Create directory for store configuration file and output files for this test [ ! -e ${LOCAL_OUTPUT_PATH} ] && mkdir ${LOCAL_OUTPUT_PATH} # Create clients file for iozone [ -e ${CONF_CLIENTS_PATH} ] && rm -f ${CONF_CLIENTS_PATH} for node in $(seq ${NB_NODES}); do for i in $(seq ${NB_THREADS_BY_NODE}); do printf "1.1.0.8"${node}'\t'${MOUNTPOINTS[${MOUNTPOINT_IDX}]}'\t'$ {IOZONE_BIN_PATH}'\n' >> ${CONF_CLIENTS_PATH} done; done; # Print iozone clients list file echo "Generate clients file (${CONF_CLIENTS_PATH}): " cat ${CONF_CLIENTS_PATH} if [ ${FREE_PAGECACHE} -eq 1 ]; then # Free pagecache on each storage server for node in $(seq ${NB_STORAGE_NODES}); do echo "Free pagecache on 1.1.0.8"${node} ssh ${SSH_USER}@"1.1.0.8"${node} "sync ; echo 1 > /proc/sys/vm/drop_caches ; sync" done; fi # Sleep time sleep 10 # Compute nb. of threads RozoFS Performance in Sequential and Random IO Workload 19 let NB_THREADS=${NB_NODES}*${NB_THREADS_BY_NODE} # Launch iozone test echo "Launch iozone test:" echo "Test run: "${TEST_TO_RUN} echo "Nb. of clients node: "${NB_NODES} echo "Nb. of thread(s)/node: "${NB_THREADS_BY_NODE} echo "Record size: "${RECORD_SIZE} echo "File size: "${FILE_SIZE} if [ ${TEST_TO_RUN} -eq 2 ]; then echo "Random test" # Give results in operations per second for random read/write test iozone -t ${NB_THREADS} -i ${TEST_TO_RUN} -r ${RECORD_SIZE} -s ${FILE_SIZE} -+m $ {CONF_CLIENTS_PATH} ${SPECIAL_ARGS} -O 2>&1 | tee ${IOZONE_STD_OUTPUT_PATH} else echo "Sequential test" iozone -t ${NB_THREADS} -i ${TEST_TO_RUN} -r ${RECORD_SIZE} -s ${FILE_SIZE} -+m $ {CONF_CLIENTS_PATH} ${SPECIAL_ARGS} 2>&1 | tee ${IOZONE_STD_OUTPUT_PATH} fi Examples: ● ● ● ● Random IO on a single node: ./iozone_test.sh 2 0 4k 10m 1 1 Random IO on multple nodes: ./iozone_test.sh 2 0 4k 10m 4 32 Sequental IO on a single node: ./iozone_test.sh 0 1 64k 100m 1 5 Sequental IO on multple nodes: ./iozone_test.sh 0 1 64k 100m 4 5 RozoFS Performance in Sequential and Random IO Workload 20
© Copyright 2024 ExpyDoc