High Performance Parallel File Access
via Standard NFS v3
Kent Ritchie
Senior Systems Engineer
AVERE SYSTEMS, INC
910 River Ave
Pittsburgh PA 15212
averesystems.com
Why NFS v3?
• NFS was made for sharing
• Isn’t pNFS already available?
• NFS v3 is mature
• NFS v3 is ubiquitous
• NFS v3 is efficient
• NFS v3 can be high performance
Proprietary & Confidential
2
Avere Edge-Core Architecture
Clients & Servers
Avere Edge Filer
LAN
/ WAN
Network
Core Filer(s)
Add performance
NetApp
Isilon
Low latency read, write
Low latency
& metadata
opsread, write
& metadata ops
BlueArc
Edge Filer
•
•
•
•
•
Performance optimized
Low latency access from fast media
(RAM, SSD, SAS)
Accelerate read, write & metadata
operations via coherent global cache
Linear performance scaling through
clustering
Parallel file access via a token manager
for cluster wide file locking
Proprietary & Confidential
Core Filer
•
•
•
•
•
Capacity optimized
SATA disks provide lowest cost and
greatest density
Any 3rd-party NFS v3 Server (e.g.
NetApp, Isilon, BlueArc, Nexenta,
Solaris, Linux, etc.)
Heterogeneous vendors in the same
cluster
Responsible for data protection &
compliance functions
3
Coherent Global Cache
Block-level caching
–
Read (8)
RAM
Demote
Write (4)
Read caching
HA mirror
8
4
4’
Efficiently uses cache resources
8
–
Promote on access
–
Demote based on LRU/LFU 5
Write caching
SSD
Demote
5
Demote
4
Node 1
NVRAM for performance
–
Mirroring provides N+1 HA
–
SAS for aggregated writes
Node 2
4
1 2 3 4
5 6 7 8
4
Node 3
Node N
–
Pre-fetch for sequential IO
–
DNLC for large directories
–
Negative caching accelerates
lookups of non-existent file
–
Op forwarding
Async write-back
SATA
4
Performance algorithms
Promote
SAS
–
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
Asynchronous write back
Core Filer
Example with three files, each composed of multiple 16KB blocks.
Proprietary & Confidential
–
4
Scheduled to maintain a copy on
Core filer
4
Parallel File Access
Clustering
Read (3)
Read (3)
Read (3)
Read (3)
RAM
–
Coherent, distributed cache
–
Global access to data
–
Load balanced across nodes
Globally shared media
3
Cache-2-cache
–
Cache-2-cache transactions 3
minimizes Core filer traffic
–
Cache size scales linearly with FXTs
SSD
Port Affinity
Promote
SAS
Node 1
Node 2
Node 3
Node N
–
Blocks migrate to node with most
frequent access
–
Minimizes inter-node traffic
Striping
–
SATA
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
Parallel storage of large files
Replication
–
Parallel access to hot blocks
Core Filer
Example with three files, each composed of multiple 16KB blocks.
Proprietary & Confidential
5
Parallel File Access
Read (1)
Read (1)
Read (1)
Read (1)
Read (2)
Read (1)
Read (3)
Clustering
Read (N)
Read (3)
Read (3)
Read (3)
Read (3)
RAM
Replicate
1
Replicate
1
1
Node 1
1
1
2
Node 2
–
Global access to data
–
Load balanced across nodes
–
Cache-2-cache transactions 3
minimizes Core filer traffic
–
Cache size scales linearly with FXTs
Replicate
Port Affinity
Promote
SAS
Coherent, distributed cache
Globally shared media
3
Cache-2-cache
SSD
–
3
Node 3
N
–
Blocks migrate to node with most
frequent access
–
Minimizes inter-node traffic
Striping
Node N
–
SATA
1 2 3 4
5 6 7 8
1 2 3 4
5 6 7 8
N
–
2
3
…
N
Parallel storage of large files
Replication
1 2 3 4
5 6 7 8
1
3
1
Parallel access to hot blocks
Core Filer
Example with three files, each composed of multiple 16KB blocks.
Proprietary & Confidential
6
Scale Perfomance not Capacity
Proprietary & Confidential
7
Comparing 1,000,000 IOPS Solutions*
EMC Isilon
$10.7 / IOPS
Avere
$2.3 / IOPS
150ms
NetApp
$5.1 / IOPS
Throughput
(IOPS)
Latency/ORT
(ms)
List Price
$/IOPS
Disk
Quantity
Rack
Units
Cabinets
Product Config
32-node cluster,
Avere FXT 3800
1,592,334
1.24
$3,637,500
$2.3
549
76
1.8 cloud storage config
NetApp FAS 6240
1,512,784
1.53
$7,666,000
$5.1
1728
436
12
24-node cluster
EMC Isilon S200
1,112,705
2.54
$11,903,540
$10.7
3360
288
7
140-node cluster
*Comparing the top SPEC SFS results for a single NFS file system/namespace (as of 08Apr2013). See www.spec.org/sfs2008 for more information.
Additional Benefits
Data Center (Core Filer:/export)
Clients
Remote Site (Core Filer:/export)
/fy2012
/cust
Avere FXT Edge Filer
/pipe
FlashMirror
NetApp:
/cust’
WAN
Isilon:
/mech
X
/src
BlueArc:
/ on Avere
AWS S3
Object
Single mount point
/src
/staff
AWS Glacier
Global Namespace
Global Namespace
/finance
•
Logical path
unchanged
/
/sales
FlashMove
FlashMoveTM
/support
/eng
Simplify namespace mgmt *and* accelerate, scale perf
•
Non-disruptively move exports (e.g. /src) between Core Filers
FlashMirrorTM
•
/pipe
/staff
/fy2012
Proprietary & Confidential
/hw
/mech
/cust
/sw
/src
Mirror write data to two locations for disaster recovery
FlashCloudTM
•
Enable Parallel File Access on top of Object Stores
•
Support for AWS Glacier
9
Avere GUI – Powerful Analytics
•
Visibility into entire NAS environment
– Current and historical IOPS, GB/s, and latency
– Drill down on specific op types, clients, FXT nodes, and Core filers
– Hot client and hot file listing
Figure 1: Consistent low latency delivered to clients, hiding latency spikes on Core filer
Figure 2: Hot clients sorted by activity (total ops) or rate (ops/sec over last 10 seconds)
Proprietary & Confidential
10
Thank You!
AVERE SYSTEMS, INC
5000 McKnight Road, Suite 404
Pittsburgh, PA 15237
(412) 635-7170
averesystems.com