A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

Portland: A Scalable Fault-Tolerant
Layer 2 Data Center Network Fabric
Radhika Niranjan Mysore, Andreas Pamboris, Nathan
Farrington, Nelson Huang, Pardis Miri, Sivasankar
Radhakrishnan, Vikram Subramanya, and Amin Vahdat
Department of Computer Science and Engineering University of California San Diego
Background
• Emerging needs for massive scale data centers
• Various design elements to achieve high
performance, scalability, fault-tolerance in
such environments
Problems
•
VM migration support among Traditional DC networks are vulnerable; migrating
VMs change the VM’s IP address breaks pre-existing TCP connections, which
results in administrative overhead for TCP connection handover among VM hosts
•
Switches need to be configured before deployment
•
Inefficient communication between physically distance hosts
•
Forwarding loops results to inefficiency, worse yet paralysis of the network
•
Physical connectivity failures interferes with existing unicast and multicast sessions
DC規模のネットワークにおいて,既存のIP, Ethernet Protocolに起因する問題が多い
Solution
• Portland
– An ethernet compatible L2 protocol to solve the
mentioned issues
A Fat Tree Network
• 本論文で対象とするネットワークトポロジ
• DCネットワークで汎用的に用いられてるトポロジ
Portland Design
Fabric Manager
• An user process running on a dedicated machine
somewhere in the network responsible for..
– Assisting with ARP resolution
– Fault tolerance
– Multicast
• 前提
– The location of the Fabric Manager is transparent for
each of the switches in the network
– Fabric Manager serves as a core function in Portland;
therefore 冗長化されてる
Portland Design
Positional Pseudo MAC Address
•
•
Virtual MAC addr which specifies the location of the host in the network
Described as pod.position.port.vmid
–
–
–
–
Pod = pod number
Position = position within pod
Port = switch port number
VMid = virtual machine number (auto increment for each added vm, zero if not running on VM?)
1. A host is connected to
an edge switch
2. The edge switch
creates an address
mapping table within
itself for further
forwarding
3. The edge switch refers
to the fabric manager
for the newly added
host
Portland Design
Proxy-based ARP
• Ethernet by default broadcast to all host in the
same L2 domain -> inefficient
Portland Design
Distributed Location Discovery
• All the switches broadcast a LDP (Location
Discovery Protocol) to all its port on a certain
interval
• LDPを受け取ったスイッチは,LDP listener
thread() 関数の内容を処理し,新規に接続さ
れたスイッチはネットワークにおける現在位
置を,既存のスイッチはForwarding Tableの
アップデートを行う
Portland Design
Unicast Fault Tolerant Routing
1. Link Failure Detection
2. Informs the Fabric Manager
3. The Fabric Manager updates
the per-link connectivity
matrix
4. The Fabric Manager informs
all switches about the link
failure
Traditional Routing Protocols
Portland
O(n2)
O(n)
Communication Overhead for Failure Detection
Implementation
•
HW
–
Switch * 20
• 4-port NetFPGA PCI card switches with Xilinx FPGA for hardware extensions ( 1U dual-core 3.2 GHz
Intel Xeon machines with 3GB RAM )
• Openflow
–
–
•
Switch configuration software?
• 32-entry TCAM and a 32K entry SRAM for flow table entries
End host * 16
• 1U quad-core 2.13GHz Intel Xeon machines with 3GB of RAM running Linux 2.6.18-92.1.18el5
System architecture
Fabric Manager
Communication Module
冗長化,同期されて
る
Open Flow Protocol
FM
FM
FM
Openflow
Switch
DC Network
Fabric Manager Network
Evaluation
Convergence Time
Convergence Time with Increasing Faults
Multicast Convergence
TCP Convergence
Evaluation
Scalability
Fabric manager control traffic
CPU requirements for ARP Requests
Conclusion
• Fabric Managerの冗長化