Improving Transport Design for WARP SDR Deployments

Improving Transport Design for WARP
SDR Deployments
Krishna C. Garikipati Kang G. Shin
SRIF '14 EECS
Real Time Computing Laboratory
Outline Ø Introduction
Ø  Problem of Latency
Ø  WARP Transport
Ø  Proposed Design
Ø  Evaluation
Ø  Extensions
SRIF '14 2 Software Defined Radios
Flexible radios that implement most of the hardware
functions in software Indispensible for wireless research NI FlexRIO So1ware Defined Radio Bundle SRIF '14 3 SDR Elements
Processing
•  Host (GPP), FPGA, DSP
•  Signal processing library
Radio
•  Antennas
•  ADC/DACs, Transceivers
Image Credit: GNURadio
Other
•  Clocking, Transport, Memory, User I/O, Debug …
Image Credit: WARP SRIF '14 4 SDR Architecture GPP-based
•  Separate dedicated GPP for
centralized processing
•  Split functionality
•  Requires external transport of
radio samples •  Scalable
•  E.g. USRP and GNURadio, WARP
and WARPLab, SORA SRIF '14 Non-GPP based
•  On-board dedicated or soft
processor/Standalone •  Integrated
•  Transport occurs through
internal bus
•  Not scalable •  E.g. USRP embedded series,
WARP 802.11 Reference
designs
5 SDR Architecture
IQ "
samples"
TX
Baseband
processing"
GPP/Host
Machine"
Write
Transport"
frames"
PCIe/
Ethernet/USB"
Baseband
signal"
Up-"
conversion"
FPGA Board "
ADC/DACs, "
Convertors"
Transmitter "
Wireless
Channel"
FPGA Board "
RX
Baseband
processing"
GPP/Host
Machine"
PCIe/
Ethernet/USB"
IQ "
samples"
SRIF '14 Read
Transport"
frames"
Receiver"
ADC/DACs, "
Convertors"
Baseband
signal"
Downconversion"
6 SDR Example
WARP
•  A popular SDR platform for wireless
research and prototyping
•  Self-contained with custom hardware
and software designs http://warpproject.org/
WARP hardware •  Vertex-6 FPGA with 2 dual-band RF
transceivers with maximum 40MHz
bandwidth
•  12-bit ADC/DACs, Gigabit Ethernet
ports, shared clocking, extensions SRIF '14 Image: WARP v3 node
7 SDR Example
WARPLab
•  Flexible MATLAB-based framework for developing wireless
applications with large array of WARP nodes •  Supports rapid implementation of PHY layer
•  Utilizes WARPLab FPGA reference hardware designs, and
reference code (C and MATLAB) for software design
FPGA"
Ethernet"
Driver"
BUS"
... Reference"
M-code"
WARPLab"
Buffers"
Reference "
C-code"
... Ethernet"
UDP"
Transport"
IP Cores"
MicroBlaze"
MATLAB"
WARPLab"
AGC"
Host Processor"
WARP Hardware"
SRIF '14 8 SDR Example
WARPLab library
Image Credit: WARP •  MATLAB commands for configuration of WARP nodes
•  Library modules: each paired to hardware and software design
run on the node
•  Transport module is responsible for message exchanges between
host and WARP node over Ethernet SRIF '14 9 Outline Ø  Introduction
Ø Problem of Latency
Ø  WARP Transport
Ø  Proposed Design
Ø  Evaluation
Ø  Extensions
SRIF '14 10 Transport Latency
Definition
•  For a given number of samples, the delay in reading(writing)
radio samples from(to) host memory or userspace Userspace" Kernel"
Target HW"
request()"
readIQ()"
recvfrom()"
DMA "
transfer +
Serialization"
Target HW"
Userspace" Kernel"
Fixed point
conversion"
Ethernet transfer"
DMA "
transfer+
Serialization "
writeIQ()"
sendto()"
Ethernet transfer"
Floating
point
conversion"
Timeline of read function
SRIF '14 ack"
Timeline of write function
11 Problem: Large WARP Deployments
MIMO Technology
•  Large no. of antennas (SDRs) •  Centralized processing in CoMP,
Massive MIMO, etc.
Other Applications
•  Indoor localization of wireless signals
using antenna array Argos (64 antennas), 2012
... Antenna array
WARPLab with its linear increase in transport latency with the
number of nodes is unsuitable for large deployments SRIF '14 12 Problem: Strict deadlines
Processing time in SDRs •  GPP Processing time (rx & tx) + transport latency (receive &
send)
Protocol requirements
•  WiFi processing deadline <16us •  LTE turnaround deadline ~ 3ms •  Mobile channel measurements < 10ms Holy Grail : Meeting protocol deadlines in large SDR setups SRIF '14 13 Objective
ü  Improve transport performance of large SDR
deployments using WARPLab reference design ü  Explore the implementation of accurate channel
measurements and practical wireless systems such
as LTE Only transport: SDR Acceleration on GPP is a whole different story !
SRIF '14 14 Outline Ø  Introduction
Ø  Problem of Latency
Ø WARP Transport
Ø  Proposed Design
Ø  Evaluation
Ø  Extensions
SRIF '14 15 Buffers
•  16-bit I and 16-bit Q samples
•  Single buffer per RF chain
•  Max buffer size = 32k samples
(128KB) Baseband"
WARP Transport Sampling "
"
Buffer"
UDP Protocol •  Fixed-size packetized (non-streaming)
•  UDP sockets for maximum speed
•  Sequence numbers, checksums, acks
for reliable transfer
•  Provision for timeouts
SRIF '14 DMA
transfer"
Ethernet
frames"
Transport, IP,
Ethernet headers"
16 WARP Transport Ethernet
•  Link speed = 1Gbps (2xEth ports)
•  Xilent library with DMA
•  Support for Jumbo frames (9KB)
Transport code
• 
• 
• 
• 
WARPLab Reference M-Code MEX implementation of UDP transport
Single-thread instantiation
Sequential read/write of buffers SRIF '14 17 WARP Transport Testbed
•  16x WARPv3 boards
•  HP ProCurve 6600 Switch (48x1GbE,
4x10GbE)
•  32-core Intel(R) Xeon(R) E5-2660 CPU
(HT enabled), 128GB RAM
•  Dual-port 10GbE card
•  WARPLab 7.4, MATLAB 2012b
•  Ubuntu 12.04 LTS
SRIF '14 18 WARP Transport Single node benchmarks (32K samples)
•  Max. theoretical transfer rate on 1Gbps link = 31.2 Msps Function
Packet Size
(bytes)
Line
Throughput
(Kpps)
Line
Throughput
(Mbps)
#calls
(per sec)
Transfer rate
(Msps)
Read 1508 30.83 373.2 193.3 6.3 Read 9004 13.57 972.8 314.2 10.3 Write 1508 9.8 118.4 71.1 2.3 Write 9004 13.67 979.9 336.7 11.0 Transport is the processing bottleneck in WARPLab ! Line rate
saturation
SRIF '14 Less than max rate
due to overheads
19 WARP Transport Multiple node benchmarks
•  Total read latency averaged over 103 runs (negligible variance)
Total read latency (ms)
80
60
Linear increase
40
20
0
SRIF '14 1464B, 32K
1464B, 16K
8960B, 32K
8960B, 16K
2
4
6
8
10
12
Number of WARP nodes
14
16
20 WARP Transport Multiple node benchmarks
•  Total write latency averaged over 103 runs
Total write latency (ms)
250
200
Linear increase
150
100
50
0
SRIF '14 1464B, 32K
1464B, 16K
8960B, 32K
8960B, 16K
2
4
6
8
10
12
Number of WARP nodes
14
16
21 Outline Ø  Introduction
Ø  Problem of Latency
Ø  WARP Transport
Ø Proposed Design
Ø  Evaluation
Ø  Extensions
SRIF '14 22 Proposed Design
Code Refactoring
•  Standard C/C++ instead of MEX
•  Standalone WARP driver
•  Improved interface for further
modifications •  Extensible to other WARPLab
modules
void nodes_initialize(int* node_sock , int numNodes); void readIQ (double complex* samples, int start_sample , int num_samples , int node_sock , int node_id , int buffer_id, int host_id ); void writeIQ (double complex* samples, int start_sample, int num_samples, int node_sock, int node_id, int buffer_id, int host_id); void sendTrigger(); void nodes_disable(int* node_sock, int numNodes); Transport functions
SRIF '14 23 Proposed Design
Transport Parallelism
• 
• 
• 
• 
Read/write calls are independent Multi-threaded implementationApplications!
Utilize multi-core processor
OpenMP API C/C++ extensions Measure!
latency!
RX!
Processing!
TX!
Processing!
multiWrite()!
multiRead()!
writeIQ()!
readIQ()!
warp_functions.c!
sendTrigger()!
warp_transport.c!
nodesInit()!
nodesDis()!
WARPLab UDP Transport!
!!!…!!
Code organization
SRIF '14 24 Proposed Design
Network Design
• 
• 
• 
• 
Support combined transfer rate of multiple nodes
High-capacity link at the host : 10Gbps
Switch is 1GbE/10GbE compliant
Suitable for up to 10 WARP nodes (each node at line rate ) 10 Gbps
Host
Processor
SRIF '14 1-GbE/10-GbE
Switch
1Gbps
WARP nodes
25 Proposed Design
Beyond 10 nodes
•  Additional 10Gbps link at host
•  Reduce queuing (congestion) delay
•  Static routing between two links for load-balancing ( Host has
two separate IP addresses)
10 Gbps
Host
Processor
SRIF '14 1-GbE/10-GbE
Switch
1Gbps
WARP nodes
26 Proposed Design
CWARP: https://github.com/gkchai/cwarp
SRIF '14 27 Outline Ø  Introduction
Ø  Problem of Latency
Ø  WARP Transport
Ø  Proposed Design
Ø Evaluation
Ø  Extensions
SRIF '14 28 Evaluation Comparison w/ WARPLab
Total duration (ms)
50
40
30
latency
reduction! 20
10
0
SRIF '14 M write
M read
C write
C read
2
4
6
8
10
12
Number of WARP nodes
14
16
29 Evaluation 32K samples
Total duration (ms)
3
2.5
Queuing delay
Reduction
with
additional
10Gbps link 2
1.5
1
SRIF '14 write, 10Gbps
read, 10Gbps
write, 2x10Gbps
read, 2x10Gbps
2
4
6
8
10
12
Number of WARP nodes
14
16
30 Evaluation 16K samples
Total duration (ms)
1.4
1.2
write, 10Gbps
read, 10Gbps
write, 2x10Gbps
read, 2x10Gbps
1
0.8
2
SRIF '14 Low
bandwidth
LTE is
possible ??
4
6
8
10
12
Number of WARP nodes
14
16
31 Outline Ø  Introduction
Ø  Problem of Latency
Ø  WARP Transport
Ø  Proposed Design
Ø  Evaluation
Ø Applications
SRIF '14 32 Signal Processing Libraries
Advantages of CWARP •  Fast !
•  Readily built as shared libraries of
existing SDR frameworks
•  Can be compiled to be processor
specific (thread libraries) •  Cross-SDR platform compatible SRIF '14 33 Research Adaptive transport control •  Variable sample rate, quantization •  Effect of baseband sample (lossy or
lossless) compression
•  Study of network load
Mobility measurements
•  Fine grained evaluation of wireless
channel in large MIMO systems •  Moving away from trace-based
evaluation of PHY protocols
SRIF '14 Transport
Compression Prioritization Rate control
Host PC"
PC Trigger"
TX_NODES
WriteIQ"
Host PC"
PC Trigger"
RX_NODES
Tx/Rx"
700μs" 20μs" 100μs"
ReadIQ"
Tx/Rx"
ReadIQ"
≈650μs"
time"
Channel measurement period ≈ 0.8ms"
K. C. Garikipati , K.G. Shin “Measurement-Based Transmission
Schemes for Network MIMO” , ACM MobiHoc 2014
34 Thank you SRIF '14