Future State Architecture for Electronic Trading

Future State Architecture for Electronic Trading
Graeme Burnett SMIEEE, MACM, MComSoc
Presented to the London Quant Group
8th May 2014
[email protected], [email protected],
Market Data Overview
Faster Feeds
• 10GbE feeds de facto, 40GbE here
• 100GbE very soon
• FPGA used to deliver market data (Activ Financial, NASDAQ)
• Others will follow soon.
!
Software Feed Handlers
• Software-based handlers can barely handle 10GbE.
• Typical performance ~ 10-15 micros in/out avg - check the outliers though...
• No use for HFT or trading as missed ticks means poor risk figures
• Platform has hardware and scheduling jitter
• Cartesian matrix support problem
!
Hardware Feed Handlers
• Principle Players: Fixnetix (sophisticated functionality, rules, session mgmt)
• Novasparks - pure market data, deterministic 1.8 micro IF to core with no jitter
• Market consolidating - also ran: Celoxica, Enyx, Fiberblaze, custom designed et al
Copyright © 2013 Graeme Burnett. All rights reserved.
2
FPGA Market Data Feeds
NASDAQ TotalView-ITCH 4.1 FPGA feed
!
•
•
•
•
•
•
•
•
•
9KB Jumbo Frames
~100 ITCH messages per frame @ 40 Gbps
125 million bytes per second
~1 million ITCH messages per second - 1 message per microsecond
Parsing in software with 100% reliability is impossible (even at 10GbE)
Minimum server jitter is 3 microseconds
Add PCI transfer buffering, OS Scheduling, cache misses, TLB misses etc
SSE won’t help much
FPGA based parsing is mandatory for FIX/ITCH messages
Copyright © 2013 Graeme Burnett. All rights reserved.
3
High Frequency Trading Software Techniques
!
!
Atomics
• Lock-free (Shavit et al/Fraser et al)
• Disruptor (n-m queue with back pressure - Thompson et al)
Software Engineering Paradigms
• Sinks, Sources and Actors (Xcelerit)
• Work-stealing queue (generic)
• Asynchronous threading with user space locking
• Hash algorithm optimisation
Intel Intrinsics
• Streaming SIMD Extentions (SSE)
• ascii/int conversions (x3 speed up)
• Cache management (prefetching)
!
Software Transactional Memory
• Distributed Order Management
!
Program, Memory and Cache Management
• Prefaulting (TLB/Huge Pages/mlock)
• __builtin_expect() (L1 cache misses)
• False sharing (L2 cache misses)
• Instruction timing
• Assembler analytics
Copyright © 2013 Graeme Burnett. All rights reserved.
4
HFT Software Techniques continued.
Operating System
!
•
•
•
•
Kernel bypass
CPU Pinning (SCHED_FIFO etc)
Customised kernels/schedulers
Data plane hacking (DPDK), Intel TBB
•
•
•
•
•
•
Multicast
IGMP snooping
Xorp
HSRP/NAT avoidance
Firewall/Switch bypass
Customised NIC drivers
!
Networking
!
!
NIC Card Techniques
!
!
•
•
•
Flow steering
Receiver Side Scaling
Ethernet packet access (ef_vi, VMA)
Hardware Techniques
!
•
•
DDI (L3 cache injection)
Jitter reduction (platform interrupts analysis - sysjitter/ftq)
Copyright © 2013 Graeme Burnett. All rights reserved.
5
FPGA Technology SWOT
Strengths
• Fast execution
• Parallelisation
• Deterministic performance
• Low Power - lower operational costs.
Weaknesses
• Complexity
• Development languages are hard to master
• Long development cycle
• Long verification cycle
• Poor track record in Finance
Opportunities
• Greatly enhanced performance
• Significantly less energy use
!
Threats
• Software Engineers - they think they can become hardware engineers overnight
• Choosing inappropriate problems to solve (e.g. trading strategies)
Copyright © 2013 Graeme Burnett. All rights reserved.
6
FPGA based Trading Use Cases
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Session Management (timed sign-in, re-sign-in, group cancel)
BGP/IGMP session management and address re-advertisement
A/B Line arbitration
Simulsend: Route diversity for fibre/microwave
Protocol Conversion: FIX/ITCH to binary translation
Common format conversions: Accelio, C structure, Protocol Buffers,
MessagePack, LBM, Thrift
Symbol Shredding, Flow steering, Market Data QOS, Temporal Queues
Multicast Emission - rebroadcasting
Market Map with full depth in user space memory
Rules Engine: Risk checks, Kill Switch
Crossing Engine - deterministic, accurate timestamp
VWAP/TWAP/Volatility/Real-time Risk
High-accuracy Packet Time Stamping
Flow Capture (drop copy, flow notarisation)
Transactional Order Manager using ePCIe and Non-Transparent Bridge
Virtualisation: data de-duplication and versioning
Throttle Management
Templatised Trading,
TCP offload
Exchange, Network, Platform jitter collection and analytics
Copyright © 2013 Graeme Burnett. All rights reserved.
7
Multicore Technology SWOT
Strengths
• Many cores make light work
• Parallelisation
• Low Power - lower operational costs
• Good IO capacity
Weaknesses
• Poor single thread performance (~20% slower)
• Optimisation hard (vectorisation, SSE, AVX)
• Misunderstood by developer community
• Poor track record in Finance (Sun Niagara)
Opportunities
• Greatly enhanced performance
• Asynchronous threads programming model
• Significantly less energy use
!
Threats
• Software Engineers
• Lack of development environments
• 3x increase in software development cycle
!
Copyright © 2013 Graeme Burnett. All rights reserved.
8
GPU Technology SWOT
Strengths
• Many cores make light work
• Parallelisation
Weaknesses
• Memory Model
• Optimisation hard
• Misunderstood by developer community
• Lock-in
• Keeping cores busy
• Poor IO capability
Opportunities
• Greatly enhanced performance
• Lower operational costs
• Significantly less energy use
!
Threats
• OpenCL
• FPGA (much lower power)
Copyright © 2013 Graeme Burnett. All rights reserved.
9
Future State
•
•
•
•
•
•
•
•
•
•
•
•
Hybrid hardware/software systems
GPU/FPGA/Multicore/SoC/ARM/Phi
Embedded strategies (Terra/Lua/OpenCL)
ePCIe/NTB architectures
Synthetic fill-rate graded trading venues
Analogue Trading
Geodesic Trading
Trade Notarisation
Ultra-high accuracy time
Binary market data
SaaS marketplace
Intelligence-based trading
Copyright © 2013 Graeme Burnett. All rights reserved.
10