Future State Architecture for Electronic Trading Graeme Burnett SMIEEE, MACM, MComSoc Presented to the London Quant Group 8th May 2014 [email protected], [email protected], Market Data Overview Faster Feeds • 10GbE feeds de facto, 40GbE here • 100GbE very soon • FPGA used to deliver market data (Activ Financial, NASDAQ) • Others will follow soon. ! Software Feed Handlers • Software-based handlers can barely handle 10GbE. • Typical performance ~ 10-15 micros in/out avg - check the outliers though... • No use for HFT or trading as missed ticks means poor risk figures • Platform has hardware and scheduling jitter • Cartesian matrix support problem ! Hardware Feed Handlers • Principle Players: Fixnetix (sophisticated functionality, rules, session mgmt) • Novasparks - pure market data, deterministic 1.8 micro IF to core with no jitter • Market consolidating - also ran: Celoxica, Enyx, Fiberblaze, custom designed et al Copyright © 2013 Graeme Burnett. All rights reserved. 2 FPGA Market Data Feeds NASDAQ TotalView-ITCH 4.1 FPGA feed ! • • • • • • • • • 9KB Jumbo Frames ~100 ITCH messages per frame @ 40 Gbps 125 million bytes per second ~1 million ITCH messages per second - 1 message per microsecond Parsing in software with 100% reliability is impossible (even at 10GbE) Minimum server jitter is 3 microseconds Add PCI transfer buffering, OS Scheduling, cache misses, TLB misses etc SSE won’t help much FPGA based parsing is mandatory for FIX/ITCH messages Copyright © 2013 Graeme Burnett. All rights reserved. 3 High Frequency Trading Software Techniques ! ! Atomics • Lock-free (Shavit et al/Fraser et al) • Disruptor (n-m queue with back pressure - Thompson et al) Software Engineering Paradigms • Sinks, Sources and Actors (Xcelerit) • Work-stealing queue (generic) • Asynchronous threading with user space locking • Hash algorithm optimisation Intel Intrinsics • Streaming SIMD Extentions (SSE) • ascii/int conversions (x3 speed up) • Cache management (prefetching) ! Software Transactional Memory • Distributed Order Management ! Program, Memory and Cache Management • Prefaulting (TLB/Huge Pages/mlock) • __builtin_expect() (L1 cache misses) • False sharing (L2 cache misses) • Instruction timing • Assembler analytics Copyright © 2013 Graeme Burnett. All rights reserved. 4 HFT Software Techniques continued. Operating System ! • • • • Kernel bypass CPU Pinning (SCHED_FIFO etc) Customised kernels/schedulers Data plane hacking (DPDK), Intel TBB • • • • • • Multicast IGMP snooping Xorp HSRP/NAT avoidance Firewall/Switch bypass Customised NIC drivers ! Networking ! ! NIC Card Techniques ! ! • • • Flow steering Receiver Side Scaling Ethernet packet access (ef_vi, VMA) Hardware Techniques ! • • DDI (L3 cache injection) Jitter reduction (platform interrupts analysis - sysjitter/ftq) Copyright © 2013 Graeme Burnett. All rights reserved. 5 FPGA Technology SWOT Strengths • Fast execution • Parallelisation • Deterministic performance • Low Power - lower operational costs. Weaknesses • Complexity • Development languages are hard to master • Long development cycle • Long verification cycle • Poor track record in Finance Opportunities • Greatly enhanced performance • Significantly less energy use ! Threats • Software Engineers - they think they can become hardware engineers overnight • Choosing inappropriate problems to solve (e.g. trading strategies) Copyright © 2013 Graeme Burnett. All rights reserved. 6 FPGA based Trading Use Cases • • • • • • • • • • • • • • • • • • • • Session Management (timed sign-in, re-sign-in, group cancel) BGP/IGMP session management and address re-advertisement A/B Line arbitration Simulsend: Route diversity for fibre/microwave Protocol Conversion: FIX/ITCH to binary translation Common format conversions: Accelio, C structure, Protocol Buffers, MessagePack, LBM, Thrift Symbol Shredding, Flow steering, Market Data QOS, Temporal Queues Multicast Emission - rebroadcasting Market Map with full depth in user space memory Rules Engine: Risk checks, Kill Switch Crossing Engine - deterministic, accurate timestamp VWAP/TWAP/Volatility/Real-time Risk High-accuracy Packet Time Stamping Flow Capture (drop copy, flow notarisation) Transactional Order Manager using ePCIe and Non-Transparent Bridge Virtualisation: data de-duplication and versioning Throttle Management Templatised Trading, TCP offload Exchange, Network, Platform jitter collection and analytics Copyright © 2013 Graeme Burnett. All rights reserved. 7 Multicore Technology SWOT Strengths • Many cores make light work • Parallelisation • Low Power - lower operational costs • Good IO capacity Weaknesses • Poor single thread performance (~20% slower) • Optimisation hard (vectorisation, SSE, AVX) • Misunderstood by developer community • Poor track record in Finance (Sun Niagara) Opportunities • Greatly enhanced performance • Asynchronous threads programming model • Significantly less energy use ! Threats • Software Engineers • Lack of development environments • 3x increase in software development cycle ! Copyright © 2013 Graeme Burnett. All rights reserved. 8 GPU Technology SWOT Strengths • Many cores make light work • Parallelisation Weaknesses • Memory Model • Optimisation hard • Misunderstood by developer community • Lock-in • Keeping cores busy • Poor IO capability Opportunities • Greatly enhanced performance • Lower operational costs • Significantly less energy use ! Threats • OpenCL • FPGA (much lower power) Copyright © 2013 Graeme Burnett. All rights reserved. 9 Future State • • • • • • • • • • • • Hybrid hardware/software systems GPU/FPGA/Multicore/SoC/ARM/Phi Embedded strategies (Terra/Lua/OpenCL) ePCIe/NTB architectures Synthetic fill-rate graded trading venues Analogue Trading Geodesic Trading Trade Notarisation Ultra-high accuracy time Binary market data SaaS marketplace Intelligence-based trading Copyright © 2013 Graeme Burnett. All rights reserved. 10
© Copyright 2024 ExpyDoc