A Power-Efficient FPGA-Based Mixture-of-Gaussian (MoG) Background Subtraction for Full-HD Resolution Hamed Tabkhi, Majid Sabbagh, Gunar Schirner Department of Electrical and Computer Engineering Northeastern University System Exploration Mixture of Gaussians (MoG) 1) Design Flow 1) Background Subtraction · · Design Flow Starting from system pacification - Captured in an SLDL · Compute intense kernel early in vision flow Extracts ForeGround (FG) pixels from BackGround (BG) scene Quality Validation Specification (SCE) Algo Tunning RTL Design Operation Width Sizing (Hand-Crafted Design) Deep Pipelining - Coarse-grain parallelism RTL validation RTL Processed Input BackGround Subtraction Pre-processing Pixels Stream Object Detection Foreground Pixels Object MetaData MoG specification Object Object Tracking Synthesis and Mapping (Xilinx Synthesis Tool) Position Gaus. Param Weight Normalization G-Update 0 (sync. point) G-Update 1 Gray Pixels 2) Mixture of Gaussian (MoG) FG Detection FG Mask FPGA Implementation Gaus. Param Bitstream Loader Timing Validation (Xilinx Time Analyzer) Execution Validation Execution G-Update 2 (Xilinx Chip-Scope) Adaptive learning-based BG tracking for static camera position Gaussian (t-1) Gaussian Components 2) Parameter Precision, Bandwidth / Quality Trade-off Gaussian (t) · (Xilinx I-Sim) pixel(i,j) of frame (t) Foreground Pixel(Yes/No) · Precision adjustment of Gaussian · Quality assessment using MS-SSIM parameters · Transfer/store only relevant bits of model · Reduce bandwidth at cost of quality - PSNR less expressive · Pareto front for selecting configuration, e.g.: - 95% quality at 63% bandwidth Foreground Detection N Bits N Bits Most Significant Bits (MSBs) 3) MoG Computation and Communication Demands · N Bits 1080p60 in SW infeasible (24.3 GOPs) - 20 (float) or 13 (integer) Blackfin DSP) · 32 bits per Gaussian parameters - (weight, mean, standard deviation) 1920*1080 1280*960 GOPs Blackfin cores Bandwidth [MB/Sec] LPDDR2 Utili. 24.3 14.4 13 8 7440 4380 Saturated 70% N Bits 00...0 Least Significant Bits 32 Bits 32 Bits Gaussian Parameters Image size Most Significant Bits (MSBs) Precision Adjustment Precision Adjustment (N-bit to 32-bit) (32-bit to N-bit) MoG Background Subtraction Pixel Stream Out Pixel Stream In MoG Computation Realization MoG Micro-Pipeline in HDL (77 pipes) 1) RTL Design · · · Hand-crafted RTL implementation - Guided by system-level exploration / specification Full-HD (148.5 MHz) System pipeline (77 stages) Input Pixels Gaussian Parameters - Macro pipeline (7 stages) 2 Cycles 1 Cycle 24 Cycles 24 Cycles Gaussian0 Match Detection Gaussian0 Weight Update Gaussian0 Mean Update Gaussian0 Standard Deviation Update Gaussian1 Match Detection Gaussian1 Weight Update Gaussian1 Mean Update Gaussian1 Standard Deviation Update Gaussian2 Match Detection Gaussian2 Weight Update Gaussian2 Mean Update Gaussian2 Standard Deviation Update SoC MoG Communication Realization · · · · · Independent traffic management - Separate clock domains Dedicated Gaussian sizing unit - Transferring only important bits 2 DMA channels for Gaussian parameters Dedicated interconnect for burst transfer Async FIFOs - Bridge clock domains - Compensate for slow interconnect (148.5Mhz pixel v.s. 125MHz bus) 21 Cycles Weight Normalization HDMI Receiver Winnow Of Interest RGB to Gray Del. buff G-param Read / De-Comp. 8-bits Width 160-bits Width Mixture of Gaussians (MoG) Pixel Pipeline (65 Cycles) 128-bits Width Source: simh.trailing-edge.com Pixel Stream @ HDMI clock 128-bits Width Parameter Stream @ HDMI clock Parameter Stream @ AXI clock · 1-bits Width 160-bits Width HDMI Input Clock Domain A-sync FIFO AXI Stream Clock Domain 128-bits Width DMA Read Channel (Read G-params) Foreground Detection Updated Gaussian Parameters Del. buff G-param Write / Compress HDMI Sender FG-Masked Pixels Stream A-sync FIFO DMA Write Channel (Write G-params) Memory Interface DDR 128-bits width @125 MHZ AXI Stream AXI Lite @ 100 MHZ ARM Cortex-A9 Sub-System 3) Functional Evaluation 1) Zynq 7020 Realization · Virtual Component Identification ForeGround (FG) Mask VDMA Sync. Delay BUFF · · 2 Cycles HDMI Signals Delay BUFF Experimental Results · 3 Cycles MoG System integration Input Pixel Stream @148.5 MHz 1) Communication Components Gaussian Parameters Design spreads over chip Significant routing overhead 34% DSP slice utilization Resolution limited to 1080p @ 30FPs · Correct FG detection for many scenes - Scenes with different complexity - Indoor/outdoor multiple moving objects >99% similarity [MS-SSIM] to specification model Original Scene ForeGround (FG) · Due to the Zynq peak memory bandwidth limitation: 4.2 GBs 100% 2) Power Consumption · · 600x more power efficient than SW (Cortex A9) 480mWatt on-chip power · Only 19% for computation · 67% for transferring Gaussian parameters 100% 90% 80% 70% 67% Communication Computation Stream Pixels 90% 80% 70% 60% 60% 50% 50% 40% 40% 30% 19% 20% 10% 0% 26% 30% 24% 4% 28% 10% 0% Publication · 20% 18% static Async. FIFOs Adjustment AXI_data DMA AXI clk 16% H. Tabkhi, R. Bushey, G. Schirner, "Algorithm and Architecture Co-Design of Mixture of Gaussian (MoG) Background Subtraction for Embedded Vision", Proceedings of the Asilomar Conference on Signals, Systems, and Computers (AsilomarSSC), Nov. 2013.
© Copyright 2024 ExpyDoc