MoG - Northeastern University

A Power-Efficient FPGA-Based Mixture-of-Gaussian (MoG) Background
Subtraction for Full-HD Resolution
Hamed Tabkhi, Majid Sabbagh, Gunar Schirner
Department of Electrical and Computer Engineering
Northeastern University
System Exploration
Mixture of Gaussians (MoG)
1) Design Flow
1) Background Subtraction
·
·
Design Flow
Starting from system pacification
- Captured in an SLDL
·
Compute intense kernel early in vision flow
Extracts ForeGround (FG) pixels from BackGround (BG) scene
Quality Validation
Specification
(SCE)
Algo Tunning
RTL Design
Operation Width Sizing
(Hand-Crafted Design)
Deep Pipelining
- Coarse-grain parallelism
RTL validation
RTL
Processed
Input
BackGround
Subtraction
Pre-processing
Pixels
Stream
Object
Detection
Foreground
Pixels
Object
MetaData
MoG specification
Object
Object
Tracking
Synthesis and Mapping
(Xilinx Synthesis Tool)
Position
Gaus.
Param
Weight
Normalization
G-Update 0
(sync. point)
G-Update 1
Gray
Pixels
2) Mixture of Gaussian (MoG)
FG
Detection
FG
Mask
FPGA Implementation
Gaus.
Param
Bitstream Loader
Timing Validation
(Xilinx Time Analyzer)
Execution Validation
Execution
G-Update 2
(Xilinx Chip-Scope)
Adaptive learning-based BG tracking for static camera position
Gaussian (t-1)
Gaussian
Components
2) Parameter Precision, Bandwidth / Quality Trade-off
Gaussian (t)
·
(Xilinx I-Sim)
pixel(i,j) of
frame (t)
Foreground
Pixel(Yes/No)
· Precision adjustment of Gaussian
· Quality assessment using MS-SSIM
parameters
· Transfer/store only relevant bits of model
· Reduce bandwidth at cost of quality
- PSNR less expressive
· Pareto front for selecting configuration, e.g.:
- 95% quality at 63% bandwidth
Foreground Detection
N Bits
N Bits
Most Significant Bits
(MSBs)
3) MoG Computation and Communication Demands
·
N Bits
1080p60 in SW infeasible (24.3 GOPs)
- 20 (float) or 13 (integer) Blackfin DSP)
·
32 bits per Gaussian parameters
- (weight, mean, standard deviation)
1920*1080
1280*960
GOPs
Blackfin
cores
Bandwidth
[MB/Sec]
LPDDR2 Utili.
24.3
14.4
13
8
7440
4380
Saturated
70%
N Bits
00...0
Least Significant Bits
32 Bits
32 Bits
Gaussian
Parameters
Image size
Most Significant
Bits (MSBs)
Precision
Adjustment
Precision
Adjustment
(N-bit to 32-bit)
(32-bit to N-bit)
MoG
Background
Subtraction
Pixel
Stream Out
Pixel Stream
In
MoG Computation Realization
MoG Micro-Pipeline in HDL (77 pipes)
1) RTL Design
·
·
·
Hand-crafted RTL implementation
- Guided by system-level exploration / specification
Full-HD (148.5 MHz)
System pipeline (77 stages)
Input
Pixels
Gaussian
Parameters
- Macro pipeline (7 stages)
2 Cycles
1 Cycle
24 Cycles
24 Cycles
Gaussian0
Match
Detection
Gaussian0
Weight
Update
Gaussian0
Mean Update
Gaussian0
Standard Deviation
Update
Gaussian1
Match
Detection
Gaussian1
Weight
Update
Gaussian1
Mean Update
Gaussian1
Standard Deviation
Update
Gaussian2
Match
Detection
Gaussian2
Weight
Update
Gaussian2
Mean Update
Gaussian2
Standard Deviation
Update
SoC
MoG Communication Realization
·
·
·
·
·
Independent traffic management
- Separate clock domains
Dedicated Gaussian sizing unit
- Transferring only important bits
2 DMA channels for Gaussian
parameters
Dedicated interconnect for burst transfer
Async FIFOs
- Bridge clock domains
- Compensate for slow interconnect
(148.5Mhz pixel v.s. 125MHz bus)
21 Cycles
Weight Normalization
HDMI
Receiver
Winnow
Of
Interest
RGB to
Gray
Del.
buff
G-param
Read /
De-Comp.
8-bits
Width
160-bits
Width
Mixture of Gaussians (MoG)
Pixel Pipeline
(65 Cycles)
128-bits
Width
Source: simh.trailing-edge.com
Pixel Stream
@ HDMI clock
128-bits
Width
Parameter Stream
@ HDMI clock
Parameter Stream
@ AXI clock
·
1-bits
Width
160-bits
Width
HDMI Input Clock Domain
A-sync
FIFO
AXI Stream Clock Domain
128-bits
Width
DMA Read Channel
(Read G-params)
Foreground
Detection
Updated
Gaussian
Parameters
Del.
buff
G-param
Write /
Compress
HDMI
Sender
FG-Masked
Pixels
Stream
A-sync
FIFO
DMA Write Channel
(Write G-params)
Memory
Interface
DDR
128-bits width @125 MHZ
AXI Stream
AXI Lite
@ 100 MHZ
ARM Cortex-A9 Sub-System
3) Functional Evaluation
1) Zynq 7020 Realization
·
Virtual Component
Identification
ForeGround (FG)
Mask
VDMA Sync. Delay BUFF
·
·
2 Cycles
HDMI Signals Delay BUFF
Experimental Results
·
3 Cycles
MoG System integration
Input
Pixel
Stream
@148.5 MHz
1) Communication Components
Gaussian
Parameters
Design spreads over chip
Significant routing overhead
34% DSP slice utilization
Resolution limited to 1080p @ 30FPs
·
Correct FG detection for many scenes
- Scenes with different complexity
- Indoor/outdoor multiple moving objects
>99% similarity [MS-SSIM] to specification model
Original Scene
ForeGround (FG)
· Due to the Zynq peak memory bandwidth limitation: 4.2
GBs
100%
2) Power Consumption
·
·
600x more power efficient than SW (Cortex A9)
480mWatt on-chip power
· Only 19% for computation
· 67% for transferring Gaussian parameters
100%
90%
80%
70%
67%
Communication
Computation
Stream Pixels
90%
80%
70%
60%
60%
50%
50%
40%
40%
30%
19%
20%
10%
0%
26%
30%
24%
4%
28%
10%
0%
Publication
·
20%
18%
static
Async. FIFOs
Adjustment
AXI_data
DMA
AXI clk
16%
H. Tabkhi, R. Bushey, G. Schirner, "Algorithm and Architecture Co-Design of Mixture of Gaussian (MoG)
Background Subtraction for Embedded Vision", Proceedings of the Asilomar Conference on Signals, Systems,
and Computers (AsilomarSSC), Nov. 2013.