Vision BOF : Aug14

Open Standard APIs for
Vision and Camera Processing
Neil Trevett
Vice President Mobile Ecosystem, NVIDIA
President, Khronos Group
© Copyright Khronos Group 2014 - Page 1
Khronos Connects Software to Silicon
Open Consortium creating
ROYALTY-FREE, OPEN STANDARD
APIs for hardware acceleration
Defining the roadmap for
low-level silicon interfaces
needed on every platform
Graphics, compute, rich media,
vision, sensor and camera
processing
Rigorous specifications AND
conformance tests for crossvendor portability
Acceleration APIs
BY the Industry
FOR the Industry
Well over a BILLION people use Khronos APIs
Every Day…
© Copyright Khronos Group 2014 - Page 2
Khronos Standards
3D Asset Handling
- 3D authoring asset interchange
- 3D asset transmission format
with compression
Visual Computing
- 3D Graphics
- Heterogeneous Parallel Computing
Over 100 companies defining royalty-free
APIs to connect software to silicon
Acceleration in HTML5
Sensor Processing
- 3D in browser – no Plug-in
- Heterogeneous computing for JavaScript
- Vision Acceleration
- Camera Control
- Sensor Fusion
© Copyright Khronos Group 2014 - Page 3
Visual Computing = Graphics PLUS Vision
Vision
Processing
Imagery
Enhanced sensor
and vision
capability deepens
the interaction
between real and
virtual worlds
Data
Graphics
Processing
Real-time GPU Compute
Research project on CUDA-enabled laptop
High-Quality Reflections, Refractions, and Caustics in Augmented
Reality and their Contribution to Visual Coherence
P. Kán, H. Kaufmann, Institute of Software Technology and Interactive
Systems, Vienna University of Technology, Vienna, Austria
https://www.youtube.com/watch?v=i2MEwVZzDaA
© Copyright Khronos Group 2014 - Page 4
Mobile Visual Computing = New Experiences
Need for advanced sensors
and the GPU throughput to
process them
Computational
Photography and
Videography
Face, Body and
Gesture Tracking
3D Scene/Object
Reconstruction
Augmented
Reality
© Copyright Khronos Group 2014 - Page 5
Vision Pipeline Challenges and Opportunities
Growing Camera Diversity
Diverse Vision Processors
Sensor Proliferation
Capturing color, range
and lightfields
Driving for high performance
and low power
Diverse sensor awareness of
the user and surroundings
• Light / Proximity
• 2 cameras
• 3 microphones
• Touch
• Camera sensors >20MPix
• Novel sensor configurations
• Stereo pairs
• Plenoptic Arrays
• Active Structured Light
• Active TOF
Flexible sensor and camera
control to generate
required image stream
• Camera ISPs
• Dedicated vision IP blocks
• DSPs and DSP arrays
• Programmable GPUs
• Multi-core CPUs
Use best processing available
for image stream processing –
with code portability
19
• Position
- GPS
- WiFi (fingerprint)
- Cellular trilateration
- NFC/Bluetooth Beacons
• Accelerometer
• Magnetometer
• Gyroscope
• Pressure / Temp / Humidity
Control/fuse vision data
by/with all other sensor data
on device
© Copyright Khronos Group 2014 - Page 6
Vision Processing Power Efficiency
• Depth sensors = significant processing
- Generate/use environmental information
Advanced
Sensors
• Wearables will need ‘always-on’ vision
- With smaller thermal limit / battery than phones!
• GPUs has x10 CPU imaging power efficiency
- GPUs architected for efficient pixel handling
• Traditional cameras have dedicated hardware
- ISP = Image Signal Processor – on all SOCs today
• Potential for dedicated sensor/vision silicon
- Can trigger full CPU/GPU complex
But how to program specialized processors?
Performance and Functional Portability
X100
Power Efficiency
• SOCs have space for more transistors
- But can’t turn on at same time = Dark Silicon
Wearables
X10
X1
Dedicated
Hardware
GPU
Compute
Multi-core
CPU
Computation Flexibility
© Copyright Khronos Group 2014 - Page 7
OpenVX – Power Efficient Vision Acceleration
• Out-of-the-Box vision acceleration framework
- Low-power, real-time, mobile and embedded
• Performance portability for diverse hardware
- ISPs, Dedicated vision blocks,
DSPs and DSP arrays, GPUs, Multi-core CPUs
• Suited for low-power, always-on acceleration
- Can run solely on dedicated vision hardware
• Foundational API for vision acceleration
- Can be used by middleware or applications
Application
OpenCV open
source library
Other higher-level
CV libraries
• Complementary to OpenCV
- Which is great for prototyping
• Khronos open source sample implementation
- To be released with final specification
Open source sample
implementation
Hardware vendor
implementations
© Copyright Khronos Group 2014 - Page 8
OpenVX Graphs – The Key to Efficiency
• Vision processing directed graphs for power and performance efficiency
- Each Node can be implemented in software or accelerated hardware
- Nodes may be fused by the implementation to eliminate memory transfers
- Processing can be tiled to keep data entirely in local memory/cache
• VXU Utility Library for access to single nodes
- Easy way to start using OpenVX by calling each node independently
• EGLStreams can provide data and event interop with other Khronos APIs
- BUT use of other Khronos APIs are not mandated
Native
Camera
Control
OpenVX
Node
OpenVX
Node
OpenVX
Node
OpenVX
Node
Downstream
Application
Processing
Example OpenVX Graph
© Copyright Khronos Group 2014 - Page 9
OpenVX 1.0 Function Overview
• Core data structures
- Images and Image Pyramids
- Processing Graphs, Kernels, Parameters
• Image Processing
- Arithmetic, Logical, and statistical operations
- Multichannel Color and BitDepth Extraction and Conversion
- 2D Filtering and Morphological operations
- Image Resizing and Warping
• Core Computer Vision
- Pyramid computation
- Integral Image computation
• Feature Extraction and Tracking
- Histogram Computation and Equalization
- Canny Edge Detection
- Harris and FAST Corner detection
- Sparse Optical Flow
Widely used extensions
adopted into future
versions of the core
OpenVX Specification
Evolution
OpenVX 1.0 defines
framework for
creating, managing and
executing graphs
Focused set of widely
used functions that are
readily accelerated
Implementers can add
functions as extensions
© Copyright Khronos Group 2014 - Page 10
Example Graph - Stereo Machine Vision
OpenVX Graph
Camera 1
Stereo
Rectify with
Remap
Camera 2
Stereo
Rectify with
Remap
Compute Depth
Map
(User Node)
Detect and
track objects
(User Node)
Image
Pyramid
Object
coordinates
Compute
Optical
Flow
Delay
Tiling extension enables user nodes (extensions) to also optimally run in local memory
© Copyright Khronos Group 2014 - Page 11
OpenVX and OpenCV are Complementary
Governance
Community driven open source
with no formal specification
Formal specification defined and
implemented by hardware vendors
Conformance
No conformance tests for consistency and
every vendor implements different subset
Full conformance test suite / process
creates a reliable acceleration platform
Portability
APIs can vary depending on processor
Hardware abstracted for portability
Scope
Very wide
1000s of imaging and vision functions
Multiple camera APIs/interfaces
Tight focus on hardware accelerated
functions for mobile vision
Use external camera API
Efficiency
Memory-based architecture
Each operation reads and writes memory
Graph-based execution
Optimizable computation, data transfer
Use Case
Rapid experimentation
Production development & deployment
© Copyright Khronos Group 2014 - Page 12
OpenVX Participants and Timeline
• Provisional 1.0 specification released November 2013 for industry feedback
- An update to the provisional spec published in July
• OpenVX 1.0 final release planned for 2014
- With conformance tests
• Itseez is working group chair (the convener of OpenCV)
- Qualcomm and TI are specification editors
© Copyright Khronos Group 2014 - Page 13
NVIDIA VisionWorks Uses OpenVX
• VisionWorks library contains diverse vision and imaging primitives
• Leverages OpenVX for optimized primitive execution
• Can extend VisionWorks nodes through CUDA accelerated primitives
Applications and Middleware
• Provided with sample library of fully accelerated pipelines
Vision Pipeline Samples
Object
Detection
…
SLAM
3rd Party Pipelines
VisionWorks
Framework
VisionWorks Primitives
Classifier
Corner
Detection
…
3rd Party
CUDA Libraries
Tegra K1
© Copyright Khronos Group 2014 - Page 14
OpenCL – Portable Heterogeneous Computing
• Portable Heterogeneous programming of diverse compute resources
- Targeting supercomputers -> embedded systems -> mobile devices
• One code tree can be executed on CPUs, GPUs, DSPs and hardware
- Dynamically interrogate system load and balance work across available processors
• OpenCL = Two APIs and C-based Kernel language
- Platform Layer API to query, select and initialize compute devices
- Kernel language - Subset of ISO C99 + language extensions
- C Runtime API to build and execute kernels
OpenCL
across multiple devices
Kernel
OpenCL
Code
Kernel
OpenCL
Code
Kernel
OpenCL
Code
Kernel
Code
GPU
DSP
HW
CPU
CPU
© Copyright Khronos Group 2014 - Page 15
OpenCL as Parallel Language Backend
JavaScript
binding for
initiation of
OpenCL C
kernels
Language for
image
processing and
computational
photography
MulticoreWare
open source
project on
Bitbucket
Embedded
array
language for
Haskell
Java language River Trail
extensions
Language
for
extensions to
parallelism
JavaScript
Compiler
directives for
Fortran,
C and C++
PyOpenCL
Python
wrapper
around
OpenCL
Harlan
High level
language
for GPU
programming
SPIR
Standard Portable
Intermediate Representation
(extending LLVM for parallel computation)
SPIR 2.0 Released here at SIGGRAPH
OpenCL provides vendor optimized,
cross-platform, cross-vendor access to
heterogeneous compute resources
© Copyright Khronos Group 2014 - Page 16
Mixamo - Avatar Videoconferencing
• Real time facial animation capture on mobile – ported directly from PC
• Animate an avatar while conferencing
• Full GPU acceleration of vision processing using OpenCL
NVIDIA Tegra K1 Development Board
© Copyright Khronos Group 2014 - Page 17
Khronos APIs for Vision Processing
GPU Compute Shaders (OpenGL 4.X and OpenGL ES 3.1)
Pervasively available on almost any mobile device or OS
Easy integration into graphics apps – no compute API interop needed
Program in GLSL not C
Limited to acceleration on a single GPU
General Purpose Heterogeneous Programming Framework
Flexible, low-level access to any devices with OpenCL compiler
Single programming and run-time framework for CPUs, GPUs, DSPs, hardware
Open standard for any device or OS – being used as backed by many languages and frameworks
Needs full compiler stack and IEEE precision
Out of the Box Vision Framework - Operators and graph framework library
Can run on dedicated hardware – no compiler needed
Easier performance portability to diverse hardware
Suited for low-power, always-on acceleration
Fixed set of operators – but can be extended
It is possible to use OpenCL or GLSL to build OpenVX Nodes on programmable devices
© Copyright Khronos Group 2014 - Page 18
Kari Pulli, NVIDIA Research
© Copyright Khronos Group 2014 - Page 19
Advanced Camera Control Use Cases
• High-dynamic range (HDR) and computational flash photography
- High-speed burst with individual frame control over exposure and flash
• Subject isolation and depth detection
- High-speed burst with individual frame control over focus
• Rolling shutter elimination
- High-precision intra-frame synchronization between camera and motion sensor
• Augmented Reality
- 60Hz, low-latency capture with motion sensor synchronization
- Multiple Region of Interest (ROI) capture
- Synchronized stereo sensors for scene scaling
- Detailed feedback on camera operation per frame
• Time-of-flight or structured light depth camera processing
- Aligned stacking of data from multiple sensors
© Copyright Khronos Group 2014 - Page 20
Typical Imaging Pipeline
Lens, sensor, aperture control
Pre-processing
Image Signal Processor
(ISP)
Bayer
Postprocessing
App
RGB
YUV
CMOS sensor
Color Filter Array
Lens
• Pre-processing is non-existent in basic use-cases
• Pre- and Post-processing can be done on CPU, GPU, DSP…
• ISP controls camera via 3A algorithms
Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF)
© Copyright Khronos Group 2014 - Page 21
High Dynamic Range (HDR)
• HDR works by combining differing exposures into the same image
• A variety of methods for HDR, based on application
- Multiple frame HDR (requires frame memory)
- Interlace HDR
- Multiple Zone HDR
Short
exposure
Optional mid
exposure
Long
exposure
HDR processing
• HDR requires
- Precise control over camera parameters (exposure)
- Fast capture and processing of multiple images
- Note: with interlace HDR, only 1 image is needed
© Copyright Khronos Group 2014 - Page 22
Image stitching, panoramic images
• Made with
• Requires processing of multiple images
• Requires position / geometry information
• Requires control of camera (e.g. AE lock)
© Copyright Khronos Group 2014 - Page 23
Typical Burst Sequence Applications
© Copyright Khronos Group 2014 - Page 24
Pipelined Sensor Model
• Traditional one-shot sensor model
- Need to know which parameters were used
-  reset pipeline between shots  slow
• Viewfinding / video mode:
- Pipelined, high frame rate
- Settings changes take effect later
• Need new model for Computational
Photography
- Need parameterized SEQUENCE of images
to feed advanced algorithms
• Real image sensors are pipelined
- While one frame exposing
- Next one is being prepared
- Previous one is being read out
© Copyright Khronos Group 2014 - Page 25
Need for Camera Control API - OpenKCAM
• Advanced control of ISP and camera subsystem – with cross-platform portability
- Generate sophisticated image stream for advanced imaging & vision apps
• No platform API currently fulfills all developer requirements
- Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays
- Cross sensor synch: e.g. synch of camera and MEMS sensors
- Advanced, high-frequency per-frame burst control of camera/sensor: e.g. ROI
- Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing
Defines control of Sensor, Color Filter Array
Lens, Flash, Focus, Aperture
Auto Exposure (AE)
Auto White Balance (AWB)
Auto Focus (AF)
Image Signal
Processor (ISP)
EGLStreams
Image/Vision
Applications
© Copyright Khronos Group 2014 - Page 26
OpenKCAM API Requirements
• Provide functional portability for advanced camera applications
- Reduce extreme fragmentation for ISVs wanting more than point and shoot
• Application control over ISP processing (including 3A)
- Including multiple, re-entrant ISPs
• Control multiple sensors with synch and alignment
- E.g. Stereo pairs, Plenoptic arrays, TOF or structured light depth cameras
• Enhanced per frame detailed control
- Format flexibility, Region of Interest (ROI) selection
• Global timing & synchronization
- E.g. Between cameras and MEMS sensors
• Flexible processing/streaming
- Multiple input and output streams with RAW, Bayer or YUV Processing
- Streaming of rows (not just frames)
Enable advanced camera functionality not available on current platforms
© Copyright Khronos Group 2014 - Page 27
OpenKCAM is FCAM-based
• FCAM (2010) Stanford/Nokia, open source
• Capture stream of camera images with precision control
- A pipeline that converts requests into image stream
- All parameters packed into the requests - no visible state
- Programmer has full control over sensor settings for each frame in stream
• Control over focus and flash
- No hidden daemon running
• Control ISP
- Can access supplemental
statistics from ISP if available
• No global state
- State travels with image requests
- Every pipeline stage may have different state
- Enables fast, deterministic state changes
© Copyright Khronos Group 2014 - Page 28
OpenKCAM Design Philosophy
• C-language API starting from proven designs
- e.g. FCAM, Android camera platform
• Design alignment with widely used hardware standards
- e.g. MIPI CSI
• Focus on mobile, power-limited devices
- But do not preclude other use cases such as automotive, DSLR…
• Minimize overlap and maximize interoperability with other Khronos APIs
- But other Khronos APIs are not required
• Provide support for vendor-specific extensions
© Copyright Khronos Group 2014 - Page 29
Potential Adoption on Android
• Android Exposes Java camera APIs to developers
- Controls underlying Camera HAL
• Camera HAL v1 API simplified basic point and shoot apps
- Difficult or impossible to do much else
• Camera HAL v3 API is a fundamentally different API
- Streams-based to enable more sophisticated camera applications
OpenKCAM builds on FCAM with a goal of
being forward compatible with Android
architecture
Camera API
Open source
project developed
by Nokia and
Stanford
HAL V3 adopts many
FCAM ideas and can use
EGL in its implementation
OpenKCAM may be used to IMPLEMENT Android
Camera HAL – and provide an advanced native
camera API in NDK
© Copyright Khronos Group 2014 - Page 30
Participating Companies and Milestones
Group charter
approved
Specification
ratification
3Q14
Apr13
Jul13
Sample
implementation and
tests
1Q15
© Copyright Khronos Group 2014 - Page 31
OpenKCAM Working Group
• Royalty free API for portable access to advanced mobile camera functionality
- Reduce fragmentation and encourage more advanced camera applications
• Control for the new wave of sensors to enable advanced imaging and vision
- Multiple sensors, depth cameras, synchronized sensors
• Provide sophisticated camera functionality not available on today’s platforms
- But work to enable easy adoption by platform vendors
• Eager to contribute? Join Khronos OpenKCAM WG!
- http://www.khronos.org/camera
• Mikaël Bourges-Sévenier
- [email protected]
© Copyright Khronos Group 2014 - Page 32
Neil Trevett, NVIDIA
© Copyright Khronos Group 2014 - Page 33
Sensor Industry Fragmentation …
© Copyright Khronos Group 2014 - Page 34
Sensor Types
• Basic sensor data:
- Acceleration, Magnetic Field, Angular Rates
- Pressure, Ambient Light, Proximity, Temperature, Humidity, RGB light, UV light
- Heart rate, Blood Oxygen Level, Skin Hydration, Breathalyzer
• Sensor fusion
- Orientation (Quaternion or Euler Angles), Gravity, Linear Acceleration
- Position
• Context awareness
- Device Motion: general movement of the device: still, free-fall, …
- Carry: how the device is being held by a user: in pocket, in hand, …
- Posture: how the body holding the device is positioned: standing, sitting, step, …
- Transport: about the environment around the device: in elevator, in car, …
© Copyright Khronos Group 2014 - Page 35
Low-level Sensor Abstraction API
Apps request semantic sensor information
StreamInput defines possible requests, e.g.
Read Physical or Virtual Sensors e.g. “Game Quaternion”
Context detection e.g. “Am I in an elevator?”
Apps Need Sophisticated
Access to Sensor Data
Advanced Sensors Everywhere
Without coding to specific
sensor hardware
Sensor Discoverability
Sensor Code Portability
Multi-axis motion/position, quaternions,
context-awareness, gestures, activity
monitoring, health and environmental sensors
StreamInput processing graph provides
optimized sensor data stream
High-value, smart sensor fusion middleware can connect
to apps in a portable way
Apps can gain ‘magical’ situational awareness
© Copyright Khronos Group 2014 - Page 36
StreamInput: Platform Integration
Applications
OS Sensor APIs
Middleware
(E.g. Android SensorManager or
iOS CoreMotion)
(E.g. Context-awareness engines,
gaming engines)
Flexible native API to
integrate where needed
depending on existing
platform sensor stacks
Low-level native API defines portable access to
fused sensor data stream and context-awareness
Sensor
Sensor
…
Sensor
Sensor
Hub
Hub
© Copyright Khronos Group 2014 - Page 37
Sensor OSP Announcement
• Proposal to converge OSP (Open Sensor Platform) APIs with StreamInput
- Sensor Platforms is StreamInput Spec Editor
© Copyright Khronos Group 2014 - Page 38
EGL 1.5 Released at GDC 2014
• EGL 1.5 brings functionality from
multiple extensions into core
- Increased reliability and portability
• EGLImages
- Sharing textures and renderbuffers
Applications
API Interop
EGL provides efficient
transfer of data and events
between Khronos APIs
• Context Robustness
- Defending against malicious code
• EGLSync objects
- Improved OpenGL /OpenCL interop
• Platform extensions
- Standardized interactions for multiple
OS e.g. Android and 64-bit platforms
• sRGB colorspace rendering
Application Portability
EGL abstracts graphics context
management, surface and
buffer binding and rendering
synchronization
OS and Display
Platforms
© Copyright Khronos Group 2014 - Page 39
Potential EGL Future Directions
• EGLImageStream extensions are very powerful today
- But need wider implementation in drivers
- Stream other types of data – unformatted buffers for metadata and more
- GPU-to-GPU streaming and invoking client API activities directly from other client
APIs without CPU intervention
• Separation of traditional context/surface functionality from “hub” functionality
• Support for new Khronos APIs where appropriate
- Streaming video + image processing + display use case
© Copyright Khronos Group 2014 - Page 40
Khronos APIs for Augmented Reality
AR needs not just advanced sensor processing, vision
acceleration, computation and rendering - but also for
all these subsystems to work efficiently together
MEMS
Sensors
Sensor
Fusion
Application
on CPUs, GPUs
and DSPs
Vision
Processing
Precision timestamps
on all sensor samples
Advanced Camera
Control and stream
generation
Audio
Rendering
EGLStream stream data
between APIs
3D Rendering and Video
Composition
On GPU
© Copyright Khronos Group 2014 - Page 41
Summary
• Khronos is building a family of interoperating APIs for portable and
power-efficient vision processing
• OpenVX 1.0 has been provisionally released and non-members are invited to
provide feedback on the forums
- http://www.khronos.org/message_boards/forumdisplay.php/110-OpenVX-General
• OpenKCAM and StreamInput APIs are currently in design and complement and
integrate with OpenVX
• Any company is welcome to join Khronos to influence the direction of mobile
and embedded vision processing!
- $15K annual membership fee for access to all Khronos API working groups
- Well-defined IP framework protects your IP and conformant implementations
• www.khronos.org
- [email protected]
© Copyright Khronos Group 2014 - Page 42