Open Standard APIs for Vision and Camera Processing Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group © Copyright Khronos Group 2014 - Page 1 Khronos Connects Software to Silicon Open Consortium creating ROYALTY-FREE, OPEN STANDARD APIs for hardware acceleration Defining the roadmap for low-level silicon interfaces needed on every platform Graphics, compute, rich media, vision, sensor and camera processing Rigorous specifications AND conformance tests for crossvendor portability Acceleration APIs BY the Industry FOR the Industry Well over a BILLION people use Khronos APIs Every Day… © Copyright Khronos Group 2014 - Page 2 Khronos Standards 3D Asset Handling - 3D authoring asset interchange - 3D asset transmission format with compression Visual Computing - 3D Graphics - Heterogeneous Parallel Computing Over 100 companies defining royalty-free APIs to connect software to silicon Acceleration in HTML5 Sensor Processing - 3D in browser – no Plug-in - Heterogeneous computing for JavaScript - Vision Acceleration - Camera Control - Sensor Fusion © Copyright Khronos Group 2014 - Page 3 Visual Computing = Graphics PLUS Vision Vision Processing Imagery Enhanced sensor and vision capability deepens the interaction between real and virtual worlds Data Graphics Processing Real-time GPU Compute Research project on CUDA-enabled laptop High-Quality Reflections, Refractions, and Caustics in Augmented Reality and their Contribution to Visual Coherence P. Kán, H. Kaufmann, Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Austria https://www.youtube.com/watch?v=i2MEwVZzDaA © Copyright Khronos Group 2014 - Page 4 Mobile Visual Computing = New Experiences Need for advanced sensors and the GPU throughput to process them Computational Photography and Videography Face, Body and Gesture Tracking 3D Scene/Object Reconstruction Augmented Reality © Copyright Khronos Group 2014 - Page 5 Vision Pipeline Challenges and Opportunities Growing Camera Diversity Diverse Vision Processors Sensor Proliferation Capturing color, range and lightfields Driving for high performance and low power Diverse sensor awareness of the user and surroundings • Light / Proximity • 2 cameras • 3 microphones • Touch • Camera sensors >20MPix • Novel sensor configurations • Stereo pairs • Plenoptic Arrays • Active Structured Light • Active TOF Flexible sensor and camera control to generate required image stream • Camera ISPs • Dedicated vision IP blocks • DSPs and DSP arrays • Programmable GPUs • Multi-core CPUs Use best processing available for image stream processing – with code portability 19 • Position - GPS - WiFi (fingerprint) - Cellular trilateration - NFC/Bluetooth Beacons • Accelerometer • Magnetometer • Gyroscope • Pressure / Temp / Humidity Control/fuse vision data by/with all other sensor data on device © Copyright Khronos Group 2014 - Page 6 Vision Processing Power Efficiency • Depth sensors = significant processing - Generate/use environmental information Advanced Sensors • Wearables will need ‘always-on’ vision - With smaller thermal limit / battery than phones! • GPUs has x10 CPU imaging power efficiency - GPUs architected for efficient pixel handling • Traditional cameras have dedicated hardware - ISP = Image Signal Processor – on all SOCs today • Potential for dedicated sensor/vision silicon - Can trigger full CPU/GPU complex But how to program specialized processors? Performance and Functional Portability X100 Power Efficiency • SOCs have space for more transistors - But can’t turn on at same time = Dark Silicon Wearables X10 X1 Dedicated Hardware GPU Compute Multi-core CPU Computation Flexibility © Copyright Khronos Group 2014 - Page 7 OpenVX – Power Efficient Vision Acceleration • Out-of-the-Box vision acceleration framework - Low-power, real-time, mobile and embedded • Performance portability for diverse hardware - ISPs, Dedicated vision blocks, DSPs and DSP arrays, GPUs, Multi-core CPUs • Suited for low-power, always-on acceleration - Can run solely on dedicated vision hardware • Foundational API for vision acceleration - Can be used by middleware or applications Application OpenCV open source library Other higher-level CV libraries • Complementary to OpenCV - Which is great for prototyping • Khronos open source sample implementation - To be released with final specification Open source sample implementation Hardware vendor implementations © Copyright Khronos Group 2014 - Page 8 OpenVX Graphs – The Key to Efficiency • Vision processing directed graphs for power and performance efficiency - Each Node can be implemented in software or accelerated hardware - Nodes may be fused by the implementation to eliminate memory transfers - Processing can be tiled to keep data entirely in local memory/cache • VXU Utility Library for access to single nodes - Easy way to start using OpenVX by calling each node independently • EGLStreams can provide data and event interop with other Khronos APIs - BUT use of other Khronos APIs are not mandated Native Camera Control OpenVX Node OpenVX Node OpenVX Node OpenVX Node Downstream Application Processing Example OpenVX Graph © Copyright Khronos Group 2014 - Page 9 OpenVX 1.0 Function Overview • Core data structures - Images and Image Pyramids - Processing Graphs, Kernels, Parameters • Image Processing - Arithmetic, Logical, and statistical operations - Multichannel Color and BitDepth Extraction and Conversion - 2D Filtering and Morphological operations - Image Resizing and Warping • Core Computer Vision - Pyramid computation - Integral Image computation • Feature Extraction and Tracking - Histogram Computation and Equalization - Canny Edge Detection - Harris and FAST Corner detection - Sparse Optical Flow Widely used extensions adopted into future versions of the core OpenVX Specification Evolution OpenVX 1.0 defines framework for creating, managing and executing graphs Focused set of widely used functions that are readily accelerated Implementers can add functions as extensions © Copyright Khronos Group 2014 - Page 10 Example Graph - Stereo Machine Vision OpenVX Graph Camera 1 Stereo Rectify with Remap Camera 2 Stereo Rectify with Remap Compute Depth Map (User Node) Detect and track objects (User Node) Image Pyramid Object coordinates Compute Optical Flow Delay Tiling extension enables user nodes (extensions) to also optimally run in local memory © Copyright Khronos Group 2014 - Page 11 OpenVX and OpenCV are Complementary Governance Community driven open source with no formal specification Formal specification defined and implemented by hardware vendors Conformance No conformance tests for consistency and every vendor implements different subset Full conformance test suite / process creates a reliable acceleration platform Portability APIs can vary depending on processor Hardware abstracted for portability Scope Very wide 1000s of imaging and vision functions Multiple camera APIs/interfaces Tight focus on hardware accelerated functions for mobile vision Use external camera API Efficiency Memory-based architecture Each operation reads and writes memory Graph-based execution Optimizable computation, data transfer Use Case Rapid experimentation Production development & deployment © Copyright Khronos Group 2014 - Page 12 OpenVX Participants and Timeline • Provisional 1.0 specification released November 2013 for industry feedback - An update to the provisional spec published in July • OpenVX 1.0 final release planned for 2014 - With conformance tests • Itseez is working group chair (the convener of OpenCV) - Qualcomm and TI are specification editors © Copyright Khronos Group 2014 - Page 13 NVIDIA VisionWorks Uses OpenVX • VisionWorks library contains diverse vision and imaging primitives • Leverages OpenVX for optimized primitive execution • Can extend VisionWorks nodes through CUDA accelerated primitives Applications and Middleware • Provided with sample library of fully accelerated pipelines Vision Pipeline Samples Object Detection … SLAM 3rd Party Pipelines VisionWorks Framework VisionWorks Primitives Classifier Corner Detection … 3rd Party CUDA Libraries Tegra K1 © Copyright Khronos Group 2014 - Page 14 OpenCL – Portable Heterogeneous Computing • Portable Heterogeneous programming of diverse compute resources - Targeting supercomputers -> embedded systems -> mobile devices • One code tree can be executed on CPUs, GPUs, DSPs and hardware - Dynamically interrogate system load and balance work across available processors • OpenCL = Two APIs and C-based Kernel language - Platform Layer API to query, select and initialize compute devices - Kernel language - Subset of ISO C99 + language extensions - C Runtime API to build and execute kernels OpenCL across multiple devices Kernel OpenCL Code Kernel OpenCL Code Kernel OpenCL Code Kernel Code GPU DSP HW CPU CPU © Copyright Khronos Group 2014 - Page 15 OpenCL as Parallel Language Backend JavaScript binding for initiation of OpenCL C kernels Language for image processing and computational photography MulticoreWare open source project on Bitbucket Embedded array language for Haskell Java language River Trail extensions Language for extensions to parallelism JavaScript Compiler directives for Fortran, C and C++ PyOpenCL Python wrapper around OpenCL Harlan High level language for GPU programming SPIR Standard Portable Intermediate Representation (extending LLVM for parallel computation) SPIR 2.0 Released here at SIGGRAPH OpenCL provides vendor optimized, cross-platform, cross-vendor access to heterogeneous compute resources © Copyright Khronos Group 2014 - Page 16 Mixamo - Avatar Videoconferencing • Real time facial animation capture on mobile – ported directly from PC • Animate an avatar while conferencing • Full GPU acceleration of vision processing using OpenCL NVIDIA Tegra K1 Development Board © Copyright Khronos Group 2014 - Page 17 Khronos APIs for Vision Processing GPU Compute Shaders (OpenGL 4.X and OpenGL ES 3.1) Pervasively available on almost any mobile device or OS Easy integration into graphics apps – no compute API interop needed Program in GLSL not C Limited to acceleration on a single GPU General Purpose Heterogeneous Programming Framework Flexible, low-level access to any devices with OpenCL compiler Single programming and run-time framework for CPUs, GPUs, DSPs, hardware Open standard for any device or OS – being used as backed by many languages and frameworks Needs full compiler stack and IEEE precision Out of the Box Vision Framework - Operators and graph framework library Can run on dedicated hardware – no compiler needed Easier performance portability to diverse hardware Suited for low-power, always-on acceleration Fixed set of operators – but can be extended It is possible to use OpenCL or GLSL to build OpenVX Nodes on programmable devices © Copyright Khronos Group 2014 - Page 18 Kari Pulli, NVIDIA Research © Copyright Khronos Group 2014 - Page 19 Advanced Camera Control Use Cases • High-dynamic range (HDR) and computational flash photography - High-speed burst with individual frame control over exposure and flash • Subject isolation and depth detection - High-speed burst with individual frame control over focus • Rolling shutter elimination - High-precision intra-frame synchronization between camera and motion sensor • Augmented Reality - 60Hz, low-latency capture with motion sensor synchronization - Multiple Region of Interest (ROI) capture - Synchronized stereo sensors for scene scaling - Detailed feedback on camera operation per frame • Time-of-flight or structured light depth camera processing - Aligned stacking of data from multiple sensors © Copyright Khronos Group 2014 - Page 20 Typical Imaging Pipeline Lens, sensor, aperture control Pre-processing Image Signal Processor (ISP) Bayer Postprocessing App RGB YUV CMOS sensor Color Filter Array Lens • Pre-processing is non-existent in basic use-cases • Pre- and Post-processing can be done on CPU, GPU, DSP… • ISP controls camera via 3A algorithms Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF) © Copyright Khronos Group 2014 - Page 21 High Dynamic Range (HDR) • HDR works by combining differing exposures into the same image • A variety of methods for HDR, based on application - Multiple frame HDR (requires frame memory) - Interlace HDR - Multiple Zone HDR Short exposure Optional mid exposure Long exposure HDR processing • HDR requires - Precise control over camera parameters (exposure) - Fast capture and processing of multiple images - Note: with interlace HDR, only 1 image is needed © Copyright Khronos Group 2014 - Page 22 Image stitching, panoramic images • Made with • Requires processing of multiple images • Requires position / geometry information • Requires control of camera (e.g. AE lock) © Copyright Khronos Group 2014 - Page 23 Typical Burst Sequence Applications © Copyright Khronos Group 2014 - Page 24 Pipelined Sensor Model • Traditional one-shot sensor model - Need to know which parameters were used - reset pipeline between shots slow • Viewfinding / video mode: - Pipelined, high frame rate - Settings changes take effect later • Need new model for Computational Photography - Need parameterized SEQUENCE of images to feed advanced algorithms • Real image sensors are pipelined - While one frame exposing - Next one is being prepared - Previous one is being read out © Copyright Khronos Group 2014 - Page 25 Need for Camera Control API - OpenKCAM • Advanced control of ISP and camera subsystem – with cross-platform portability - Generate sophisticated image stream for advanced imaging & vision apps • No platform API currently fulfills all developer requirements - Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays - Cross sensor synch: e.g. synch of camera and MEMS sensors - Advanced, high-frequency per-frame burst control of camera/sensor: e.g. ROI - Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing Defines control of Sensor, Color Filter Array Lens, Flash, Focus, Aperture Auto Exposure (AE) Auto White Balance (AWB) Auto Focus (AF) Image Signal Processor (ISP) EGLStreams Image/Vision Applications © Copyright Khronos Group 2014 - Page 26 OpenKCAM API Requirements • Provide functional portability for advanced camera applications - Reduce extreme fragmentation for ISVs wanting more than point and shoot • Application control over ISP processing (including 3A) - Including multiple, re-entrant ISPs • Control multiple sensors with synch and alignment - E.g. Stereo pairs, Plenoptic arrays, TOF or structured light depth cameras • Enhanced per frame detailed control - Format flexibility, Region of Interest (ROI) selection • Global timing & synchronization - E.g. Between cameras and MEMS sensors • Flexible processing/streaming - Multiple input and output streams with RAW, Bayer or YUV Processing - Streaming of rows (not just frames) Enable advanced camera functionality not available on current platforms © Copyright Khronos Group 2014 - Page 27 OpenKCAM is FCAM-based • FCAM (2010) Stanford/Nokia, open source • Capture stream of camera images with precision control - A pipeline that converts requests into image stream - All parameters packed into the requests - no visible state - Programmer has full control over sensor settings for each frame in stream • Control over focus and flash - No hidden daemon running • Control ISP - Can access supplemental statistics from ISP if available • No global state - State travels with image requests - Every pipeline stage may have different state - Enables fast, deterministic state changes © Copyright Khronos Group 2014 - Page 28 OpenKCAM Design Philosophy • C-language API starting from proven designs - e.g. FCAM, Android camera platform • Design alignment with widely used hardware standards - e.g. MIPI CSI • Focus on mobile, power-limited devices - But do not preclude other use cases such as automotive, DSLR… • Minimize overlap and maximize interoperability with other Khronos APIs - But other Khronos APIs are not required • Provide support for vendor-specific extensions © Copyright Khronos Group 2014 - Page 29 Potential Adoption on Android • Android Exposes Java camera APIs to developers - Controls underlying Camera HAL • Camera HAL v1 API simplified basic point and shoot apps - Difficult or impossible to do much else • Camera HAL v3 API is a fundamentally different API - Streams-based to enable more sophisticated camera applications OpenKCAM builds on FCAM with a goal of being forward compatible with Android architecture Camera API Open source project developed by Nokia and Stanford HAL V3 adopts many FCAM ideas and can use EGL in its implementation OpenKCAM may be used to IMPLEMENT Android Camera HAL – and provide an advanced native camera API in NDK © Copyright Khronos Group 2014 - Page 30 Participating Companies and Milestones Group charter approved Specification ratification 3Q14 Apr13 Jul13 Sample implementation and tests 1Q15 © Copyright Khronos Group 2014 - Page 31 OpenKCAM Working Group • Royalty free API for portable access to advanced mobile camera functionality - Reduce fragmentation and encourage more advanced camera applications • Control for the new wave of sensors to enable advanced imaging and vision - Multiple sensors, depth cameras, synchronized sensors • Provide sophisticated camera functionality not available on today’s platforms - But work to enable easy adoption by platform vendors • Eager to contribute? Join Khronos OpenKCAM WG! - http://www.khronos.org/camera • Mikaël Bourges-Sévenier - [email protected] © Copyright Khronos Group 2014 - Page 32 Neil Trevett, NVIDIA © Copyright Khronos Group 2014 - Page 33 Sensor Industry Fragmentation … © Copyright Khronos Group 2014 - Page 34 Sensor Types • Basic sensor data: - Acceleration, Magnetic Field, Angular Rates - Pressure, Ambient Light, Proximity, Temperature, Humidity, RGB light, UV light - Heart rate, Blood Oxygen Level, Skin Hydration, Breathalyzer • Sensor fusion - Orientation (Quaternion or Euler Angles), Gravity, Linear Acceleration - Position • Context awareness - Device Motion: general movement of the device: still, free-fall, … - Carry: how the device is being held by a user: in pocket, in hand, … - Posture: how the body holding the device is positioned: standing, sitting, step, … - Transport: about the environment around the device: in elevator, in car, … © Copyright Khronos Group 2014 - Page 35 Low-level Sensor Abstraction API Apps request semantic sensor information StreamInput defines possible requests, e.g. Read Physical or Virtual Sensors e.g. “Game Quaternion” Context detection e.g. “Am I in an elevator?” Apps Need Sophisticated Access to Sensor Data Advanced Sensors Everywhere Without coding to specific sensor hardware Sensor Discoverability Sensor Code Portability Multi-axis motion/position, quaternions, context-awareness, gestures, activity monitoring, health and environmental sensors StreamInput processing graph provides optimized sensor data stream High-value, smart sensor fusion middleware can connect to apps in a portable way Apps can gain ‘magical’ situational awareness © Copyright Khronos Group 2014 - Page 36 StreamInput: Platform Integration Applications OS Sensor APIs Middleware (E.g. Android SensorManager or iOS CoreMotion) (E.g. Context-awareness engines, gaming engines) Flexible native API to integrate where needed depending on existing platform sensor stacks Low-level native API defines portable access to fused sensor data stream and context-awareness Sensor Sensor … Sensor Sensor Hub Hub © Copyright Khronos Group 2014 - Page 37 Sensor OSP Announcement • Proposal to converge OSP (Open Sensor Platform) APIs with StreamInput - Sensor Platforms is StreamInput Spec Editor © Copyright Khronos Group 2014 - Page 38 EGL 1.5 Released at GDC 2014 • EGL 1.5 brings functionality from multiple extensions into core - Increased reliability and portability • EGLImages - Sharing textures and renderbuffers Applications API Interop EGL provides efficient transfer of data and events between Khronos APIs • Context Robustness - Defending against malicious code • EGLSync objects - Improved OpenGL /OpenCL interop • Platform extensions - Standardized interactions for multiple OS e.g. Android and 64-bit platforms • sRGB colorspace rendering Application Portability EGL abstracts graphics context management, surface and buffer binding and rendering synchronization OS and Display Platforms © Copyright Khronos Group 2014 - Page 39 Potential EGL Future Directions • EGLImageStream extensions are very powerful today - But need wider implementation in drivers - Stream other types of data – unformatted buffers for metadata and more - GPU-to-GPU streaming and invoking client API activities directly from other client APIs without CPU intervention • Separation of traditional context/surface functionality from “hub” functionality • Support for new Khronos APIs where appropriate - Streaming video + image processing + display use case © Copyright Khronos Group 2014 - Page 40 Khronos APIs for Augmented Reality AR needs not just advanced sensor processing, vision acceleration, computation and rendering - but also for all these subsystems to work efficiently together MEMS Sensors Sensor Fusion Application on CPUs, GPUs and DSPs Vision Processing Precision timestamps on all sensor samples Advanced Camera Control and stream generation Audio Rendering EGLStream stream data between APIs 3D Rendering and Video Composition On GPU © Copyright Khronos Group 2014 - Page 41 Summary • Khronos is building a family of interoperating APIs for portable and power-efficient vision processing • OpenVX 1.0 has been provisionally released and non-members are invited to provide feedback on the forums - http://www.khronos.org/message_boards/forumdisplay.php/110-OpenVX-General • OpenKCAM and StreamInput APIs are currently in design and complement and integrate with OpenVX • Any company is welcome to join Khronos to influence the direction of mobile and embedded vision processing! - $15K annual membership fee for access to all Khronos API working groups - Well-defined IP framework protects your IP and conformant implementations • www.khronos.org - [email protected] © Copyright Khronos Group 2014 - Page 42
© Copyright 2024 ExpyDoc