WSI Search Algorithm Update: - Digital Pathology Association

WSI Search Algorithm Update:
Short and intermediate term
capabilities with CBIR search
as an exemplar
Ulysses J. Balis, M.D.
Professor of Pathology
Director, Division of Pathology Informatics
Department of Pathology
University of Michigan
[email protected]
Disclosure
• Consultant: 2DP, Inc.
• Consultant: Lorxx,
LLC
However, no content
in this presentation is
related to either of
these entities.
@dpatweet
#PV14
Outline
•
•
•
•
•
Some predictions (both short and long-term)
General overview of image search technology
Overview of a specific exemplar
Some demonstrations
Review of automation in AP as an enabler of the
overall digital adoption lifecycle
• Closing thoughts
@dpatweet
#PV14
The Present (WSI)
• Early-mid point on the technology adoption curve
• WSI being utilized for “niche” areas with 100%
workflow remaining somewhat elusive, owing to
perceived lack of cost effectiveness combined
with limited reimbursement models
• Experience with 100% workflow yet to be
accumulated
• Adoption rate still shows opportunity for growth
@dpatweet
#PV14
2019
• Early adopters will demonstrate feasibility of
100% digital workflow
• FDA certification issue settled (US)
• Some inroads in shared repositories for
teaching and consultation (mostly regional)
• No use of WSI data for diagnostic prescreening
• Reimbursement models incomplete
@dpatweet
#PV14
2024
• Initial clinical trials of digital WSI pre-screening
• LCM seamlessly integrated into clinical workflow models in support of
personalized medicine
• National and international shared WSI repositories increase in popularity
and use.
• Cloud-based modality workflow models begin to emerge as an improved
model for delivery of high-quality surgical pathology diagnostics
• Pathology completes the transition to all-digital WSI workflow
• Data standards issues are resolved
• Reimbursement models mature
• Emergence in Japan and Europe of completely automated turnkey
diagnostic solutions, addressing dire pathologist shortages
• Training programs midway in retooling to teach digital diagnostic
techniques and skills
@dpatweet
#PV14
2034
• Pathology completes the transition to national and
international models for modality-specific workflow
• Global WSI practice consortia now common
• Advanced computational diagnostics / image analysis
skills are now an integral component of surgical
pathology skills / routine workflow
• Fusion of pathology digital image data with other highthroughput modalities is well under way
• Training programs mature in their incorporation of
digital WSI content into curricula
• Interstate / Inter-provincial and International
reciprocity issues for pathologist licensing resolved.
@dpatweet
#PV14
2042
• Singularity of all digital imaging and high
throughput modalities
• Pathologists serve as the anchoring point for
high-throughput testing /reporting with digital
imaging continuing to serve as a high-utility tool
• Near total maturity of personalized medicine
• Development of personalized diagnostic devices,
as a feature set of the now-ubiquitous PDA
• Computational Singularity
@dpatweet
#PV14
@dpatweet
#PV14
@dpatweet
#PV14
@dpatweet
#PV14
@dpatweet
#PV14
What just took place?
• Human Cognitive
Perspective
– About 30 seconds of
focused viewing
– 25 seconds of feature
extraction, image
segmentation and
individual feature
classification
– 20 seconds of cognitive
review of long-term
memory, in order to reach
a decision
– Perception of linear
experience
@dpatweet
• Equivalent Computational
Perspective
– 1800 individual 25+
megapixel image field
analysis operations
– 40+ Gb of image data
transfer
– ~1015 distinct image
feature comparisons
representing ~1020
PetaFLOP computational
throughput
– Thoroughly parallel
computational operation
#PV14
Human Cognitive Image
Recognition/Classification
• Remarkably streamlined / real-time task
• Seamlessly blends segmentation, feature
recognition, feature prioritization, and then
statistical hierarchical classification into a
unified effortless process
• Very efficient/accurate for qualitative
assessment
• Less accurate for quantitative assessment
@dpatweet
#PV14
Various Use Cases of Image Searching In
Pathology
• Find all cases in a repository that match the current
case, based on a region of interest
• Extract associated metadata from matching images,
providing some measure of equivalence or prediction
for the current case (all quantitative operations)
–
–
–
–
Biological potential of malignancy (e.g. survival)
Extracted Kaplan-Meyer statistics
Historical responses to therapeutic agents
Association with genomic data already known for the
image-matched cohort of cases (in essence, the
constitutive image features can become a proxy for
previously established multi-dimensional correlates
between morphology and the molecular basis of disease
@dpatweet
#PV14
So why is image-based query so
difficult?
@dpatweet
#PV14
1. Multiple Concurrently Informative
Length Scales
@dpatweet
#PV14
The immense length scale range of WSI Data…
@dpatweet
@dpatweet
#PV13
#PV14
2. Potentially
Large Image Repository Size
that
Exceeds both human cognition and
computational throughput
(at least, without special software and tools)
@dpatweet
#PV14
Supercomputing
Threshold
Population-Level NGS Time-Series Data
NGS Time-Series Data
1020 data elements
Single NGS Data Set
Library of Whole Slide Images
1012
1015
data elements
data elements
1011 data elements
Single Whole Slide Image Data
Expression Array Data
Tissue Microarray Study
Expression Data
Time-Series Routine Lab
Studies
Comprehensive Chemistry +
CBC
Chem 7
Single Analyte
108 data
elements
105 data elements
103 - 104 data elements
250 Data Elements: encoded data including prognostic data likely present
28 Data Elements: encoded data present beyond human cognitive limit
7 data elements: limit of experiential threshold of encoded data extraction
1 data element: Simple linear inference model
@dpatweet
Threshold for
complete cognitive
data extraction
#PV14
3. A contemporary cohort of
practitioners in pathology that are
generally unaware of machine vision
techniques and numerical methods that
be applied to digital image subject
matter.
@dpatweet
#PV14
Some Salient Technology History
@dpatweet
#PV14
Corona Satellite Image Program
(1959-1972)
Film based, but with digital assistance
its latest phase.
The challenge of image search as
experienced with this project provided
the first insights as to the difficulty of
this type of computational problem.
@dpatweet
#PV14
Modern Remote Sensing Era
(1972-present)
@dpatweet
#PV14
@dpatweet
#PV14
Contemporary Commoditized
Offerings in Image-Based Search
@dpatweet
#PV14
@dpatweet
#PV14
Contemporary “Image Search”
tools are looking for a scaled
replica of an exact match. Any
deviation from the original will
cause the search algorithm to fail.
In this case, the algorithm
succeeds because we have
provided an exact match to an
image that it has already classified
As histology is not an exercise of
making exact matches, the entire
contemporary generation of image
search tools is ineffective for
querying histology repositories.
@dpatweet
#PV14
In this case, the algorithm fails
because we have provided a novel
image which has not been previously
encountered.
The natural human concept of
“likeness” is not yet encapsulated in
this algorithm.
“Likeness” is an essential and desirable
feature of any ultimately useful
histology image search algorithm.
@dpatweet
#PV14
Searching Libraries of Pathology
Images with Images
•
The availability of digital whole slide data sets
represent an enormous opportunity to carry out new
forms of numerical and data-driven query, in search
modes that are not based on textual, ontological or
lexical matching
–
–
Extraction from Image
repositories based upon
spatial information
–
•
…001011010111010111..
Search image repositories with whole images or image regions of
interest
Carry our search in real-time via use of scalable computational
architectures
Higher order space bioinformatics searches can finally include
quantitative histology (e.g. combined search of histology, radiology
and genomic repositories offers an significant potential for enhanced
statistical power)
Known as Content-Based Image Retrieval (CBIR)
or
Analysis of data
in the digital domain
@dpatweet
Resultant Heat Map with gallery of matching images
#PV14
and associated diagnostic / decision support data
Colon Cancer
@dpatweet
#PV14
Automatic Detection of Malignant Tissue
@dpatweet
#PV14
Automatic Detection of Benign Tissue
@dpatweet
#PV14
Automation in the Collective AP Labs…
@dpatweet
#PV14
@dpatweet
#PV14
@dpatweet
#PV14
@dpatweet
#PV14
@dpatweet
#PV14
@dpatweet
#PV14
@dpatweet
#PV14
@dpatweet
#PV14
@dpatweet
#PV14
@dpatweet
#PV14
@dpatweet
#PV14
Demonstrations
@dpatweet
#PV14
Closing Thoughts
• Image search for histology is no more than three years
away from production use.
• With image search capability will come the added
benefits of prior cases metadata aggregation and
analytics, enabling:
– Survival estimation
– Biological potential
– Response to therapy potential
Acknowledgement and thanks: Jerome Cheng, Jason Hipp, and John Blau
@dpatweet
#PV14
@dpatweet
#PV14