Role of Data in Healthcare

The Role of Data in Healthcare
Innovation
Abel Kho MD, MS
[email protected]
May 2nd 2014
Outline
• Electronic Health Records (EHRs) as an
enabling data platform
• Genetic studies (bench to bedside)
• Population studies “Planting trees to see the
forest”
• Addressing data privacy
Changes In Adoption Of Basic And Comprehensive
EHR
DesRoches CM, Charles D, Furukawa MF, et al. (2013) Adoption of Electronic Health Records Grows Rapidly, But Fewer
Than Half of US Hospitals Had At Least A Basic System in 2012. Health Aff (Millwood). 2013;32(8)
EHR Adoption and Meaningful Use
• 1,807 providers (goal 1,486)
• 1,751 (118 %) of our enrolled
providers are live on their EHR
products
• 1,228 (83%) of our enrolled
providers have achieved MU
• CHITREC enrolled providers
have received almost $20M in
EHR Incentive program funds
Ddd
Ddd
Coordinating
Center
Type II Diabetes Case Algorithm
* Abnormal lab= Random glucose > 200mg/dl, Fasting glucose > 125 mg/dl, or
hemoglobin A1c ≥6.5%.
Type II Diabetes Control Algorithm
Mega-Analysis (adjusted)
TCF7L2
3,353 cases
3,352 controls
Kho et al. Use of diverse electronic medical record systems to identify genetic risk for
type 2 diabetes within a genome-wide association study. JAMIA 2012
eMERGE Sample Size
eMERGE I
eMERGE
I & II
eMERGE II
Participants
Participants Genotyped (Enrolled/ Genotyped Genotyped
Targeted)
GHC/UW
2,820
2,789
3,561
786
3,575
Marshfield
20,000
4,210
20,000
777
4,987
Mayo
3,769
3,755
19,000
3,185
6,940
NU
10,500
1,907
10,500
3,055
4,962
VU
70,000
6,055
140,000
27,173
33,228
Geisinger
N/A
N/A
19,650
4,191
4,191
Mt. Sinai
N/A
N/A
21,000
16,000
16,000
CCMC/CHB
N/A
N/A
40,051
5,586
5,586
CHOP
N/A
N/A
40,000
8,000
8,000
107,089
18,716
313,762
68,753
87,469
Distributed Common Identity For Integration of
Regional Health Data (DCIFIRHD)
HealthLNK Data Description
HealthLNK Patient Count
8.0
7.0
Patients, in millions
6.0
5.0
4.0
3.0
2.0
Total Chicago Patients
1.0
0.0
Non-deduplicated
De-duplicated
Visit Data
Chicago Only
n=1,492,144
% White
408,241 (27.4%)
% Black
521,972 (35.0%)
% Asian
49,597 (3.3%)
% American Indian / Alaska Native
15,780 (1.1%)
% Pacific Islander
3,168 (0.2%)
% Other/Unknown/Declined
350,805 (23.5%)
% Hispanic (Ethnicity)
247,231 (16.6%)
Median Age (in Years)
42
Sample size/cohort comparison,
by residential ZIP code,
BRFSS* vs. HealthLNK
Source
IL BRFSS, Chicago
2011 respondents
HealthLNK, patient
with 2010 visit
Min
4
Median Mean Max
15
16
33
1,339 10,031 9,270 21,289
*CDC Behavioral Risk Factor Surveillance System survey, Chicago
sub-sample from Illinois dataset.
Diabetes prevalence estimate
by residential ZIP
Percent=
# of patients with > 1 diabetes mellitus diagnosis code
or lab criteria met
# of patients with visit in 2006-2010
The amount of variability inside a zip code
can be as much or more than between zip codes
Pah AR, Behrens JJ, Goel S, Kho
AN. Unzipping Zip Codes:
A Methodology to Assign Deidentified Health Data to Smaller
Geographic Localities. AMIA CRI
2014.
Difference in median household income from ACS
Need to disaggregate patient data from
zip code to evaluate small area effects
•
•
•
Input data:
•
Patient records with demographics (age, gender, race)
•
Census data at block group level
Methodology:
•
Monte Carlo simulation to distribute patient cases
•
Fit simulation data with semi-variogram
•
Create Kriged surface using semi-variogram
Output is probabilistic patient case contour map
Example: Diabetes in Chicago
Health records from 7 healthcare institutions in Chicago1
Examining Diabetes cases (Type 1 and 2)2 from 2010
190,069 total cases
Population data from Census 2010
1HealthLNK
2A.
— Northwestern University
Elixhauser, C. Steiner, D. R. Harris, and R. M. Coffey. Comorbidity Measures for Use with Administrative Data. Medical Care, 36(1):8–27, January 1998.
Probabilistic maps from simulation
Simulation alone
Raw data
Probabilistic maps from simulation
Simulation alone
Simulation + Kriged surface
But how good is it?
Scraped data related to houses for sale from Zillow
A house has features we use similarly to demographics:
Beds
Baths
Price
And the exact address of the house to use in quantifying
performance
Apply the same methodology to 656 houses across 9 zip codes
Looking at housing data
Advantages of this method
• Aggregation at the zip code can obscure small area effects
• Re-capture this detail using probabilistic methods without
requiring detailed patient health information
• Produces finer spatial resolution resulting in “hot-spot”
detection
• Ability to re-aggregate to meaningful geographic areas (i.e.
community area)
MC portion is available at:
https://bitbucket.org/adamrpah/geographic-record-disaggregation
GIS portion coming shortly
Addressing Data Privacy
HIPAA Expert Determination
(abridged)
Certify via “generally accepted statistical and
scientific principles & methods, that the risk
is very small that the information could be
used, alone or in combination with other
reasonably available information, by the
anticipated recipient to identify the subject
of the information.”
26
Summary
• Comprehensive capture of contiguous EHR
data can be a powerful engine for new
discoveries
• Methods exist to “unlock” data from where it
resides while protecting privacy/identity
Acknowledgements
• Northwestern University: Katie Jackson, Jess Behrens, Adam Pah, Sara
Lake, Satyender Goel
• UIC: Bill Galanter, John Lazaro, Denise Hynes, Neil Bahroos, Jerry Krishnan
• University of Chicago Medical Center: David Meltzer Chris Lyttle, Ben
Vekhter
• Cook County Hospital and Clinics: Bill Trick, Amanda Grasso
• Alliance of Chicago: Erin Kaleba, Andrew Hamilton, Fred Rachman
• Rush University Medical Center: Bala Hota, Shannon Sims
• Loyola University: Ron Price, Rich Kennedy
• Vanderbilt University: Brad Malin
• UIC Intern team: Ariadna Garcia, Pravin Babu Karuppaiah, Shazia Sathar,
Ulas Keles (Sid Battacharya, Faculty mentor)
• Becker Friedman Institute: Jörn Boehnke, John Eric Humphries, Scott
Kominers (Harvard)
The Role of Data in Healthcare
Innovation
Abel Kho MD, MS
[email protected]
May 2nd 2014