Design Considerations for Visualizing Large EMR Data

Design Considerations for Visualizing Large EMR Data
Jianping Li, Chun-Fu Wang, and Kwan-Liu Ma Fellow, IEEE
University of California, Davis
Abstract— The explosive growth of electronic medical data (EMR) has led to a major unmet need for efficient data retrieval, presentation, analysis, and decision support. We have been developing and evaluating visual-based strategies for efficient organization and
analysis of population medical data from multiple sources to study a particular disease and associated treatments and outcomes. We
believe the resulting medical record review system will significantly enhance the ability of physicians and clinical researchers to utilize
the massive EMR data available to them for improving overall health care.
Index Terms—Data management, visual analytics, interactive visualization, EMR,
1
I NTRODUCTION
Electronic Medical Records (EMRs) provide rich data about individual patients and the opportunity to study a particular disease on a large
population of patients. However, current EMR systems do not provide
adequate support for fully exploiting the data. The growing size, high
dimensionality, heterogeneity and complexity of EMRs demand a new
set of tools for extracting information of interest from the data. Interactive visualization is one such tool [11] and this paper discusses some
of the design considerations for building a visual analysis tool to study
large EMR data.
A basic EMR system consists of two parts, namely the database
(back end) and interface (front end). The interface allows physicians
to access the database, and is typically graphical and based on multiple
tabs with lists of information along with filters for refining searches.
The interface closely models the underlying database, as opposed to
the tasks that physicians must perform; that is, the interface is built
with the data in mind, not the physician. For instance, physicians often
want to know when a chronic disease was first diagnosed, which seems
like a simple task, but in practice this requires manually searching
through many electronic notes. There are many other examples of
conceptually simple tasks that require extensive work on the part of
the physician. Therefore, a key observation is that there is a large gap
between the high-level tasks that physicians perform and the low-level
database access that current EMR systems provide. This results in a
relatively large work load for a physician reviewing patient records.
The core problem is that physicians have very limited time to perform
this extensive review. Therefore, designing a task-oriented process
effectively supported by visual summarization of relevant data and an
intuitive, easy-to-use user interface is an essential topic.
Supporting a variety of interactive operations on very large and
noisy temporal event data is challenging. Many existing visualization
systems assume all the data can fit in memory, which is no longer feasible. Previous works to address the big data aspect of EMRs use data
warehouse methodologies, where data from multiple EMR systems are
first collected in a central relational database and then data marts are
constructed according to analytics applications for fast access. Such
a strategy, however, does not scale with the data size. Furthermore,
existing EMR data review systems do not support interactive visualization tasks well. In fact, for visual analytics applications in genreal,
considerations of data organization for efficient visual-based querying
and interaction are lacking. How to manage and prepare EMR data to
better support common analysis and visualization tasks is thus also an
essential topic.
Proc. of IEEE VIS 2014 Workshop on Visualization of Electronic Health
Records. Copyright retained by the authors.
E-mail: [email protected], [email protected], and
[email protected].
While previous research efforts have addressed many issues in visualizing patient records and temporal sequence data [10, 13, 3, 2, 8, 14,
7, 6], Lifelines2 [13] expedites the pattern finding process by aligning records by a sentinel event, where PatternFinder [9] increases the
query capability by introducing chained events and time span. EventFlow [7] focuses on overview-specific tasks and defines a process to
visually simplify the query summary; however, the above methods rely
on a well-defined hypothesis to query and narrow down the scope.
They are less suitable for exploratory analysis. The system we aim to
develop should allow physicians or clinical researchers to make visual
narratives of the EMRs in the process of data exploration. For example, we have been investigating how to employ hierarchical clustering
methods and visual encoding strategies to bring out hidden structures
for exploring causal relationships [12]. The resulting visualization will
also help compare different patient groups, determine critical factors
to a particular disease, and help direct further analyses. Furthermore,
we place a focus on supporting cohort studies of EMR data and ensuring interactivity. In this short paper, we present our preliminary studies
on cohort visualization and discuss data management solutions.
2
D RIVING DATA S ET
Our work has used a data set obtained from Taiwan National Health
Insurance Database (NHIDB), which contains ICD 9-CM (International Classification of Disease, Ninth Revision, Clinical Modification) codes for disease identification as well as the drug/procedure
codes. From the one million patients, we extracted those associated
with Chronic Kidney Disease (CKD) between the years of 1998 to
2011. There are a total of 14,567 such patients. Each piece of record
stores the date and patient ID, along with the disease, drug and procedure codes.
With appropriate tools, this dataset will allow us to understand the
course of a disease by exploring trajectories of patient groups (cohorts), which are timelines consisting of multidimensional, high variance variables derived from diseases, drugs or procedures. We call
such variables the “factors”. CKD is known for its correlation with a
wide range of factors and the comorbidity varies over time stages.
3
D ESIGN C ONSIDERATIONS
For a visual-based cohort study of such a dataset, we are presented
with several challenges. First, direct visualization of all the patients
can easily lead to overplotting, as shown by Figure 1. Second, in this
dataset, there exist tens of thousands of factors pertinent to the CKD
patients. It is not apparent how to discriminate and visualize these
factors over time for bringing out structures of interest in the data. The
visualization in Figure 1 provides an overview and could be used to
guide further analysis, but it is too complex to be used by itself. It
would be useful to select, aggregate, and visualize factors associated
with patient groups. We discuss a few approaches to such operations
and the design of an interface to match the intended tasks in [12]. To
verify the usefulness of the resulting visualiations, we have conducted
a few case studies followed by a small user study.
Fig. 1. Visualization of 14,567 Chronic Kidney Disease (CKD) patients’
symptoms from 1997-2011. All patients are aligned around the middle
vertical line (0y) based on the dates they were diagnosed with CKD.
3.1
Case Studies
Figure 1 provides a broad overview of the data, but the visualization
is hard to interpret. We simplified the visualization by always spliting
patients into two cohorts. Note that the trajectory of each patient is
aligned by the time when the patient was first diagnosed with CKD,
which is marked as 0y. 2y means two years after the CKD diagnosis.
A cohort of patients is represented as a vertical rectangular box with
its height representing the cohort size and color indicating the different
nature of the disease comorbidities in the cohort. A ribbon connecting
one rectangle to another on its right represents the trnsition of patients
from one cohort to another.
Figure 2 shows the results of simplifying the visualization by focusing on the groups of patients with and without hypertension (HTN)
over time. We also mark those who died and those who had hemodialysis (HD) in black. We can see the number of patients getting HTN
increased over time and became the majority two years before being
diagnosed with CKD. The prevalence of HTN is irreversible, and it
can be verified with the visualization because no patient moves from
the cohort with HTN to the cohort without HTN. Finally, the patients
with HTN seem to a have higher mortality rate and more likely needed
hemodialysis.
Figure 3 shows visualization of those CKD patients with and without proteinuria. Again, the amount of patients receiving hemodialysis
is marked in black. A larger increase in proteinuria patients can be
observed over the 2-4 year period before being diagonsed with CKD.
Receiving hemodialysis seems to be less dependent on having proteinuria or not. In Figure 4, we compare the use of two drugs: MA1 and
A21. We can see patients started using A21 much earlier than using
MA1. The medication did not seem to have particular impact on the
patients.
We used these case study results to conduct a survey using nine
subjects. Table 1 lists their job titles and areas of study/practice. One
objective of the study was to find out if the subjects find the flow diagram visualization of the CKD data helpful or not. Almost all subjects
find the visualizations in Figure 2 and Figure 3 helpful. For most of
them, it’s their first time seeing EMR data this way. They can immediately understand what each visualization shows. For example, they
can tell from the visualization the correlation between medications and
hemodialysis intervention. However, a few subjects had some difficulty interpreting visualizations in Figure 4. It turns out that they were
confused by the crossing of the two groups, which is really a result
of the layout. That is, this artificial pattern drew unnecessary attention from the subjects. We will therefore revise our design to avoid
Fig. 2. Visualization of those CKD patients with and without hypertension (HTN) over time. All patients are aligned by the year (i.e., Year
0) they were diagnosed with CKD. Top: Highlighting the amount of patient deaths in black. Bottom: Highlighting the amount of patients taking
hemodialysis (HD) in black.
Fig. 3. Visualization of those CKD patients with and without proteinuria.
The amount of patients receiving hemodialysis (HD), indicated in black,
seems to be independent of whether they have proteinuria or not.
al. have done in their conceptual framework for designing visualization systems for time-oriented data [1]. Unlike previous work which
largely focuses on inter-record analysis [13, 6], we aim to better support inter-cohort analysis which demands advanced data management
solutions to support fast data aggregation and reduction. The discussion of the specific design decisions for achieving required interactivity is beyond the scope of this paper.
4 C ONCLUSION
There are many different ways to make use of the rapidly growing
EMR data. Our study so far only addresses rather narrow aspects of the
overall design problem for building a capable tool for studying large
EMR data. However, our preliminary findings, the testbed system,
and our continuing dialogue with the users of EMR data will keep our
work in the right direction.
Our work promises a new means for physicians and medical specialists to more easily make sense of large amounts of data about their
patients for making important decisions. The technology introduced
will also make physicians rethink how they do their work, potentially
leading to new directions and opportunities for optimizing their overall efficiency and productivity. We believe some of our designs will
also be applicable to other temporal sequence data and heterogeneous
data.
ACKNOWLEDGMENTS
This research is sponsored in part by the U.S. National Science Foundation and UC Davis RISE Program. Thanks to Miss Chih-Wei Huang
and Professor Yu-Chuan Li at Taipei Medical University for providing
us with the CKD dataset and in assisting us to conduct the survey.
Fig. 4. Visualization of those CKD patients with and without using MA1
over time (Top) and with and without using A21 (Bottom). The amount
of patients taking hemodialysis (HD) is indicated in black.
Subject
1
2
3
4
5
6
7
8
9
Job Title
Assistant Professor
postdoc
Pharmacist
Physician assistant
Project manager
Researcher
Graduate student
Medical student
Medical student
Area of Study/Practice
Health Informatics
Medical Informatics
Pharmacy
Internal Medicine
Medical Informatics
Health Informatics
Clinical Laboratory Science
Medicine
Biology
Table 1. Subjects in the user study.
introducing patterns or structures that are not in the data.
3.2
Data Management Considerations
Interactivity is key to most of the visual analysis tasks, which often
involve repeated filtering, aligning, aggregating, and rendering operations. A few promising approaches have been introducted to enhance
interactivity of visual-based data querying and analysis. For example, Ermac [15] provides a declarative language for constructing interactive visualization, and imMens [5] relies on aggregation and precomputation. EMR data, which have nested arrary and multi-value
attributes, require special treatments. According to our experimental
studies, conventional relational database cannot efficiently support all
of these operations. We have adopted NoSQL [4] as the underlying
database of our data management system because it offers more flexibilty when we need to model different types of data such as physician’s
notes, drug and procedure codes, image data, etc. NoSQL is also more
scalable for handling large data because it can be used with distributed
computing architectures.
We have been experimentally studying our data management strategies to support efficient processing of complex EMR data. We consider time as a unique characteristic of the data as what Aigner et
R EFERENCES
[1] W. Aigner and S. Miksch. Towards a conceptual framework for visual
analytics of time and time-oriented data. In Proceedings of Simulation
Conference, pages 721–729, 2007.
[2] A. Bui, D. R. Aberle, and H. Kangarloo. Timeline: Visualizing integrated patient records. IEEE Transactions on Information Technology in
Biomedicine, 11(4):462473, 2007.
[3] T. Gschwandtner, W. Aigner, K. Kaiser, S. Miksch, and A. Seyfang. Carecruiser: Exploring and visualizing plans, events, and effects interactively.
In Proceedings of IEEE Pacific Visualization Symposium, pages 43–50,
2011.
[4] J. Han, E. Haihong, G. Le, and J. Du. Survey on nosql database. In
Pervasive computing and applications (ICPCA), 2011 6th international
conference on, pages 363–366. IEEE, 2011.
[5] Z. Liu, B. Jiang, and J. Heer. imMens: Realtime visual querying of big
data. Computer Graphics Forum, 32(3):421–430, 2013.
[6] M. Monroe, R. Lan, H. Lee, C. Plaisant, and B. Shneiderman. Temporal
event sequence simplification. IEEE Transactoins on Visualization and
Computer Graphics, 19(12):22272236, 2013.
[7] M. Monroe, K. Wongsuphasawat, C. Plaisant, B. Shneiderman, J. Millstein, and S. Gold. Exploring point and interval event patterns: Display
methods and interactive visual query. Technical Report Technical Report
No. HCIL-2012-06, University of Maryland, 2012.
[8] V. Nair, M. Kaduskar, P. Bhaskaran, S. Bhaumik, and H. Lee. Preserving
narratives in electronic health records. In Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages
418–421, 2011.
[9] C. Plaisant, S. Lam, B. Shneiderman, M. S. Smith, D. Roseman, G. Marchand, M. Gillam, C. Feied, J. Handler, and H. Rappaport. Searching electronic health records for temporal patterns in patient histories: A case
study with microsoft amalga. In Proceedings of AMIA Annual Symposium, pages 601–605, 2008.
[10] C. Plaisant, R. Mushlin, A. Snyder, J. Li, D. Heller, and B. Shneiderman. Lifelines: using visualization to enhance navigation and analysis
of patient records. In Proceedings of the AMIA Symposium, page 7680,
1998.
[11] B. Shneiderman, C. Plaisant, and B. W. Hesse. Improving healthcare with
interactive visualization. Computer, 46(5):5866, 2013.
[12] C.-F. Wang, J. Li, K.-L. Ma, C.-W. Huang, and Y.-C. Li. A visual analysis
approach to cohort study of electronic patient records. In Proceedings of
IEEE BIBM 2014 (to appear), November 2014.
[13] T. D. Wang, C. Plaisant, A. Quinn, R. Stanchak, B. Shneiderman, and
S. Murphy. Aligning temporal data by sentinel events: Discovering patterns in electronic health records. In Proceedings of the ACM SIGCHI
Conference on Human Factors in Computing Systems (CHI ’08), pages
457–466, 2008.
[14] K. Wongsuphasawat and D. Gotz. Exploring flow, factors, and outcomes
of temporal event sequences with the outflow visualization. IEEE Transactoins on Visualization and Computer Graphics, 18(12):26592668,
2012.
[15] E. Wu, L. Battle, and S. R. Madden. The case for data visualization management systems. In The 40th International Conference on Very Large
Data Bases, pages 903–906, 2014.