MEDIS Hyojun Byomei Master

Overview of MedNLP-2
Eiji Aramaki
Mizuki Morita
Tomoko Ohkuma
Yoshinobu Kano
(Kyoto university)
(The university of Tokyo)
(Fuji Xerox)
(Shizuoka university)
Why are we dealing with Medical records?
Medical records contain rich clinical information as in text form
Medical records contain rich clinical
information as text
BUT: The amount is more than one researcher
can handle → requires ICT (NLP)
Our goal is to develop the fundamental techniques for NLP in the
medical field.
ALSO, we are aiming to develop methodology, and publish the
standard tools for the medical NLP.
Who are the organizers?
NLP Researcher
Bioinformatics
Medical
NLP
Bioinformatics
NLP Researcher
NLP Researcher
(Fuji Xerox)
Framework
Tool sharing
Company
viewpoint
• Organizers cover both academic researchers and a
company member
• Covering various fields (not only pure NLP/IR but
also bioinformatics & framework making),
Who are the organizers?
NLP Researcher
Bioinformatics
Medical
NLP
Bioinformatics
NLP Researcher
NLP Researcher
(Fuji Xerox)
Framework
Tool sharing
Company
viewpoint
• Because MedNLP targets on two aspects: computer
science & medical application, this them is suitable
for such multiple aspects
Overview
•
•
•
•
•
•
Background
Material
Task Design
Overview of Task 1
Overview of Task 2
What’ the Next
One & Only Non-English Medical
Shared Task
• Medical Shared Task
–
–
–
–
–
Image CLEFmed (2005-) Image
I2b2 NLP (2006) English
TREC Medical Records Track (2011) English
CLEFeHealth (2013) English
MedNLP (2011) Japanese (& non-English)
:
• Why are not many non-English medical tasks
available?
Medical Record contains Privacy
Information
• In US, HIPPA clearly defines what is
privacy, consisting of 18 items (name,
telephone number, e-mail address, face
picture….)
– Once privacy information is removed, it can be
used freely
→ SO: many English Health Records can be
available
• In contrast, Japan is still conservative
• we do not have such clear privacy
guideline for medical text
• This becomes a heavy barrier for research
use for medical text
To break the barrier:
2 types of dummy medical records
• (1) Dummy (virtual) Records
– We asked volunteers, who are MDs, to write
records assuming dummy (virtual) patients
– Then, we bought the records
• (2) Exam Texts
– Question texts of the National Medical Exam (=“医
師国家試験”) for doctors.
Exam basically consists of multiple question
like SAT test in US or “center shiken” in Japan
Most of question is give in the form of short
sentence. BUT…
Exam Texts
Some of question contains rich information on a
patient, which is called “case based question”
That style is very similar to clinical record
So, we convert the data to corpus
Conversion process is 2 folds:
Question style expression, such as multiple options,
are removed
Ws also add Named Entity, date time, to the corpus
Part A .Medical Examination for Doctors (2005)
Quantity & Quality of Dummy data
MedNLP-1
MedNLP-2
Disorder of the
Alimentary Tract
4
19
Liver, Biliary Tract &
Pancreas
2
12
Cardiovascular System
12
23
Endocrinology,
Metabolism & Nutrition
5
17
Disorders of the Kidney
& Urinary Tract
4
14
Immune System &
Immune-Mediated
Injury
5
17
Disorders of the
Hematopoietic System
1
13
Infectious Disease
6
15
Disorders of the
Respiratory System
11
26
TOTAL
50
156
While MedNLP-1 does not covers
several clinical domains enough,
MedNLP-2 covers all domains
To validate the quality, we ask MDs
to classify the dummy records from
the mixed corpus
10 dummy records
10 real records
Accuracy
Medical (physician)
(n=2)
60.0%
Non medical (n=3)
56.3%
It was hard to distinguish
even forMDs
Task Design
京大病院来院5日前から腹が痛むとのこと
De identification
■■大病院来院5日前から腹が痛むとのこと
■■大病院来院5日前から腹痛とのこと
Coding
■■大病院来院5日前からR104とのこと
Decision Support
MedNLP-2
NER
MedNLP-1
Milestone
What kind of Task is required?
MedNLP-2 targets on The 2nd step & 3rd step.
Output Example
Given a raw text
MedNLP-1
MedNLP-2
Participants
Participants increased!
Task MedNLP-1
De-identification 6 groups
(11 systems)
NER 11 groups
(15 systems)
ICD-coding Free 1 groups
(1 systems)
MedNLP-2
10 groups
(24 systems)
9 groups
(19 systems)
2 groups
(2 systems)
The number of groups is
the same to the
previous MedNLP-1
The number of systems
increased much
• Surprisingly, In total, MedNLP-2 had 12 groups and 45 systems!
• One of the most active tasks in NTCIR
• More Surprisingly: ICD-coding task, which is a medical specific task,
also almost 20 submissions.
• This indicates that NLP people pay much attention to find the way to
reach the medical application.
Lists of MedNLP-2 Participants
北陸先端科学技術大学院大学
国立中央大学(台湾)
JAIST
National Central University
北海道大学
Hokkaido University
朝陽科技大学(台湾)
京都大学
Chaoyang University of Technology
Kyoto University
岡山大学
大学
Academic
海外
Oversea
南京大学 (中国)
Nanjing University
Okayama Prefectural University
東京大学
中央研究院 (台湾)
The University of Tokyo
Academia Sinica
奈良先端科学技術大学院大学
Nara Institute of Science and Technology
ダブリン大学(英国)
安田女子大学
Dublin City University
Yasuda Women's College
日本ユニシス
Nihon Unisys, Ltd
Participants have various
background just as the organizers
日立中央研究所
Hitachi, Ltd.
企業
Company
NTT研究所
NTT Science and Core Technology Laboratory Group
Lists of MedNLP-2 Participants
北陸先端科学技術大学院大学
国立中央大学(台湾)
JAIST
National Central University
北海道大学
Hokkaido University
朝陽科技大学(台湾)
京都大学
Chaoyang University of Technology
Kyoto University
岡山大学
大学
Academic
海外
Oversea
南京大学 (中国)
Nanjing University
Okayama Prefectural University
東京大学
中央研究院 (台湾)
The University of Tokyo
Academia Sinica
奈良先端科学技術大学院大学
Nara Institute of Science and Technology
ダブリン大学(英国)
安田女子大学
Dublin City University
Yasuda Women's College
日本ユニシス
Nihon Unisys, Ltd
We are very happy to have five
submission from oversea
→ Although the material is
Japanese language only, task is not
depend on the language.
日立中央研究所
Hitachi, Ltd.
企業
Company
NTT研究所
NTT Science and Core Technology Laboratory Group
Overview of Task 1
extraction of complaint and diagnosis Task
(Shortly, NER task)
Two types of NER Task
(1) NER ONLY
• Given a raw text, find a disease name
腹痛は認めず
Stomachache is not found
腹痛は認められず
Stomachache is not found
(2) NER + MODALITY
• Given a raw text, find a disease name & its modality
腹痛は認めず
Stomachache is not found
腹痛は認められず
腹痛は認められず
Stomachache is not found
Stomachache is not found
Negative
MedNLP-1 << MedNLP-2
Seemingly MedNLP-2 much improved
MedNLP-1 (2011)
MedNLP-2 (2014)
15 groups over baseline
20 groups over baseline
MedNLP-1 << MedNLP-2
MedNLP-1 (2011)
MedNLP-2 (2014)
The accuracy of the best did not improve!
Still 85% is the maxim
→ we need a breakthrough
On the other hand, the average performance much improved.
That shows participants have already learned the best way
from
previous
MedNLP-1, and used it
15 the
groups
over baseline
→ We could successfully improve the level of NLP in this field
20 groups over baseline
STILL, we can improve
modality detection
• In modality detection, we could see
divergence in performance
• Several systems suffer from
negation.
• Especially, detection of suspicion is
difficult. and the half of systems (Fmeasure) is lower than 50%.
• The next challenge of this task is
how to deal with such rare
modalities
Overview of Task 2
ICD coding task
(shortly coding task)
ICD-Coding Task
2 ways to join
(1) TASK2ONLY
• Given a text with disease name, to give ICD-code to
them
(2) TOTAL TASK
• Given a text without any information, to find a disease
name, and give ICD-code to them
Divergence in performance
70%
Difference is 40 %
Rare case in recent shared task
30%
Much Divergence in Task2ONLY
80%
Difference is 50 %
30%
Because
• Everything is unknown in new task
– What kind of tool or method is good?
• Supervised or un-supervised
– What kind of resource is good?
• Extra corpus
• Disease name Dictionary
– What is the “ICD-Coding” task all about?
• Multi labeling
• Document classification
• Term similarity design
Methods
Gro
up
Method
Tool
Resource
Approach
B
RNN
word2vec
MEDIS Hyojun Byomei Master
ICD-10 English dictionary
Supervised
C
SVM
Brown clustering
word2vec
Wikipedia
Supervised
D
Distance in ICD
tree hierarchy
MEDIS Hyojun Byomei Master
-
E
Full-test search
MEDIS Hyojun Byomei Master
ICD-10 English dictionary
Unsupervised
F
Pattern match
MEDIS Hyojun Byomei Master
Similarity Design
G
Pattern match
Brown clustering
H
Logistic regression
MEDIS Hyojun Byomei Master
LSD, T-Jisyo, MeDRA/J
-
J
Rule
MEDIS Hyojun Byomei Master
Rule
K
Full-text search,
Exact match
Lucene
Google
translate
Unsupervised
Apache
Solr
Unsupervised
Much varieties in tool and methods, including
Methods
the state-of-art
tools,
such as word2vec,
RNN,
Gro Method
Tool
Resource
Approach
up
areRNNutilized word2vec MEDIS Hyojun Byomei Master Supervised
B
ICD-10 English dictionary
C
SVM
Brown clustering
word2vec
Wikipedia
Supervised
in ICD
MEDIS Hyojun Byomei Master
TheDistance
popular
resource
is
“
MEDIS
Hyojun
Byomei Master”
tree hierarchy
EBUT:
Full-test
search of groups
Lucene
MEDIS
Hyojun
Byomei
Masterit Unsupervised
half
do
not
use
Google
ICD-10 English dictionary
D
translate
F
Pattern match
MEDIS Hyojun Byomei Master
Similarity Design
Pattern match
Unsupervised
Interesting
approach
(using
English
resources
Brown clustering
Husing
Logisticmachine
regression
MEDIS Hyojun Byomei
Master
translation
) is
utilized
G
LSD, T-Jisyo, MeDRA/J
J
Rule
K
Full-text search,
Exact match
MEDIS Hyojun Byomei Master
Apache
Solr
Rule
Unsupervised
STILL: rule-based approach is employed
Methods
Gro
up
Method
Tool
Resource
Approach
B
RNN
word2vec
MEDIS Hyojun Byomei Master
ICD-10 English dictionary
Supervised
C
SVM
Brown clustering
word2vec
Wikipedia
Supervised
D
Distance in ICD
tree hierarchy
MEDIS Hyojun Byomei Master
-
E
Full-test search
MEDIS Hyojun Byomei Master
ICD-10 English dictionary
Unsupervised
F
Pattern match
MEDIS Hyojun Byomei Master
Similarity Design
G
Pattern match
Brown clustering
Lucene
Google
translate
Unsupervised
We’d
like
to discussMEDIS
the
detail
in the
MedNLP
H
Logistic
regression
Hyojun
Byomei Master
LSD, T-Jisyo, MeDRA/J
Session
(will be held
day
tomorrow
J
Rule
MEDIS
Hyojun after
Byomei Master
Rule
K
Full-text search,
Apache Please join us
Unsupervised
morning
(9:20-))
Exact match
Solr
Overview of Task 3
- Free Task --
What is Free Task?
• MedNLP has a unique task, FREE task, in which
participants design their tasks freely (Any task is
welcome!)
• We design this task because we are frequently asked
“We’d like to join MedNLP. But, MedNLP task is NOT our
target task” or “We could not have enough ability to
develop the NLP systems”
• In order to save such groups, we proposed this task
• However, the "Free task" is much too open-ended
– An NTCIR reviews said “I'm a little pessimistic about
whether anything concrete will come of this.”
• I am not so pessimistic, because 2 groups joined this
task, presented interesting works.
Several medical terms are too difficult,
and hard to understand for non-medical
people, including patients and NLP
researchers.
To help the understanding of medical
word, they build a word dictionary for
non-medical people
F-group
(Word Dictionary for Patients)
ATOK covers the
corpus?
L-group
(Investigation Dictionary Coverage)
Conclusion
Summary
MedNLP-1
MedNLP-2
Corpus Amount
50 documents
150 documents
Material
Dummy Records
Dummy Records
Medical Doctor Exam.
Task
De-identification
NER
Free
NER
ICD-coding
Free
# of systems
12 groups
(27 systems)
12
(45 systems)
MedNLP-2 improved
Providing larger corpus
Designing more complex task
Although the number of groups is the same,
but the number systems increased
Acknowledgment
Adviser
MASUICHI Hiroshi, Ph.D.
Annotator
SHIKATA Shuko
KUBO Kay
SHIMAMOTO Yumiko