スライド 1

Presentation Title:
Semantic Computing and
Standard Data Category Registry
9th Open Forum on Metadata Registries
Harmonization of Terminology, Ontology and Metadata
20th – 22nd March, 2006 , Kobe Japan.
Day: 20060322
Slot No. K3
Name: HASIDA Koiti (ISO/TC37/SC4/TDG3 Convener)
Organization: AIST & GSK
Semantic Gap

People and computers don't share meaning
and value.
– We don't understand computers.
– Computers don't understand us.

So they cannot collaborate well.
9th Open Forum for Metadata Registry, Kobe, 2006
We Don't Understand Computers.
(Computers Don't Understand Themselves,
either.)
 I installed Service Pack 2 into my PC
running Windows XP. Since then I cannot
connect to wireless LAN. Why?
 I cannot remove a strange line in MS
Word.
 We cannot coordinate workflow systems
with each other in our intranet.
9th Open Forum for Metadata Registry, Kobe, 2006
Computers Don't Understand Us.

I cannot find the information I want. The search
engine returns a lot of irrelevant information and
little relevant information.
–

Web sites are very hard to keep easy to use.
–

The computer doesn’t know what exactly I want to
know.
The computer doesn’t know what the Web content
means.
Performance improved by banning intracorporate e-mails.
–
E-mails poorly reflect contexts of real human
communication.
9th Open Forum for Metadata Registry, Kobe, 2006
Semantic Computing
= Semantics-Oriented Architecture

Glassbox Computer
–
–


design and operation of computer systems through
semantics shared with people
semantic model of data and process
Straightforward provision of services
meaningful to people
People can understand, compose, and
improve software.
–
emergent total optimization by accumulation of
improvements by many users
9th Open Forum for Metadata Registry, Kobe, 2006
agent
device
home info.
appliance
translation
Ubiquitous Info. Service
enterprise
project management
summarization
accounting
ITS
behavior
mining
retrieval
Semantic Service
semantic
authoring
possible-world
simulation
semantic Web
service
ontology
ad-hoc
wireless
network
dialog
network
robot
spatial
reasoning
planning
speech
Semantic Platform
vision
multiagent
architecture
semantic
annotation
Ubiquitous Platform
grid
sensor net
privacy
security
9th Open Forum for Metadata Registry, Kobe, 2006
Ontology
9th Open Forum for Metadata Registry, Kobe, 2006
Ontology of Patent Claim
Each `claim’ class
instance has one or
more `constituent’
properties with
`technology’ class
instances as values.
class (concept)
property
The `claim’
class subsumes
the `Jepsontype claim’ class.
claim
constituent+
technology
about*
Jepson-type
claim
presupposes
other
claim
9th Open Forum for Metadata Registry, Kobe, 2006
description
Semantic Structure of Patent Claim
constituent
constituent
extract ion a
from (1)
ion source (1)
(2)
separates a
about
mass analyzer
(2)
(2) extracts
ion b
mass spectroscope (0)
constituent
presupposes
constituent
Jepson-type
claim 0
constituent
subslit (10)
ion-electron
converter (4)
electron
detector (3)
about
(4) converts b
to electron c
about
about
(3) detects c and
extracts as electric
signal
enables
enables
enables
enables
place (10) between
(2) and (4)
purpose
constituent
voltage controller(12)
about
(12) determines Vs and
Vc according to V0
Vs=V0-k1
Vc=V0-k2
V0 = ion-extraction voltage on (1)
Vs = voltage on (10)
Vc = converter voltage on (4)
k1 and k2 are constants
9th Open Forum for Metadata Registry, Kobe, 2006
constraint
Translation … Two-Day Work
検索質問Qのノードxごとに、リンクy-zが
データベースDに含まれてyのラベルがL
であるようなノードyとノードz∈F(x)が存
在するような、ラベルLのリストを、表示
部に表示する
wrong
translation
displaying, on a display unit, a list of
labels L in which are present a node
z∈F(x) and a node y of which a link yz is contained in the database D and
of which the label y is L, for each of
the nodes x of a search question Q
9th Open Forum for Metadata Registry, Kobe, 2006
Explicit Semantic Structure
検索質問Qの各ノードx
量化
Lのリストを表示部に表示する
内包
each node x in
retrieval query Q
z∈F(x)。
データベースDがリンクy-zを含む。
yのラベルがLである。
quantify
display the list of L on the display unit
z∈F(x).
Database D contains link y-z.
The label of y is L.
9th Open Forum for Metadata Registry, Kobe, 2006
intension
Semantic Authoring
9th Open Forum for Metadata Registry, Kobe, 2006
The Right Question about
Semantic Annotation
 How
to make many people do semantic
annotation (in place of machines)?
 How to raise intellectual productivity of
people/society?
9th Open Forum for Metadata Registry, Kobe, 2006
Traditional Authoring
Huge knowledge
needed.
human
content
authoring
human
content
document
inaccurate
精度低
Information loss
Linearization cost
human
content
computer
9th Open Forum for Metadata Registry, Kobe, 2006
Semantic Authoring
easy & accurate
human
content
semantic
authoring
coarsegrain
graphica
l content
content
accurate
精度低
Little information loss
No linearization cost
human
finegrain
graphica
l content
computer
human
9th Open Forum for Metadata Registry, Kobe, 2006
content
Coarse-Grain Graphical Content
Result of semantic authoring
Easy for people to understand and compose


–
–
explicit logical structure
no intersentential order
concession
I was hungry.
causes
I had had a lunch.
I had a snack.
causes
causes
I became full.
9th Open Forum for Metadata Registry, Kobe, 2006
Fine-Grain Graphical Content



automatic analysis of coarse-grain graphical
content
retrieval, translation, summarization, etc.
too fine for human browsing/editing
agt
have
concession
lunch
obj
causes
aen
I
hungry
agt
aen
causes
have
obj
snack
causes
become
gol
full
9th Open Forum for Metadata Registry, Kobe, 2006
Semantic Authoring is Easier
than Text Composition (1/2)
concession
I was hungry.
causes
I had had a lunch.
I had a snack.
causes
causes
I became full.
9th Open Forum for Metadata Registry, Kobe, 2006
Semantic Authoring is Easier
than Text Composition (2/2)

A text synonymous with the graph in the
previous page:
*

I had had a lunch. But I was hungry, and
so I had a snack. Then I became full.
This relation is hard to reflect in the text.
I had had a lunch but I was hungry.
So I had a snack. Then I became full.
9th Open Forum for Metadata Registry, Kobe, 2006
Semantic Authoring



Authoring based on ontologies, together
with explicit semantic structures
Easier authoring of better content than
with MS Word, etc.
Accurate semantic structure in resulting
content
–
–
–
short text in box
rhetorical structure
anaphora/coreference
9th Open Forum for Metadata Registry, Kobe, 2006
Improvement of Document Quality
by Idea Processor
Yagishita’s (1998) experiment
Compose
network-type
content by idea
processor
Compose text
based on the
network-type
content
Less oversights

–
more points covered
Deeper thoughts

–
longer inference chains
9th Open Forum for Metadata Registry, Kobe, 2006
Traditional Idea Processor
No standardized relations

–
Only the author or participants of brain storming can
understand.
hard to share and reuse
–
Cost of text composition

–
big apparent cost → limited spread
Semantic Authoring
Standardization of relations

–
–
–
ISO/TC37/SC4/TDG3
easy to share and reuse
retrieval, summarization, translation, etc.
Automatic text generation

–
small cost → wide spread
9th Open Forum for Metadata Registry, Kobe, 2006
Scalability
section
paragraph
paragraph
paragraph
9th Open Forum for Metadata Registry, Kobe, 2006
Upgrading Semantic Levels in
Software Architecture
window system
semantic authoring
operating system
semantic platform
file system
RDF database
9th Open Forum for Metadata Registry, Kobe, 2006
ISO/TC37/SC4/TDG3
Semantic Content Representation
9th Open Forum for Metadata Registry, Kobe, 2006
ISO/TC37
Terminology and Other Language Resources
 SC1: Principles and Methods
 SC2: Terminography and Lexicography
 SC3: Computer Applications for Terminology
– ISO12620: Data Categories

SC4: Language Resources Management
9th Open Forum for Metadata Registry, Kobe, 2006
ISO/TC37/SC4
Language Resources Management


– Chair: Laurent Romary
– Secretariat: Key-Sun Choi
WG1: Basic descriptors and mechanisms for
language resources (Laurent Romary)
WG2: Representation schemes (Kiyong Lee)
– Multimodal meaning representation scheme



WG3: Multilingual text representation
WG4: Lexical resources/database (Nicoletta
Calzolari)
WG5: Workflow of LR management
9th Open Forum for Metadata Registry, Kobe, 2006
Thematic Domain Group
ISO/TC37/SC4/Ad Hoc TDGs



TDG1: Metadata (Peter Wittenburg)
TDG2: Morphosyntax (Gil Francopoulo)
TDG3: Semantic Content Representation (Koiti
Hasida)
–
–
–
–
–
–
Discourse relations (Koiti Hasida)
Dialogue acts (Harry Bunt)
Referential structures and links (Laurent Romary)
Logico-semantic relations (Scott Farrar)
Temporal entities and relations (Kiyong Lee)
Semantic roles and argument structure (Thierry
Declerck)
– More?
9th Open Forum for Metadata Registry, Kobe, 2006
Expected Products
Not ISs (International Standards) in ISO’s
official sense
 But Standard Registries of Data Categories

– discourse relations, dialogue acts, etc.
9th Open Forum for Metadata Registry, Kobe, 2006
Scope of TDG3

Semantics, Abstracting Syntax Away
– Semantic DCs usable with various annotation
schemes
• We’re not writing annotation manuals.
– We don’t care syntax-semantics mapping,
syntactic markup and markables, etc.

Deliverables
– Concrete Data Category Registries
• semantic types of function words/morphemes and their
taxonomy
– not full dictionaries or encyclopedias
– Documents on These DCs
9th Open Forum for Metadata Registry, Kobe, 2006
Criteria on DC Registry

Purpose
– annotation/interpretation
• Inter-Annotator Agreement
– authoring/composition/description
• Descriptive Convenience

General Requirement
– ease of selection
• clarity and coverage
9th Open Forum for Metadata Registry, Kobe, 2006
Collaborative Semantic Authoring
9th Open Forum for Metadata Registry, Kobe, 2006
Discussion-Supporting Groupware
How to
eliminate illegal
bike-parking?
solution
solution
Remove illegallyparked bikes
immediately.
Prepare more
bike-parking lots.
con
That is not
profitable.
We have to
keep them for
six months.
causes
con
We don't have
enough space to
keep them.
9th Open Forum for Metadata Registry, Kobe, 2006
causes
Collaborative Semantic Authoring
Traditional Groupware

–
–
IBIS, Coordinator, Open Meeting, etc.
improved efficiency and quality of discussion
•
•
•
•
–
reduced redundancy
simultaneous utterances
better coverage of important ponts
deeper discussion
weakness ・・・ usable only for group work
Collaborative SA

–
–
seamless unification of individual SA as a major
usual task and group work
the above merits
+ advanced retrieval, summarization, etc.
9th Open Forum for Metadata Registry, Kobe, 2006
From e-Mails to Collaborative SA
Perspicuous semantic structure develops.
 No spams.
 TODO

– user-account maintenance
9th Open Forum for Metadata Registry, Kobe, 2006
Knowledge-Circulating Society
9th Open Forum for Metadata Registry, Kobe, 2006
Knowledge Circulation


social sharing, reuse, and extended reproduction
of knowledge
participation of everybody in every situation
general public
users
 producers
 consumers
 mediators
provision of
knowledge
shared DB
acquisition of
knowledge
9th Open Forum for Metadata Registry, Kobe, 2006
Semantic Enterprise System
System Design and Operation Based on
Business-Process Semantics
 Incremental and emergent total optimization
(in the sense of Enterprise Architecture)
–
–

accumulation of improvements by users
Integration of business operation, regulation,
and computer system
Transparent and fair procurement
9th Open Forum for Metadata Registry, Kobe, 2006
Knowledge Circulation in
Research (Past)
Knowledge-Circulation period > 2 years
 Papers are hard to read/write.

evaluation
publication
review
research
writing
paper
submission
9th Open Forum for Metadata Registry, Kobe, 2006
(Future)
Collaborative creation of
huge graphical content
Publication of sentences
rather than papers
Fast knowledge
circulation



–
Evaluation better than IF
and CI

–


In a week?
Network analysis
visualization
retrieval, translation,
summarization
9th Open Forum for Metadata Registry, Kobe, 2006
e-Knowledge Government

Limitation of representative system
– increasing diversity and complexity of social problems

Involvement of all the citizens
– collection and analysis of public opinions and knowledge
– policy making and consensus building

Given effective discussion by all the people:
– no need for representative/indirect democracy
– compositional democracy ・・・ KAWAKITA Jiro
– deliberative democracy

IT-based support
– retrieval, summarization, translation, etc.
– Weblog not sufficient
• no systematic support to formation of long inference chains
9th Open Forum for Metadata Registry, Kobe, 2006