2014-06-Falquet-Knowledge-engineering-techniques-for-the

Knowledge engineering techniques for the
creation of a semantic digital edition of
Saussure's manuscripts
Gilles Falquet, Luka Nerima, Massimo Brero
1. 
Storage, visualization, annotation, transcriptions of manuscripts
from Ferdinand de Saussure
2. 
Digital scholarly publishing of manuscripts } 
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
2
A system for the visualization, annotation, and
transcription of manuscripts from Ferdinand de
Saussure
Swiss linguist (1857 – 1913
Famous for
modern linguistics
structuralism
Cours de linguistique générale
very few publications in his lifetime
but
15'000 sheets of paper given to libraries (Harvard, Paris, Geneva)
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
3
Aims of the project
A usable tool for researchers
1. 
2. 
3. 
Visualization
Annotation Transcription
Université de Genève - CUI
of manuscrits
from F. de Saussure
Fribourg Workshop – 27.02.2014
4
Typical (human) task: Reconstructing the reading order
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
5
Main concepts
Transcription
Transcription
element
Annotation
zone
Transcription
element
zone
Pictures
Covered surface
Université de Genève - CUI
Writing surface
Fribourg Workshop – 27.02.2014
6
Data/Knowledge Model
Represent
•  basic metadata about manuscripts
• 
• 
• 
• 
location, date, image file, ...
(scientific) transcriptions
annotations
semantic annotations
Available on the semantic web
• 
• 
expressed in RDF/S
stored in a RDF triple store
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
7
From classification numbers to URIs
Semantic web => universal identification (URI)
•  library classification number → URI
Example (BGE)
•  Cote :
•  Nom de fichier :
Ms. fr. 3951/10, f. 28
ms_fr_03951_10_f028v_029.tif
URI :
•  x:ms_fr
•  x:ms_fr_03951
•  x:ms_fr_03951_10
•  x:ms_fr_03951_10_f028v_029
•  x:ms_fr_03951_10_f028v_029-DOT-jp2
•  x:ms_fr_03951_10_f028v_029_Z_001
•  x:ms_fr_03951_10_f028v_029_Z_001_annot_001
•  x:ms_fr_03951_10_f028v_029_Z_001_Shape_001
Université de Genève - CUI
27.02.14
8
Data Model
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
9
System / User Interface
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
10
Manuscript visualization
Université de Genève - CUI
27.02.14
11
Manuscript visualization
Université de Genève - CUI
27.02.14
12
Manuscript visualization
Université de Genève - CUI
27.02.14
13
Manuscript visualization
Université de Genève - CUI
27.02.14
14
Manuscript visualization
Université de Genève - CUI
27.02.14
15
IIP Image server
}  Tiles
Université de Genève - CUI
27.02.14
16
Creating Annotations (texts or concepts)
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
17
Navigation in the corpus
Université de Genève - CUI
27.02.14
18
Navigation in the corpus
Université de Genève - CUI
27.02.14
19
Full text search
Université de Genève - CUI
27.02.14
20
System Architecture
Image import
Web Server/
Front end (REST)
Université de Genève - CUI
Back end
Storage control (updates,
authentification)
21
Example: Inserting a new annotation
} 
Insert request sent to the RDF server
Université de Genève - CUI
27.02.14
22
Usability Testing
Methodology
•  14 users (linguists, librarians, ...)
•  13 tasks (4 scenarios)
•  find a manuscrit, create an annotation, ...
• 
Measurements: •  #completed tasks
•  time to complete each task
•  user satisfaction
¨ 
System Usability Scale (SUS) questionaire
Université de Genève - CUI
27.02.14
23
Results
Task completion
by task
100%
85%
Task completion
by user
Université de Genève - CUI
50%
27.02.14
24
Satisfaction evaluation
SUS scores (by question)
SUS scores (by user)
68
Université de Genève - CUI
Fribourg Workshop - 27.02.14
25
Demo Site
fds.unige.ch/iipmooviewer/homepage.php
Université de Genève - CUI
27.02.14
26
Digital scholarly publishing of manuscripts a knowledge representation and management model ... and a system for the digital edition of large corpora of original works Université de Genève - CUI
27.02.14
27
Context and goals
Digital Critical Edition – current state
•  based on paper critical edition
•  DCE of Nietzsche, Peirce, Wittgenstein
•  other obstacles:
•  no scientific catalogue
Digital edition of Saussure’s manuscripts project
•  to provide a cooperative edition platform for the next 20 years
•  to use computers as convergence and mediation tools
•  the scientific catalogue and the critical edition will be the
outputs
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
28
Digital editions as knowledge networks
Transcriptions
terminologies
Manuscripts
ontologies
Articles/Monographs
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
29
Digital editions as knowledge networks
Transcriptions
Semantic indexes
terminologies
Manuscripts
ontologies
Alignment
Articles/Monographs
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
30
Digital editions as knowledge networks
Transcriptions
Semantic indexes
terminologies
Inferred relations
Manuscripts
ontologies
Alignment
Articles/Monographs
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
31
Knowledge modeling challenge
To represent the current state of our knowledge about the
manuscripts
different types of resources
•  direct transcriptions •  scholarly transcriptions
•  related terminologies, ontologies, dictionaries •  annotations
•  ...
and resource interconnections
• 
• 
semantic indexes
text alignments / ontology alignments
Université de Genève - CUI
27.02.14
32
Operations
Transcriptions
MLU extraction
multiword lexical units
Handwriting
recognition
Semantic indexing
Manuscripts
ontologies
Ontology
Alignment
Articles/Monographs
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
33
Operations
alignment operations: Finding correspondences between elements of
different resources, aligning ontologies, aligning texts at the sentence or term level. enrichment operations: Create new resources that describe an
existing one, add transcriptions to manuscript pictures, extract collocations from
texts, create a semantic index.
}  Specific to each type of resource
}  Based on OCR, NLP, AI algorithms
Challenge: define a minimal and expressive set of operations
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
34
System/Workbench for linguists/knowledge engineers
} 
} 
} 
Transcription acquisition
•  crowdsourcing
Indexing
•  word spotting, handwriting recognition ?
Knowledge network operations
•  NLP techniques for multiword lexical unit extraction
•  terminology extraction
•  semantic indexing
•  resource alignment (existing ontologies, terminologies, ...)
• 
• 
define operation workflows
define virtual (hyper) document generation
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
35
Thank you
Questions ?
Université de Genève - CUI
Fribourg Workshop – 27.02.2014
36