Kimmo Rossi European Commission DG CONNECT.G3

Kimmo Rossi
European Commission
DG CONNECT.G3
Connecting Europe Facility
Automated Translation (AT)
What we do
Unit CNECT.G3 – Data Value Chain
• FP7/CIP/H2020 project portfolio: Big Data,
analytics, language technology
• Contractual Public-Private Partnership (cPPP) on
Data Value (signed 13 Oct 2015)
• European Data value chain strategy (see:
Communication "Towards a thriving data
economy")
• Public Sector Information (PSI) directive, Open
Data Portal
2
CEF
What is CEF?
• A funding programme for building and deploying infrastructures
• transport networks
• energy networks
• communication networks (broadband)
• Digital Services
• Deploys mature technologies to build, enable and operate panEuropean Digital Services (and roads and energy grids)
• Intended to contribute to growth, competitiveness, jobs and social
inclusion
What CEF is not?
• research or innovation (Horizon 2020 is for that)
• technology development
3
CEF
Parts of CEF programme
• CEF Transport
• CEF Energy
• CEF Telecommunication
• Broadband
• Digital Services
Legal instruments
• CEF regulation – covers the entire programme
• CEF Telecommunication Guidelines – Broadband and Digital
Services – describes (briefly) the "projects of common
interest" (=individual digital services)
4
CEF - Characteristics
Alignment of approaches and solutions across different initiatives,
such as
• CEF
• ISA programme
• Horizon 2020
• European Interoperability framework for European public services
(EIF)
• European/international standards
Relies on broad cooperation between EC, Member States,
stakeholders of the digital services
Building on large scale pilots and other predecessor platforms
arising from earlier programmes (e.g. CIP, ISA, FP7), such as:
• PEPPOL, STORK, epSOS, eCODEX, SPOCS
• Europeana
• Safer Internet
• [email protected]
5
CEF – Important Concepts
Building blocks:
Basic digital service infrastructures which are key enablers to be
reused in more complex digital service infrastructures.
Digital service infrastructures (DSIs):
Networked services to be delivered electronically, typically over the
internet, providing Trans-European interoperable services of common
interest for citizens, businesses and/or governments, and which are
composed of core service platforms and generic services
Core service platforms: central hubs of digital service
infrastructures aiming to ensure trans-European connectivity, access
and interoperability and which are open to Member States and may
be open to other entities
Generic services: gateway services linking one or more national
infrastructures to core service platforms
6
CEF – Building blocks
CEF regulation foresees the following Building blocks:
a. Electronic identification and authentication: to enable cross
border recognition and validation of e-identification and e-signature.
b. Electronic delivery of documents: services for the secured,
traceable cross border transmission of electronic documents.
c. Automated translation: machine translation engines and
specialised language resources including the necessary tools and
programming interfaces needed to operate the pan-European digital
services in a multilingual environment.
d. Electronic invoicing: secure electronic exchange of invoices
7
CEF – Other DSIs
In addition to the "building blocks", the CEF regulation
identifies the following DSIs:
•
•
•
•
•
•
•
•
•
•
•
Cultural heritage platform (Europeana)
Safer Internet service infrastructure
Access to public sector information ("Open Data portal")
eProcurement
eHealth
Cyber security
Interconnection of Business registers
Business Mobility
eJustice
Online Dispute Resolution system (ODR)
Exchange of Social Security Information (EESSI)
8
CEF – Automated Translation (AT)
9
CEF – Automated Translation (AT)
Rationale
• Automated Translation (AT) is a "building block"
• AT will serve the other Digital Services Infrastructures in CEF
• AT = whatever it takes to make DSIs actually multilingual
Features
• Adaptable machine translation and relevant Language Resources
are central
• Other likely key areas: CAT, CMS, terminology, semantic
interoperability, interfaces to various systems and data types
• Human element is essential: service provision, quality control,
validation, post-editing, on-demand response...
All these features are not necessarily funded by CEF!
National dimension: CEF implies and encourages partnerships with
member states and regions, e.g. use of structural funds for language
resources and, technology and translation on a "local" basis.
10
CEF – Automated Translation (AT)
Problem statement
• Pan-European public services address the whole EU, as opposed to
national online public services
• Less than half of Europeans know any English
• 90% of EU web users prefer to use their own language in online
services
• pan-European online services serve users who speak 24 different
languages, and who do not share any single common language
• Human translation is too expensive and too slow with the intended
text volumes (e.g. eJustice portal, Europeana) and use scenarios
(e.g. online communication in ODR)
• Available online translators (e.g. Google) have gaps in language
coverage (especially: the small EU languages as target language)
and are not secure (e.g. patient data in eHealth systems)
11
CEF – Automated Translation (AT)
What it will deliver
• AT will make the other DSIs multilingual (it is a "building block")...
• ...by deploying mature language technologies in a secure platform
Features, elements, users
• Machine translation requires a lot of computing power and storage
space
• Adaptable machine translation and relevant Language Resources
are central, and will determine the quality of translation
• Interfaces to various systems and data types are important
• Human service element is essential: service provision, quality
control, validation, post-editing, on-demand response...
• AT will build on and extend the already functional [email protected] system,
already serving some DSIs (e.g. IMI since 2012) and individual
users at EU institutions (7M pages translated)
12
CEF – Automated Translation (AT)
Planned Actions – Work Programme 2014
• Core platform
• computing, storage and transfer capacity and DSI interfaces
to run an initial service for 3 well-defined DSIs for 2 years
• human resources to maintain a continuous service,
including handling of language resources, adapting machine
translation engines, managing workflows
• Basic language resources for automated translation
• language resources are the "raw material" for generating AT
systems
• 2014 WP covers setting up a coordination mechanism to
maximise the use of existing language resources and to
organise collaboration (involving Member States) with
language resource providers
13
CEF AT – stakeholders, contributors
Language industry
• providers of language technology, especially MT
• language service providers
Language competence centres: provision of language resources,
tools, validation and evaluation
Member states/regions: provision of language resources,
validation and evaluation, coordinating local/national/regional
programs and initiatives with CEF to reach critical mass
14
CEF AT – What we can offer
Further development of [email protected] service, in collaboration with
DGT
• to improve translation quality for CEF languages
• to adapt [email protected] for new domains
• to experiment with selected "cases" (pilot DSIs)
Assistance to Member State administrations
• one workshop in each MS in 2015
• training material for understanding MT, for identifying and
processing language resources
• one CEF AT user conference each year
• consultancy for individual cases (e.g. for pilot users)
Setting up a governance for language resource collection
•
board with representation of DSI stakeholders and Member
States
Funding (by procurement) for:
•
•
acquisition of language resources & translation technology
acquisition of computing resources and infrastructure
15
CEF AT – What we need from you
Data & Data & Data!
• ideally: aligned multilingual text corpora
• any multilingual corpora (texts & translations)
• terminologies (ideally: multilingual)
Requirements, feedback
• we need to understand the linguistic use scenarios of each
DSI, and the specific needs of each language community
• evaluation & validation of CEF AT services
Collaboration
•
•
•
•
participation/representation in the board
participation in the events
facilitating the workshops (co-hosting etc.)
improving your data (cleaning, aligning, annotating...)
16
CEF AT platform
Vision
The service
MT engines
DSIs
DISPATCHER
by language,
domain…
Engines factory
Language resources
managing
MT requests
Multilingual
corpora
Monolingual
corpora
NLP Tools
Other
SECURE
(and performing)
From data to engines
Collect and clear
QUALITY
CUSTOMISABLE
Recap: CEF AT platform in WP 2014
• IT and service specifications based on an analysis
of requirements for at least 3 mature DSIs;
• Extend [email protected] to provide small-scale customised
automated translation services for 3 DSIs;
• Setup coordination mechanism and promote and
support effective pooling of language resources
for automated translation.
18
CEF AT future objectives
• Launch the full core service platform (scale up,
set up domain adaptation "factory")
• Build on the user needs and requirements
analysis delivered by 2014 actions
• Implement and adapt the automated translation
services for remaining CEF DSIs
• Extend work on language resource collection in
the framework of the coordination mechanism set
up for this purpose by 2014 actions
19
Thank you!
[email protected]
20