Kimmo Rossi European Commission DG CONNECT.G3 Connecting Europe Facility Automated Translation (AT) What we do Unit CNECT.G3 – Data Value Chain • FP7/CIP/H2020 project portfolio: Big Data, analytics, language technology • Contractual Public-Private Partnership (cPPP) on Data Value (signed 13 Oct 2015) • European Data value chain strategy (see: Communication "Towards a thriving data economy") • Public Sector Information (PSI) directive, Open Data Portal 2 CEF What is CEF? • A funding programme for building and deploying infrastructures • transport networks • energy networks • communication networks (broadband) • Digital Services • Deploys mature technologies to build, enable and operate panEuropean Digital Services (and roads and energy grids) • Intended to contribute to growth, competitiveness, jobs and social inclusion What CEF is not? • research or innovation (Horizon 2020 is for that) • technology development 3 CEF Parts of CEF programme • CEF Transport • CEF Energy • CEF Telecommunication • Broadband • Digital Services Legal instruments • CEF regulation – covers the entire programme • CEF Telecommunication Guidelines – Broadband and Digital Services – describes (briefly) the "projects of common interest" (=individual digital services) 4 CEF - Characteristics Alignment of approaches and solutions across different initiatives, such as • CEF • ISA programme • Horizon 2020 • European Interoperability framework for European public services (EIF) • European/international standards Relies on broad cooperation between EC, Member States, stakeholders of the digital services Building on large scale pilots and other predecessor platforms arising from earlier programmes (e.g. CIP, ISA, FP7), such as: • PEPPOL, STORK, epSOS, eCODEX, SPOCS • Europeana • Safer Internet • MT@EC 5 CEF – Important Concepts Building blocks: Basic digital service infrastructures which are key enablers to be reused in more complex digital service infrastructures. Digital service infrastructures (DSIs): Networked services to be delivered electronically, typically over the internet, providing Trans-European interoperable services of common interest for citizens, businesses and/or governments, and which are composed of core service platforms and generic services Core service platforms: central hubs of digital service infrastructures aiming to ensure trans-European connectivity, access and interoperability and which are open to Member States and may be open to other entities Generic services: gateway services linking one or more national infrastructures to core service platforms 6 CEF – Building blocks CEF regulation foresees the following Building blocks: a. Electronic identification and authentication: to enable cross border recognition and validation of e-identification and e-signature. b. Electronic delivery of documents: services for the secured, traceable cross border transmission of electronic documents. c. Automated translation: machine translation engines and specialised language resources including the necessary tools and programming interfaces needed to operate the pan-European digital services in a multilingual environment. d. Electronic invoicing: secure electronic exchange of invoices 7 CEF – Other DSIs In addition to the "building blocks", the CEF regulation identifies the following DSIs: • • • • • • • • • • • Cultural heritage platform (Europeana) Safer Internet service infrastructure Access to public sector information ("Open Data portal") eProcurement eHealth Cyber security Interconnection of Business registers Business Mobility eJustice Online Dispute Resolution system (ODR) Exchange of Social Security Information (EESSI) 8 CEF – Automated Translation (AT) 9 CEF – Automated Translation (AT) Rationale • Automated Translation (AT) is a "building block" • AT will serve the other Digital Services Infrastructures in CEF • AT = whatever it takes to make DSIs actually multilingual Features • Adaptable machine translation and relevant Language Resources are central • Other likely key areas: CAT, CMS, terminology, semantic interoperability, interfaces to various systems and data types • Human element is essential: service provision, quality control, validation, post-editing, on-demand response... All these features are not necessarily funded by CEF! National dimension: CEF implies and encourages partnerships with member states and regions, e.g. use of structural funds for language resources and, technology and translation on a "local" basis. 10 CEF – Automated Translation (AT) Problem statement • Pan-European public services address the whole EU, as opposed to national online public services • Less than half of Europeans know any English • 90% of EU web users prefer to use their own language in online services • pan-European online services serve users who speak 24 different languages, and who do not share any single common language • Human translation is too expensive and too slow with the intended text volumes (e.g. eJustice portal, Europeana) and use scenarios (e.g. online communication in ODR) • Available online translators (e.g. Google) have gaps in language coverage (especially: the small EU languages as target language) and are not secure (e.g. patient data in eHealth systems) 11 CEF – Automated Translation (AT) What it will deliver • AT will make the other DSIs multilingual (it is a "building block")... • ...by deploying mature language technologies in a secure platform Features, elements, users • Machine translation requires a lot of computing power and storage space • Adaptable machine translation and relevant Language Resources are central, and will determine the quality of translation • Interfaces to various systems and data types are important • Human service element is essential: service provision, quality control, validation, post-editing, on-demand response... • AT will build on and extend the already functional MT@EC system, already serving some DSIs (e.g. IMI since 2012) and individual users at EU institutions (7M pages translated) 12 CEF – Automated Translation (AT) Planned Actions – Work Programme 2014 • Core platform • computing, storage and transfer capacity and DSI interfaces to run an initial service for 3 well-defined DSIs for 2 years • human resources to maintain a continuous service, including handling of language resources, adapting machine translation engines, managing workflows • Basic language resources for automated translation • language resources are the "raw material" for generating AT systems • 2014 WP covers setting up a coordination mechanism to maximise the use of existing language resources and to organise collaboration (involving Member States) with language resource providers 13 CEF AT – stakeholders, contributors Language industry • providers of language technology, especially MT • language service providers Language competence centres: provision of language resources, tools, validation and evaluation Member states/regions: provision of language resources, validation and evaluation, coordinating local/national/regional programs and initiatives with CEF to reach critical mass 14 CEF AT – What we can offer Further development of MT@EC service, in collaboration with DGT • to improve translation quality for CEF languages • to adapt MT@EC for new domains • to experiment with selected "cases" (pilot DSIs) Assistance to Member State administrations • one workshop in each MS in 2015 • training material for understanding MT, for identifying and processing language resources • one CEF AT user conference each year • consultancy for individual cases (e.g. for pilot users) Setting up a governance for language resource collection • board with representation of DSI stakeholders and Member States Funding (by procurement) for: • • acquisition of language resources & translation technology acquisition of computing resources and infrastructure 15 CEF AT – What we need from you Data & Data & Data! • ideally: aligned multilingual text corpora • any multilingual corpora (texts & translations) • terminologies (ideally: multilingual) Requirements, feedback • we need to understand the linguistic use scenarios of each DSI, and the specific needs of each language community • evaluation & validation of CEF AT services Collaboration • • • • participation/representation in the board participation in the events facilitating the workshops (co-hosting etc.) improving your data (cleaning, aligning, annotating...) 16 CEF AT platform Vision The service MT engines DSIs DISPATCHER by language, domain… Engines factory Language resources managing MT requests Multilingual corpora Monolingual corpora NLP Tools Other SECURE (and performing) From data to engines Collect and clear QUALITY CUSTOMISABLE Recap: CEF AT platform in WP 2014 • IT and service specifications based on an analysis of requirements for at least 3 mature DSIs; • Extend MT@EC to provide small-scale customised automated translation services for 3 DSIs; • Setup coordination mechanism and promote and support effective pooling of language resources for automated translation. 18 CEF AT future objectives • Launch the full core service platform (scale up, set up domain adaptation "factory") • Build on the user needs and requirements analysis delivered by 2014 actions • Implement and adapt the automated translation services for remaining CEF DSIs • Extend work on language resource collection in the framework of the coordination mechanism set up for this purpose by 2014 actions 19 Thank you! [email protected] 20
© Copyright 2024 ExpyDoc