This documents presents the lessons learned, best practices and results of the ODEII-project, aimed at creating an open data infractructure in the metropolitan area of Amsterdam, The Netherlands. Amsterdam Open Data Exchange II Knowledge dissemination for the ODE II Project Ron van der Lans, Jasper Soetendal, Ronald Siebes, Rinke Hoekstra, Tom Demeyer, Chris van Aart, Job Spierings, Fernando Flores García, Manos Tsagkias , Arjan Nusselder, Maarten Marx September 2014 INTRODUCTION Ron van der Lans (Amsterdam Economic Board), Jasper Soetendal (Amsterdam Economic Board) The Open Data Exchange (ODE) project started in 2012, commissioned by the Amsterdam Economic Board. The project aims at strengthening the economy of the Amsterdam metropolitan area by unlocking available (public) data sources to citizens and businesses. By using this data, citizens, businesses, research institutions and other parties, are enabled to develop services that previously wouldn’t be possible or too expensive. The results are built and implemented in such a way that third parties can easily use them to build their own applications and services. It is expected that this will create an important boost for the Amsterdam open data development community. IN THIS DOCUMENT This document focuses on the lessons learned, best practices and results of the ODE II project. This is split in the five work packages that are defined in this project: The project does this by creating an open data infrastructure in the metropolitan area. One on hand, the local government is actively approached and supported in opening up the data they create, possess and maintain. On the other hand, potential users of the data are activated by organizing app development contests (Apps 4 Amsterdam) and seminars (App bootcamps). WP2: Getting the Data WP3: Linked Open Data Core WP4: Use case Mobility WP5: Use case Tourism WP6: Use case Transparency PARTICPANTS ODE II These organisations are part of the ODE II project team: Amsterdam Economic Board Waag Society VU University Amsterdam University of Amsterdam 2CoolMonkeys GOAL FOR ODE II In 2014, the goal for the ODE II project is to develop three show cases that emphasize the power of open data applications and to lay the foundation for a sustainable implementation of open data in Amsterdam. Data providers and users around three cases (mobility, transparency and tourism) are brought together and facilitated by creating an application that is tailored to their needs. These applications are solving an actual problem, creating a sustainable business case. Data providers experience how their data is used in practice and users experience improved convenience, new features and/or decreasing costs. These solutions are built upon the foundations of available platforms as well as the Linked Open Data Core and are carefully connected to other ongoing innovation projects. ERDF Investing in your future. This project is partly financed by the European Regional Development Fund of the European Union. 2 CONTENTS INTRODUCTION ................................................................................................................................................................................................ 2 Goal for ODE II............................................................................................................................................................................................... 2 In this document .......................................................................................................................................................................................... 2 Particpants ODE II ....................................................................................................................................................................................... 2 ERDF.................................................................................................................................................................................................................. 2 CONTENTS ........................................................................................................................................................................................................... 3 GETTING OPEN DATA..................................................................................................................................................................................... 4 Open Data: What’s in it for me? ............................................................................................................................................................. 4 Lessons Learned........................................................................................................................................................................................... 5 Challenges to overcome ............................................................................................................................................................................ 6 Five step approach to identify and prioritize Open Data ............................................................................................................ 6 Results: 389 data sets ................................................................................................................................................................................ 8 INTEGRATING OPEN DATA .......................................................................................................................................................................... 9 Analyse your risks ....................................................................................................................................................................................... 9 Aim for a jam-session and not a concert ......................................................................................................................................... 10 MOBILITY – SAVING LIVES WITH OPEN DATA ................................................................................................................................ 12 The P2000 monitor .................................................................................................................................................................................. 12 Linked open data firehazard assessment ....................................................................................................................................... 13 TOURISM ........................................................................................................................................................................................................... 14 Collecting Tourism Data......................................................................................................................................................................... 14 A recommendation service ................................................................................................................................................................... 15 Process-logic backend design .............................................................................................................................................................. 15 Demo: AmsterTinder App ..................................................................................................................................................................... 16 backend visualization ............................................................................................................................................................................. 18 TRANSPARENCY ............................................................................................................................................................................................ 19 Status of the data....................................................................................................................................................................................... 19 Data enrichment........................................................................................................................................................................................ 19 Applications ................................................................................................................................................................................................ 20 Foto on coverpage taken from http://citysdk.waag.org/buildings by Bert Spaan (Waag Society) 3 GETTING OPEN DATA WORK PACKAGE 2 Jasper Soetendal (Amsterdam Economic Board), Ron van der Lans (Amsterdam Economic Board) Providing Open Data is not just uploading files to a web server. It’s a fundamental change in the information process of an organisation. This work package focused on the transition from closed data to open data, including aspects like privacy, security and standardisation. And in the end on getting as much relevant open data as possible. Open Data as a goal Opening data can be seen as a goal in itself, based on an organisation’s enthusiasm for open data, as part of its policy of because of legislation. By opening data, there are four main benefits1: 1. Transparency In a well-functioning, democratic society citizens need to know what their government is doing. To do that, they must be able freely to access government data and information and to share that information with other citizens. In this work package we stimulated, supported and encouraged multiple organisation within and outside the City of Amsterdam to think and act on the topic of ‘Open Data’. It is estimated that there are over 10.000 datasets within the city, of which only a fraction is currently open. But not all this data is relevant for developers, businesses and citizens. Making the right choices in prioritising the process of opening up data was an important part of this work package. 2. Releasing social and commercial value In a digital age, data is a key resource for social and commercial activities. Everything from finding your local post office to building a search engine requires access to data, much of which is created or held by government. By opening up data, government can help drive the creation of innovative business and services that deliver social and commercial value. Open Data is a matter of supply and demand. Supplying open data can’t be done without a clear view of the demand side. So developers, businesses and citizens need to be involved in this process. 3. Participatory Governance Much of the time citizens are only able to engage with their own governance sporadically — maybe just at an election every 4 or 5 years. By opening up data, citizens are enabled to be much more directly informed and involved in decision-making. This is more than transparency: it’s about making a full “read/write” society, not just about knowing what is happening in the process of governance but being able to contribute to it. In this document we present the main lessons learned in this work package, the challenges to overcome and a best practice on identifying and prioritizing open data. OPEN DATA: WHAT’S IN IT FOR ME? 4. Efficiency Opening up data to the public, may lead to internal efficiency as well. Since everybody can access the data, it means that the data is available internally as well. In those environments (like in Amsterdam) where the information system is decentralized an distributed, this may lead to enhanced efficiency. We’ve seen numerous of examples and opportunities in Amsterdam. In the process of getting as much open data as possible, one of the first question for all parties involved is: “Why open data? What’s in it for me?” This question can best be answered using a two-step approach: regarding open data as a goal in itself, and using open data as a mean for regular business targets. Open Data as a means In addition to these benefits of opening data, it is important to acknowledge the fact that open data can be a The first three benefits and descriptions are copied from: http://opengovernmentdata.org/ 1 4 means to reach your regular business targets. Or those of other stakeholders involved. For example, at the Amsterdam Department of Transport (DIVV), open data has contributed to the targets of their communication department. Because of the open data that was published and the apps that were built upon this data, costs for traditional media (print, advertisements, booklets, etc.) could be cut. LESSONS LEARNED While trying to get as much data open as possible in Amsterdam for the last two years, these are the most important lessons we’ve learned. Lesson 1 - Start bottom up Open Data starts with the enthusiastic initiatives of the early adopters. Let a 1.000 flowers bloom and support these initiatives. Connect the early adopters and share their experiences and best practices. This is why we use what we call the “Open Data Game” at the start of any project, initiative or idea where open data is involved. Basically it’s a stakeholder analysis, putting all cards on the table, to define the pros and cons for all parties involved. Lesson 2 - Action first, policies later The best practices of Open Data rarely start with policies and documents. They start with action and experimentation, backed by management commitment. Lesson 3 - Know “what’s in for me” for all stakeholders For each Open Data project, it is important to know all interests of the stakeholders involved. For data owner, data provider, developers and end users it should be clear what’s their role and what’s in it for them. (See previous paragraph on the “Open Data Game”) Facilitating the “Open Data Game” Providing Open Data is about setting up an environment where government, business and citizens can cooperate and profit from transparency, efficiency and creating economic value. For each specific case, the interests of these stakeholders should be aligned to create a favourable setting for cooperation. This involves creating a balanced business proposition for all stakeholders, including (local) governments, businesses, developers and citizens. While at one case developers can be asked to invest in building their own apps, because of expected income from end users, in other cases the main benefits may be for another party, which should then cover the main costs of the project. Lesson 4 - Involve the end user Not all data can be opened at once, so the open data activities should be prioritised. Involve all possible users (developers, colleagues, citizens) from the start and let them help you prioritize the datasets. Lesson 5 - The business case: internal efficiency Do not underestimate the internal profits that can be realised by opening up your data. In Amsterdam it turns out that the internal efficiency of sharing data within the city is the most important profit. Lesson 6 - Practice what you preach: use your data Within the City of Amsterdam, we played the role of facilitator for this process. In the next chapters we will describe our lessons learned, the main challenges to overcome and a five step approach to get to identification and prioritization of Open Data. If you preach the benefits of open data, practice it as well within your own organisation. Information driven intervention, evidence based policy, big data analysis: the opportunities are huge! 5 Lesson 7 - Open Data is just a simple last step in a perfectly organised information process Challenge 5 - Technical challenges Providing data should be as easy as possible. Do not increase thresholds by setting conditions on formats, standards or technologies. Use workarounds if necessary, focus on getting the data first. The real problem is: the information process isn’t perfectly organised. While trying to open up data, you’ll have to deal with a lot of problems that aren’t really connected to open data. Be very clear about this to all stakeholders: these aren’t the costs of Open Data, these are the costs of getting your information process together. FIVE STEP APPROACH TO IDENTIFY AND PRIORITIZE OPEN DATA In addition to the lessons learned and challenges to overcome, we have a practical approach for identifying open data for organisations. To get to an prioritized list of data that can be opened, we define five steps. CHALLENGES TO OVERCOME While trying to convince other parties to open up their data, challenges are faced. These are the most important challenges and some measures to overcome them. Step 1 - Listing available data The first step is creating a list of all data in the organisation. While it may be impossible to get an exhaustive list of all data that is used, try to be complete as possible. And do include data that is currently unavailable, but can be generated or collected if it might be relevant for re-use. Challenge 1 - Privacy challenges Data that is privacy sensitive can’t be published as open data. But aggregated or anonymised data can. Make sure to involve a trusted party that has expertise to cope with these challenges. In Amsterdam the Department of Research & Statistics provided support to overcome these challenges. There might be good organisational reasons why data can’t or won’t be opened. In these cases, the sum of all benefits in the “Open Data Game” should be profitable enough for the data provider to open up the data. If it isn’t it will be difficult to convince them in doing anyway. Start this list by setting up a logical structure for it, for example grouped by department, process or products, and fill it in with the most obvious data. Next, set up a sessions with colleagues for each of the departments/processes/products. Make sure there are enough post-it notes and let everybody write down the data they generate, process, use, publish or would like to have. These sessions should be like brainstorm sessions: there’s no time for difficulties and problems, it’s just about thinking what data is available or could be available. Challenge 3 - Quality challenges Step 2 - Estimate of ‘coolness’ for reuse Challenge 2 - Organisation/business challenges Opening up data may reveal quality issues in the data. In some cases these data issues need to be addressed before publishing, but in most cases open data provides a great way of improving your data quality. Input from developers, user generated content and mashing up the data with other sources will improve the data quality. In these sessions, when all data is written down on post-it notes, the second step is to, jointly, make an estimate of the ‘coolness’ for reuse of each dataset. For this, we use the ‘Coolwall’ which might be familiar from the TV program Topgear. Use a wall or flipchart with an ‘uncool’ side and a ‘cool’ side. Now it’s up to the group of all attendees, for all post-its containing a data set, to come up with possible (re)use for this data (by external developers, citizens, businesses, etc.). Together you decide it’s place on the cool wall, from uncool to cool. Challenge 4 - Political challenges Opening up data can be scary for politicians and civil servants. So Open Data activities should be clearly supported and communicated by the mayor, aldermen or other influential persons. 6 that are very complex to publish to data sets that are easy to publish. Step 4 - Test ‘coolness’ with potential users The fourth step is to test the ‘coolness’-factor, which was estimated in step 2 with ‘real’ potential users. In Amsterdam, we organised an ‘Open Data Café’ for the Department of Transport. For this event, 100 participants were invited from different target groups (developers, business, co-workers, etc.) who al got 10 stickers in a specific colour of their group. In multiple 15 minutes sessions, data owners presented their data, how this data could be used and whether the data could be easily published (we used a green/yellow/red-traffic light indication). After these sessions all participants could use their stickers to indicate what data they wanted the most. They could divide them over multiple datasets, or go allin on one data set. At the end of the event, the results were presented, as soon as the stickers were translated to bar charts. Now the estimated ‘coolness’ was enhanced with the ‘real’ coolness, judged by those who would actually use the data. At the end of this session, the cool wall shows an overview of all your data, ordered from ‘uncool’ to ‘cool’ from a reuse perspective. This order will be used in step 5 as the vertical axis of the prioritization matrix. Step 3 - Current status of data After the ‘cool wall’-session, all the data should be listed in an Excel sheet, to be extended with extra information of the dataset. In this Excel sheet you list (and track) information on the data sets on two important aspects: the current status of the data and what should be done to publish this data as open data. Add columns for each of the aspects below and talk to data owners and experts to fill in this table for all datasets. - - - Current status of the dataset o Current availability of data (local PC, internal, shared, published, etc.) o Data owner and responsibility for data. o Location of data storage and current data format. o Level of detail of data stored (temporal, location, details, etc.). o Frequency of updates. What should be done to publish data as open data? o Do data quality and up-to-dateness meet standards for publishing? o What technical adjustments need to be made to publish the data? o What organisation adjustments need to be made to publish the data? o What are possible risks and issues if published (privacy, competition, etc.)? Estimated needed effort (time and money) to publish data. While a less time consuming approach can be used, like organising a session with a dozen of potential users, or setting up an online questionnaire, it is important to involve the actual users of the data in this step. Step 5 - Prioritize your data The last step is to use all information from the previous step to create a prioritization matrix. An example is shown below: The last aspect, the estimated needed time and money to publish the data is used in the last step, where this estimation is used on the horizontal axis, from datasets 7 On the vertical axis the outcomes of step 4 are used. The higher the demand for a dataset (the very ‘cool’ ones) a data set, the higher on this access. Using these axes, all data sets can be placed in the matrix, like the example shows the data sets from ‘A’ to ‘J’. The datasets at the top right corner have the highest priority, datasets in the lower left corner the lowest priority. Traversing the matrix from top right to lower left, the prioritization for all datasets can be defined. RESULTS: 389 DATA SETS As a result of the effort in work package 2, the number of available data sets in Amsterdam has grown to 389. See the figure below for the distribution across the 10 themes. The matrix has two axes: on the horizontal axis the information from step 3 is used, listing data from very hard (complex, expensive, time consuming) to open to very easy to open. 8 INTEGRATING OPEN DATA WORK PACKAGE 3 Ronald Siebes (VU University Amsterdam), Rinke Hoekstra (VU University Amsterdam), Tom Demeyer (Waag Society) Work package 3 concerns the approach of integrating Open Data in the ODE-II project. This chapter of the booklet has the focus on the work that has to be done before one starts coding. During the project it became clear that the technical challenge is easy compared to the challenge of predicting the reliability of the services that provide the Open Data one tries to integrate. to afford looking for another solution when the problem occurs. 2: How important is a data source to the overall functionality of your application? In case your application becomes useless when one type of data becomes unavailable, this data obviously is very important. Think of a weather application where temperature data is a key ingredient and air pressure a nice extra. If the temperature data is a ‘must-have’ and the source unreliable, it is wise to have access to alternative sources. Regarding the system architecture it is key to keep dependencies also as loose as possible (ie. ‘Loose Coupling’ system design), more on this in the next section. ANALYSE YOUR RISKS Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control 2 . This freedom for the receiver morally implies the same level of freedom for the provider. There is often no Service Level Agreement where the source is bound to. The data is provided ‘As-Is’ with no legal implications for the quality of the data, nor its availability. 3: Who is the source providing the data, and why for free ? Academics The academic infrastructure and personnel to provide Open Data often is bound to a specific project that funds it. One needs to investigate the duration of the project and the likelihood that the funding continues in one way or the other after that deadline. Be aware that the end of an academic project often result in a rapid deterioration of the provided functionality by the source. Any application builder and policy maker who plans to use Open Data should be aware the it often lacks any QoS (Quality of Service) guarantees from the providers. (Semi) Governmental organisations Procedures and activities followed and performed by (semi) governmental organisations are more stable (ie. unlikely to change) than any other type of organisation. Once something got into the system of activities, it is likely to stay that way. That is something to keep in mind when determining the risk of future availability of open data that is provided. One needs to have a clear answer on the following questions in order to determine if the planned investment to include a certain source is wise: 1: When does it start to hurt? Some data updates rapidly like live-traffic data and some are fairly static like road signs, If your application is dependent on live-data, it quickly becomes a problem if the source stops providing. A fall-back option has to be thought of beforehand in case the source has a too big risk of defaulting. If the data is quite static one has the luxury Idealistic communities The concerns of the ongoing radio-active pollution from the crippled Fukushima nuclear facility initiated a large community effort on providing real-time sensor data 3 . This is one of the many examples where idealistic motivations lie behind the provision of Open Data. The Auer, S. R.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. (2007). "DBpedia: A Nucleus for a Web of Open Data". The Semantic Web. Lecture Notes in Computer Science 4825. p. 722 3 http://blog.safecast.org/ 2 9 main risk is that the community will stop providing if the organisation becomes suspicious and/or the problem became uninteresting or irrelevant. Try to predict how long an idealistic community will provide the data you need. Commercial entities In most cases a company will only continue providing data as long it has to bound by contract or needs to according make money. If ‘somebody’ paid for the provision of this data to the public (like a government), the company is obliged by contract from which you benefit. If this is not the case, it is important to know why a company is sharing that data, or in other words, how they benefit from that. One example is product information, where the open data about products is free to use because it will increase the sales of the products themselves. source5 Choreography vs. orchestration. The terms orchestration and choreography describe two aspects of emerging standards for creating business processes from multiple Web services. The two terms overlap somewhat, but orchestration refers to an executable business process that can interact with both internal and external Web services. Orchestration always represents control from one party's perspective. This distinguishes it from choreography, which is more collaborative and allows each involved party to describe its part in the interaction6 To summarize, if you have a great idea based on integrating existing live Open Data, it would be a pity if your effort goes in vain when the data suddenly disappears which could have been prevented by a decent risk analysis. AIM FOR A JAM-SESSION AND NOT A CONCERT Within a consortium of data providers, softwareengineers and researchers, all have a shared goal and commitment. Orchestration is in this case possible because every player knows their role in the team guided by the conductor. The Open-Data domain is different in its nature. Choreography is the way to deal with players without a centrally organized authority. There you collaborate in a flexible and unpredictable way with the shared goal of an attractive performance. Within the ODE-II project we chose for a mixed approach. Some of the Open Data services were provided by the consortium partners. For these we have the opportunity to specify the API and are very flexible of developing the most efficient workflows (ie. Orchestration). Other services, provided by 3rd parties are often provided ‘as-is’, on which we have little influence. source4 4 6 http://www.imax.com/community/blog/lost-in-new-yorkcity-hobbits-and-dwarves-find-their-way-home-in-imax/ 5 Chris Peltz, "Web Services Orchestration and Choreography," Computer, vol. 36, no. 10, pp. 46-52, Oct. 2003, doi:10.1109/MC.2003.1236471 http://billyeckstine.jazzgiants.net/ 10 Integration architecture 11 MOBILITY – SAVING LIVES WITH OPEN DATA WORK PACKAGE 4 Chris van Aart (2CoolMonkeys) The Fire department has to assess emergency situations in a matter of minutes. Amsterdam contains about 600,000 unique objects such as buildings, railways, roads, railways, tunnels, and an airport. Around 10,000 incidents happens per year. Most of these incidents, such as small fires, lift jams and traffic accidents are routine. For 200 large objects detailed emergency plans are compiled. For other objects, the fire department has to access the situation on the spot, on the basis of a variety of data from various sources. In this work package two use cases are explored around available open data sources. The idea, if the data is useful for the Fire department, it is also available for the local government, researchers and the general public. THE P2000 MONITOR This first system, the p2000 monitor filters real time p2000 alarm messages about buildings and enriches them with open data about construction year, function, permits and if the address is known by other emergency services. This system gives colored (from green to red) labels so that the fire department is informed about possible general risks. The system can be consulted at http://p2000.citysdk.nl. When a building is on fire, can we tell the fire department how many people, what kind of people and hazardous goods are inside? Screenshot of p2000.citysdk.nl 12 Examples of Open Data Solutions LINKED OPEN DATA FIREHAZARD ASSESSMENT Typical questions, citizens can inspect themselves: 1: What type and function has the building I live or work in? this can be answered by (i) Bag 7 : Construction year and Destination (e.g. house, shop or factory), and (ii) Function Map 8: Horeca, Office, Parking, Shop, etc. The second system answers questions concerning detailed risks about a location. Imagine one system that can answer any questions about buildings, surroundings, roads, where-abouts of people in a system. Given an emergency call, the fire department can ask ad-hoc questions about the incident and environment. 2. Are Permits given for constructional exceptions at my place? be answered by (i) Residence permits9: e.g. for a roof terrace or to change the entrance, and (ii) Company permits: e.g. usage of explosive materials 3. Cheap isolation material burns, can be determined by Energy labels - Agentschap NL (from A – G)10. 4. Are there oxygen tanks? Can be inspected via the National Riskmap11 5. Extensive use of electricity, can be inspected by the Alliander:12 energy consumption. 7 10 http://www.kadaster.nl/web/artikel/productartikel/BAGExtract.htm 8 http://maps.amsterdam.nl/functiekaart/?LANG=en 9 http://www.centrum.amsterdam.nl/algemene_onderdelen/ contact/vergunningen/ http://maps.amsterdam.nl/energie_gaselektra/?LANG=en http://www.risicokaart.nl 12 http://www.liander.nl/liander/opendata/index.htm 11 13 TOURISM WORK PACKAGE 5 Job Spierings (Waag Society), Arjan Nusselder (University of Amsterdam), Jasper Soetendal (Amsterdam Economic Board), Fernando Flores García (University of Amsterdam), Manos Tsagkias (University of Amsterdam), Chris van Aart (2CoolMonkeys), Ronald Siebes (VU University Amsterdam) The challenge in the Tourism lab was to enable tourists and travellers to get quick and focused access to a selection of data from a myriad of sources. Extra hurdle is that users typically have little to no affinity or experience with the specific surroundings. This means users will have relatively little context to interpret data, which needs to be addressed when developing visualisations and applications. COLLECTING TOURISM DATA For the prototype we wanted to create, we needed two sources of data: data about the ‘supply’ of tourism (info on venues and event, including descriptions, images, dates, etc.) and the actual ‘use’ of all that Amsterdam has to offer. The first source was already available as open data for some time, both through AmsterdamOpenData.nl and ArtsHolland. Data on the actual usage is needed to ‘link’ the points of interest together, to recommend places by saying ‘people who visited A, also visited B and C’. Getting this data was a big challenge in which we only succeeded to some extent. Data from the usage of the “I Amsterdam Card” would have been a great starting point, since this card covers (and tracks) the usage on over 100 of the most visited places in Amsterdam. However, after numerous discussions Amsterdam Marketing (the organisation that offers the “I Amsterdam Card”) decided to not provide this source data. So as an alternative, we have used some earlier reports of Amsterdam Marketing and other sources to create a simulated dataset of visits in Amsterdam. This data was used for the Recommender-service and provides useful results. Nevertheless, if we would get the raw data from the “I Amsterdam Card”, the Recommender-service would be far more accurate than it is today. Amsterdam is one of Europe's top ten tourist cities. Geographically and socio-economically, Amsterdam and its surrounding municipalities form a single entity. In total, more than 2.5 million people live within this area. In the economic field in particular, Amsterdam wishes to compete strongly with other cities such as Copenhagen, Hamburg and Milan. This region presents itself under the name Amsterdam Metropolitan Area to international companies, as well as to business visitors and tourists. 13 As a huge economic sector, that caters to many different target groups, there is not only an (almost) endless list of data available but collecting, and dealing with these data is extra complicated. In this chapter the creation of a prototype application is described. The app is built for a scenario in which a tourist would like to receive context based recommendation on events and activities in the city. Based on available data this has been kept relatively simple: Can the user get an overview of nearby indoor events when it is raining, or events aimed at families with children. Or POI’s to visit based on sites that have been visited by other tourists. Lessons learned on collecting data http://www.iamsterdam.com/en-GB/amsterdammarketing/about-amsterdam-marketing/faq 13 14 Thinking out of the box, not limited by the actual limitations, the tourism sector offers an almost unlimited range of opportunities for new ideas, business models and applications. There’s a lot of information on the Points of Interests in Amsterdam, but it is quite a task to match, map and combine all this content. There’s hardly detailed structured data on Points of Interest in Amsterdam on which a decision system can be based. Information on whether a museum is museums. After some deliberation, it was decided to also look into the relative popularity. interesting for kids, if it can be best visited in the morning or the afternoon, in sunny times or rainy days, etc. is not available. But even more ‘factual’ data like number of visitors per year, accessibility, type of collection, etc. is not available or very limited. If this detailed structured data would be available, really nice and advanced applications can be built. All data can be simulated, but there’s nothing like the real data. An additional normalisation step was added with respect to the total number of visitors. Recommendations taking only these relative popularities into account showed different venues, but sometimes promoted venues that were hardly visited but perhaps coincidentally related. It is unclear if this is mostly a result of the sample data, or if it would also be present in the real raw data. As a middle ground, the final recommendation takes into account the top 25 popular venues and reorders these based on their relative popularity. A RECOMMENDATION SERVICE A recommendation service for Amsterdam venues was implemented for use with Amsterdam Marketing visitor data. This visitor data can be interpreted as pairs of venues that are both visited by the same tourist. Initially the recommendation algorithm was targeted at popularity and distinguishing between time periods, e.g. the most visited venues within a week are the best recommendations. The result is a recommendation service that conforms to knowledge about the behaviour of Amsterdam Card users. PROCESS-LOGIC BACKEND DESIGN The Process-logic backend contains a set of rules that formulates queries according to the (user) preference variables and external information, like the weather and geo-locations. In the absence of raw data, known relations between venues were used to generate a sample dataset over a three day time period. On this sample data the time component had no real effect. As first result the recommendations made sense, but the large difference in absolute visits overly promoted a few well known The current demo application has an initial version is developed together with pre-processing available data that enriches the event data (e.g. “for-children”, “indoor” etc.). 15 DEMO: AMSTERTINDER APP The AmsterTinder App demo is intended as an end-user application for day trippers visiting Amsterdam. Based on a simple user profile a recommendation for a day is given out of 100 popular venues. The venues are enhanced with open data about type of venue, location, and indented visitors. The prototype is built as a HTML5/javascript responsive web-app in combination of a mysql/php back-end webserver. The leaflet map environment is responsible for the map and marker display. The prototype makes use of the following services: DBPedia for venue enrichment Logic Module for filtering venues based on the profile of the user Recommender service, for recommending other venues based on a selected venue. Open Street Map for topographic map display CitySDK for venue information Screenshots Fig 1: Preference panel, where the user can specify the type of audience, transport, weather and time of day. Fig 2: Suggestions returned by the Recommendation Engine, which are sorted by relevance. 16 Fig. 3: Description of a selected venue Fig. 4: list of enrichments in RDF triples provided by the enrichment backend 17 Work to do for developing a real application BACKEND VISUALIZATION In addition to the end-user frontend prototype, a prototype backend dashboard has been developed by UvA. The dashboard aims to determine ways to give high level management insights into tourist movements, events and activities in Amsterdam. Technologies OpenStreetMaps for map based visualization JavaScript + Jquery + CSS for getting data from services and performing basic presentation of data D3.js (JavaScript library) for developing the dashboard visualizations showing statistics. Obtain aggregated information from ArtsHolland end-point with a large temporal timeframe to be able to show trends in the major events and tourist statistics. Obtain and integrate real-time data from the IAmsterdam card application. Integrate other real-time information sources such as Twitter feeds. Lessons learned 18 Start working with real data and real users as soon as possible allowing short design/develop/evaluate cycles. There are many data sources with potential relevance, many of them should be integrated and related before correlations can be found and conclusions drawn. To get insight in the complex data streams involved in tourist and event data interactive visualizations are essential. TRANSPARENCY WORK PACKAGE 6 Arjan Nusselder (University of Amsterdam), Maarten Marx (University of Amsterdam) We look at the use case Transparency and Democracy through text that is produced by and for politicians. The main data set consists of written questions and answers from the Amsterdam Municipality Council. To showcase possibilities of structured data, Dutch Parliamentary Proceedings are analysed as well. Some meta data can be extracted, such as the publication date (although they are not always available). There is no explicit text structure. It is visible/readable for humans, but not for computers. Many of the textual annotations will be done per document. Parliamentary proceedings Our approach uses an automated text analysis to enrich the content of the documents. Applications of the use case, that further "open up" the data, are built on top of these enriched political documents. Proceedings of parliament (Handelingen der Staten Generaal) are digitally published since 1995. They contain a (slightly redacted) transcript of the oral questions and discussions in the House of Commons and Senate. The total size of the data is quite large. Documents are freely available online. Documents are available in XML format (and HTML, PDF, ODT). A lot of meta data is readily available about the meetings (date, session number, topic, etc). The data contains a lot of explicit structure. Text is present in small chunks, all text is attributed to speakers, etc. Many of the textual annotations will be done per paragraph. Looking at textual documents, one distinction we can make is transparency within documents and transparency between documents. Within documents, data can be made more accessible by for instance facilitating search. Between documents, data can be compared and linked to other, new data sources. These are both made possible by the text analysis, where each document is summarised by its most relevant terms and the detection of named entities therein. DATA ENRICHMENT STATUS OF THE DATA Before the data is used in applications, it is processed through several steps. Before the data is processed, it is good to look at the "openness" of the data. The two data sets differ in the way they are available, and in the level at which the analysis is consolidated. 1. Scraping / download data Documents are published in many formats. For the Amsterdam municipality data, the officially published PDF files are downloaded. 2. Extract text Municipality written questions Many documents are easy to read and understand for people, but not so much for computer tools. The textual content of the (visually oriented) PDF documents is extracted and stored separately. The written questions are available from roughly 2010. They contain questions and answers from the municipality council. 3. Add structure The total size of the data set is relatively small. Documents are freely available online. Documents are available in PDF format. Known implicit structure can be made explicit. XML is a useful structured data format that is used to this end. The 19 second data set of governmental proceedings is already available as XML and downloaded as such. Summaries can go further than standard word clouds. Our document viewer for instance also lists entities, ordered by count, with an internal link to the first occurrence. The summary can thus serve as a document index on subjects. We emphasize that all summaries are created fully automatically. This is cost-effective and fast, but the summaries may contain strange or even wrong terms for the given document. 4. Text analysis The text in the document is analysed word by word by natural language processing software. Words and phrases are tagged with their part of speech and corresponding lemmas, Named Entities (typically proper names) are detected, and distinctive terms (relative to other documents) are determined. Search with descriptions 5. Enrich data After documents are made available as open data, a good next step is to facilitate search within those documents. Most search engines will show documents with a short text snippet where the query term was found. The relevance of a document might not be directly clear from just a snippet. Known terms and named entities are annotated with links to Wikipedia. Named entities which are recognized as locations are linked to their geo-coordinates. In the proceedings, known politicians (and parties) are explicitly identified as unique people. 6. Presentation The structured, annotated data is made accessible online, through search interfaces, document viewers and visualizations showing several aggregates. APPLICATIONS We present five tools that were implemented using the data sets. They are examples of many tools that can rather easily be built on top of the machine-readable and enriched data. These range from data exploration to fact checking. The document summaries can be used to create a short indicator of the topic of a document. In the search results shown here we present the summary terms in red (the file name is given in green). Several roles of Amsterdam Schiphol Airport can be distinguished using the summary terms, such as an employment provider, a country border, or a destination within bird colonies. Document summaries With the documents and analyses available, a first application is to create and visualise a summary of each document. A popular method for such visualisations is a word cloud. The terms shown here are from a document about local amateur football club AFC. Each term is lemmatised, and only verbs, nouns and adjectives are included. The sizes signify how representative the word is for this document (compared to the other documents in the municipality data set). Link to discussed entities The more structured governmental proceedings can sometimes be quite long, even for a single topic. When reading through such documents, it can be useful both to have a quick overview of the current subject, and find related information online. 20 The example shown displays a single paragraph of text, with the detected named entities listed explicitly below. Each entity links to Wikipedia for background information. From the structured proceedings data, we know several things. First, the questions are present in the document as text elements attributed to the specific person asking the question. Second, the date of the meeting determines when news articles are likely published. Finally, the document summary describes the likely topic and key political actors of the news articles. Time line charts Larger data sets, spanning a more significant length of time, are often well suited for chronological time line visualisations. One example shown here is a plot of the number of occurrences of the entity Libya in the proceedings of parliament from 2001-2012. A significant spike is visible at the beginning of 2011, when an important event took place that was widely discussed internationally. The information is combined to create a query to the EMM service. In words, the queried news articles must contain at least the name of the speaker who asked the question, should be published around the date of the meeting, and should contain at least one of the top ten distinctive terms. The returned set of news articles is then further analysed to group similar articles. Time lines can be deceptive and require interpretation to have meaning. They are an interesting possibility for exploration however. The chart tool contains a direct link to search for documents, containing the given entity within the specific time range. It is easy to imagine a new combined tool that shows for instance a summary of a set of documents for a given time period. Automated news queries The last application combines several different annotations and an external data source, for an automated news query. When questions are asked in the Lower House during a “question hour”, related news articles are often published by newspapers. A third party, The European Media Monitor, collects news articles and has these available for querying. 21
© Copyright 2024 ExpyDoc