Amsterdam Open Data Exchange II

This documents presents the lessons
learned, best practices and results of the
ODEII-project, aimed at creating an open
data infractructure in the metropolitan area
of Amsterdam, The Netherlands.
Amsterdam
Open Data
Exchange II
Knowledge dissemination for
the ODE II Project
Ron van der Lans, Jasper Soetendal, Ronald
Siebes, Rinke Hoekstra, Tom Demeyer, Chris
van Aart, Job Spierings, Fernando Flores
García, Manos Tsagkias , Arjan Nusselder,
Maarten Marx
September 2014
INTRODUCTION
Ron van der Lans (Amsterdam Economic Board), Jasper Soetendal (Amsterdam Economic Board)
The Open Data Exchange (ODE) project started
in 2012, commissioned by the Amsterdam
Economic Board. The project aims at
strengthening the economy of the Amsterdam
metropolitan area by unlocking available
(public) data sources to citizens and businesses.
By using this data, citizens, businesses, research
institutions and other parties, are enabled to
develop services that previously wouldn’t be
possible or too expensive.
The results are built and implemented in such a way that
third parties can easily use them to build their own
applications and services. It is expected that this will
create an important boost for the Amsterdam open data
development community.
IN THIS DOCUMENT
This document focuses on the lessons learned, best
practices and results of the ODE II project. This is split in
the five work packages that are defined in this project:





The project does this by creating an open data
infrastructure in the metropolitan area. One on hand, the
local government is actively approached and supported
in opening up the data they create, possess and maintain.
On the other hand, potential users of the data are
activated by organizing app development contests (Apps
4 Amsterdam) and seminars (App bootcamps).
WP2: Getting the Data
WP3: Linked Open Data Core
WP4: Use case Mobility
WP5: Use case Tourism
WP6: Use case Transparency
PARTICPANTS ODE II
These organisations are part of the ODE II project team:
 Amsterdam Economic Board
 Waag Society
 VU University Amsterdam
 University of Amsterdam
 2CoolMonkeys
GOAL FOR ODE II
In 2014, the goal for the ODE II project is to develop
three show cases that emphasize the power of open data
applications and to lay the foundation for a sustainable
implementation of open data in Amsterdam.
Data providers and users around three cases (mobility,
transparency and tourism) are brought together and
facilitated by creating an application that is tailored to
their needs.
These applications are solving an actual problem,
creating a sustainable business case. Data providers
experience how their data is used in practice and users
experience improved convenience, new features and/or
decreasing costs.
These solutions are built upon the foundations of
available platforms as well as the Linked Open Data Core
and are carefully connected to other ongoing innovation
projects.
ERDF
Investing in your future.
This project is partly financed by the
European Regional Development Fund
of the European Union.
2
CONTENTS
INTRODUCTION ................................................................................................................................................................................................ 2
Goal for ODE II............................................................................................................................................................................................... 2
In this document .......................................................................................................................................................................................... 2
Particpants ODE II ....................................................................................................................................................................................... 2
ERDF.................................................................................................................................................................................................................. 2
CONTENTS ........................................................................................................................................................................................................... 3
GETTING OPEN DATA..................................................................................................................................................................................... 4
Open Data: What’s in it for me? ............................................................................................................................................................. 4
Lessons Learned........................................................................................................................................................................................... 5
Challenges to overcome ............................................................................................................................................................................ 6
Five step approach to identify and prioritize Open Data ............................................................................................................ 6
Results: 389 data sets ................................................................................................................................................................................ 8
INTEGRATING OPEN DATA .......................................................................................................................................................................... 9
Analyse your risks ....................................................................................................................................................................................... 9
Aim for a jam-session and not a concert ......................................................................................................................................... 10
MOBILITY – SAVING LIVES WITH OPEN DATA ................................................................................................................................ 12
The P2000 monitor .................................................................................................................................................................................. 12
Linked open data firehazard assessment ....................................................................................................................................... 13
TOURISM ........................................................................................................................................................................................................... 14
Collecting Tourism Data......................................................................................................................................................................... 14
A recommendation service ................................................................................................................................................................... 15
Process-logic backend design .............................................................................................................................................................. 15
Demo: AmsterTinder App ..................................................................................................................................................................... 16
backend visualization ............................................................................................................................................................................. 18
TRANSPARENCY ............................................................................................................................................................................................ 19
Status of the data....................................................................................................................................................................................... 19
Data enrichment........................................................................................................................................................................................ 19
Applications ................................................................................................................................................................................................ 20
Foto on coverpage taken from http://citysdk.waag.org/buildings
by Bert Spaan (Waag Society)
3
GETTING OPEN DATA
WORK PACKAGE 2
Jasper Soetendal (Amsterdam Economic Board), Ron van der Lans (Amsterdam Economic Board)
Providing Open Data is not just uploading files to
a web server. It’s a fundamental change in the
information process of an organisation. This work
package focused on the transition from closed
data to open data, including aspects like privacy,
security and standardisation. And in the end on
getting as much relevant open data as possible.
Open Data as a goal
Opening data can be seen as a goal in itself, based on an
organisation’s enthusiasm for open data, as part of its
policy of because of legislation. By opening data, there are
four main benefits1:
1. Transparency
In a well-functioning, democratic society citizens need to
know what their government is doing. To do that, they
must be able freely to access government data and
information and to share that information with other
citizens.
In this work package we stimulated, supported and
encouraged multiple organisation within and outside the
City of Amsterdam to think and act on the topic of ‘Open
Data’. It is estimated that there are over 10.000 datasets
within the city, of which only a fraction is currently open.
But not all this data is relevant for developers, businesses
and citizens. Making the right choices in prioritising the
process of opening up data was an important part of this
work package.
2. Releasing social and commercial value
In a digital age, data is a key resource for social and
commercial activities. Everything from finding your local
post office to building a search engine requires access to
data, much of which is created or held by government. By
opening up data, government can help drive the creation
of innovative business and services that deliver social and
commercial value.
Open Data is a matter of supply and demand. Supplying
open data can’t be done without a clear view of the
demand side. So developers, businesses and citizens need
to be involved in this process.
3. Participatory Governance
Much of the time citizens are only able to engage with
their own governance sporadically — maybe just at an
election every 4 or 5 years. By opening up data, citizens
are enabled to be much more directly informed and
involved in decision-making. This is more than
transparency: it’s about making a full “read/write”
society, not just about knowing what is happening in the
process of governance but being able to contribute to it.
In this document we present the main lessons learned in
this work package, the challenges to overcome and a best
practice on identifying and prioritizing open data.
OPEN DATA: WHAT’S IN IT FOR
ME?
4. Efficiency
Opening up data to the public, may lead to internal
efficiency as well. Since everybody can access the data, it
means that the data is available internally as well. In those
environments (like in Amsterdam) where the information
system is decentralized an distributed, this may lead to
enhanced efficiency. We’ve seen numerous of examples
and opportunities in Amsterdam.
In the process of getting as much open data as possible,
one of the first question for all parties involved is: “Why
open data? What’s in it for me?”
This question can best be answered using a two-step
approach: regarding open data as a goal in itself, and using
open data as a mean for regular business targets.
Open Data as a means
In addition to these benefits of opening data, it is
important to acknowledge the fact that open data can be a
The first three benefits and descriptions are copied
from: http://opengovernmentdata.org/
1
4
means to reach your regular business targets. Or those of
other stakeholders involved.
For example, at the Amsterdam Department of Transport
(DIVV), open data has contributed to the targets of their
communication department. Because of the open data
that was published and the apps that were built upon this
data, costs for traditional media (print, advertisements,
booklets, etc.) could be cut.
LESSONS LEARNED
While trying to get as much data open as possible in
Amsterdam for the last two years, these are the most
important lessons we’ve learned.
Lesson 1 - Start bottom up
Open Data starts with the enthusiastic initiatives of the
early adopters. Let a 1.000 flowers bloom and support
these initiatives. Connect the early adopters and share
their experiences and best practices.
This is why we use what we call the “Open Data Game” at
the start of any project, initiative or idea where open data
is involved. Basically it’s a stakeholder analysis, putting all
cards on the table, to define the pros and cons for all
parties involved.
Lesson 2 - Action first, policies later
The best practices of Open Data rarely start with policies
and documents. They start with action and
experimentation, backed by management commitment.
Lesson 3 - Know “what’s in for me” for all
stakeholders
For each Open Data project, it is important to know all
interests of the stakeholders involved. For data owner,
data provider, developers and end users it should be clear
what’s their role and what’s in it for them. (See previous
paragraph on the “Open Data Game”)
Facilitating the “Open Data Game”
Providing Open Data is about setting up an environment
where government, business and citizens can cooperate
and profit from transparency, efficiency and creating
economic value.
For each specific case, the interests of these stakeholders
should be aligned to create a favourable setting for
cooperation. This involves creating a balanced business
proposition for all stakeholders, including (local)
governments, businesses, developers and citizens.
While at one case developers can be asked to invest in
building their own apps, because of expected income from
end users, in other cases the main benefits may be for
another party, which should then cover the main costs of
the project.
Lesson 4 - Involve the end user
Not all data can be opened at once, so the open data
activities should be prioritised. Involve all possible users
(developers, colleagues, citizens) from the start and let
them help you prioritize the datasets.
Lesson 5 - The business case: internal
efficiency
Do not underestimate the internal profits that can be
realised by opening up your data. In Amsterdam it turns
out that the internal efficiency of sharing data within the
city is the most important profit.
Lesson 6 - Practice what you preach: use
your data
Within the City of Amsterdam, we played the role of
facilitator for this process. In the next chapters we will
describe our lessons learned, the main challenges to
overcome and a five step approach to get to identification
and prioritization of Open Data.
If you preach the benefits of open data, practice it as well
within your own organisation. Information driven
intervention, evidence based policy, big data analysis: the
opportunities are huge!
5
Lesson 7 - Open Data is just a simple last
step in a perfectly organised information
process
Challenge 5 - Technical challenges
Providing data should be as easy as possible. Do not
increase thresholds by setting conditions on formats,
standards or technologies. Use workarounds if necessary,
focus on getting the data first.
The real problem is: the information process isn’t
perfectly organised. While trying to open up data, you’ll
have to deal with a lot of problems that aren’t really
connected to open data. Be very clear about this to all
stakeholders: these aren’t the costs of Open Data, these
are the costs of getting your information process together.
FIVE STEP APPROACH TO IDENTIFY
AND PRIORITIZE OPEN DATA
In addition to the lessons learned and challenges to
overcome, we have a practical approach for identifying
open data for organisations. To get to an prioritized list of
data that can be opened, we define five steps.
CHALLENGES TO OVERCOME
While trying to convince other parties to open up their
data, challenges are faced. These are the most important
challenges and some measures to overcome them.
Step 1 - Listing available data
The first step is creating a list of all data in the
organisation. While it may be impossible to get an
exhaustive list of all data that is used, try to be complete
as possible. And do include data that is currently
unavailable, but can be generated or collected if it might
be relevant for re-use.
Challenge 1 - Privacy challenges
Data that is privacy sensitive can’t be published as open
data. But aggregated or anonymised data can. Make sure
to involve a trusted party that has expertise to cope with
these challenges. In Amsterdam the Department of
Research & Statistics provided support to overcome these
challenges.
There might be good organisational reasons why data
can’t or won’t be opened. In these cases, the sum of all
benefits in the “Open Data Game” should be profitable
enough for the data provider to open up the data. If it isn’t
it will be difficult to convince them in doing anyway.
Start this list by setting up a logical structure for it, for
example grouped by department, process or products,
and fill it in with the most obvious data. Next, set up a
sessions
with
colleagues
for
each
of
the
departments/processes/products. Make sure there are
enough post-it notes and let everybody write down the
data they generate, process, use, publish or would like to
have. These sessions should be like brainstorm sessions:
there’s no time for difficulties and problems, it’s just about
thinking what data is available or could be available.
Challenge 3 - Quality challenges
Step 2 - Estimate of ‘coolness’ for reuse
Challenge 2 - Organisation/business
challenges
Opening up data may reveal quality issues in the data. In
some cases these data issues need to be addressed before
publishing, but in most cases open data provides a great
way of improving your data quality. Input from
developers, user generated content and mashing up the
data with other sources will improve the data quality.
In these sessions, when all data is written down on post-it
notes, the second step is to, jointly, make an estimate of
the ‘coolness’ for reuse of each dataset. For this, we use
the ‘Coolwall’ which might be familiar from the TV
program Topgear. Use a wall or flipchart with an ‘uncool’
side and a ‘cool’ side.
Now it’s up to the group of all attendees, for all post-its
containing a data set, to come up with possible (re)use for
this data (by external developers, citizens, businesses,
etc.). Together you decide it’s place on the cool wall, from
uncool to cool.
Challenge 4 - Political challenges
Opening up data can be scary for politicians and civil
servants. So Open Data activities should be clearly
supported and communicated by the mayor, aldermen or
other influential persons.
6
that are very complex to publish to data sets that are easy
to publish.
Step 4 - Test ‘coolness’ with potential
users
The fourth step is to test the ‘coolness’-factor, which was
estimated in step 2 with ‘real’ potential users.
In Amsterdam, we organised an ‘Open Data Café’ for the
Department of Transport. For this event, 100 participants
were invited from different target groups (developers,
business, co-workers, etc.) who al got 10 stickers in a
specific colour of their group. In multiple 15 minutes
sessions, data owners presented their data, how this data
could be used and whether the data could be easily
published (we used a green/yellow/red-traffic light
indication). After these sessions all participants could use
their stickers to indicate what data they wanted the most.
They could divide them over multiple datasets, or go allin on one data set.
At the end of the event, the results were presented, as
soon as the stickers were translated to bar charts. Now the
estimated ‘coolness’ was enhanced with the ‘real’
coolness, judged by those who would actually use the
data.
At the end of this session, the cool wall shows an overview
of all your data, ordered from ‘uncool’ to ‘cool’ from a
reuse perspective. This order will be used in step 5 as the
vertical axis of the prioritization matrix.
Step 3 - Current status of data
After the ‘cool wall’-session, all the data should be listed
in an Excel sheet, to be extended with extra information of
the dataset. In this Excel sheet you list (and track)
information on the data sets on two important aspects:
the current status of the data and what should be done to
publish this data as open data.
Add columns for each of the aspects below and talk to data
owners and experts to fill in this table for all datasets.
-
-
-
Current status of the dataset
o Current availability of data (local PC, internal,
shared, published, etc.)
o Data owner and responsibility for data.
o Location of data storage and current data format.
o Level of detail of data stored (temporal, location,
details, etc.).
o Frequency of updates.
What should be done to publish data as open data?
o Do data quality and up-to-dateness meet standards
for publishing?
o What technical adjustments need to be made to
publish the data?
o What organisation adjustments need to be made to
publish the data?
o What are possible risks and issues if published
(privacy, competition, etc.)?
Estimated needed effort (time and money) to publish
data.
While a less time consuming approach can be used, like
organising a session with a dozen of potential users, or
setting up an online questionnaire, it is important to
involve the actual users of the data in this step.
Step 5 - Prioritize your data
The last step is to use all information from the previous
step to create a prioritization matrix. An example is shown
below:
The last aspect, the estimated needed time and money to
publish the data is used in the last step, where this
estimation is used on the horizontal axis, from datasets
7
On the vertical axis the outcomes of step 4 are used. The
higher the demand for a dataset (the very ‘cool’ ones) a
data set, the higher on this access.
Using these axes, all data sets can be placed in the matrix,
like the example shows the data sets from ‘A’ to ‘J’.
The datasets at the top right corner have the highest
priority, datasets in the lower left corner the lowest
priority. Traversing the matrix from top right to lower left,
the prioritization for all datasets can be defined.
RESULTS: 389 DATA SETS
As a result of the effort in work package 2, the number of
available data sets in Amsterdam has grown to 389. See
the figure below for the distribution across the 10
themes.
The matrix has two axes: on the horizontal axis the
information from step 3 is used, listing data from very
hard (complex, expensive, time consuming) to open to
very easy to open.
8
INTEGRATING OPEN DATA
WORK PACKAGE 3
Ronald Siebes (VU University Amsterdam), Rinke Hoekstra (VU University Amsterdam), Tom Demeyer (Waag Society)
Work package 3 concerns the approach of
integrating Open Data in the ODE-II project. This
chapter of the booklet has the focus on the work
that has to be done before one starts coding.
During the project it became clear that the
technical challenge is easy compared to the
challenge of predicting the reliability of the
services that provide the Open Data one tries to
integrate.
to afford looking for another solution when the problem
occurs.
2: How important is a data source to the
overall functionality of your application?
In case your application becomes useless when one type
of data becomes unavailable, this data obviously is very
important. Think of a weather application where
temperature data is a key ingredient and air pressure a
nice extra. If the temperature data is a ‘must-have’ and the
source unreliable, it is wise to have access to alternative
sources. Regarding the system architecture it is key to
keep dependencies also as loose as possible (ie. ‘Loose
Coupling’ system design), more on this in the next section.
ANALYSE YOUR RISKS
Open data is the idea that certain data should be freely
available to everyone to use and republish as they wish,
without restrictions from copyright, patents or other
mechanisms of control 2 . This freedom for the receiver
morally implies the same level of freedom for the
provider. There is often no Service Level Agreement
where the source is bound to. The data is provided ‘As-Is’
with no legal implications for the quality of the data, nor
its availability.
3: Who is the source providing the data,
and why for free ?
Academics
The academic infrastructure and personnel to provide
Open Data often is bound to a specific project that funds
it. One needs to investigate the duration of the project and
the likelihood that the funding continues in one way or the
other after that deadline. Be aware that the end of an
academic project often result in a rapid deterioration of
the provided functionality by the source.
Any application builder and policy maker who plans to
use Open Data should be aware the it often lacks any QoS
(Quality of Service) guarantees from the providers.
(Semi) Governmental organisations
Procedures and activities followed and performed by
(semi) governmental organisations are more stable (ie.
unlikely to change) than any other type of organisation.
Once something got into the system of activities, it is likely
to stay that way. That is something to keep in mind when
determining the risk of future availability of open data
that is provided.
One needs to have a clear answer on the following
questions in order to determine if the planned investment
to include a certain source is wise:
1: When does it start to hurt?
Some data updates rapidly like live-traffic data and some
are fairly static like road signs, If your application is
dependent on live-data, it quickly becomes a problem if
the source stops providing. A fall-back option has to be
thought of beforehand in case the source has a too big risk
of defaulting. If the data is quite static one has the luxury
Idealistic communities
The concerns of the ongoing radio-active pollution from
the crippled Fukushima nuclear facility initiated a large
community effort on providing real-time sensor data 3 .
This is one of the many examples where idealistic
motivations lie behind the provision of Open Data. The
Auer, S. R.; Bizer, C.; Kobilarov, G.; Lehmann, J.;
Cyganiak, R.; Ives, Z. (2007). "DBpedia: A Nucleus for a
Web of Open Data". The Semantic Web. Lecture Notes in
Computer Science 4825. p. 722
3 http://blog.safecast.org/
2
9
main risk is that the community will stop providing if the
organisation becomes suspicious and/or the problem
became uninteresting or irrelevant. Try to predict how
long an idealistic community will provide the data you
need.
Commercial entities
In most cases a company will only continue providing data
as long it has to bound by contract or needs to according
make money. If ‘somebody’ paid for the provision of this
data to the public (like a government), the company is
obliged by contract from which you benefit. If this is not
the case, it is important to know why a company is sharing
that data, or in other words, how they benefit from that.
One example is product information, where the open data
about products is free to use because it will increase the
sales of the products themselves.
source5
Choreography vs. orchestration.
The terms orchestration and choreography describe two
aspects of emerging standards for creating business
processes from multiple Web services. The two terms
overlap somewhat, but orchestration refers to an
executable business process that can interact with both
internal and external Web services. Orchestration always
represents control from one party's perspective. This
distinguishes it from choreography, which is more
collaborative and allows each involved party to describe
its part in the interaction6
To summarize, if you have a great idea based on
integrating existing live Open Data, it would be a pity if
your effort goes in vain when the data suddenly
disappears which could have been prevented by a decent
risk analysis.
AIM FOR A JAM-SESSION AND NOT
A CONCERT
Within a consortium of data providers, softwareengineers and researchers, all have a shared goal and
commitment. Orchestration is in this case possible
because every player knows their role in the team guided
by the conductor. The Open-Data domain is different in its
nature. Choreography is the way to deal with players
without a centrally organized authority. There you
collaborate in a flexible and unpredictable way with the
shared goal of an attractive performance.
Within the ODE-II project we chose for a mixed approach.
Some of the Open Data services were provided by the
consortium partners. For these we have the opportunity
to specify the API and are very flexible of developing the
most efficient workflows (ie. Orchestration). Other
services, provided by 3rd parties are often provided ‘as-is’,
on which we have little influence.
source4
4
6
http://www.imax.com/community/blog/lost-in-new-yorkcity-hobbits-and-dwarves-find-their-way-home-in-imax/
5
Chris Peltz, "Web Services Orchestration and Choreography,"
Computer, vol. 36, no. 10, pp. 46-52, Oct. 2003,
doi:10.1109/MC.2003.1236471
http://billyeckstine.jazzgiants.net/
10
Integration architecture
11
MOBILITY – SAVING LIVES WITH OPEN
DATA
WORK PACKAGE 4
Chris van Aart (2CoolMonkeys)
The Fire department has to assess emergency
situations in a matter of minutes.
Amsterdam contains about 600,000 unique
objects such as buildings, railways, roads,
railways, tunnels, and an airport. Around 10,000
incidents happens per year. Most of these
incidents, such as small fires, lift jams and traffic
accidents are routine. For 200 large objects
detailed emergency plans are compiled. For other
objects, the fire department has to access the
situation on the spot, on the basis of a variety of
data from various sources.
In this work package two use cases are explored around
available open data sources. The idea, if the data is useful
for the Fire department, it is also available for the local
government, researchers and the general public.
THE P2000 MONITOR
This first system, the p2000 monitor filters real time
p2000 alarm messages about buildings and enriches them
with open data about construction year, function, permits
and if the address is known by other emergency services.
This system gives colored (from green to red) labels so
that the fire department is informed about possible
general
risks. The system can be consulted at
http://p2000.citysdk.nl.
When a building is on fire, can we tell the fire department
how many people, what kind of people and hazardous
goods are inside?
Screenshot of p2000.citysdk.nl
12
Examples of Open Data Solutions
LINKED OPEN DATA FIREHAZARD
ASSESSMENT
Typical questions, citizens can inspect themselves:
1: What type and function has the building I live or
work in? this can be answered by
(i) Bag 7 : Construction year and Destination (e.g. house,
shop or factory), and (ii) Function Map 8: Horeca, Office,
Parking, Shop, etc.
The second system answers questions concerning
detailed risks about a location.
Imagine one system that can answer any questions about
buildings, surroundings, roads, where-abouts of people in
a system. Given an emergency call, the fire department
can ask ad-hoc questions about the incident and
environment.
2. Are Permits given for constructional exceptions at
my place? be answered by
(i) Residence permits9: e.g. for a roof terrace or to change
the entrance, and (ii) Company permits: e.g. usage of
explosive materials
3. Cheap isolation material burns, can be determined
by Energy labels - Agentschap NL (from A – G)10.
4. Are there oxygen tanks? Can be inspected via the
National Riskmap11
5. Extensive use of electricity, can be inspected by the
Alliander:12 energy consumption.
7
10
http://www.kadaster.nl/web/artikel/productartikel/BAGExtract.htm
8 http://maps.amsterdam.nl/functiekaart/?LANG=en
9 http://www.centrum.amsterdam.nl/algemene_onderdelen/
contact/vergunningen/
http://maps.amsterdam.nl/energie_gaselektra/?LANG=en
http://www.risicokaart.nl
12 http://www.liander.nl/liander/opendata/index.htm
11
13
TOURISM
WORK PACKAGE 5
Job Spierings (Waag Society), Arjan Nusselder (University of Amsterdam), Jasper Soetendal (Amsterdam Economic Board),
Fernando Flores García (University of Amsterdam), Manos Tsagkias (University of Amsterdam), Chris van Aart
(2CoolMonkeys), Ronald Siebes (VU University Amsterdam)
The challenge in the Tourism lab was to enable
tourists and travellers to get quick and focused
access to a selection of data from a myriad of
sources. Extra hurdle is that users typically have
little to no affinity or experience with the specific
surroundings. This means users will have
relatively little context to interpret data, which
needs to be addressed when developing
visualisations and applications.
COLLECTING TOURISM DATA
For the prototype we wanted to create, we needed two
sources of data: data about the ‘supply’ of tourism (info on
venues and event, including descriptions, images, dates,
etc.) and the actual ‘use’ of all that Amsterdam has to
offer. The first source was already available as open data
for some time, both through AmsterdamOpenData.nl and
ArtsHolland.
Data on the actual usage is needed to ‘link’ the points of
interest together, to recommend places by saying ‘people
who visited A, also visited B and C’. Getting this data was
a big challenge in which we only succeeded to some
extent. Data from the usage of the “I Amsterdam Card”
would have been a great starting point, since this card
covers (and tracks) the usage on over 100 of the most
visited places in Amsterdam.
However, after numerous discussions Amsterdam
Marketing (the organisation that offers the “I Amsterdam
Card”) decided to not provide this source data. So as an
alternative, we have used some earlier reports of
Amsterdam Marketing and other sources to create a
simulated dataset of visits in Amsterdam. This data was
used for the Recommender-service and provides useful
results. Nevertheless, if we would get the raw data from
the “I Amsterdam Card”, the Recommender-service would
be far more accurate than it is today.
Amsterdam is one of Europe's top ten tourist cities.
Geographically and socio-economically, Amsterdam and
its surrounding municipalities form a single entity. In
total, more than 2.5 million people live within this area. In
the economic field in particular, Amsterdam wishes to
compete strongly with other cities such as Copenhagen,
Hamburg and Milan. This region presents itself under the
name Amsterdam Metropolitan Area to international
companies, as well as to business visitors and tourists. 13
As a huge economic sector, that caters to many different
target groups, there is not only an (almost) endless list of
data available but collecting, and dealing with these data
is extra complicated.
In this chapter the creation of a prototype application is
described. The app is built for a scenario in which a tourist
would like to receive context based recommendation on
events and activities in the city. Based on available data
this has been kept relatively simple: Can the user get an
overview of nearby indoor events when it is raining, or
events aimed at families with children. Or POI’s to visit
based on sites that have been visited by other tourists.
Lessons learned on collecting data



http://www.iamsterdam.com/en-GB/amsterdammarketing/about-amsterdam-marketing/faq
13
14
Thinking out of the box, not limited by the actual
limitations, the tourism sector offers an almost
unlimited range of opportunities for new ideas,
business models and applications.
There’s a lot of information on the Points of Interests
in Amsterdam, but it is quite a task to match, map and
combine all this content.
There’s hardly detailed structured data on Points of
Interest in Amsterdam on which a decision system
can be based. Information on whether a museum is

museums. After some deliberation, it was decided to also
look into the relative popularity.
interesting for kids, if it can be best visited in the
morning or the afternoon, in sunny times or rainy
days, etc. is not available. But even more ‘factual’ data
like number of visitors per year, accessibility, type of
collection, etc. is not available or very limited. If this
detailed structured data would be available, really
nice and advanced applications can be built.
All data can be simulated, but there’s nothing like the
real data.
An additional normalisation step was added with respect
to the total number of visitors. Recommendations taking
only these relative popularities into account showed
different venues, but sometimes promoted venues that
were hardly visited but perhaps coincidentally related. It
is unclear if this is mostly a result of the sample data, or if
it would also be present in the real raw data. As a middle
ground, the final recommendation takes into account the
top 25 popular venues and reorders these based on their
relative popularity.
A RECOMMENDATION SERVICE
A recommendation service for Amsterdam venues was
implemented for use with Amsterdam Marketing visitor
data. This visitor data can be interpreted as pairs of
venues that are both visited by the same tourist. Initially
the recommendation algorithm was targeted at
popularity and distinguishing between time periods, e.g.
the most visited venues within a week are the best
recommendations.
The result is a recommendation service that conforms to
knowledge about the behaviour of Amsterdam Card users.
PROCESS-LOGIC BACKEND DESIGN
The Process-logic backend contains a set of rules that
formulates queries according to the (user) preference
variables and external information, like the weather and
geo-locations.
In the absence of raw data, known relations between
venues were used to generate a sample dataset over a
three day time period. On this sample data the time
component had no real effect. As first result the
recommendations made sense, but the large difference in
absolute visits overly promoted a few well known
The current demo application has an initial version is
developed together with pre-processing available data
that enriches the event data (e.g. “for-children”, “indoor”
etc.).
15
DEMO: AMSTERTINDER APP
The AmsterTinder App demo is intended as an end-user
application for day trippers visiting Amsterdam. Based on
a simple user profile a recommendation for a day is given
out of 100 popular venues. The venues are enhanced with
open data about type of venue, location, and indented
visitors.



The prototype is built as a HTML5/javascript responsive
web-app in combination of a mysql/php back-end
webserver. The leaflet map environment is responsible
for the map and marker display.
The prototype makes use of the following services:


DBPedia for venue enrichment
Logic Module for filtering venues based on the
profile of the user
Recommender service, for recommending
other venues based on a selected venue.
Open Street Map for topographic map display
CitySDK for venue information
Screenshots
Fig 1: Preference panel, where the user can specify the
type of audience, transport, weather and time of day.
Fig 2: Suggestions returned by the Recommendation
Engine, which are sorted by relevance.
16
Fig. 3: Description of a selected venue
Fig. 4: list of enrichments in RDF triples provided by the
enrichment backend
17
Work to do for developing a real
application
BACKEND VISUALIZATION
In addition to the end-user frontend prototype, a
prototype backend dashboard has been developed by
UvA. The dashboard aims to determine ways to give high
level management insights into tourist movements,
events and activities in Amsterdam.



Technologies



OpenStreetMaps for map based visualization
JavaScript + Jquery + CSS for getting data from
services and performing basic presentation of data
D3.js (JavaScript library) for developing the
dashboard visualizations showing statistics.
Obtain aggregated information from ArtsHolland
end-point with a large temporal timeframe to be able
to show trends in the major events and tourist
statistics.
Obtain and integrate real-time data from the
IAmsterdam card application.
Integrate other real-time information sources such
as Twitter feeds.
Lessons learned



18
Start working with real data and real users as soon
as possible allowing short design/develop/evaluate
cycles.
There are many data sources with potential
relevance, many of them should be integrated and
related before correlations can be found and
conclusions drawn.
To get insight in the complex data streams involved
in tourist and event data interactive visualizations
are essential.
TRANSPARENCY
WORK PACKAGE 6
Arjan Nusselder (University of Amsterdam), Maarten Marx (University of Amsterdam)

We look at the use case Transparency and
Democracy through text that is produced by and
for politicians. The main data set consists of
written questions and answers from the
Amsterdam Municipality Council. To showcase
possibilities of structured data, Dutch
Parliamentary Proceedings are analysed as well.


Some meta data can be extracted, such as the
publication date (although they are not always
available).
There is no explicit text structure. It is
visible/readable for humans, but not for computers.
Many of the textual annotations will be done per
document.
Parliamentary proceedings
Our approach uses an automated text analysis to
enrich the content of the documents. Applications
of the use case, that further "open up" the data,
are built on top of these enriched political
documents.
Proceedings of parliament (Handelingen der Staten
Generaal) are digitally published since 1995. They contain
a (slightly redacted) transcript of the oral questions and
discussions in the House of Commons and Senate.
 The total size of the data is quite large.
 Documents are freely available online.
 Documents are available in XML format (and HTML,
PDF, ODT).
 A lot of meta data is readily available about the
meetings (date, session number, topic, etc).
 The data contains a lot of explicit structure. Text is
present in small chunks, all text is attributed to
speakers, etc.
 Many of the textual annotations will be done per
paragraph.
Looking at textual documents, one distinction we can
make is transparency within documents and transparency
between documents. Within documents, data can be made
more accessible by for instance facilitating search.
Between documents, data can be compared and linked to
other, new data sources.
These are both made possible by the text analysis, where
each document is summarised by its most relevant terms
and the detection of named entities therein.
DATA ENRICHMENT
STATUS OF THE DATA
Before the data is used in applications, it is processed
through several steps.
Before the data is processed, it is good to look at the
"openness" of the data. The two data sets differ in the way
they are available, and in the level at which the analysis is
consolidated.
1. Scraping / download data
Documents are published in many formats. For the
Amsterdam municipality data, the officially published
PDF files are downloaded.
2. Extract text
Municipality written questions
Many documents are easy to read and understand for
people, but not so much for computer tools. The textual
content of the (visually oriented) PDF documents is
extracted and stored separately.
The written questions are available from roughly 2010.
They contain questions and answers from the
municipality council.



3. Add structure
The total size of the data set is relatively small.
Documents are freely available online.
Documents are available in PDF format.
Known implicit structure can be made explicit. XML is a
useful structured data format that is used to this end. The
19
second data set of governmental proceedings is already
available as XML and downloaded as such.
Summaries can go further than standard word clouds. Our
document viewer for instance also lists entities, ordered
by count, with an internal link to the first occurrence. The
summary can thus serve as a document index on subjects.
We emphasize that all summaries are created fully
automatically. This is cost-effective and fast, but the
summaries may contain strange or even wrong terms for
the given document.
4. Text analysis
The text in the document is analysed word by word by
natural language processing software. Words and phrases
are tagged with their part of speech and corresponding
lemmas, Named Entities (typically proper names) are
detected, and distinctive terms (relative to other
documents) are determined.
Search with descriptions
5. Enrich data
After documents are made available as open data, a good
next step is to facilitate search within those documents.
Most search engines will show documents with a short
text snippet where the query term was found. The
relevance of a document might not be directly clear from
just a snippet.
Known terms and named entities are annotated with links
to Wikipedia. Named entities which are recognized as
locations are linked to their geo-coordinates. In the
proceedings, known politicians (and parties) are
explicitly identified as unique people.
6. Presentation
The structured, annotated data is made accessible online,
through search interfaces, document viewers and
visualizations showing several aggregates.
APPLICATIONS
We present five tools that were implemented using the
data sets. They are examples of many tools that can rather
easily be built on top of the machine-readable and
enriched data. These range from data exploration to fact
checking.
The document summaries can be used to create a short
indicator of the topic of a document. In the search results
shown here we present the summary terms in red (the file
name is given in green). Several roles of Amsterdam
Schiphol Airport can be distinguished using the summary
terms, such as an employment provider, a country border,
or a destination within bird colonies.
Document summaries
With the documents and analyses available, a first
application is to create and visualise a summary of each
document. A popular method for such visualisations is a
word cloud. The terms shown here are from a document
about local amateur football club AFC. Each term is
lemmatised, and only verbs, nouns and adjectives are
included. The sizes signify how representative the word is
for this document (compared to the other documents in
the municipality data set).
Link to discussed entities
The more structured governmental proceedings can
sometimes be quite long, even for a single topic. When
reading through such documents, it can be useful both to
have a quick overview of the current subject, and find
related information online.
20
The example shown displays a single paragraph of text,
with the detected named entities listed explicitly below.
Each entity links to Wikipedia for background
information.
From the structured proceedings data, we know several
things. First, the questions are present in the document as
text elements attributed to the specific person asking the
question. Second, the date of the meeting determines
when news articles are likely published. Finally, the
document summary describes the likely topic and key
political actors of the news articles.
Time line charts
Larger data sets, spanning a more significant length of
time, are often well suited for chronological time line
visualisations. One example shown here is a plot of the
number of occurrences of the entity Libya in the
proceedings of parliament from 2001-2012. A significant
spike is visible at the beginning of 2011, when an
important event took place that was widely discussed
internationally.
The information is combined to create a query to the EMM
service. In words, the queried news articles must contain
at least the name of the speaker who asked the question,
should be published around the date of the meeting, and
should contain at least one of the top ten distinctive terms.
The returned set of news articles is then further analysed
to group similar articles.
Time lines can be deceptive and require interpretation to
have meaning. They are an interesting possibility for
exploration however. The chart tool contains a direct link
to search for documents, containing the given entity
within the specific time range. It is easy to imagine a new
combined tool that shows for instance a summary of a set
of documents for a given time period.
Automated news queries
The last application combines several different
annotations and an external data source, for an
automated news query.
When questions are asked in the Lower House during a
“question hour”, related news articles are often published
by newspapers. A third party, The European Media
Monitor, collects news articles and has these available for
querying.
21