2. Übung - Hu

Informationsintegration
Übung 2
SS 2016
Yvonne Lichtblau
Wettbewerb
Übung 1
Yvonne Lichtblau – Informationsintegration SS 2016
2
Top Verdiener Sport 2014
Top Verdiener Sport 2014
10
9
8
Floyd Mayweather
Manny Paquiao
Christiao Roaldo
Lyonel Messy
Roger Federer
LeBron James
Kevin Durant
Phil Michelson
Tiger Woods
Kobe Bryant
7
6
5
4
3
2
US
PH
PT
AR
CH
US
US
US
US
US
1
0
Gruppe1
Gruppe2
Gruppe3
Gruppe4
Gruppe5
Gruppe6
Yvonne Lichtblau – Informationsintegration SS 2016
Gruppe8
3
5
7
5
3
7
6
6
6
7
7
Medaillengewinner Frauen Sport 200m Rücken
Medaillengewinner Frauen Sport 200m Rücken
10
9
8
Diana Mocanu
Roxana Mărăcineanu
Miki Nakao
Kirsty Coventry
Stanislawa Komarowa
Antje Buscschulte
Reiko Nakamura
Margaret Hoelzer
Krisztina Egerszegi
Elaine Tanner
7
6
5
4
3
2
RO
RO
JP
ZW
RU
DE
JP
US
HU
CA
1
0
Gruppe1
Gruppe2
Gruppe3
Gruppe4
Gruppe5
Gruppe6
Yvonne Lichtblau – Informationsintegration SS 2016
Gruppe8
4
4
5
7
7
3
6
7
6
7
5
Top Fußballer Instagram
Top Fußballer Instagram
10
9
Cristiano Ronaldo
Neymar Junior
Lionel Messi
David Beckham
James Rodriguez
Gareth Bale
David Luiz
Zlatan Ibrahimovic
Ronaldinho
Luis Suárez
8
7
6
5
4
3
2
PT
BR
AR
GB
CO
GB
BR
SE
BR
UY
1
0
Gruppe1
Gruppe2
Gruppe3
Gruppe4
Gruppe5
Gruppe6
Gruppe8
UK = GB
Yvonne Lichtblau – Informationsintegration SS 2016
5
6
5
6
5
5
4
6
6
7
6
Formel 1 nach Anzahl Titeln
Formel 1 nach Anzahl Titeln
10
9
8
Micha Schumacher
Juan Manuel Fangio
Alain Prost
Sebastian Vettel
Jack Brabham
Jackie Stewart
Niki Lauda
Nelson Piquet
Ayrton Senna
Lewis Hamilton
7
6
5
4
3
2
1
DE
AR
FR
DE
AU
GB
AT
BR
BR
GB
0
Gruppe1
Gruppe2
Gruppe3
Gruppe4
Gruppe5
Gruppe6
Yvonne Lichtblau – Informationsintegration SS 2016
Gruppe8
6
5
6
6
7
5
4
7
7
6
3
Top 10 Athleten Frauen 2009
Top 10 Athleten Frauen 2009
10
9
8
Serena Williams
Annika Sörenstam
Yelena Isinbayeva
Justine Henin
Lisa Leslie
Venus Williams
Marta
Lorena Ochoa
Diana Taurasi
Kerri Walsh
7
6
5
4
3
2
1
US
SE
RU
BE
US
US
BR
MX
US
US
0
Gruppe1
Gruppe2
Gruppe3
Gruppe4
Gruppe5
Gruppe6
Yvonne Lichtblau – Informationsintegration SS 2016
Gruppe8
7
6
6
4
6
7
5
3
5
6
5
Triathleten mit meisten Olympiasiegen
Triathleten mit meisten Olympiasiegen
10
9
8
Bevan Docherty
Stephan Vuckovic
Jan Frodeno
Simon Whitfield
Sven Riederer
Jan Rehula
Hamish Carter
Eneko Llanos
Andreas Raelert
Pete Jacobs
7
6
5
4
3
2
1
NZ
DE
DE
CA
CH
CZ
NZ
ES
DE
AU
0
Gruppe1
Gruppe2
Gruppe3
Gruppe4
Gruppe5
Gruppe6
Yvonne Lichtblau – Informationsintegration SS 2016
Gruppe8
8
6
5
7
4
4
3
7
5
6
3
Rallye Mexico 2016
Rallye Mexico 2016
10
9
8
Seebastien Ogier
Jari Matti Latvala
Thierry Neuville
Dani Sordo
Mads Ostberg
Eric Camilli
Andreas Mikkelsen
Ott Tänak
Benito Guerra jr.
Hayden Paddon
7
6
5
4
3
2
1
FR
FI
BE
ES
NO
FR
NO
EE
MX
NZ
0
Gruppe1
Gruppe2
Gruppe3
Gruppe4
Gruppe5
Gruppe6
Yvonne Lichtblau – Informationsintegration SS 2016
Gruppe8
9
3
6
3
4
3
2
3
5
2
4
World Cup Dressur Gewinner
World Cup Dressur Gewinner
10
9
8
Valentina Truppa
IT
Tinne Vilhelmson-Silfv SE
Anky van Grunsven NL
Isabell Werth
DE
Steffen Peters
US
Edward Gal
NL
Adelinde Cornelissen NL
Helen Langehanenberg DE
Charlotte Dujardin
GB
Peter Minderhoud
NL
7
6
5
4
3
2
1
0
Gruppe1
Gruppe2
Gruppe3
Gruppe4
Gruppe5
Gruppe6
Yvonne Lichtblau – Informationsintegration SS 2016
Gruppe8
10
3
4
2
5
2
3
4
4
3
1
Segeln Olympia 2012
Segeln Olympia 2012
10
9
8
Tom Slingsby
Pavlos Kontides
Rasmus Myrgrend
Tonči Stipanović
Andrew Murdoch
Simon Grotelöschen
Alejandro Foglia
Juan Ignacio Maegli
Toni Wilhelm
Richard Stauffacher
7
6
5
4
3
2
AU
CY
SE
HR
NZ
DE
UY
GT
DE
CH
1
0
Gruppe1
Gruppe2
Gruppe3
Gruppe4
Gruppe5
Gruppe6
Yvonne Lichtblau – Informationsintegration SS 2016
Gruppe8
11
3
2
4
2
1
4
6
2
4
3
Erfolgreiche E-Sportler
Erfolgreiche E-Sportler
10
9
8
Saahil Arora
Sumail Hassan
Jiao Wang
Zhang Pan
Clement Ivanov
Gustav Magnusson
Danil Ishutin
Kuro Takhasomi
Lee Jae-dong
Manuel Schenkhuizen
7
6
5
4
3
2
1
US
PK
CN
CN
EE
SE
UA
DE
KR
NL
0
Gruppe1
Gruppe2
Gruppe3
Gruppe4
Gruppe5
Gruppe6
Yvonne Lichtblau – Informationsintegration SS 2016
Gruppe8
12
2
2
4
2
1
3
0
4
3
2
Wettbewerb - Korrektheit
Gesamtergebnis
100
90
80
70
60
50
40
30
20
10
0
Gruppe1
Gruppe2
Gruppe3
Gruppe4
Gruppe5
Gruppe6
Gruppe8
1. Platz: Gruppe8 (93%)
2. Platz: Gruppe1 (78%)
3. Platz: Gruppe4 (73%)
Yvonne Lichtblau – Informationsintegration SS 2016
13
Wettbewerb - Geschwindigkeit
benötigte Zeit für 100 Personen
140
120
Zeit in sec
100
80
60
40
20
0
Gruppe 1
Gruppe2
Gruppe3
Gruppe4
Gruppe5
Gruppe6
Gruppe8
1. Platz: Gruppe1 (6.4s)
2. Platz: Gruppe2 (16,6s)
3. Platz: Gruppe5 (43,4s)
Yvonne Lichtblau – Informationsintegration SS 2016
14
Vorstellung
Lösungen
Übung 1
Yvonne Lichtblau – Informationsintegration SS 2016
15
Übung 2
Database Setup/
Basic Queries
Yvonne Lichtblau – Informationsintegration SS 2016
16
Background
We start with genes, their location,
and their function
Types of information:
●
Genes: Have a taxon ID
(organism), have an ID, have a
preferred name (symbol), have
multiple other names (synonyms),
have multiple functional
annotations, have a connected
protein (with a protein_id and a
protein_version_id), have a status,
are on a chromosome, have a start
and end position, and a
chromosomal location
Yvonne Lichtblau – Informationsintegration SS 2016
http://thinnergene.com
17
Background
●
●
Genes function: Are described by a taxonomy of terms which forms a
DAG; each term has an ID, a name, a description, and can be
annotated to multiple genes (Gene Ontology)
Gene – function relationship:
Has an evidence code
Yvonne Lichtblau – Informationsintegration SS 2016
18
Task 1: Create the Schema
●
●
Create a relational model for the information described
on the previous page
Implement this model in the Postgres database
●
●
One account per group, access information will be sent
Login from gruenau2:
psql ­h delphi ­U gruppeX ­p 5432 ●
Datataypes: Look at the data
●
●
Be conservative
Length: Guess a reasonable number which leaves some buffer
Yvonne Lichtblau – Informationsintegration SS 2016
19
Task 2: Fill the schema (partly)
●
From the web page download the following files:
●
●
●
●
●
●
gene2refseq (see slide 17)
[tax_id, geneID, status, protein_accession, protein_accession_version,
start, end]
gene_info (see slide 17)
[tax_id, geneID, symbol, synonyms, chromosome, map location]
gene2go (see slide 18)
[tax_id, geneID, goID, evidence]
README
Import the given attributes into your schema
● This requires some processing:
● gene2refseq and gene_info should be merged
● gene_info.synonyms should be normalized
The DAG structure of the GO will become relevant in
assignment 3, but is not considered in this exercise!!
Yvonne Lichtblau – Informationsintegration SS 2016
20
Task 3: Some queries
●
Formulate and answer the following queries:
●
How many genes does your gene table have? (13,644,002)
●
How many relationships between genes and a GO term are there?
●
How many distinct GO terms are annotated to at least one gene?
●
How many gene names are present (symbol or snyonym)?
●
How many synonyms are assigned to more than one gene?
Yvonne Lichtblau – Informationsintegration SS 2016
21
Submit
●
●
●
●
By Monday, 23.05.2016, 23:59 pm
Send by mail to: [email protected]
(questions are also welcome!)
Submit:
● Create Table statements
● Relational schema as image
(tables as boxes, foreign key – primary key links as arcs)
● Queries and answers of task 3
● Notes of your workflow as PDF
(preprocessing and import of tables)
Criteria for passing the exercise:
Submit everything from the list and
correct answer of each query.
Yvonne Lichtblau – Informationsintegration SS 2016
22