Finding Syntactic Characteristics of Surinamese Dutch

Finding Syntactic Characteristics of
Surinamese Dutch
Erik Tjong Kim Sang
Meertens Institute
12 June 2014
Finding Syntactic Characteristics of Surinamese Dutch
12 June 2014
Goal
Evaluating automatic methods for finding syntactic differences
between Surinamese Dutch and standard Dutch
3
Finding Syntactic Characteristics of Surinamese Dutch
12 June 2014
Approach
Finding Syntactic Characteristics of Surinamese Dutch
12 June 2014
Technical details
1. Select two texts: one written in Surinamese Dutch and one written in
standard Dutch
2. Process the two texts with a syntactic parser for Dutch
3. Compare the two parse results: can we find frequent syntactic
constructions in Surinamese Dutch which are infrequent in standard
Dutch?
We used the syntactic parser Alpino: a state-of-the-art parser for Dutch
We collected the frequencies of the syntactic constructions and
compared these with the t-test:
√
(f1 − f2)/ f1 + f2
where f1 is the frequency of a construction in Surinamese Dutch and
f2 its frequency in standard Dutch
The higher the t-score, the more unusual the construction is
4
Finding Syntactic Characteristics of Surinamese Dutch
12 June 2014
Chosen text for Surinamese Dutch (from dbnl.nl)
5
Finding Syntactic Characteristics of Surinamese Dutch
12 June 2014
Chosen reference text (from dbnl.nl)
...Aan de voorkant van het prachtige grote huis waren
echter niet alle vensters meer gesloten. Op de eerste
verdieping was er een geopend raam en daar stond
de 17-jarige Elza en keek uit over het groene gazon
dat zich voor het huis uitstrekte tot aan de waterkant
waarlangs de brede Surinamerivier langzaam stroomde.
Een heerlijke ochtend, het begin van een fijne dag.
Vandaag 11 oktober 1765 ging de familie naar JodenSavanna voor de 65e verjaardag van grootmama. Dat
zou de volgende dag zijn, de 12e oktober en tegelijk ook
de verjaardag van de synagoge op Joden-Savanna...
...Als je ’em zo zijn straat zag binnen komen fietstrappen!
Krio, krio, krio..., niet om naar te luisteren! Volgens
zijn stuur ging hij naar links! Volgens zijn wiel reed
hij kaarsrecht rechtdoor. Maar Bo’s schuinse pet wees
weer een andere kant op.
Aaj, ik zag al sins hoelang! Meneertje slaat damp,
´ Na
stinkender dan koeiekak! Hij is dronken als een wat!
sakasaka ellendeling dati!
Mamsi joeg die kindertjes uit haar weg.
Ze had
voorop ’t erf onder die amandelboom urenlang staan
schoonmaken. Eerst zogenaamd erf bezemen. Daarna
die kindertjes gras laten trekken...
6
7
Finding Syntactic Characteristics of Surinamese Dutch
12 June 2014
Finding Syntactic Characteristics of Surinamese Dutch
Vocabulary results
Foreseen pitfalls
t-score
0.99944
0.99925
0.99921
0.99863
0.99839
0.99830
0.99778
0.99714
0.99653
0.99620
Differences between texts may have other causes than language
differences:
•
•
•
•
•
12 June 2014
author style
story genre
story topic
story time setting
...
The automatic parser may not detect interesting syntactic aspects
because it is developed to handle contemporary standard Dutch
f1
1796
1338
1270
728
621
586
450
349
287
262
f2
0
0
0
0
0
0
0
0
0
0
token
z’n
nie
d’r
Bo
fo
em
Mamsi
Gusta
Baas
wou
t-score
0.99600
0.99507
0.99502
0.99444
0.99401
0.99390
0.99296
0.99275
0.99180
0.99153
f1
249
202
200
179
166
163
141
137
121
117
f2
0
0
0
0
0
0
0
0
0
0
token
Willy
Laila
...!
dinges
Couplet
Aaj
baja
Faader
Coola
´
neks
9
8
Finding Syntactic Characteristics of Surinamese Dutch
12 June 2014
Syntactic rules
t-score
0.98780
0.98529
0.98113
0.98077
0.97872
0.97561
0.97500
0.97436
0.97297
0.97059
Steven sings sad songs
generates the syntactic analysis:
Steven
sad
songs
is a subject
is a object
is an object
of verb
of noun
of verb
12 June 2014
Syntactic results: Lemma Relation Lemma
We analyse sentences with dependency rules:
name
adjective
noun
Finding Syntactic Characteristics of Surinamese Dutch
sings
songs
sings
We also lookup the lemma of each word: the basic word form:
Steven → Steven; sings → sing; sad → sad; songs → song;
Lemma Relation Lemma
ding hd/det dat
soort hd/mod van
ma tag/nucl ben
erf hd/det zijn
erf hd/det dat
ander hd/det die
al cmp/mod ook
kind hd/det die
verkoop hd/obj1 erf
boom hd/det die
t-score
0.96875
0.96774
0.96774
0.96774
0.96774
0.96552
0.96000
0.96000
0.96000
0.95946
Lem Rel Lem
van hd/obj1 erf
oog hd/det je
kijk hd/mod daar
jongen hd/det die
hoofd hd/det je
ben hd/su erf
zeg hd/mod zo
in hd/obj1 me
broek hd/det zijn
ga hd/vc kom
11
10
Finding Syntactic Characteristics of Surinamese Dutch
12 June 2014
Finding Syntactic Characteristics of Surinamese Dutch
Example sentences
12 June 2014
Syntactic results: PoS Relation Lemma
t-score
0.98361
0.98361
0.97561
0.97500
0.97222
0.97143
0.96774
0.96698
0.96667
0.96552
Pattern: ga hd/vc kom
hij heb vermoeden dat die Bo ga kom
he has suspicion that that Bo goes come
Pattern: zeg hd/mod zo
Droomboek zeg zo, dus Vrouw Couplet ook.
Droomboek says so, so Mrs Couplet too.
PoS Relation Lemma
verb hd/su hond
verb hd/su erf
noun hd/det jullie
prep hd/obj1 soort
noun hd/mod schoon
verb hd/vc breek
verb hd/su boom
noun hd/mod daar
comp dlink/nucl met
verb nucl/tag vind
t-score
0.96429
0.96296
0.96296
0.96000
0.96000
0.96000
0.96000
0.95652
0.95455
0.95455
PoS Relation Lemma
prep hd/obj1 hoed
comp dlink/nucl dan
prep hd/obj1 broek
verb hd/predc baas
comp dp/dp ga
comp dlink/nucl laat
comp dlink/nucl dat
adv dp/dp met
prep hd/obj1 dood
comp dlink/nucl te
13
12
Finding Syntactic Characteristics of Surinamese Dutch
12 June 2014
Finding Syntactic Characteristics of Surinamese Dutch
Example sentence:
12 June 2014
Syntactic results: PoS Relation PoS
t-score
0.97378
0.96875
0.96689
0.95775
0.95652
0.95455
0.95288
0.94330
0.93333
0.92857
Pattern: comp dp/dp ga
want iemand van me familie ga kom!
because someone of my family goes coming!
14
PoS Relation PoS
comp dlink/nucl noun
adj hd/ld prep
det hd/mod noun
comp dlink/nucl prep
comp dp/dp det
prep nucl/tag tag
det hd/mod name
comp dlink/nucl adv
noun tag/nucl noun
pron hd/mod noun
t-score
0.92308
0.91667
0.91667
0.91096
0.90909
0.90909
0.90625
0.90566
0.89552
0.89157
PoS Relation PoS
pron dp/dp prep
tag tag/nucl det
adv rhd/body comp
comp dlink/nucl comp
pron dp/dp adv
adj hd/obj2 noun
prep hd/predc comp
comp nucl/tag tag
adj dp/dp adv
adv dp/dp noun
15
Finding Syntactic Characteristics of Surinamese Dutch
12 June 2014
Finding Syntactic Characteristics of Surinamese Dutch
Example sentence
12 June 2014
Syntactic results: Lemma Relation PoS
t-score
0.99495
0.99020
0.99000
0.98947
0.98592
0.98214
0.97778
0.97561
0.97297
0.96774
Pattern: adv rhd/body comp
waar dat ze staande loerde
where that she standingly peeked
Lemma Relation PoS
ma tag/nucl verb
dan dp/dp noun
zijn hd/mod noun
dan dp/dp verb
soort hd/mod prep
ma dp/dp verb
baas hd/app name
maar dlink/nucl noun
ma tag/nucl noun
ma dp/dp noun
t-score
0.96667
0.96667
0.96429
0.96429
0.96000
0.95833
0.95783
0.95652
0.95652
0.95652
Lemma Relation PoS
ma tag/nucl adv
dan hd/mod comp
zie dp/dp noun
hoor dp/dp noun
kijk dp/dp noun
want dlink/nucl noun
en dlink/nucl noun
Dan dp/dp verb
ma tag/nucl adj
zijn hd/mod name
17
16
Finding Syntactic Characteristics of Surinamese Dutch
12 June 2014
Example sentence
Finding Syntactic Characteristics of Surinamese Dutch
12 June 2014
Concluding remarks and future work
Pattern: dan dp/dp verb
We have shown that automatic methods can assist in finding syntactic
differences between texts
dan kijk hoe ze wegmanoevreert
then look how she leaves
A paper about this work can be found on http://ifarm.nl/erikt/papers
The techniques described in this talk will be build in into the Nederlab
framework
18
THE END
19