Finding Syntactic Characteristics of Surinamese Dutch Erik Tjong Kim Sang Meertens Institute 12 June 2014 Finding Syntactic Characteristics of Surinamese Dutch 12 June 2014 Goal Evaluating automatic methods for finding syntactic differences between Surinamese Dutch and standard Dutch 3 Finding Syntactic Characteristics of Surinamese Dutch 12 June 2014 Approach Finding Syntactic Characteristics of Surinamese Dutch 12 June 2014 Technical details 1. Select two texts: one written in Surinamese Dutch and one written in standard Dutch 2. Process the two texts with a syntactic parser for Dutch 3. Compare the two parse results: can we find frequent syntactic constructions in Surinamese Dutch which are infrequent in standard Dutch? We used the syntactic parser Alpino: a state-of-the-art parser for Dutch We collected the frequencies of the syntactic constructions and compared these with the t-test: √ (f1 − f2)/ f1 + f2 where f1 is the frequency of a construction in Surinamese Dutch and f2 its frequency in standard Dutch The higher the t-score, the more unusual the construction is 4 Finding Syntactic Characteristics of Surinamese Dutch 12 June 2014 Chosen text for Surinamese Dutch (from dbnl.nl) 5 Finding Syntactic Characteristics of Surinamese Dutch 12 June 2014 Chosen reference text (from dbnl.nl) ...Aan de voorkant van het prachtige grote huis waren echter niet alle vensters meer gesloten. Op de eerste verdieping was er een geopend raam en daar stond de 17-jarige Elza en keek uit over het groene gazon dat zich voor het huis uitstrekte tot aan de waterkant waarlangs de brede Surinamerivier langzaam stroomde. Een heerlijke ochtend, het begin van een fijne dag. Vandaag 11 oktober 1765 ging de familie naar JodenSavanna voor de 65e verjaardag van grootmama. Dat zou de volgende dag zijn, de 12e oktober en tegelijk ook de verjaardag van de synagoge op Joden-Savanna... ...Als je ’em zo zijn straat zag binnen komen fietstrappen! Krio, krio, krio..., niet om naar te luisteren! Volgens zijn stuur ging hij naar links! Volgens zijn wiel reed hij kaarsrecht rechtdoor. Maar Bo’s schuinse pet wees weer een andere kant op. Aaj, ik zag al sins hoelang! Meneertje slaat damp, ´ Na stinkender dan koeiekak! Hij is dronken als een wat! sakasaka ellendeling dati! Mamsi joeg die kindertjes uit haar weg. Ze had voorop ’t erf onder die amandelboom urenlang staan schoonmaken. Eerst zogenaamd erf bezemen. Daarna die kindertjes gras laten trekken... 6 7 Finding Syntactic Characteristics of Surinamese Dutch 12 June 2014 Finding Syntactic Characteristics of Surinamese Dutch Vocabulary results Foreseen pitfalls t-score 0.99944 0.99925 0.99921 0.99863 0.99839 0.99830 0.99778 0.99714 0.99653 0.99620 Differences between texts may have other causes than language differences: • • • • • 12 June 2014 author style story genre story topic story time setting ... The automatic parser may not detect interesting syntactic aspects because it is developed to handle contemporary standard Dutch f1 1796 1338 1270 728 621 586 450 349 287 262 f2 0 0 0 0 0 0 0 0 0 0 token z’n nie d’r Bo fo em Mamsi Gusta Baas wou t-score 0.99600 0.99507 0.99502 0.99444 0.99401 0.99390 0.99296 0.99275 0.99180 0.99153 f1 249 202 200 179 166 163 141 137 121 117 f2 0 0 0 0 0 0 0 0 0 0 token Willy Laila ...! dinges Couplet Aaj baja Faader Coola ´ neks 9 8 Finding Syntactic Characteristics of Surinamese Dutch 12 June 2014 Syntactic rules t-score 0.98780 0.98529 0.98113 0.98077 0.97872 0.97561 0.97500 0.97436 0.97297 0.97059 Steven sings sad songs generates the syntactic analysis: Steven sad songs is a subject is a object is an object of verb of noun of verb 12 June 2014 Syntactic results: Lemma Relation Lemma We analyse sentences with dependency rules: name adjective noun Finding Syntactic Characteristics of Surinamese Dutch sings songs sings We also lookup the lemma of each word: the basic word form: Steven → Steven; sings → sing; sad → sad; songs → song; Lemma Relation Lemma ding hd/det dat soort hd/mod van ma tag/nucl ben erf hd/det zijn erf hd/det dat ander hd/det die al cmp/mod ook kind hd/det die verkoop hd/obj1 erf boom hd/det die t-score 0.96875 0.96774 0.96774 0.96774 0.96774 0.96552 0.96000 0.96000 0.96000 0.95946 Lem Rel Lem van hd/obj1 erf oog hd/det je kijk hd/mod daar jongen hd/det die hoofd hd/det je ben hd/su erf zeg hd/mod zo in hd/obj1 me broek hd/det zijn ga hd/vc kom 11 10 Finding Syntactic Characteristics of Surinamese Dutch 12 June 2014 Finding Syntactic Characteristics of Surinamese Dutch Example sentences 12 June 2014 Syntactic results: PoS Relation Lemma t-score 0.98361 0.98361 0.97561 0.97500 0.97222 0.97143 0.96774 0.96698 0.96667 0.96552 Pattern: ga hd/vc kom hij heb vermoeden dat die Bo ga kom he has suspicion that that Bo goes come Pattern: zeg hd/mod zo Droomboek zeg zo, dus Vrouw Couplet ook. Droomboek says so, so Mrs Couplet too. PoS Relation Lemma verb hd/su hond verb hd/su erf noun hd/det jullie prep hd/obj1 soort noun hd/mod schoon verb hd/vc breek verb hd/su boom noun hd/mod daar comp dlink/nucl met verb nucl/tag vind t-score 0.96429 0.96296 0.96296 0.96000 0.96000 0.96000 0.96000 0.95652 0.95455 0.95455 PoS Relation Lemma prep hd/obj1 hoed comp dlink/nucl dan prep hd/obj1 broek verb hd/predc baas comp dp/dp ga comp dlink/nucl laat comp dlink/nucl dat adv dp/dp met prep hd/obj1 dood comp dlink/nucl te 13 12 Finding Syntactic Characteristics of Surinamese Dutch 12 June 2014 Finding Syntactic Characteristics of Surinamese Dutch Example sentence: 12 June 2014 Syntactic results: PoS Relation PoS t-score 0.97378 0.96875 0.96689 0.95775 0.95652 0.95455 0.95288 0.94330 0.93333 0.92857 Pattern: comp dp/dp ga want iemand van me familie ga kom! because someone of my family goes coming! 14 PoS Relation PoS comp dlink/nucl noun adj hd/ld prep det hd/mod noun comp dlink/nucl prep comp dp/dp det prep nucl/tag tag det hd/mod name comp dlink/nucl adv noun tag/nucl noun pron hd/mod noun t-score 0.92308 0.91667 0.91667 0.91096 0.90909 0.90909 0.90625 0.90566 0.89552 0.89157 PoS Relation PoS pron dp/dp prep tag tag/nucl det adv rhd/body comp comp dlink/nucl comp pron dp/dp adv adj hd/obj2 noun prep hd/predc comp comp nucl/tag tag adj dp/dp adv adv dp/dp noun 15 Finding Syntactic Characteristics of Surinamese Dutch 12 June 2014 Finding Syntactic Characteristics of Surinamese Dutch Example sentence 12 June 2014 Syntactic results: Lemma Relation PoS t-score 0.99495 0.99020 0.99000 0.98947 0.98592 0.98214 0.97778 0.97561 0.97297 0.96774 Pattern: adv rhd/body comp waar dat ze staande loerde where that she standingly peeked Lemma Relation PoS ma tag/nucl verb dan dp/dp noun zijn hd/mod noun dan dp/dp verb soort hd/mod prep ma dp/dp verb baas hd/app name maar dlink/nucl noun ma tag/nucl noun ma dp/dp noun t-score 0.96667 0.96667 0.96429 0.96429 0.96000 0.95833 0.95783 0.95652 0.95652 0.95652 Lemma Relation PoS ma tag/nucl adv dan hd/mod comp zie dp/dp noun hoor dp/dp noun kijk dp/dp noun want dlink/nucl noun en dlink/nucl noun Dan dp/dp verb ma tag/nucl adj zijn hd/mod name 17 16 Finding Syntactic Characteristics of Surinamese Dutch 12 June 2014 Example sentence Finding Syntactic Characteristics of Surinamese Dutch 12 June 2014 Concluding remarks and future work Pattern: dan dp/dp verb We have shown that automatic methods can assist in finding syntactic differences between texts dan kijk hoe ze wegmanoevreert then look how she leaves A paper about this work can be found on http://ifarm.nl/erikt/papers The techniques described in this talk will be build in into the Nederlab framework 18 THE END 19
© Copyright 2024 ExpyDoc