Natural Language Processing for Internet Security: the AMiCA project V. Hoste, W. Daelemans, G. De Pauw, E. Lefever, B. Desmet, S. Schulz, B. Verhoeven & C. Van Hee RaKonale Project overview • Young people spend a lot of Kme online • Online environments are not without risks • Unfeasible for stakeholders to keep track of potenKally harmful situaKons • ProtecKon: detect and curate threats AMiCA Goals Dataflow management Context mining & analysis ValidaKon: 3 use cases AutomuKlaKon & suicidal behavior Cross-‐media analysis • DetecKon and filtering of unwanted and illegal online content • Cross-‐media analysis (text, image, video) • Context and profile analysis • Aggregated data => quanKtaKve informaKon on risk incidence • Embedded monitoring and privacy by design Text analyKcs Issues and risks of social media use AMiCA kernel PlaTorm Urgent demand for automaKc monitoring Development Core technologies Manual monitoring infeasible because of informaKon overload Grounding Text AnalyKcs Transgressive sexual behavior Image Processing & Audio Mining Cyberbullying NormalisaKon Original Normalized • Translate noisy language into its canonical form • Approaches: spelling correcKon, machine translaKon, hey sarahke Ks al lang gelde dak hey sarahke het is al lang geleden hier ng op ben geweest ma hey dat ik hier nog op ben geweest G2P2G, classificaKon, … bffl eh ;) maar hey best friends for life he ;) Profiling Deep text analyKcs • AutomaKc extracKon of informaKon about the author of a text: idenKty, gender, age, educaKonal level, personality, etc. • Challenges: single out feature types and discriminaKve methods that are able to efficiently deal with large author set sizes, small data sizes, and a variety of topics and genres • Text analysis pipeline that automaKcally analyzes text up to the level of discourse • Modules that deal with non-‐proposiKonal aspects of meaning (e.g. modality, negaKon) , necessary for filtering and mining social media • Script: temporal sequence of event frames with different roles (parKcipants, acKon, locaKon, Kme, …) • Script detecKon through an ensemble of classifiers trained on the detecKon of parKcipant features and their interacKons Transgressive sexual behaviour: script with series of event frames in which parKcipants (minor, adult) experience a number of “grooming” steps Cyberbullying: script with series of event frames in which parKcipants (bully, bystander, vicKm) experience a number of interacKons Frame-‐based detecKon with the support of
© Copyright 2024 ExpyDoc