Dariusz Brzeziński Poznan University of Technology • • • • • • • KDD & ECML: How-To Large-scale processing Conformal prediction The science of annotation Other topics Applications Summary Trends in Data Mining and Machine Learning 2014 Trends in Data Mining and Machine Learning 2014 Trends in Data Mining and Machine Learning 2014 Trends in Data Mining and Machine Learning 2014 Data Mining Big Data Mining Big Data Science Trends in Data Mining and Machine Learning 2014 ”Big data mining is not about data mining per se” Jimmy Lin (Univ. Maryland, Twitter) • • • • • It’s a lot of mundane tasks Large-p and large-n Understanding and cleaning data Good integration over fancy algorithms Scaling up over new methods Trends in Data Mining and Machine Learning 2014 • Exact data reduction algorithm • Eliminate attributes without changing the final model Trends in Data Mining and Machine Learning 2014 • Key idea: fast Lasso estimation • • • • Some attributes will zero-out DPC: Dual Projection onto Convex Set Calculating the Dual Formulation of Lasso is difficult Providing a good estimate is easier Close underestimate of rejected attributes 𝜃∗ 𝜆 ∈ Θ Trends in Data Mining and Machine Learning 2014 • A similar technique can be used speed up SVM Compilation of papers available at: http://www.public.asu.edu/~jwang237/screening.html Lasso Screening Rules via Dual Polytope Projection, J. Wang, J. Zhou, P. Wonka, J. Ye. NIPS 2013. Safe Screening with Variational Inequalities and Its Application to Lasso, J. Liu, Z. Zhao, J. Wang, J. Ye, ICML 2014. Scaling SVM and Least Absolute Deviations via Exact Data Reduction, J. Wang, P. Wonka, J. Ye, ICML 2014. An Efficient Algorithm for Weak Hierarchical Lasso, Y. Liu, J. Wang, J. Ye, SIGKDD 2014. Trends in Data Mining and Machine Learning 2014 • An evaluation framework suitable for high risk applications • Confidence intervals for all possible outcomes • Based on randomness and hypothesis testing • Can be used with classifiers and regressors • Developed by Vovk, Shafer, and Gammerman Trends in Data Mining and Machine Learning 2014 • Online setting • Requires a non-conformity (strageness) measure • Prediction steps: – Given a sequence of labeled data S and a test object x – For all possible labels for y • Compute the non-conformal scores for each point in the sequence S {(x,y)} • Find Py – Include y in prediction region Γ 𝜀 (𝑆, 𝑥) iff Py > 𝜀 Trends in Data Mining and Machine Learning 2014 Strangeness for k-NN Strangeness for SVM • In an offline setting requires a calibration set – Inductive Conformal Prediction – Cross-conformal prediction – Bootstrap conformal prediction Trends in Data Mining and Machine Learning 2014 • Applications in sensing, medicine, biology, security, computer vision, civil engineering, and other fields • Extensions: – – – – – – Active Learning Model Selection Feature Selection Anomaly Detection Change Detection Quality Assessment Conformal Predictions for Reliable Machine Learning, V. N. Balasubramanian, S. Ho, V. Vovk, ECML 2014. Tutorial website with references: http://www.iith.ac.in/~vineethnb/cptutorial/index.html Tutorial slides: https://dl.dropboxusercontent.com/u/16632828/Conformal%20Prediciton%20Tutorial%20SlidesECML2014.pdf Trends in Data Mining and Machine Learning 2014 ”Algorithms last shorter than that what they work on” Eduard Hovy (Carnegie Mellon University) • Is annotation the most boring thing in the world? • Annotation must be: – Fast… to produce enough material – Consistent… enough to support learning – Deep… enough to be interesting • What is required: – Simple procedure – Several people – Attention to the source theory Trends in Data Mining and Machine Learning 2014 • Human annotation services – – – – Amazon Mechanical Turk Crowdflower ATLAS.TI QDAP • What UI? How complex task? • How many annotators? What price? Trends in Data Mining and Machine Learning 2014 • Seven questions of annotation – – – – – – – Selecting a corpus Instantiating the theory Designing the interface Selecting and training annotators Designing and managing the annotation procedure Validating results Delivering and mainataining the product Towards a ‘Science’ of Corpus Annotation: A New Methodological Challenge for Corpus Linguistics, E. Hivy, J. Lavid, Internation Journal of Translation, 22 (1), 2010. Toward a Science of Annotation, E. Hovy, MLSMA 2014 (Tutorial slides, I’ve got a copy) Trends in Data Mining and Machine Learning 2014 • • • • • Comparing machine learning and social science Predicting vs theory testing Correlation vs causation How to combine these worlds Great talk by Sendhil Mullainathan http://videolectures.net/kdd2014_mullainathan_machine_learning/ Trends in Data Mining and Machine Learning 2014 Box drawings • Regularization on number of boxes • Slow exact algorithm • Fast approximation via clustering Box Drawings for Learning with Imbalanced Data, S. T. Goh, C. Rudin, KDD 2014. Bayesian Decision Lists • Posterior distribution over possible decision lists • Focus on sparsity Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model, B. Letham, C. Rudin, Tech Report. Trends in Data Mining and Machine Learning 2014 Reject classification • Classifiers with reject option • Best reject classifier ≠ Best classifier with reject option Combination of One-Class Support Vector Machines for Classification with Reject Option, B. Hanczar, M. Sebag, ECML 2014. Pattern number estimation • Fast estimation of the expected number patterns based on min-sup • Can be used to create a chart based on a range of min-sups Fast estimation of the pattern frequency spectrum, M. Leeuwen, A. Ukkonen, ECML 2014. Trends in Data Mining and Machine Learning 2014 Etsy • Unique goods marketplace • Capturing aesthetic preferences • Topic modeling – Items as words – User favorites as documents – Styles as topics • Coherent styles without any image processing • Locality sensitive hashing for nearest neighbor search • Used for trend detection and recommendation Style in the Long Tail: Discovering Unique Interests with Latent Variable Models in Large Scale Social E-Commerce, D. Hu, R. Hall, J.Attenberg, KDD 2014. Trends in Data Mining and Machine Learning 2014 Call of duty • Game analytics – – – – – 4.6 billion hours played 6.5 trillion shots 227 billion grenades 386 billion kills 30 million players • Map and weapon balancing • ”Boosting” detection and learning • Feature engineering and process scaling (GBM) Machine Learning and Data Mining in Call of Duty, Arthur von Eschen, ECML 2014. (Great talk, no slides… the only summary I found is here: http://inside-bigdata.com/2014/05/30/data-science-activision/) Trends in Data Mining and Machine Learning 2014 Learning about meetings • Questions: – – – – Can we detect when key decision are made? Is their a pattern of interactions? How long will the meeting last? Do persuasive words exist? • Dataset: http://groups.inf.ed.ac.uk/ami/download/ Learning about meetings, B. Kim, C. Rudin, ECML 2014. Mobile application usage • Which app will the user start next? • Experiment with Amazon Mechanical Turk • Dataset: http://www.idiap.ch/dataset/mdc Conditional Log-linear Models for Mobile Application Usage Prediction, J. Kim, T. Mielikäinen, ECML 2014. Trends in Data Mining and Machine Learning 2014 Windflow • Aircrafts Aloft • Planes as sensors • Much better wind predictions + http://windflow.azurewebsites.net/ PROOF • Biomarker construction • Heart, lung and kidney failures • Quick prevention • Large-p-small-n problem https://www.cs.ubc.ca/~rng/researchprojects.html Trends in Data Mining and Machine Learning 2014 • • • • • • • • GiveDirectly DataKind.org SumAll.org UN Global Pulse NYC Analytics UNICEF Crisis Text Line DonorsChoose.org Trends in Data Mining and Machine Learning 2014 • • • • • • • • • Social Networks Graphical Models (Weisfeler-Lehman) MOOC Mining ADMM (Gradient decent optimization) Evaluation Active Learning Multilabel classification Deep learning Privacy Trends in Data Mining and Machine Learning 2014 • • • • • • Everything should be large and social Data mining is leaning towards data science Lots of preprocessing methods Sparse screening Conformal prediction Social applications http://videolectures.net/kdd2014_newyork/ Trends in Data Mining and Machine Learning 2014 Thank you! Trends in Data Mining and Machine Learning 2014
© Copyright 2024 ExpyDoc