Exam #1 study topics

CS 491/591 – Machine Learning – Spring 2014
Review Subjects for Exam #1
Be sure to review the following:
• LFD Lectures: 1–7, 9 (don’t worry about VC details)
Note that lecture videos + slides are available on LFD website
• LFD Text: chapters 1–3 (don’t worry about: VC details, bias-variance details)
• Carver Slides: ML Intro, Perceptrons, Prob & Stats, Concept Learning, Linear Models
• Homework #1 (PLA and Delta Rule)
Topics to be covered on Exam #1:
1. Introduction to machine learning
• What is machine learning?
• LFD: (1) a pattern exists, (2) cannot pin down mathematically,
(3) we have data on it.
• Key issue: function, etc. to be learned is unknown and unknowable, so how is it
possible to judge that one has learned well?
• Five characteristics used to “categorize” ML techniques: e.g., target format,
information/feedback, etc.
• Inductive learning: what it is (i.e., inducing function from examples).
• Supervised learning: labeled training data, need for teacher/expert.
• Supervised vs. unsupervised learning vs. reinforcement learning.
• Offline vs. online learning.
• Other “types” of learning: decision tree learning, neural network learning, etc.
2. Supervised/Inductive Learning Basics
• Inductive learning terminology: input set/space, training data, examples,
attributes/features, target function, hypothesis set/space, hypotheses, target function,
classification, regression, etc.
• Learning as search through hypothesis space.
• Training vs. testing, learning curves.
• Learning functions (y = f (x)) vs. distributions (p(y|x)) (noisy/stochastic data).
3. Perceptron Learning
•
•
•
•
•
What is the hypothesis space for perceptrons? (Hyperplanes)
Notion of linearly separable data/problems.
Perceptron Learning Algorithm: basics, issues.
Delta Rule: differences with PLA, advantages/disadvantages.
Gradient descent: basic approach, differentiability, termination.
4. Linear Learning Models
•
•
•
•
•
Classification vs. regression.
PLA and Pocket algorithm.
Linear regression: closed form solution.
Logistic regression (classification): sigmoid, MLE, gradient descent, probabilities.
Error measures: classification/binary error, squared error, cross-entropy error
(MLE)
• Maximum likelihood estimation: p(θ|D) = p(D|θ).
• Gradient descent: convex error functions.
• Using linear models for nonlinear functions: nonlinear transforms.
5. Computational Learning Theory
•
•
•
•
•
•
•
•
Is learning feasible? What is the problem? Why an issue?
Ein vs. Eout .
Two questions of learning: (1) Eout (g) ≈ Ein (g) and Ein (g) ≈ 0.
PAC learning: probably approximately correct.
Hoeffding’s Inequality: what it is, how applies to learning, importance.
Terminology: dichotomies, growth function, break point.
Break point leads to polynomial growth function (mH (N)).
VC dimension: what is, vs. break point.
• Vapnik-Chervonenkis (VC) Inequality: P [|Ein (g)−Eout (g)| > ǫ] ≤ 4mH (2N)e−
what it tells us about learning feasibility, training set size, etc.
• VC dimensions vs. degrees of freedom, VC dim. for perceptron, etc.
• Generalization bound (VC inequality in terms of ǫ and δ):
let δ be max
q prob. of bad hyp, i.e., P [|Ein (g) − Eout (g)| > ǫ] ≤ δ
then ǫ = N8 ln 4mHδ(2N )
• Trade-offs: Ein vs. Eout , model complexity vs. ability to generalization.
ǫ2 N
8
6. Concept Learning and Version Space Search
• What is concept learning and relation of CL to classification (same thing).
• Concept representations: constraint vectors vs. logic expressions,
? for don’t care (all OK).
• Learning as search through hypothesis space.
• Hypothesis space structured using general-to-specific ordering, so efficient search.
• Version space approach to represent subsets of hypothesis space: S and G.
• Inductive bias: limited representational power of hypotheses, trade-offs.
• Problems with CL algorithms: require consistency with data so noise/errors are
a problem.
,