Sample Exam Solutions (part 2)

• What is overfitting and why is it bad? Explain for a particular type of
model of your choice how you can prevent overfitting.
– model learns the training data and not the general concept
– very good training performance, very bad test performance
– model does not generalise
– to prevent for kNN, increase the k (number of neighbours)
• How you you train a decision tree from training data? Describe the techniques you would use and where they fit in.
– recursive procedure
– repeatedly split the training data into smaller subsets
– split points are the nodes of the tree
– one way of building trees is to use entropy and information gain
– entropy measures the information content, in this context the additional information we need to make the classification
– information gain is the difference in entropy before and after a split,
i.e. how much information do we gain by being given the value for
an attribute
– at each node (for each subset of the data), compute entropy at current
state and entropies given more information
– compute information gain for each attribute based on that
– choose the attribute with the highest information gain to split on
• What is a linear regression model? What are the restrictions that make
it linear?
– linear regression model combines a set of coefficients with the features
– for example a coefficient is multiplied with each feature value and the
sum of those taken as the predicted value
– coefficients must be linear to make the model linear (no restriction
on the feature values)
• You are given a dataset with 100 instances and two classes. One of the
classes occurs in only 2 cases. You have the choice of training a 3-nearest
neighbour classifier or a Bayesian model. Which one is likely to give better
performance and why?
– key thing here are the 2 cases vs. 3-nearest neighbour
– this means that in order for the kNN classifier to choose the smaller
class at all, both of the instances have to be in the neighbourhood
– depending on the distribution of the instances, this may even be
impossible
1
– Bayesian model does not have this restriction and is therefore likely
to have better performance
• Describe one meta-learning technique and how it improves the performance of a base learner or a set of base learners.
– e.g. ensembles – train several models, let them vote on decision
– if one of the classifiers has bad performance on a particular area of the
feature space, the ensemble does not suffer as long as the performance
of the other classifiers is good
– idea is that a sufficient number of models will model the “real” underlying principle, even if not all of them manage to do so
– we are “hedging our bets”
• What is the most important parameter you have to choose when using
nearest-neighbour classifiers? How would you determine the best value
for this parameter on a given data set?
– the number of neighbours to consider for classification
– can determine by trying different values to see which one has the best
performance
• What are the differences between cross-validation and bootstrapping?
– in cross-validation, a data set is divided into n folds of approximately
equal size, training is done on n − 1 partitions and testing on the
remaining partition for all possible combinations (n times)
– in bootstrapping, data is sampled randomly (with replacement) from
the whole set, the sampling determines train/test
– CV is a more principled way of partitioning a data set, bootstrapping
is random
– in particular there is no guarantee that the partitions will be different
when bootstrapping is done repeatedly
– CV guarantees that every instance will have been used n − 1 times
for training and once for testing at the end
2