• What is overfitting and why is it bad? Explain for a particular type of model of your choice how you can prevent overfitting. – model learns the training data and not the general concept – very good training performance, very bad test performance – model does not generalise – to prevent for kNN, increase the k (number of neighbours) • How you you train a decision tree from training data? Describe the techniques you would use and where they fit in. – recursive procedure – repeatedly split the training data into smaller subsets – split points are the nodes of the tree – one way of building trees is to use entropy and information gain – entropy measures the information content, in this context the additional information we need to make the classification – information gain is the difference in entropy before and after a split, i.e. how much information do we gain by being given the value for an attribute – at each node (for each subset of the data), compute entropy at current state and entropies given more information – compute information gain for each attribute based on that – choose the attribute with the highest information gain to split on • What is a linear regression model? What are the restrictions that make it linear? – linear regression model combines a set of coefficients with the features – for example a coefficient is multiplied with each feature value and the sum of those taken as the predicted value – coefficients must be linear to make the model linear (no restriction on the feature values) • You are given a dataset with 100 instances and two classes. One of the classes occurs in only 2 cases. You have the choice of training a 3-nearest neighbour classifier or a Bayesian model. Which one is likely to give better performance and why? – key thing here are the 2 cases vs. 3-nearest neighbour – this means that in order for the kNN classifier to choose the smaller class at all, both of the instances have to be in the neighbourhood – depending on the distribution of the instances, this may even be impossible 1 – Bayesian model does not have this restriction and is therefore likely to have better performance • Describe one meta-learning technique and how it improves the performance of a base learner or a set of base learners. – e.g. ensembles – train several models, let them vote on decision – if one of the classifiers has bad performance on a particular area of the feature space, the ensemble does not suffer as long as the performance of the other classifiers is good – idea is that a sufficient number of models will model the “real” underlying principle, even if not all of them manage to do so – we are “hedging our bets” • What is the most important parameter you have to choose when using nearest-neighbour classifiers? How would you determine the best value for this parameter on a given data set? – the number of neighbours to consider for classification – can determine by trying different values to see which one has the best performance • What are the differences between cross-validation and bootstrapping? – in cross-validation, a data set is divided into n folds of approximately equal size, training is done on n − 1 partitions and testing on the remaining partition for all possible combinations (n times) – in bootstrapping, data is sampled randomly (with replacement) from the whole set, the sampling determines train/test – CV is a more principled way of partitioning a data set, bootstrapping is random – in particular there is no guarantee that the partitions will be different when bootstrapping is done repeatedly – CV guarantees that every instance will have been used n − 1 times for training and once for testing at the end 2
© Copyright 2024 ExpyDoc