Abstract
Machine learning (ML) is mainly concerned with identifying patterns in data in order to predict and classify. Previously, we used linear regression and logistic regression to achieve this. However, we heard about them in the context of statistical learning (SL) since they are widely used for inference about the population and hypotheses. Typically, SL relies on assumptions such as normality, homoscedasticity, independent variables and others, whereas ML often ignores these. We continue with supervised learning (i.e. the response is known) approaches. Please be sure you have done the statistical learning chapter. Particularly, we will look at tree-based methods such as the decision tree and random forest. Then we will look at a nearest neighbour approach.