CLASSIFICATION OF OUTCOMES OF ENGLISH PREMIER LEAGUE MATCHES USING MACHINE LEARNING MODELS

Abstract
Football remains an important sport in the world and it has a lot of followers. Researchers are often interested in the analysis of the results of football matches, which helps in the prediction or classification of outcomes (results) of football matches based on some variables. Most of the available models of prediction and classification of outcomes are based on a selected variable or a large number of variables. The use of a few variables can not predict accurately and the use of large variables leads to the problem of interpretation (Parsimony). This work used feature selection methods to reduce sixteen selected independent variables (football related) to six variables in the classification of the outcome variable (home win, away win, and draw) of five seasons of English premier league matches. As expected, a home win is a modal observation in all five seasons. The Kruskal Wallis test showed that the median outcome was not the same for the five seasons, while four machine learning models classified the outcome using the six best variables recommended via the feature selection. Furthermore, the result of the first half and second half was used to classify the final outcome. Five performance metrics attest that the ML models are good in the classification. Cross-Validation ensured that the issues of over-fitting were adequately addressed. Bookmakers may find this research interesting as some variables were identified as key to the classification of outcomes of football matches.
Description
Keywords
Citation