Last modified: 2021-02-25
Abstract
Players win or lose as a team; however individual players play important role in promoting teams and in advertisement. It is an important task to identify best players. This research attempts to use machine learning to predict a player level of importance following the CRISP (Cross-Industry Standard Process for Data Mining) method and WEKA (Waikato Environment for Knowledge Analysis) software. The research collects and cleans the 2018-19 regular season performance data from NBA.com and ESPN.com for 601 players. The features in data used for machine learning are games played, games started, minutes played, points scored, offensive rebounds, defensive rebounds, rebounds, assists, steals, blocks, turnovers, assists to turnover ratio, player efficiency and NBA’s real plus minus (RPM) statistic. The RPM statistic, computed by sport analysts, is used to classify player’s importance with values: High (H) with RPM of +1.51 and above; Average (A) with 0 to +1.50; Low (L) with -0.01 to -3; and Very Low (VL) with -3.1 and lower. Using 10-fold cross-validation machine learning technique, the Logistic, Artificial Neural Network, and Random Forest machine learning algorithms perform relatively well with a classification accuracy of 68%, 67%, and 66%, respectively; compared to 25% accuracy for a random guess. With more data, from other seasons, a better accuracy can be achieved. When new performance data come, the classifier models predict players’ importance level without the involvement of sport analysts. This allows Teams know who are the most important players and therefore, know who are their most marketable players.