Predict WAR with Classic Stats

Full Article

Abstract

Wins Above Replacement (WAR) is a widely used metric in baseball. However, it is too complex to calculate. Therefore, we built a position player’s WAR predicting model only using 12 easily accessible classic stats. The final model is an ensemble stacking model, which consists of two best-performing models, random forest and XGBoost. The two base model was chosen among 8 different algorithms, by hyperparameters random searching. The final model has a strong performance not only in predicting WAR of the season, but also another seasons. By using the model and machine learning methodology we used, there will be an innovative change in baseball.

Structure of Final Model

References

Slowinski, P. What is WAR?, https://library.fangraphs.com/misc/war/ (2010).
Slowinski, P. Replacement Level, https://library.fangraphs.com/misc/war/replacement-level/(2010).
Slowinski, P. WAR for Position Players, https://library.fangraphs.com/war/war-position-players/ (2010).
Hamilton, M. et al. in Proceedings of the 3rd International Conference on Pattern Recognition Applications and Methods 520-527 (2014).
Koseler, K. & Stephan, M. Machine Learning Applications in Baseball: A Systematic Literature Review. Applied Artificial Intelligence 31, 745-763 (2018). https://doi.org/10.1080/08839514.2018.1442991
Huang, M.-L. & Li, Y.-Z. Use of Machine Learning and Deep Learning to Predict the Outcomes of Major League Baseball Matches. Applied Sciences 11 (2021). https://doi.org/10.3390/app11104499
Yaseen, A. S., Marhoon, A. F. & Saleem, S. A. Multimodal Machine Learning for Major League Baseball Playoff Prediction. Informatica 46 (2022). https://doi.org/10.31449/inf.v46i6.3864
Ishii, T. Using Machine Learning Algorithms to Identify Undervalued Baseball Players.(2016).
developers, s.-l. SVR, https://scikit-learn.org/1.5/modules/generated/sklearn.svm.SVR.html (2025).
Louppe, G. & Geurts, P. in Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2012, Bristol, UK, September 24-28, 2012. Proceedings, Part I 23. 346-361 (Springer).
Géron, A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 3rd Edition.(O’Reilly Media, Inc, 2022).
Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 119-139 (1997).
Breiman, L. Arcing the edge. (Citeseer, 1997).
Friedman, J. H. Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232 (2001).
Chen, T. & Guestrin, C. in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (2016).
Wade, C. Hands-On Gradient Boosting with XGBoost and scikit-learn. (Packt Publishing, 2020).
Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30 (2017).
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems 31 (2018).
Dorogush, A. V., Ershov, V. & Gulin, A. CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv 1810 (2018).

Twitter Facebook LinkedIn

Predict WAR with Classic Stats

HJ

Full Article

Abstract

Structure of Final Model

References

공유하기

댓글남기기

참고

[Articles] Highly accurate protein structure prediction with AlphaFold

[Articles] A universal SNP and small-indel variant caller using deep neural networks

[Articles] Learning Transferable Visual Models From Natural Language Supervision

[Articles] Mastering the game of Go with deep neural networks and tree search