tim.ald.red

Full stack software developer since 2025

Baseball Data

Baseball Data

January 2025
#Project
GitHub repo

A Python notebook created for my CODELancashire studies.

Named 'the best presentation by a student ever' by my tutor.

Tools

Python, Numpy, Pandas, Matplotlib, Jupyter

The mission

Take any dataset we wanted, use the Python we've learned to manipulate the data, then present it to the class.

What I did

  • Found a database of baseball statistics ranging from the 1800s to the modern day. (Source: Kaggle)
  • Cleaned up the data, fixing some columns that had mixed data types and removing records that I didn't feel contributed to the project - making it faster, and making for better quality analysis.
  • Used sorting, filtering and aggregating to find the most effective players in various categories across single seasons, eras and full careers.
  • Graphed the data to identify how trends have changed over time.
  • Invented a custom formula to identify my number one player of all time.
  • Added headings, commentary, photos and gifs to add clarity, explain my thoughts and make the presentation more viewer-friendly.

The outcome

Not only did I get a 10/10 score from my tutor Andre, but he said: "That was the best presentation by a student, like... ever."

(and also "It was almost good enough to make baseball seem interesting.")