Catchers Get Bigger

I’m fairly confident all Major League Baseball players have gotten bigger over time, but I specifically decided to use the newest version of the Lahman Baseball Database to look at the average weight of catchers by the decade in which they debuted. Their listed weights are static so we can’t be certain what their debut weights were, but we’re looking at large trends. I also required any catcher in the list to have caught at least 200 career games.

Distributions of MLB catcher weights by debut decade, with averages and number of players inset.

We can also look only at the average weights per decade to get a clearer sense of the overall trend.

Average weights of MLB catchers by debut decade.

There’s a pronounced increase in the 1940s and again in the 1990s through 2000s, the latter of which being when players started eating balanced breakfasts.

Technical Details

I first ran this query in the Lahman Database loaded on my computer.

WITH
	"catchers" AS (
		SELECT
			People.playerID,
			People.nameFirst,
			People.nameLast,
			MAX(People.weight) AS "weight",
			SUM(Appearances.G_c) AS "gamesCaught",
			SUBSTRING(People.debut, 1, 4) AS "debutYear"
		FROM
			People
			LEFT JOIN Appearances ON Appearances.playerID = People.playerID
		WHERE
			Appearances.G_c >= 10
			AND People.weight > 0
		GROUP BY
			People.playerID,
			People.nameFirst,
			People.nameLast
		ORDER BY
			weight
	)
SELECT
	"playerID",
	"nameFirst",
	"nameLast",
	"weight",
	"gamesCaught",
	"debutYear"
FROM
	"catchers"
WHERE
	"gamesCaught" >= 200
ORDER BY
	"debutYear"

I exported the resulting data as catchers.csv and used a Jupyter Notebook for the rest.

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
from os import path


DATA_DIR = '/Users/markrichard/Downloads'

df = pd.read_csv(path.join(DATA_DIR, 'catchers.csv'))

# Remove 2020 partial decade
df = df[df['Decade'] != 2020]

### MAKE DISTRIBUTIONS ###

# Create base plot
g = sns.displot(df, 
                x='weight',
                kind='kde',
                col='Decade',
                col_wrap=3,
                fill=True,
                common_norm=False,
                aspect = 1.75)

# Calculate and add average lines
decade_stats = df.groupby('Decade')['weight'].agg(['mean', 'count']).round(1)

for decade, ax in zip(g.col_names, g.axes.flat):
    if decade in decade_stats.index:
        mean_weight = decade_stats.loc[decade, 'mean']
        player_count = decade_stats.loc[decade, 'count']
        
        # Add vertical line at mean
        ax.axvline(mean_weight, color='red', linestyle='--', linewidth=2, alpha=0.8)
        
        # Add text annotation
        ax.text(0.02, 0.98, f'Avg: {mean_weight} lbs\nN: {player_count}', 
                transform=ax.transAxes, 
                verticalalignment='top',
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))

plt.tight_layout()
plt.show()

### MAKE AVERAGES SCATTER PLOT ###

plt.figure(figsize=(10, 6))

# Simple scatter with consistent sizing
plt.scatter(decade_stats.index, decade_stats['mean'], 
           s=100, 
           alpha=0.8, 
           color='steelblue',
           edgecolors='white',
           linewidth=2)

# Connect points with a line
plt.plot(decade_stats.index, decade_stats['mean'], 
         color='steelblue', 
         alpha=0.6,
         linewidth=2)

# Clean styling
plt.xlabel('Decade')
plt.ylabel('Average Weight (lbs)')
plt.title('Average MLB Catcher Weight by Debut Decade')
plt.grid(True, alpha=0.2)


plt.tight_layout()
plt.show()

Leave a Reply