Baseball’s best and worst umpires, by the numbers – Daily News
“It’s a bad strike zone,” I thought.
Could it have been the camera angle in the center of the field on Monday night in San Diego? No, the looks on the players’ faces confirmed that. They were veteran hitters – AJ Pollock, Steven Souza Jr. – who winced back at umpire Jordan Baker. I could tell it was no ordinary strike zone. It was bad.
Testing this hypothesis, traditionally, is a pain in the back. The quickest method is to open the MLB app or website. There you can see the pitches plotted against a two-dimensional hitting area, then cycle through the most memorable batsmen for each blatant call. Sometimes this offers proof of a misleading camera angle. It also leads to a lot of confirmation bias. To avoid it, you can’t just search suspicious bats; you have to understand all of the referee’s job on any given night.
The Statcast search engine helps. With a few clicks, it allows you to search every bullet called a strike and every shot taken for a bullet after the end of each game. It’s a decent tool for evaluating raw accuracy, but there is something to be said about the consistency of an ump. Were Souza and Pollock looking over their shoulders on Monday because throws that were previously called balls were suddenly called strikes? This is difficult to assess visually with Statcast. You need something more.
Even access to the clubhouse after the game has its limits. Sometimes a player or manager has the ability to reflect honestly about the present moment and the frankness to speak openly about it with reporters. I heard a few variations on “I watched the video and was wrong; it was a good call, ”or“ despite my reactions, the strike zone was pretty consistent all night. More often than not, you realize that your search for a reliable narrator shouldn’t start in the clubhouse, where emotions tend to color every opinion.
Ethan Singer, a student at Boston University, shared my frustration. With a degree in computer science and statistics, he has a very useful skill in seeking objective truth about bullets and strikes. Last summer, the 19-year-old began writing a program that uses Statcast data to fill the knowledge gap.
The result is referee scorecards, a popular Twitter account whose content has recently migrated to a site of the same name. The Twitter account has nearly 90,000 followers, including major league players, coaches, managers, analysts, front office executives, scouts, broadcasters – just about anyone with skin in the skin. the game. The singer said some referees even contacted him independently as the account grew in popularity. Their comments informed the changes he made to his code. He also informed his opinion of the arbitrators.
“On a small sample, the difference between 100 and 80 percent accuracy is probably not that big,” Singer said. “At some level, someone of the same ability could have those two scores given the same locations. It is just a very difficult task.
The singer scores umps judiciously for consistency and precision. His data confirmed my initial suspicion about the Baker Monday night strike zone in San Diego. At 91%, his accuracy was below the league average of 94%. But Baker was consistent with 96% of his calls – the average for a major league umpire.
The most viewed dashboards on the Singer site aren’t always the worst. When we spoke last week, a June 6 game between the New York Yankees and the Boston Red Sox called by Gabe Morales was all the rage. Morales was on his game that night. He correctly qualified the 57 throws made in the strike zone of the stroke rule book. Only four balls in the rulebook qualified as strikes (98% accuracy) and its consistency was even more remarkable: only one of the 115 throws made was qualified as false compared to the strike zone established by Morales.
Think about the most memorable ball / hitting calls from the last game you watched. Chances are, a referee’s worst calls come to mind more easily than the best. I think this helps explain why the most accurately named games are the most popular on Singer’s website and Twitter feed. The precision is hidden from view, a surprise until the data is displayed before our eyes. The inaccuracy is more obvious.
“Usually the majority of the trending games on the site are trending because they are highly accurate,” Singer said.
For everything the Ump Scorecards say about the fans, it says more about the referees themselves.
The average umpire is 94% accurate to the strike zone in the rulebook. Of the referees who have called at least 10 games this season until Tuesday, only two have an accuracy greater than 95.0% for the season: Tripp Gibson and Alan Porter. The least accurate are Rob Drake and Ron Kulpa (both 91.5%). The most consistent is Pat Hoberg (98.3%); the least consistent is Hunter Wendelstedt (92.9%).
This might help explain why we hear players complain about consistency more than rulebook accuracy: Not only is this more essential to a player’s understanding of the strike zone, but it tends to vary more from match to match.
Ump Scorecards also quantifies bias for or against specific teams. The most favored team in baseball? The Texas Rangers. The least? The Dodgers.
This affirms what has been called the “compassion strike zone. “The data shows that the strike zone narrows to its smallest area when a batter is behind in the 0-and-2 count. The strike zone expands when a pitcher is behind 3-and-0. Human compassion regularly enters the umpiring equation, favoring the underdog, so teams that are best at throwing strikes should show at least the least compassion on the part of the plate umpires. records, the Dodgers (44-29) and Rangers (26-47) are representative opposites.
What does all of this mean for the future of bullets and strikes called by humans? I asked Singer the question. He is in favor of MLB instituting automated hitting zones if the technology is capable of improving human precision. It is also trying to integrate Pitch F / X historical data into its website to create a multi-year database.
Data is generally less accurate as it ages. Even the most recent data uses a two-dimensional plot of a three-dimensional ball flying through a three-dimensional strike zone. (The singer carefully calls his work a “best guess.”) But the general trend line is clear and it calls for focusing on the issues of automated balls and hitting.
Statcast records the pitches wrongly called balls after having sliced the middle of the plate, and the steps taken extremely high, low or wide of the plate wrongly called strikes. These are the most glaring and memorable mistakes a plate umpire can make. They have been divided by six since 2008. We do not know how coherent those calls were about the strike zone established for the match – I’ll leave that to Singer – but today’s referees are much more exact than they were 12 years ago. It seems obvious.
The question is, how much more precise can humans be in this task? If they can’t reach 100%, what is the accuracy enough? A generation ago, 94% might have been an impossible dream, but we made it happen. Thanks to Singer, it has never been so clear.