My earliest memory of playing soccer — sorry, but “football” meant something very different in Fort Worth, Texas — is when I was four years old. We played a sophisticated positional game that involved a swarm of toddlers running wherever the ball went and kicking each other in the ankles for, you know, tactical reasons.
Feeling a little overwhelmed by it all, I came up with a plan. The next time the ball bounced my way, I sat down on it and refused to move. Toddlers swarmed. Ankles were kicked. Didn’t matter, I wasn’t budging. Finally the poor ref, whose training hadn’t equipped him for this, whistled the first foul he could think of: a handball, pretty much the only body part I wasn’t actually touching it with. Where’s VAR when you need it?
So yeah, fine, I was never a good player. But I think the ball-squatting impulse came from something that’s still true about me, which is that I’ve always wanted to slow down this complex sport and make it make sense. There’s too much to keep track of. No matter how smart you are or how closely you pay attention, it’s impossible to hold even one full match in your head, let alone thousands.
That’s why I like football data.
The truth is, though, I wasn’t always an analytics guy. I’m not sure people even used that word around the time I fell in love with Pep Guardiola’s Barcelona, the first team that organised the game in a way that got me excited about it. Xavi and Andres Iniesta looked like they could stop time in their heads, work out a dozen possible lines of play like old men studying a chessboard in the park, and pick the right pass before ever touching the ball. I desperately wanted to understand how they did it.
I watched so many games. I watched them obsessively while living in Mexico City, Brazil, New York, and along the Texas border. I watched them on dive bar TVs, on illegal Russian streams, in plazas packed with tens of thousands during the World Cup. I downloaded every game I could find and watched them over and over again. I read endlessly about the sport. One night in Ciudad Juarez, several Tecates deep, a friend who was sick of listening to me talk about football said, “Man, you should do this for a living.”
Instead of taking good advice, I went to law school. Maybe I was still dreaming of tracking down that kids’ referee and appealing the unjust handball to CAS or something, I don’t know. But it was while I was a bored law student that I first got my hands on event data.
The stats you’ve seen don’t start their lives as numbers. Most football data consists of “events,” line-by-line records of everything notable that happens on the ball during a match. Actual humans watch the games and punch in when and where every pass or shot or tackle or whatever happened, who did it, what the outcome was, and so on.
There’s a lot that you can do with that information. At first, people counted events: Manchester United took 14 shots; Paul Scholes completed 88 per cent of his passes. That beats trying to remember everything, but it doesn’t always mean a whole lot on its own. Some passes are harder than others; some shots are more likely to score. Critics rightly pointed out that football is a lot messier than, say, baseball, and numbers need context and careful interpretation.
One way to add context to data is to build models, and you’ve probably seen some of those too. Instead of just counting up shots, expected goals (xG) models use information about where and how each shot was taken to estimate its likelihood of scoring. That’s useful. A player’s xG is typically a better measure of his goalscoring talent than actual goals, for example, since shots happen a lot more often than goals and this sport is full of weird bounces and luck that cause problems for small samples.
If you read The Athletic regularly, this isn’t news to you. Mark Carey and the gone-but-not-forgotten Tom Worville have done a great job making analytics more familiar to a mainstream audience. They’ve even introduced readers to the kind of model I think you’ll be hearing a lot about next, expected threat and others like it that estimate goal probabilities at any moment in a game. Football is so much more than what happens around the box, and to get the most out of data we have to look beyond shots.
To be honest, football is more than any kind of data, since the records we have don’t come close to capturing everything that happens on the pitch. Stats still need context and interpretation. They also need stories, because spreadsheets are boring and football should be fun. Good data journalism brings all these things together: curiosity, context, storytelling, and facts (or as close to them as we can get). The goal is to learn things about the game and have a good time doing it.
A little over a year ago I left the law and launched a newsletter called space space space that covered everything from data and tactics to how to find your midfield personality type. Spending all my time watching and writing about football has been the best job I’ve ever had. But journalism works best as a team sport, and when I got a chance to work with so many of the best writers in the business here at The Athletic, I leapt at it. In this job, I won’t just be digging through numbers for my own amusement but for yours, too, hundreds of thousands of subscribers, no matter which team you support. That’s pretty cool.
I don’t think there’s another publication out there with the scale to cover this massive sport and an interest in doing it with the kind of depth and nuance that The Athletic does. “Senior Football Analytics Writer” isn’t a job you see every day. I hope it’s a job that can help make almost anything you read from us that much better.
After all, there’s a lot happening on football pitches all over the world. Wouldn’t it be nice sometimes to sit down and take the measure of it all?
33 per cent discount code: www.theathletic.com/johnmuller33
(Top photo: Getty Images; design: Sam Richardson)