Ten years ago, on Nov. 15, 2009, one fourth-down decision and the analysis that followed changed the NFL as we knew it.
Two of the greatest quarterbacks in league history were playing in prime time, as Tom Brady‘s New England Patriots led Peyton Manning’s undefeated Indianapolis Colts by six points with just over two minutes to play. The Patriots found themselves in a fourth-and-2 from their own 28-yard line. That’s an automatic punt, right? Not for coach Bill Belichick. Rather than give the ball back to an in-his-prime Manning and an Indianapolis offense that averaged 363.1 yards per game that season, Belichick chose to try to pick up the first down.
Brady swung a short pass out right to Kevin Faulk, but the running back gained only one yard, coming up short and turning the ball over. The Colts got the ball back, scored in four plays and won the game 35-34.
“We tried to win the game on that play,” Belichick told reporters afterward. “I thought we could make the yard. We had a good play, we completed it. I don’t know how we couldn’t get a yard.”
Fans were shocked. Sports media unleashed on Belichick, condemning him for his brazen and highly unconventional decision. Many people just didn’t understand it. But the numbers proved Belichick actually made the right call.
Let’s back up just a little bit. For decades, analysts tried unsuccessfully to crack the puzzle of football. We were able to do a few basic things, like make pretty good team-ranking systems and, from those, game predictions. We were also beginning to understand how much more important passing is than running the ball. But that was pretty much it — until a breakthrough in the 2009 offseason. It was then, due to a confluence of data, modern modeling techniques and computing power, that we were first able to successfully build what’s known as a win probability (WP) model. A WP model tells you how likely each team is to win a game considering all the critical factors of time, score, down, distance and yard line, along with additional circumstances.
When looking at Belichick’s choice, we see the numbers side with what he did. Going for it presented the Patriots with a 79% win probability, based on conversion probability and the potential of giving up a resulting touchdown in the case they came up short. Punting, however, would have yielded just a 70% win probability when factoring in average net punt yardage and the likelihood of the Colts scoring on the possession thereafter. Belichick’s decision gave him an extra 9% in win probability.
Until that November night, nobody in football cared about analytics. Coaches robotically followed the tired convention of kicking or punting on every fourth down until it was painfully obvious that there was no other alternative. But then the “Belichick fourth-and-2” happened, and people started paying attention to the numbers. Suddenly, fourth downs became synonymous with analytics. Teams started adding consultants and full-time analysts to help them make better decisions using win probability. The dam had broken, and the conversation changed overnight, as The Ringer’s Kevin Clark wrote on Wednesday. It was the first time that modern analytics could be shown to truly impact the way the game was played. But the real story is much more complicated.
So how have coaches changed their approaches to fourth downs over the past several years, and what impact has it had on the game? Let’s answer some of the bigger questions about the win probability model, final-down decision-making and what it all means.
What impact did Belichick’s decision have around the league?
Teams had been already evolving on fourth downs long before 2009. And there was actually no abrupt change in behavior on that play following Belichick’s fateful decision. Decision-making continued to gradually improve, and for league as a whole, it has really been in only the past two to three seasons that the evolution has accelerated.
It’s more difficult to measure how well teams do with fourth-down decisions than you might imagine. It’s tempting to simply measure how often teams choose to go for it rather than kick or punt, but that can be misleading. Every fourth down is unique in terms of situation, and each one must be analyzed in context. All other things being equal, for example, going for it with four yards to go is a different decision than with one yard to go, and going for it up by four points is a different decision than going for it while trailing by four.
To more reasonably assess final-down decision-making, I devised “win probability forfeit” a few years ago. It analyzes every fourth down and compares the win probability if a team goes for it against the win probability if they kick or punt. It spits out WP forfeit, or the difference in win probability if a team makes the wrong decision. In other words, it’s how much a team cost itself in terms of chance to win the game by making the wrong call. It also allows us to see how fourth-down decision-making has changed over the years.
How significant are the changes in fourth-down decision-making?
In 2001, teams on average forfeited about a half-game per season by choosing to kick and punt too often. That might not sound like much, but an extra win every other season could turn an eight-win also-ran team into a wild-card team, or a division winner into a 1- or 2-seed.
By 2018, though, the average team halved its error, forfeiting only a quarter of a game per year (or one win every fourth season). And in 2019? The league is on pace to forfeit an average of less than a fifth of a game. That’s a good deal of progress.
Total Win Probability (WP) Forfeit on 4th downs by year. Teams have been improving steadily since at least 2001.
While it’s tough to tell due to some year-to-year variance in the totals from the relatively small samples sizes a 16-game season provides, it appears that the decline in WP forfeit is accelerating since 2017. The projection for 2019 would be the least WP forfeit in the data set.
How have the decisions changed?
The magnitude (or cost) of each error also continues a steady decline. Coaches’ improvement has typically occurred on the most beneficial situations. When coaches do the wrong thing, it costs them an average 0.8% WP forfeit in 2019. In 2001, that was over 1.1%. (In other words, on average every time a coach punts or kicks when he should have gone for it, it costs his team about a 1% chance to win the game.)
In non-obvious situations, or the situations other than when coaches simply have no alternative but to go for it because of time and score, the trend in error rate follows a similar path as WP forfeit, but you could make a good case that the declining trend there really began in 2010.
‘Non-Obvious’ Error rates on 4th downs by Season. It’s possible that Belichick’s 4th and 2 decision 10 years ago sparked an improvement.
Personally, I think the noise in the data prevents us from drawing that conclusion, especially because the WP forfeit measure shows a much steadier decline through that same period. But perhaps the most plausible explanation is that following 2009, some teams started becoming more aggressive on fourth down but predominantly in inconsequential situations — such as when the outcome had all but been decided.
Are these developments the same from team to team?
No, the improvements in fourth-down decision-making are not shared by teams equally. Some teams were quicker to adopt better approaches to decision-making than others, while some have yet to indicate any change. And even a few could be seen as going in the wrong direction.
Although nearly all teams now have an analytics staff, that doesn’t mean game-day decisions are driven by analytics. According to the trends we’ve seen in WP forfeit, here is how I’d rate teams over the past few seasons in their use of analytics in decision-making:
Early adopters: Browns, Chiefs and Ravens
Improving: Bengals, Bills, Buccaneers, Dolphins, Falcons, Packers and Steelers
Late to the party: Colts, Eagles, Jets, Raiders, Texans, Vikings and 49ers
Not improving: Bears, Broncos, Cardinals, Chargers, Giants, Lions, Panthers, Patriots, Redskins, Saints, Seahawks and Titans
Going the wrong way: Cowboys, Jaguars and Rams
That’s just my interpretation of the trend lines, which are admittedly noisy after dividing up the data into 32 slices. I know many of the teams listed under “Not improving” or “Going the wrong way” have made some strides toward embracing analytics, but it might just take some time before it shows up on the field.
So why were fourth-down decisions only slowly improving before teams started seriously analyzing them?
I think teams have always very slowly adapted to changes in the game through intuition and trial and error, but that leaves them years, sometimes decades, behind the optimum. Football is an extremely complex sport. The higher the degree of complexity in a system, the more conservative the decision-makers need to be. The complexity makes it impossible to directly solve for the optimum solutions, so change is evolutionary. People rely on tried and true rules of thumb, passed down from the generation above.
The WP model and the general approach fostered by an analytical mindset has changed that. The optimum could now be solved. Although there might not have been an immediate shift in team behaviors following the Belichick fourth-and-2, the adoption of analytics around the league really began following the discussion surrounding the event. And 10 years later, the recent acceleration in fourth-down improvements is a direct consequence of that fateful decision.