A Simple, New Fantasy Hockey Scoring System

Abstract: Based on the analysis of the relationship between common fantasy hockey stats and actual team success, I devise a new fantasy hockey scoring system that leaves out stats that do not have an obvious contribution to success. The result is a simple system that will lend itself to new fantasy players.

A hockey game, like most sports, is won or lost because of one thing: goals. Your opponent can out-hit, out-shoot, and out-skate you, but the only category that matters in the standings is how many times you and the other team put the puck in the net.

In our age of non-stop stats though, this gets lost. Turn on any given game on TV and you will be informed of power play percentage, how many shots each player has blocked, or how long any given player has spent on the ice. We largely fail to realize how much any of these numbers actually leads to goals and ultimately the team points that determine the standings.

This is no different in fantasy hockey, where hits, penalty minutes, and shorthanded points all matter more than perhaps they should. Leagues that use category scoring take one step forward by dividing stats up, but two steps back by giving goals as much weight as faceoff wins—both are important, but they are not equally important.

I wanted to create a scoring system in which stats are weighted based on how often they actually occur relative to each other, leave out all the stats that lack a strong connection to team points, and is simple.

Goalies

To get a better sense of what we are looking at, let’s start with goalies. A goalie’s job is simple, at least to describe: Stop the puck from going into the net. This is why our new scoring system has just two categories: Goals Allowed and Saves.

Most leagues use goalie wins, shutouts, and losses too. Aren’t those important?

Wins and losses are important, but they are largely beyond a goalie’s control. Like wins and losses for pitchers in baseball, goalies are at the mercy of their team’s offense if they want to earn a win. A goalie who allows two goals on forty shots had a great night regardless of whether his team won or lost, so why should he receive extra points (or have them taken away) because of how his team’s offense (and the other goalie) performed at the other end of the rink?

I’m still not sold, isn’t winning the game the point, regardless of how many goals are allowed? Why is this better than current systems?

The standard goalie points system on ESPN will give a goalie .2 points per save, five points for a win, three for a shutout, and -1 for each goal allowed. This means an average night for a goalie (3 goals and 27 saves) will be worth either 2.4 or 7.4 points. That five-point swing has nothing to do with anything the goalie did (or failed to do). In comparison, an offensive player would get five points for a goal and an assist, so we are giving/docking a goalie the equivalent of a good night on offense for things beyond his control. You might as well flip a coin for his five points.*

Note: It is not quite a coin flip over the whole season. It could be argued that goalies on better teams will get more wins, so a team’s offensive quality should be taken into consideration when drafting a goalie. This is true, but it works against the reason many play fantasy sports, where individual players and not teams are drafted. A player’s team will always affect his stats in some way, but we should work to diminish these effects, not amplify them. Simply put, it is not fair to punish a goalie because his teammates cannot score.

Under ESPN’s settings if a goalie stops all 30 shots they face during a game, they will not only get the six points for their saves (a two-goal night for offensive players), they’ll get five more points for the win, and three more on top of that for the shutout. That’s a 14 point night. A shutout for a goalie is great, but for an offensive player to match that he would need to score four goals and have an assist. There are far more shutouts than there are four goals, one assist games. A good goalie is a valuable asset to a team, but few, if any, are worth more than the top offensive stars in the game.

So what will goalie points be worth?

On an average night an NHL team will score 2.7 goals on 30 shots. Therefore, win or loss, the average goalie performance is three goals allowed (we’ll round up, which will benefit the goalies in the long run) and 27 saves. We can set such a game equal to zero fantasy points, which conveniently means that we can make each save worth one point and each goal allowed worth -9.

That may appear drastic at first, but remember, nothing hurts your chances to win a game more than allowing a goal, and it is after all, the goalie’s sole job. It also means a shutout does not need an extra bonus—without any points taken away, the goalie is going to earn a lot of points. A bad night, say five goals allowed, is not going to be pretty. On the other hand, a goalie on a team that gives up a lot of shots will reap the benefits of having a good game with so many extra saves.

If you don’t understand the intuition behind why goalies stats are what they are, you should probably just stop reading now before the offensive stats make your head explode.

Offense

Offensive stats are much more difficult to value for a number of reasons. You can analyze a baseball game batter-by-batter and if you have enough games, you can figure out how much each play contributed to a team’s likelihood to win the game. And those numbers can be used to create a fantasy league with points based on reality. Hockey is much tougher to break down play-by-play. Does a player help his team more by dumping a puck in or trying to pass it to his teammate? There are too many questions than we can currently track to come up with a good answer—what is the score, how much time is left, does his team need a line change, what is the chance the pass is picked off?

What Should Count for Offensive Players?

Before I started handing out points for anything, I wanted to see how strong of a connection there is between team success and the common fantasy stats are (see my detailed breakdown here). Scoring goals, for example, is good. Other than point differential, scoring a goal has stronger correlation to team points than any other common stat. Which is, of course, exactly what you would expect. Goals against are second.

Perhaps more interesting is the lack of connection we find between team points and hits, blocked shots, or penalty minutes—which are all common fantasy stats. This does not mean that taking a penalty is good, but taking more penalties will not automatically make you a bad team. There is also a very weak correlation between blocked shots and goals allowed. Announcers often praise blocking shots—and blocking a shot is better than not blocking it—but the best teams are not blocking shots, they are preventing the other team from getting a shot off in the first place.

Are You Really Saying Blocked Shots and Hits Don’t Help a Team?

Not at all. I am saying that hockey players do things, the value of which cannot be quantified accurately or with exactness. Team strategy plays a huge role in what a player’s stats will look like. If a team never hit anyone, took a penalty, or blocked a shot… well, it would be interesting. At first glance I would not expect such a team to do that well, but it is possible they would be masters of puck control and a success. I would love to see someone try it.

So for this league, we are going to stick with four categories we know have a stronger correlation to a team’s chance of winning: Goals, assists, faceoff wins, and shots.* We will also be docking players for each game they play. There is more explanation on all that in a moment. Before that, let’s talk point values.

Note: Correlation doesn’t mean causation, but I would be shocked if any of these do not cause a team to be more successful.

OK, Let’s Talk About Point Values…

Coming up with point values is not an exact science, but relative to each other they fit pretty well. We made a goal against -9 points for goalies, and we will stick to nine points per goal for offensive players. We will work with that nine-point baseline for everything else.

Scoring is not evenly distributed throughout all of the players. There were almost 18,000 goals and assists in 2013-14, but half of them came from the top 20% of the players. Coincidentally the top 20%, or about 160 players, will be about the size of our fantasy league.

Because our league will be so reliant on goals and assists, it will be forward dominated. This is a big reason why most leagues include blocked shots, hits, and PIMs in the first place: without them, most defensemen would be valueless, which is more or less what they will be in this league. In reality, defensemen are of great importance, but that importance simply does not translate to stats very well. As I mentioned, the best teams are preventing shots from being taken in the first place, not blocking them. But there is no stat that tells us how many shots a player has prevented.

Rather than making up arbitrary values for them,* we are going to leave them out. So throwing out the defensemen leaves us with 12 forwards per game. An average game (3 goals on 30 shots) means that the average game for a forward will consist of.25 goals on 2.5 shots with .425 assists.

Note: By including blocked shots, hits, and PIMs we may actually be skewing their importance in reality. Again, the correlation between these stats and team success is non-existent.

Because there are 1.7 times more assists than goals, we’ll make goals 1.7 times the value of assists: That’s 5.2 points per assist… ok, we’ll make it five points because round numbers. Like goals, shots will retain the same value they have against goalies, which is one point each. There are twice as many faceoffs in an average game (60) than there are shots (30), so we’ll make a faceoff win 0.5 points and a faceoff loss 0.5 points (those numbers may sound off, but that’s shots on goal, not shots attempted).

Remember our average game for goalies was worth zero points, which means that our aforementioned average game for an offensive player should be the same. This means that we need to add our average values: 0.25 goals, 0.425 assists, and 2.5 for shots to get an average game of 3.175 points… er, three because round numbers. (Faceoffs are left out because a 50% game would be average and worth zero points.) Therefore each game played will be worth -3 points.

League Settings

Because games can have such dramatic swings in points, particularly for goalies, it may not be best to do a head-to-head weekly format with these point values. I prefer roto leagues, as all teams finish based on their point totals rather than who they happened to play on any given week, which can lead to fluky (and frustrating) outcomes.

A team consisting of about 12 offensive players and 2 goalies (the typical NHL team size) should make things simple enough for newer players to grasp and it will mean all of the players will have positive point totals on the year.

Conclusion

This fantasy league would obviously not be right for everyone, but like Reese’s Peanut Butter Cups, there is no right way to fantasy sport. Some defensemen will still be worth having on your team, but they will largely be irrelevant. The positive is that it is simple to understand and the point values are based more in reality than most leagues, which may mean newcomers to fantasy hockey and those with more knowledge of offensive stars will enjoy it more than traditional systems.

Point Values

Goalies

  • -9 per goal allowed
  • 1 per save

Skaters

  • 9 per Goal
  • 5 per Assist
  • 1 per Shot on Goal
  • .5 per faceoff win
  • -.5 per faceoff loss
  • -3 per game played

A Quote and A List: Racist Presidents Edition

Welcome to A Quote and a List, in which a quote and a list are presented, free of commentary, for your consideration.

A Quote

“The president [Barack Obama] is the most racist president there has ever been in America. He is purposely trying to use race to divide Americans.”

– Ben Stein

And A List… of US Presidents Who Owned Slaves

  1. George Washington
  2. Thomas Jefferson
  3. James Madison
  4. James Monroe
  5. Andrew Jackson
  6. Martin Van Buren
  7. William Henry Harrison
  8. John Tyler
  9. James Polk
  10. Zachary Taylor
  11. Andrew Johnson
  12. Ulysses S. Grant

This has been a quote and a list.

In Hockey, More Isn’t Always Better

“A statistician is concerned what baseball statistics ARE. I had no concern with what they are. I didn’t care, and I don’t care, whether Mike Schmidt hit .306 or .296 against left-handed pitching. I was concerned with what the statistics MEAN.” – Bill James

It is a standard of today’s sporting world that we see a lot of stats throughout a broadcast. In addition to the standard goals and assists, we see plenty during hockey games. Blocked shots numbers, penalty kill percentage, a team’s average age. What does not seem to come up too much is: So what?

How much do penalties actually hurt a team? If my team blocks a lot of shots that means they are great defensive team, right? Physical play is vital to success! Why aren’t they hitting more guys!? Based on the games I have watched, it would be reasonable to assume these things matter. I have never heard an announcer advocate shying away from physical play or scold a defender for putting his body between the goal and a shot.

But here’s the thing… On average, teams who block more shots allow more goals. More penalty minutes has a negligible effect on goals allowed. And there is no connection between the number of hits a team dishes out and how successful they are. Knowing what the numbers are means nothing if we interpret them wrong.

Descriptions and Predictions

We need to be conscious of which stats are descriptive and which are predictive. Most stats are descriptive because they tell us what has already happened: Alex Ovechkin scored three goals in his last game. The danger comes in when we imply stats like this have some sort of predictive power: Ovechkin is on a roll right now, he scored three goals in the last game. That is a redundant statement. He is “on a roll” because he scored three goals in his last game, but saying he is “on a roll” is hinting that he will maintain his high level of play in his next game. He could score three goals, or zero, what he did last game does not matter.

This is similar to statements thrown around all of the time like, “This team needs to be more physical in the second period.” It’s subtle, but the implication here is that being physical will lead to greater success. Is that true though?

There won’t be much math here, but to give you a sense of what we are looking at, let’s find the correlation coefficient (or R-squared) between our two factors. For example, we would expect teams with a high goal differential (goals scored – goals allowed) to do well. If we graph the two over the last three full seasons, we get this:

Goal Differential

(Click on graphs to embiggen.)

Things are as we expected. As goal differential goes up, team points go up. More importantly, our R-squared is 0.87. The closer R-squared is to 1, the stronger the two values are connected, conversely if it is 0, there is no linear relationship between the two. Keep in mind that correlation does not mean causation though.

This is the relationship between number of hits and team points:

Hits

Which is to say, there is no relationship. Things are all over the place. So why does this matter?

As far as I can tell, this contradicts the general perception about physical play among fans, who see aggressive play and hard hitting as things that good hockey teams do. An article on NHL.com said of the 2013-14 Blue Jackets, “[Coach Todd Richards] not only insisted the Blue Jackets play fast, get in on the forecheck and play responsibly, but he also wanted to play hard in every zone. That message was heard loud and clear. Columbus set a franchise record with more than 2,500 hits this season.”

Of course they’re good, they hit a lot of people! That League-leading number of hits resulted in 93 points and a trip to the playoffs. Do you know who was dead last in hits? Chicago, who had 107 points and advanced three more rounds in the post-season than Columbus, who were knocked out in the first round.

In this case the writer was not actually wrong in what he wrote (we’ll come back to that in a moment), but you can start to see the problem with thinking descriptive stats are predictive. An announcer or writer being wrong might not be a very big deal, but if a coach designs a strategy under the impression that it will result in better outcomes, he might not have a job for very long. As fans, the more we know about what has an impact on the game, the better we can analyze play and spend more time studying what matters.

What Actually Matters?

Let’s take a look at a few things you hear thrown around during hockey games like, “This team needs to get on the board first.” We have info on that, which tells us that teams who scored first won 68% of the time over the past three full seasons. Perhaps they do not need to score first, but there is clear evidence to suggest it is an advantage. Unfortunately, things are not always so clear.

Announcers often credit a team’s success to how well they perform on special teams. There are a few ways to say this and the phrasing makes a difference. The key factor here seems to be that teams who score more goals are more successful. Full stop. League-wide 20% of goals scored come with a man advantage, but whether a team puts 15% or 27% of their goals in the net while on the power play does not make a difference. As long as the puck ends up in the net, they are all worth one point. (Remember this before you make power play points a category in your fantasy league.)

When it comes to being down a man, there is no correlation between penalty minutes and team points or goals allowed. Rather than PIMs, we should pay attention to penalty kill percentage. Teams will kill anywhere from 75 to 90% of penalties against them, so a penalty will hurt the team with a bad PK team more than one with a good one. No matter how good you are the number of penalties you take will increase the number of short-handed goals you allow, but the teams who take the fewest penalties will not always have the best PK% either.

What about age? Are young legs more important than experience? No, it does not matter much.

Age

Can you score more goals just by shooting more? With an R-squared of .59, shot percentage and number of goals scored are about as strongly correlated as anything else we have seen, other than goal differential (not that .59 is particularly high). But with an R-sqared of .11, the number of shots a team takes hardly has any connection to the number of goals they score. And even with that loose connection: On average, the more shots your team takes, the lower their shot percentage gets, not higher. The next time you are yelling “SHOOT!” at your television, remember, setting up a high-percentage shot is probably worth a few extra seconds.

On the other end of the spectrum, what about blocking shots? Surely that helps to decrease goals? Not really. Again, the relationship hardly exists, but surprisingly teams who block more shots actually allow more goals, on average. Announcers often praise blocking shots—and blocking a shot is better than not blocking it—but there are more alternatives to blocking the shot than letting it go through, like not allowing the other team the chance to get a shot off in the first place.

Blocking Shots

We Need to Watch to Understand the Stats

Things are not so cut and dry as we might want to believe, which can be tough to admit if you have been watching a sport for a few decades. Should we stop showing or even counting stats without predictive power? No way, descriptive stats add a completely different dimension to watching a hockey game. We just need to be aware of them in context; to know that more is not necessarily better. We should not let the stats distract us because they exist: We should not hit guys more because they count hits, we should hit them because it is part of our team’s strategy.

Can physical play help teams? Of course it can, but it is just as important to realize that success is possible without physical play (or with fewer blocked shots) as well. The Blue Jackets found success by hitting guys all over the place, the Blackhawks found it by doing just the opposite. There is more than one way to win; teams rarely need to do anything. Basketball coach Stan Van Gundy talked about the same situation in the NBA:

Everybody’s gotten into these generalizations that you need free throws, shots at the rim, and threes. That’s all well and good, but if you don’t have guys who can shoot the threes, that doesn’t help you a lot. The Celtics won the championship in 2008 and they took more mid- and long-range twos than anybody in the League and they shot them better than anybody in the League because that’s what they had as a team. It has to be part of an overall philosophy that fits your personnel…

Announcers often fall into the trap that Van Gundy pointed out and mislead viewers by oversimplifying the situation. If we, as fans, hear an announcer say, “This team should be doing X more,” we should figure out if X actually does have a connection to the team’s strategy. After all, it is the announcers who get to watch more hockey than anyone, which should allow them to find the numbers that are most relevant to a team’s playing style and pick up on when they are falling short of their game plan.

Which Goal is the Biggest in a Hockey Game?

We intuitively know that not all goals are equal in terms of helping our team win. If we are winning 7-0, we already have such a massive lead that scoring an eighth goal is not going to increase the likelihood of us winning much. But how many goals will our team have to score to pick up a win? And which goal is the biggest?

From their listing of the 1230 games from the 2013-14 season, Hockey‑Reference.com can help us figure this out. We (er, Excel) can count the number of games in which each team scored X number of goals, which looks like this:

Games Per Goals Scored

This same info tells us that the average team scores 2.74 goals in each game. And while we’re at it, the standard deviation is 1.58. So teams are scoring 2, 3, or 4 goals in about 2/3 of their games. Not exactly breaking news.

Let’s get to the wins though. Unsurprisingly, as goals increase so too does win percentage:

Win Percentage Per Goal

I probably do not have to explain that teams who never scored, never won and teams who scored six or more times always won. That is the part we understand before looking at any numbers; we are after what happens in between. Teams who scored once only won 8.4% of the time; not surprising considering they would need to shutout the other team. Teams who scored twice—the most common goal total—won just under one-third of their games. Scoring that second goal increases our win percentage by 22 points (8.4 to 31.8), which makes sense given the added leeway; we can still give up a goal and get the W.

Our biggest jump in win percentage, however, comes in our third goal. Whereas two-goal teams won just under one-third of their games, three-goal teams won just under two-thirds. It is a 32-point jump in win percentage, which is a larger boost than any other goal will give a team on average. Teams that score four goals get another nice 19-point boost of, up to an 82% chance to win. Beyond that, teams win such a high percentage of the time that there is not much room for an extra increase and we arrive back with the intuition we began with—scoring six goals or more means you are going to win, at least for the 2013-14 season. Here is all of the data if you’re curious:

Goals Scored, Win Percent Data

All we have done is quantify what we already knew: Scoring more goals increases our chances of winning. Perhaps you will cheer a little more after that third goal from now on though.

A Guide to Good Graphs

There is a lot of data out there nowadays and whether it is an article or a PowerPoint, a good visualization helps to make sense of it. Unfortunately there are a lot of bad graphs out there too. There are a few reasons for that, the first is that few people really think about graphs (like we’re about to). The second is that most of the graphs people see are garbage, so they don’t really know how to make a good one even if they wanted to.

Here’s a few suggestions to hopefully fix that.

First, we have to pick what kind of graph we are going to use. Microsoft Excel gives you more than enough options to choose from. Bar, column, line, pie, area, and our old friend the scatter. Frankly, some of these should never be used (I’m looking at you Bubble with a 3-D Effect). I don’t care if you think your classmates will be blown away with a donut chart, column and line will take you pretty far and that is fine. Why? We don’t want your classmates to notice the graph.

Rule One: It’s not about the graph

Graph or otherwise, every time you communicate with another person you should have a simple task in mind. What is the point I am trying to get across here? When we are finished this should be as clear as possible.

The people who create the special effects in movies spend months working on computer-generated imagery. If they do their job well you will not notice any of their effects because they will look so real you will be absorbed in the story. We want people to see the information you are showing via the graph, not the graph itself.

Peyton Manning recently set a new NFL record for touchdown passes, so let’s make a graph to show how he compares to the other top TD throwers. If you put the info into Excel, highlight it, and click Column Graph it gives you this:

TD1

You can click on any graph to make it bigger.

I have a negative physical reaction to graphs that look like that. A few paragraphs from now I hope you will too. It’s vulgar. I don’t blame Excel, it needs some sort of default, but that doesn’t mean it is good enough for us to slap a title on and use. The onus is on us to know that this is a starting point and it will take some work to make it look good.

Let’s fix this up some. Our first rule was that it’s about the info, not the graph. In this situation, we are using the graph as a means to display the information in a better way than a list:

  1. Manning – 510
  2. Favre – 508
  3. Marino – 420
  4. Brees – 374
  5. Brady – 372
  6. Tarkenton – 342
  7. Elway – 300
  8. Moon – 291
  9. Unitas – 290
  10. Testaverde – 275

As we can see though, our default graph is not much of an improvement over the list. It is clear Manning and Favre are the top two, but we could have seen that on the list. Our graph does show that they lead everyone else by a pretty good margin, but it’s difficult to say by just how many (or how far apart the two of them are) so in some respects the list is actually a better way to portray the info. We can change that though.

Let’s make this big enough to see, without distorting the content. Make your screen look something like this:

TD2

Pro Tip: If you are making more than one graph for a project, keep them all the same size.

This also applies if you are putting your graph into PowerPoint presentation. It should not cover 100% of the slide, but you the graph should be the only thing on the slide, so make it big enough to see.

Next we’re going to delete the “Series 1” label. We only have one thing we are talking about here, touchdown passes, so we do not need to point that out anywhere other than the title. The second thing we need to do is add that title, which should be as brief as possible (we’ll be using the Layout options under Chart Tools a lot).

Like a lot of writing, the title should be as brief as possible without leaving out any key info. Rarely will you start with “The.” You do not have to say “top ten” because there are only ten names on the graph. Unless there is some meaningful threshold, it is implied. Abbreviations, like TD in this case, are fine. So let’s use “Most TD Passes, NFL History.” Also, the comma is your friend in graph titles.

TD3

Rule Two: Delete all irrelevant information

There is a reason why so many people love the design of Apple products: they get rid of everything they can. Jonathan Ive, one of Apple’s lead designers said of the iPad, “In many ways it’s the things that are not there that we are most proud of.” We don’t want people to spend time figuring out how the graph works, we want them to be absorbing the information, remember rule one.

Look at this bad graph I found:

Bad3Dlinegraph

Some guy thought adding a 3D effect would end up making his graph complex and him look smart. What it did was made the info in the graph impossible to read and him look like a dufus. Quick question: What was the profit made on hammers in February? If it takes you more time to figure that out than read this sentence, you fail. And you will fail, because the angle is such that you have no chance to figure it out no matter how long you stare at it.

It is easy to make a graph look 3D, to add gradients, to bevel edges, or use a drop shadows. Do Not Do That. I have nothing against drop shadows, but they do one thing on graphs: Distract the audience from the information. Let’s add some drop shadows:

TD4

Again, click on the graph to see it full size.

Does it make it look nicer? If you are saying yes, you have forgotten rule one. If there is drop shadow, people are going to be looking at drop shadow, and if they are looking at drop shadow, they are not paying attention to the information we are trying to show.

Same thing goes for 3D, only it is worse.

TD5

Not only does it make the graph more difficult to read, but again the perspective distorts how the columns line up—note that Elway’s bar looks like it is under 300, even though he has thrown 300. What’s the point of having the graph if it is not accurate?

But you might protest without any added effects it looks plain and boring. Does the iPhone look boring with its one button? Because they still are not struggling to sell those. This is not to say we are going to leave it like this, because it is difficult to read.

TD6

Brief Detour: Best Graph Ever

This is not to say we can never use complicated graphs if our audience is comprised of people willing to take the time to digest the info. (The 1% of the time your audience is this select you will know it, the other 99% of the time we should strive to make things as simple as possible.) One of the most famous graphs ever made was for a select audience, it is this one about Napoleon’s army:

Minard

A guy called Charles Joseph Minard came up with that in 1869 (he was probably not using Excel) and one guy said it “may well be the best statistical graphic ever drawn.” Another guy wrote a whole freaking book about it. A third guy made a video you can watch about it:

That’s a great YouTube channel, by the way.

On first look it can be confusing, but Minard knew that anybody into Napoleon enough would take the time to digest the info. Even with five pieces of info, he was still able to make it simple enough that most people could understand it after a few minutes of explanation. What if Minard had thrown a drop shadow on there?

MinardDropShadow

The author regrets having to alter such a great graphic, but felt it was important for educational purposes.

Add much? Didn’t think so.

Rule Three: Details are important

It can take a while to understand the following: Font matters. It has a much bigger impact than you may think. The packaging of our graph is important—it is not just what we are saying, but how we say it. Trying to get cute by using a creative font often makes things more difficult to read. So let’s pick a clearer and larger font for our names and numbers. And we can do the same thing for our title (although it doesn’t necessarily have to be the same one).

This graph gives us more whitespace on the right side, so we can also move the title over a bit. It does not need to be smashed against the top of the box; give it some room on all sides. The title is going to be bold by default, but get rid of that because it is clearly the title and does not need any further emphasis. Having the title look good and in a better location than the default can go a long way in the overall presentation of the graph. Sweat the small stuff.

TD7

And now for our most drastic departure from the mainstream world of terrible graphs yet: Data Labels. We noticed earlier that it is difficult to see the real difference between Marino and Favre or even Manning and Favre, so why not actually include the actual numbers? You can find the Data Label button under the Layout tab (I almost always use Outside End because you are going to be looking at the top of the columns most of the time).

We have plenty of space to include these numbers on a chart with only ten columns, so let’s make them the same font as our names. More importantly, now that we can see specifically how many TDs each guy has thrown, our vertical axis is no longer necessary. Remember Rule Two. Let’s delete it and the horizontal lines. This gives the reader the option to look at the specific numbers or absorb then general comparisons as presented by the columns.

TD8

Now we’re getting somewhere. Look different than most column graphs you’ve seen? Good. Let’s make sure we include our source on this, which you should always have. While technically this is your source: http://www.pro-football-reference.com/leaders/pass_td_career.htm remember that brevity is key, so we can get away with Pro-Football-Reference.com. If your teacher wants you to include some absurd URL that’s three lines long use a URL shortener or grit your teeth and make it small; such criteria makes your graph look bad, but luckily does not exist in the real world.

This last suggestion may just be personal preference, but I usually make the background black and keep the bars a darker color. At this point, we can throw in a few final touches, like changing the color of the three guys on the list who are still active (football fans should pick up on this immediately), noting when we made the graph, and we have it:

TD9

Compare that to where we started and we can see that not all graphs are created equally:

TD1

It took some time for us to get to as simple of a graph as possible—when you think you are finished, it’s not a bad idea to ask yourself what the effect of deleting each element would be while remembering our first rule about having some. All of the text is brief and easier to read, we can tell exactly how many TD passes each guy threw, the addition of our data labels allowed us to remove the vertical axis, and our columns still allow us to view the comparison of QBs visually.

Line Graphs

Congratulations, you have made it through column graphs! Let’s move on to line graphs, although the vast majority of things we discussed already will apply to every graph you make regardless of what type it is. To reiterate the three rules: It’s about the info not the graph, get rid of anything that isn’t a must-keep, and make sure it looks good.

We’ll switch to baseball for our line graph (sorry if you don’t like sports, but they’re full of stats that can be used to practice graphs with). You have probably heard of Ted Williams who was one of the best hitters ever for the Boston Red Sox. Let’s make a graph to see just how good.

We often evaluate baseball players in terms of averages, rather than absolute numbers like our touchdown passes, so let’s compare Williams’s yearly On-Base Average (aka the percent of times he hit and did not make an out) to the league’s average OBA from each season he played. Williams missed the three seasons to fight in World War II and two more to fly planes during the Korean War.

Williams1

Here is another spot that we can immediately see where size makes a difference.

Williams2

I also have used the Line with Markers chart, which I prefer when we’re only looking at less than 20 points. You can see without the markers, it becomes difficult to tell one year from the next.

Williams3

Before we get too far, we should look at why we didn’t we use another bar graph. You can show the same info in multiple different ways, after all, so there is some subjectivity as to which type of graph looks the best given certain information. It is never a bad idea to look through the different styles (Excel makes that easy enough) before you go too far. Here is the same info in a column graph:

Williams4

It is quickly obvious there are more columns than we had in our first graph, which makes things much tighter and tougher to read. We can still tell that Williams was always an above average hitter. This is helped by the fact that the League average OBA stays steady the whole time. If it were going up and down, it would be much tougher to keep the two straight, especially if Williams had a few below-average seasons. There is not always a clear-cut way, but what we are trying to portray to our audience is the key to deciding: Which style makes it easiest to see Williams’s stats compared to the League average?

Now that we have established a line graph is the way to go we can make the changes that we did with our column graph: Larger, more readable text. Let’s only include every-other-year on the horizontal axis, because that’ll make it less jumbled and it’s still easy to understand. Let’s also change the vertical axis units to the standard way OBA is portrayed (i.e. .400 rather than 0.4). We can make our lines thicker while we’re at it.

Since Williams played for the Red Sox his whole career let’s change our color scheme to match their dark blue and red. If we were working with fewer years I would use data labels and eliminate the vertical axis, but they become too chaotic at some point. The title looks nicer below the lines where there is more space too. Don’t be afraid to move something from where it usually is, people will recognize the title from its font size and style no matter its location.

Williams5

Our goal here is to show a general overview of Ted Williams’s career and we have done that: Anyone can take a quick look and see that he was far above average every season. If we don’t just want to inform, like we did with the TD graph, but persuade our audience that Williams is the best hitter ever we can make a few slight changes to drive that point home.

First, we can use black markers on the years in which he led the league in OBA. We can also switch the horizontal axis to show his age to make it more personal. This helps us notice that he was 24, 25, and 26 years old during the three seasons he was fighting in WWII—these are prime years for most ballplayers. Not only did Williams miss those seasons, but he returned in 1946 and did not miss a beat. The legend also fits in nicely at the bottom, rather than relegating it over to the side.

One feature that you may think is missing is the vertical axis title. We have the horizontal axis labeled as Age because there may be some confusion over those particular numbers. I would not have added a label for the year, for instance, people assume it is a year when they see “1948.” Because the title of the graph states that we are looking at Williams’s OBA though, I see no reason to repeat that same info in an axis title.

Williams6

Don’t Do This: Skewing Info

There is an element to both of our graphs that might look better if changed, particularly to someone new to graphs. For example, what if I changed the vertical axis of our TD Pass graph so this happened:

TDskew

By making the minimum of the vertical axis 250, our column heights have a much larger range. But even while all of the number remain the same, our information becomes skewed. Peyton Manning’s column becomes four or five times the size of John Elway or Vinnie Testaverde’s columns, but he has not even thrown twice the number of TD passes they did.

For some reason FOX News has become known for this type of skewed graph, like this one about people apprehended at the border between the US and Mexico:

BadFOXgraph

Based on the size of the bars, it looks like the number of apprehensions has tripled from 2011 to 2013, but if we look at the numbers we see that is not even close to being the case. You might argue that this is why the numbers are there, but I would counter that if you want to give the numbers, then the graph becomes unnecessary. The purpose of the graph is to give a visual of the numbers relative to each other, which when done properly looks like this:

BorderGraph

Obviously the numbers have still gone up, but they have not tripled. It is common to skew graphs (or data in general) like this on purpose to make an argument look better. There is always some balance: We did not, for example, have a vertical axis that went all the way up to 1 on the Williams graph. By making the maximum .600 were we artificially inflating his numbers? Because nobody ever has an OBA above .600, I would argue that we were not. The point of that graph was to compare Williams against the League average, which we did.

The border patrol graph is comparing the yearly numbers to each other though. Each column’s size relative to the others is what matters and what was skewed. You should never purposely skew a graph to make an argument look better. To the contrary, if a graph helps you to see an argument is weak, then maybe you should reevaluate your position.

More on Graph Goals

We looked at using data labels to make specific numbers more clear. Sometimes we can improve a graph by not using them though. Check out this graph that shows the Pittsburgh Steelers’ points for and against over the last decade:

SteelersPPG

I forgot to note that this was only through the first few weeks of the 2014 season, an element that should have been included.

I purposely left off the markers and data labels on each year because my goal with this graph was to show trends more than any specific numbers. And with a quick glance you can see that it is the defense that has dropped off more than the offense.

This is one of my favorite graphs from the most recent baseball season:

PiratesChase

Two things to note: The title is in the form of a question, which makes it longer, but immediately more engaging than “Pirates Chase Percentage.” The second thing is that including the league average line gives it much more context. A single line immediately answers the question “OK, is that good?” in a case where most people will not know what is good or bad.

So one more time: Have a specific goal as to what you want your audience to learn from your graph, delete all info that does not lead to that goal, and make what is left look good. Happy graphing!

Song and a Quote

The effect of music is so very much more powerful and penetrating than is that of the other arts, for these others speak only of the shadow, but music of the essence.

- Arthur Schopenhauer