A Guide to Good Graphs

There is a lot of data out there nowadays and whether it is an article or a PowerPoint, a good visualization helps to make sense of it. Unfortunately there are a lot of bad graphs out there too. There are a few reasons for that, the first is that few people really think about graphs (like we’re about to). The second is that most of the graphs people see are garbage, so they don’t really know how to make a good one even if they wanted to.

Here’s a few suggestions to hopefully fix that.

First, we have to pick what kind of graph we are going to use. Microsoft Excel gives you more than enough options to choose from. Bar, column, line, pie, area, and our old friend the scatter. Frankly, some of these should never be used (I’m looking at you Bubble with a 3-D Effect). I don’t care if you think your classmates will be blown away with a donut chart, column and line will take you pretty far and that is fine. Why? We don’t want your classmates to notice the graph.

Rule One: It’s not about the graph

Graph or otherwise, every time you communicate with another person you should have a simple task in mind. What is the point I am trying to get across here? When we are finished this should be as clear as possible.

The people who create the special effects in movies spend months working on computer-generated imagery. If they do their job well you will not notice any of their effects because they will look so real you will be absorbed in the story. We want people to see the information you are showing via the graph, not the graph itself.

Peyton Manning recently set a new NFL record for touchdown passes, so let’s make a graph to show how he compares to the other top TD throwers. If you put the info into Excel, highlight it, and click Column Graph it gives you this:

TD1

You can click on any graph to make it bigger.

I have a negative physical reaction to graphs that look like that. A few paragraphs from now I hope you will too. It’s vulgar. I don’t blame Excel, it needs some sort of default, but that doesn’t mean it is good enough for us to slap a title on and use. The onus is on us to know that this is a starting point and it will take some work to make it look good.

Let’s fix this up some. Our first rule was that it’s about the info, not the graph. In this situation, we are using the graph as a means to display the information in a better way than a list:

  1. Manning – 510
  2. Favre – 508
  3. Marino – 420
  4. Brees – 374
  5. Brady – 372
  6. Tarkenton – 342
  7. Elway – 300
  8. Moon – 291
  9. Unitas – 290
  10. Testaverde – 275

As we can see though, our default graph is not much of an improvement over the list. It is clear Manning and Favre are the top two, but we could have seen that on the list. Our graph does show that they lead everyone else by a pretty good margin, but it’s difficult to say by just how many (or how far apart the two of them are) so in some respects the list is actually a better way to portray the info. We can change that though.

Let’s make this big enough to see, without distorting the content. Make your screen look something like this:

TD2

Pro Tip: If you are making more than one graph for a project, keep them all the same size.

This also applies if you are putting your graph into PowerPoint presentation. It should not cover 100% of the slide, but you the graph should be the only thing on the slide, so make it big enough to see.

Next we’re going to delete the “Series 1” label. We only have one thing we are talking about here, touchdown passes, so we do not need to point that out anywhere other than the title. The second thing we need to do is add that title, which should be as brief as possible (we’ll be using the Layout options under Chart Tools a lot).

Like a lot of writing, the title should be as brief as possible without leaving out any key info. Rarely will you start with “The.” You do not have to say “top ten” because there are only ten names on the graph. Unless there is some meaningful threshold, it is implied. Abbreviations, like TD in this case, are fine. So let’s use “Most TD Passes, NFL History.” Also, the comma is your friend in graph titles.

TD3

Rule Two: Delete all irrelevant information

There is a reason why so many people love the design of Apple products: they get rid of everything they can. Jonathan Ive, one of Apple’s lead designers said of the iPad, “In many ways it’s the things that are not there that we are most proud of.” We don’t want people to spend time figuring out how the graph works, we want them to be absorbing the information, remember rule one.

Look at this bad graph I found:

Bad3Dlinegraph

Some guy thought adding a 3D effect would end up making his graph complex and him look smart. What it did was made the info in the graph impossible to read and him look like a dufus. Quick question: What was the profit made on hammers in February? If it takes you more time to figure that out than read this sentence, you fail. And you will fail, because the angle is such that you have no chance to figure it out no matter how long you stare at it.

It is easy to make a graph look 3D, to add gradients, to bevel edges, or use a drop shadows. Do Not Do That. I have nothing against drop shadows, but they do one thing on graphs: Distract the audience from the information. Let’s add some drop shadows:

TD4

Again, click on the graph to see it full size.

Does it make it look nicer? If you are saying yes, you have forgotten rule one. If there is drop shadow, people are going to be looking at drop shadow, and if they are looking at drop shadow, they are not paying attention to the information we are trying to show.

Same thing goes for 3D, only it is worse.

TD5

Not only does it make the graph more difficult to read, but again the perspective distorts how the columns line up—note that Elway’s bar looks like it is under 300, even though he has thrown 300. What’s the point of having the graph if it is not accurate?

But you might protest without any added effects it looks plain and boring. Does the iPhone look boring with its one button? Because they still are not struggling to sell those. This is not to say we are going to leave it like this, because it is difficult to read.

TD6

Brief Detour: Best Graph Ever

This is not to say we can never use complicated graphs if our audience is comprised of people willing to take the time to digest the info. (The 1% of the time your audience is this select you will know it, the other 99% of the time we should strive to make things as simple as possible.) One of the most famous graphs ever made was for a select audience, it is this one about Napoleon’s army:

Minard

A guy called Charles Joseph Minard came up with that in 1869 (he was probably not using Excel) and one guy said it “may well be the best statistical graphic ever drawn.” Another guy wrote a whole freaking book about it. A third guy made a video you can watch about it:

That’s a great YouTube channel, by the way.

On first look it can be confusing, but Minard knew that anybody into Napoleon enough would take the time to digest the info. Even with five pieces of info, he was still able to make it simple enough that most people could understand it after a few minutes of explanation. What if Minard had thrown a drop shadow on there?

MinardDropShadow

The author regrets having to alter such a great graphic, but felt it was important for educational purposes.

Add much? Didn’t think so.

Rule Three: Details are important

It can take a while to understand the following: Font matters. It has a much bigger impact than you may think. The packaging of our graph is important—it is not just what we are saying, but how we say it. Trying to get cute by using a creative font often makes things more difficult to read. So let’s pick a clearer and larger font for our names and numbers. And we can do the same thing for our title (although it doesn’t necessarily have to be the same one).

This graph gives us more whitespace on the right side, so we can also move the title over a bit. It does not need to be smashed against the top of the box; give it some room on all sides. The title is going to be bold by default, but get rid of that because it is clearly the title and does not need any further emphasis. Having the title look good and in a better location than the default can go a long way in the overall presentation of the graph. Sweat the small stuff.

TD7

And now for our most drastic departure from the mainstream world of terrible graphs yet: Data Labels. We noticed earlier that it is difficult to see the real difference between Marino and Favre or even Manning and Favre, so why not actually include the actual numbers? You can find the Data Label button under the Layout tab (I almost always use Outside End because you are going to be looking at the top of the columns most of the time).

We have plenty of space to include these numbers on a chart with only ten columns, so let’s make them the same font as our names. More importantly, now that we can see specifically how many TDs each guy has thrown, our vertical axis is no longer necessary. Remember Rule Two. Let’s delete it and the horizontal lines. This gives the reader the option to look at the specific numbers or absorb then general comparisons as presented by the columns.

TD8

Now we’re getting somewhere. Look different than most column graphs you’ve seen? Good. Let’s make sure we include our source on this, which you should always have. While technically this is your source: http://www.pro-football-reference.com/leaders/pass_td_career.htm remember that brevity is key, so we can get away with Pro-Football-Reference.com. If your teacher wants you to include some absurd URL that’s three lines long use a URL shortener or grit your teeth and make it small; such criteria makes your graph look bad, but luckily does not exist in the real world.

This last suggestion may just be personal preference, but I usually make the background black and keep the bars a darker color. At this point, we can throw in a few final touches, like changing the color of the three guys on the list who are still active (football fans should pick up on this immediately), noting when we made the graph, and we have it:

TD9

Compare that to where we started and we can see that not all graphs are created equally:

TD1

It took some time for us to get to as simple of a graph as possible—when you think you are finished, it’s not a bad idea to ask yourself what the effect of deleting each element would be while remembering our first rule about having some. All of the text is brief and easier to read, we can tell exactly how many TD passes each guy threw, the addition of our data labels allowed us to remove the vertical axis, and our columns still allow us to view the comparison of QBs visually.

Line Graphs

Congratulations, you have made it through column graphs! Let’s move on to line graphs, although the vast majority of things we discussed already will apply to every graph you make regardless of what type it is. To reiterate the three rules: It’s about the info not the graph, get rid of anything that isn’t a must-keep, and make sure it looks good.

We’ll switch to baseball for our line graph (sorry if you don’t like sports, but they’re full of stats that can be used to practice graphs with). You have probably heard of Ted Williams who was one of the best hitters ever for the Boston Red Sox. Let’s make a graph to see just how good.

We often evaluate baseball players in terms of averages, rather than absolute numbers like our touchdown passes, so let’s compare Williams’s yearly On-Base Average (aka the percent of times he hit and did not make an out) to the league’s average OBA from each season he played. Williams missed the three seasons to fight in World War II and two more to fly planes during the Korean War.

Williams1

Here is another spot that we can immediately see where size makes a difference.

Williams2

I also have used the Line with Markers chart, which I prefer when we’re only looking at less than 20 points. You can see without the markers, it becomes difficult to tell one year from the next.

Williams3

Before we get too far, we should look at why we didn’t we use another bar graph. You can show the same info in multiple different ways, after all, so there is some subjectivity as to which type of graph looks the best given certain information. It is never a bad idea to look through the different styles (Excel makes that easy enough) before you go too far. Here is the same info in a column graph:

Williams4

It is quickly obvious there are more columns than we had in our first graph, which makes things much tighter and tougher to read. We can still tell that Williams was always an above average hitter. This is helped by the fact that the League average OBA stays steady the whole time. If it were going up and down, it would be much tougher to keep the two straight, especially if Williams had a few below-average seasons. There is not always a clear-cut way, but what we are trying to portray to our audience is the key to deciding: Which style makes it easiest to see Williams’s stats compared to the League average?

Now that we have established a line graph is the way to go we can make the changes that we did with our column graph: Larger, more readable text. Let’s only include every-other-year on the horizontal axis, because that’ll make it less jumbled and it’s still easy to understand. Let’s also change the vertical axis units to the standard way OBA is portrayed (i.e. .400 rather than 0.4). We can make our lines thicker while we’re at it.

Since Williams played for the Red Sox his whole career let’s change our color scheme to match their dark blue and red. If we were working with fewer years I would use data labels and eliminate the vertical axis, but they become too chaotic at some point. The title looks nicer below the lines where there is more space too. Don’t be afraid to move something from where it usually is, people will recognize the title from its font size and style no matter its location.

Williams5

Our goal here is to show a general overview of Ted Williams’s career and we have done that: Anyone can take a quick look and see that he was far above average every season. If we don’t just want to inform, like we did with the TD graph, but persuade our audience that Williams is the best hitter ever we can make a few slight changes to drive that point home.

First, we can use black markers on the years in which he led the league in OBA. We can also switch the horizontal axis to show his age to make it more personal. This helps us notice that he was 24, 25, and 26 years old during the three seasons he was fighting in WWII—these are prime years for most ballplayers. Not only did Williams miss those seasons, but he returned in 1946 and did not miss a beat. The legend also fits in nicely at the bottom, rather than relegating it over to the side.

One feature that you may think is missing is the vertical axis title. We have the horizontal axis labeled as Age because there may be some confusion over those particular numbers. I would not have added a label for the year, for instance, people assume it is a year when they see “1948.” Because the title of the graph states that we are looking at Williams’s OBA though, I see no reason to repeat that same info in an axis title.

Williams6

Don’t Do This: Skewing Info

There is an element to both of our graphs that might look better if changed, particularly to someone new to graphs. For example, what if I changed the vertical axis of our TD Pass graph so this happened:

TDskew

By making the minimum of the vertical axis 250, our column heights have a much larger range. But even while all of the number remain the same, our information becomes skewed. Peyton Manning’s column becomes four or five times the size of John Elway or Vinnie Testaverde’s columns, but he has not even thrown twice the number of TD passes they did.

For some reason FOX News has become known for this type of skewed graph, like this one about people apprehended at the border between the US and Mexico:

BadFOXgraph

Based on the size of the bars, it looks like the number of apprehensions has tripled from 2011 to 2013, but if we look at the numbers we see that is not even close to being the case. You might argue that this is why the numbers are there, but I would counter that if you want to give the numbers, then the graph becomes unnecessary. The purpose of the graph is to give a visual of the numbers relative to each other, which when done properly looks like this:

BorderGraph

Obviously the numbers have still gone up, but they have not tripled. It is common to skew graphs (or data in general) like this on purpose to make an argument look better. There is always some balance: We did not, for example, have a vertical axis that went all the way up to 1 on the Williams graph. By making the maximum .600 were we artificially inflating his numbers? Because nobody ever has an OBA above .600, I would argue that we were not. The point of that graph was to compare Williams against the League average, which we did.

The border patrol graph is comparing the yearly numbers to each other though. Each column’s size relative to the others is what matters and what was skewed. You should never purposely skew a graph to make an argument look better. To the contrary, if a graph helps you to see an argument is weak, then maybe you should reevaluate your position.

More on Graph Goals

We looked at using data labels to make specific numbers more clear. Sometimes we can improve a graph by not using them though. Check out this graph that shows the Pittsburgh Steelers’ points for and against over the last decade:

SteelersPPG

I forgot to note that this was only through the first few weeks of the 2014 season, an element that should have been included.

I purposely left off the markers and data labels on each year because my goal with this graph was to show trends more than any specific numbers. And with a quick glance you can see that it is the defense that has dropped off more than the offense.

This is one of my favorite graphs from the most recent baseball season:

PiratesChase

Two things to note: The title is in the form of a question, which makes it longer, but immediately more engaging than “Pirates Chase Percentage.” The second thing is that including the league average line gives it much more context. A single line immediately answers the question “OK, is that good?” in a case where most people will not know what is good or bad.

So one more time: Have a specific goal as to what you want your audience to learn from your graph, delete all info that does not lead to that goal, and make what is left look good. Happy graphing!

Advertisements