Interview: Cole Nussbaumer on Google, what businesses need and what’s hard to unlearn

Cole NussbaumerIf you’re looking online for a data visualization training, it’s likely you’ve come across Cole Nussbaumer. That’s what happened to me when I registered for what turned out to be her very first public workshop, after years of teaching at Google and to other organizations. She describes her views and teachings in various articles, but here she’s agreed to speak more about her personal journey and what she observed from her clients.

Click for the full interview Continue reading

Advertisements
Posted in interview, learning | 2 Comments

Concentric Circles on a Map, version 3

After my attempt at improving the concentric circles, Stephen Few was kind enough to provide more feedback: he still doesn’t like them.

Your experiments with concentric circles are interesting and it’s clear that you’re having fun exploring this, but the new version doesn’t seem to work any better than the first, even though you’ve eliminated the one annoying illusion. We’re still left trying to compare areas, and even though inner circles make this slightly easier, the comparison still requires too much effort and time. Also, to my eyes the patterns formed by the concentric circles are hard to look at–similar to targets on a gun range–which make me a bit dizzy. I appreciate your efforts to find a better solution, but I doubt that concentric circles will prove useful.

The concentric circles don’t make me dizzy, but I agree with the core of the arguments. If the goal was to equal bricks in ease-of-reading, then yes it is a failed attempt. But if the goal was to improve on plain shapes and colors to display quantities on a map, then it seem like a fair addition to the data visualization arsenal. Circles exist and are regularly used on maps and this is a suggestion to make them a little more precise.

I made a few more tweaks to the concentric circles. Here is version 3. I am starting to think that they should have been called “circular gridlines” ever since I replaced the circumferences with colored the area.

circles v3

The smallest circle has disappeared because the spacing was not constant: the small circle represented 1 unit; the next circle, 5 units; and the third, 10 units (1-5-10…). Now the interval is constant at 5 units (0-5-10-15). The result makes clear that the area grows much faster than the radius. I double and triple checked my numbers, but it seems that the inner circle is really the same area as each of the two rings. Truly, these areas are counter-intuitive.

I now use white gridlines with the vague hope that they will be less dizzying, if that is truly a problem. The downside is that we can’t see when the value reaches a multiple of 5 units, like with the faint grey gridline, only when it exceeds it. That’s why I have a third row in the example above, to show more than one gridline at work. One is not limited to intervals of 5, of course, and a different interval would certainly work better in some cases.

Time to put the version 3 on a map and compare with the plain circles. Click for full size.

So, which one seems clearer? For testing purposes, compare Texas and Louisiana. Washington and Oregon. Oklahoma and New Mexico. In these cases, the circular gridlines help me establish which one is largest, something I can’t quite do with the plain circles.

Stephen Few has very high standards, which I respect and wish I could meet, and he wrote that he will not endorse a method that uses area to encode data. Still, this is not about getting his coveted approval, but contributing to and engaging with the larger data visualization community. I would be interested to hear what you think and to see the result if some of you ever test the concentric circles on a real project.

Posted in dataviz, learning, originals | Tagged , | 3 Comments

Concentric Circles on a Map, version 2.0

In response to Stephen Few’s challenge to do better than his bricks to display quantities on a map, I proposed the concentric circles. He shot down version 1.0:

Thanks for exploring the possibilities of concentric circles. Unfortunately, as you’ve seen, when there are more than three or four concentric circles, we cannot perceive the quantities by subitizing; we must attentively count them, which is very difficult to do because they are close to one another and hard to differentiate. Even with attention, it is very difficult to see the difference between a set of seven vs. a set of eight, and so on. Also, notice that sets of closely packed concentric circles beyond a small number create an annoying visual illusion of partially overlapping circles at the four cardinal positions (top, bottom, left, and right). You can see this especially in your map example. Even though this doesn’t work, it was definitely worthwhile to make the attempt. Thanks for the contribution.

This is fair criticism, so I went back to the drawing board. My goal is not to match or exceed the bricks, which I think do a fine job on the preattentive side, but rather to improve on the circles to convey quantities on a map. These were my challenges.

  • Get rid of the optical illusion.
  • Preserve the capacity to overlap.
  • Make the quantities easier to perceive.

Here are the concentric circles version 2.0.

Image

The colors is now on the area instead of the stroke and there is a circular gridline every 5 units.

This design is less busy and does not create the optical illusion of version 1 at smaller sizes and lower resolutions.

circles 1 and 2

The concentric circles can still overlap and preserve their shape.

circles 2 overlap

And the circular gridline allows to see when certain thresholds are crossed on the circle, something that is not possible on a plain circle.

circles threshold

It is not possible to interpolate precisely between gridlines. Columns and bar charts suffer from a similar problem, but they hold two advantages. The first is that they generally have a gridline that exceeds the length of the longest column or bar.

columnsThe concentric circles 2.0 could do the same thing.

circles outer gridlineI don’t want to discard this solution entirely, but I am concerned that we will perceive the outer limit more than their colored area and overestimate the size of the circles. The cost seems to outweigh the benefit.

The second advantage of the bar is that the distance between the gridlines is constant. In a circle, it is well-known, the distance between the circumferences of concentric circles with areas of equal intervals gets smaller as they area grows. It is unlikely that people will adjust their perception of the distance and scale between each circular gridline.

column and circle

I am not sure how much of a problem this is, considering that we are not aiming for the precision of a table, but rather for a visual method that allows a fair approximation. Still, the approximation is likely inferior to that of the column and bar charts.

The contribution of the concentric circles is that they make this confusing property of areas visible, while the plain circles do not.

circumference and area

Enough parading, time to put the concentric circles at work on a map. Click for real size.

Compare with the plain circles.

circles map

So, is it easier to visually estimate quantities with the concentric circles? The slight difference between Arizona and California seems more visible with the concentric circles, and easier to perceive than with version 1.0. The difference between Oklahoma and Louisiana, at least to my eye, is perceptible with the concentric circles, but barely with the plain circles.

Click below to see some other experiments that I discarded or keep for later versions.

Continue reading

Posted in dataviz, learning, originals | Tagged , | 2 Comments

Concentric Circles on a Map

Stephen Few introduced “bricks” a few days ago as a new way to display quantities on a map. Using circles is tricky because humans are not skilled at distinguishing areas. Few suggests that we can distinguish the shape of the bricks arrangement and hence count them more easily (“preattentively”).

The problem arises when bricks overlap, as demonstrated by Andy Cotgreave, from Tableau, and acknowledged by Stephen Few.

Few invited – challenged – his readers to build on his proposal, so I would like to add concentric circles to the discussion.

circlesIt seems to me that the main issue with plain circles, the one that Few is addressing with bricks, is that they barely send a signal when they grow. A circle representing 8 units looks a lot like one representing 7 or 9 units.7-8-9 circles

7-8-9 empty circles
Bricks on the other hand adopt a new shape, so it is very easy to see the difference.

Concentric circles do not go as far as bricks to show each unit, but they convey it more clearly than plain circles.

7-8-9 concentric circles

In the example above, each increase of one unit translates into one more concentric circle. Much like the plain circles, the concentric circles convey quantity by their size, adding to it the number of circles and the aspect of the outer rings. As the area grows in a linear fashion, the corresponding increase in radius diminishes and the circumference grows denser.

The advantage over bricks is that concentric circles can overlap and keep their identity.

overlapping concentricPutting them on a map shows how their size and appearance combine to convey quantities. For instance, to my eye at least, California seems a little bigger than Arizona and, indeed, it is by 1 unit (click on map for real size). A clearer example would be Colorado and Wyoming, where the 1 unit difference is unmistakable.

concentric circles map

Concentric circles have their downsides.

  • Above 5 circles, it is very difficult to count them. It is unlikely that a reader would rely on this method to compare quantities precisely.
  • The stroke can become problematic at greater sizes, when the difference between the circles is very small.
  • The visual is more complex than plain circles. This may create issues especially when interacting with certain backgrounds.
  • The distinction between sizes is less clear than bricks. Testing might tell us if it is any better than plain circles.

I have made some quick tests with variations in colors and full circles with white strokes. Neither seem to work as well as empty circles with colored strokes.

concentric circles variationsIt might well be that this method is already well-known, although a quick search did not yield results. I’d welcome any pointer to anterior examples. It might also be that this method is adding less than it takes away. In any case, I thought I would add one more idea to this interesting discussion started by Stephen Few.

Posted in dataviz, learning, originals | Tagged , | 1 Comment

It’s not you, it’s PowerPoint

I really want to know the source of this brilliant caricature.

Genius: HikingArtist.com

PowerPoint is so despised that “death by PowerPoint” has become a common expression. It might be more literal than you think given its role in the explosion of the Columbia shuttle and its negative impact on the conduct of war. Even the creators of PowerPoint don’t like how it’s used today.

Reliant Robin

If PowerPoint was a car.

Yet PowerPoint has its defenders. Many argue that users can build good presentations with it if they know how to use it properly.

Fine, but that does not make it any less evil. In fact, it is at the root of why PowerPoint is evil:

It is designed to create bad presentations by default.

If PowerPoint was a coffee mug.

If PowerPoint was a coffee mug.

That is, to create a good presentation, you have to avoid the pitfalls that PowerPoint puts in your way.

For people who are so sensitive to design, I am surprised that presentation professionals do not level this criticism more often at PowerPoint. They know that a presentation design helps the audience understand the right messages. It is Microsoft’s job to design PowerPoint so that users will stumble upon good presentation practices.

If PowerPoint was a software. Oh, wait...

If PowerPoint was a software. Oh, wait…

(For an example of how software design can make good people do bad things, see the initial comments of Stephen few on Tableau 8.)

Let’s go through a few basic features of the PowerPoint interface (2011, the latest for Mac) and how it contradicts professional advice on how to make good presentations. I’ll even make some constructive suggestions.

The first screen

notes

A face only a mother could love.

Click on the familiar red icon and you’ll get this welcome screen. It invites you to type the title of your presentation, as if you knew at the beginning. It also makes you feel inadequate if you don’t have a subtitle. Can you imagine Word starting by telling you what part of your text you need and should write first?

Solution: Present an interface that encourages the user to develop their story first. Unfortunately, PowerPoint has become the first thing that people open when they have a presentation to prepare. Deal with it. Instead of starting with a slide, what about starting with some story-making interface where people would be able to note down their points and rearrange them, like post-its? In the meantime, make the speaker’s notes section much more prominent so that users write down all that crosses their mind there, rather than in their slides.

Slide layout options

Choice paralysis? You might be a good presenter.

Choice paralysis? You might be a good presenter.

The very first icon in the very first ribbon encourages you to add slides and to choose layout options for them. Let’s look for one of the best options: a full screen picture. There? No. There? Nope. Hum. There? Non. Oh well: It doesn’t exist.

Solution: Check out the layout options of the Apple Aperture photo albums. They look like good presentations by masters like Garr Reynolds. Full size images, big titles. Why is it that PowerPoint does not have a single such layout?

Title and content slide

A well-known hierarchy: dot, dash, dot, dash, French quotation mark.

A well-known hierarchy: dot, dash, dot, dash, French quotation mark.

The bullet points! The dreaded, overused bullet points, the staple of lists without hierarchy, of paragraphs to read while a speaker speaks, of half-formed ideas on printed decks — it’s what PowerPoint suggests you should be using in its bread-and-butter slide. It gets worse: when entering text, hit tab and you go down one level, with a random choice of “em dash” as the next thing after “round bullet” (who died and made the round bullet king?) and… reduces text size! Take that, large and sparse text. Hey user, why don’t you just fix that yourself since we all know it’s bad form anyway?

Solution: Do not put bullet by defaults. If a user needs them, they are at the exact same place as in Word. Make users work just a little for doing the wrong thing.

Text auto-resize

Mucho dolor, indeed.

Mucho dolor, indeed.

PowerPoint gets you started with a fairly large text size: 32 points. Good: this is in line with the advice to limit the amount of text on your slides. But what happens when you write more than a text box can contain? PowerPoint takes no offense, in fact it takes care of the problem by automatically reducing the size of your text!

How is that supportive of restraining the number of words on a slide? How is this not a temptation, a signal to the user that there’s no point in limiting the amount of text? That’s the source of the slideument, right there.

Solution: No more text auto-resize. When a user puts too much text on a slide, it should have to manually override the software.

Themes

Which of these themes look good to you? Maybe better not to answer.

Which of these themes look good to you? Maybe better not to answer.

Microsoft ran a competition to find the ugliest themes possible. Is there another explanation? (Would “Couture” mind getting out of the way please so that we can add some content?) Maybe they asked their software engineers to design the themes. Or Steve Ballmer.

Solution: Include elegant, restrained but also relevant themes. Call the black and white one “Brightly lit room”. This might even suggest to the user that a good presenter knows in advance what the room looks like.

Images

Help me choose between the bomb and the hourglass!

Help me choose between the bomb and the hourglass.

PowerPoint is a visual aid, so this has to be good. Kudos to Microsoft: nearly half of their layout options propose a direct link to pictures. And double kudos because it links to your own photo library, which might not look professional but are surely original.

On the other hand, there is no link to a professional photography website, such as Corbis or iStockPhoto. There is, however, a link to their breathtaking (as in “taking the life out of you”) clip arts. Seeing a Microsoft clip art in a presentation has become a reminder that you’re not following your dreams.

Solution: Give good habits to your users and offer access to 20 free photos with each copy of MS Office, in partnership with a professional photo company. Heck, Microsoft, Bill Gates owns Corbis. Hint, hint!

Charts

I lost my appetite.

I lost my appetite.

PowerPoint uses the Excel charts and all Excel defaults are wrong. It glorifies the pie chart and proposes 3D versions of just about every chart. Fixing Excel charts can be a full time job and a profitable one at that. How can we blame the casual users for using the most accessible designs?

Solution: Fix Excel charts. The 2013 version seems like a step in the right direction, although I haven’t used it. But that’s for another post.

Well well

MS Word has not corrupted writing. Maybe it has to do with a blank page and a series of fairly relevant tools. If Word was designed like PowerPoint, the text would be all caps by default, the first page would ask you the title of your book and you’d have three bullets waiting for you.

Let’s not defend a bad tool. Let’s ask Microsoft for a tool with the right affordances. The interaction of users and their tools is critical to the outcome. Audiences deserve better than the duo amateur-PowerPoint.

Posted in originals, presentation, software | Tagged , | 3 Comments

Diving with a view

Part II of my observations from the World Bank Data Dive on poverty and corruption.

It might start with the data, but for me the fun is in the analysis, especially visual. I had in fact joined the group fighting corruption because they seemed the most likely to need data exploration and visualization.

Below is the result of a long day’s worth, more or less. I wish I had a graph shining the light of integrity on collusion, coercion or some other evil, but no. Slowed down by data issues, we did not make it that far. I can’t say that I’m satisfied with any graph I’ve done over the week-end but then again, I’ve done them.Board approvals per month

The first one happened while I was idly playing with the project data. By déformation professionnelle, I looked at the number of projects that the Board of Directors had approved per month. With July at the top, it is clear that there is a rush to approve more projects towards the end of the fiscal year, in June.

Is it possible that more cases of corruption happen in projects approved in May and June because the staff takes less time to conduct the due diligence? This question opened the Pandora box of linking disbarment data and the project data. If we were to find project characteristics that lead to higher likelihood of corruption, it could orient the preventive work of the integrity team. It was too much to resist and became our undoing as we spent hours trying to recreate that link, leaving available data sets unused.

While the true wizards were working on said link, I continued to explore visually the project data. My original graph showed cumulative approvals for 66 years. What if this bunching is an old problem and that the Board now approves a constant number of projects per month? I needed a trend.Trend share WB board approvals

I’m afraid this is my best effort of the week-end. About 800 data points visible with a clear enough message: the trend has worsened over the decades and the Board approves a growing share of projects towards the end of the year. The months with a larger share have gotten an increasing share vice versa. Since the mid-1980s, the share has reached 30% regularly in June. This is nearly four times as much as would be expected from an equal distribution per month (1/12 = 8.25%). This finding confirmed that it was still worth exploring the impact of this share of approvals on the due diligence of individual projects. Unfortunately, the data materialized too late and the link was never explored.

We did get an original data set though: the historical list of firms and individuals disbarred by the World Bank. I’m afraid I did nothing worth sharing with it. A few bar graphs showing the number of firms, the average number of days of disbarment per country. No corruption fighting histogram in there, no revolutionary radar graph.

In lieu, here are two of the most interesting visualizations I’ve seen. The first one is a network diagram of the bidders on World Bank contracts built by Nick Violi with data that he scraped himself (wow). It draws no conclusion, but it makes me curious. What are these clusters? I don’t even know what the colors mean, but I’d like to know why some clusters are all yellow, some are mostly blue and some are mixed. G11 is an interesting nod, as it bids on few things but then bids across two clusters. What kind of company can it be? This is the kind of exploratory visualization that makes me want to dive into the data.

Credit: Nick Violi @nvioli

Much credit to: Nick Violi @nvioli Master scraper and rad coder.

The second is from a team exploring UNDP’s resources allocation. In a scatter plot, it compares the overhead with the expenses of, apparently, hundreds of projects. It might look like a Caribbean hurricane to you, but to me the resulting distribution of the data is surprisingly elegant. The two measures have expenses in common, which accounts for the  slope pattern. The horizontal cut-off at 1.0 is due to budget limits (or one hopes). The color overlay provides a nice analytical tool, suggesting to the reader where to look and how to interpret the data. There are a few startling findings already. A surprising number of projects have spent 2-3 times as much in overhead as in operations. Despite the high quantity of outliers, there is a strong concentration of projects around the target of spending 100% of budget and keeping the overhead low, which suggests good planning and lean implementation.

World Bank DataDive UNDP Capacity & Performance

Credit: Monique Williams, Harlan Harris, Dennis D. McDonald, Kezia Charles, Keren Charles, Terence Rose, Kent Rahman, Adell Mendes, Joshua Tokle, Jean-Ezra Yeung, George Fenton, Josh De La Rosa, et al.

This graph would benefit from some graphic design flair. The overlay text should be readable and aligned everywhere. The overlay colors could be more visible and helpful. I’d be curious to experiment with empty circles instead of semi-transparent ones. The vertical text could be made horizontal. The light grey frame could be removed.

Knowing the conditions in which these graphs were produced, I wouldn’t take the data for granted, nor draw any hard conclusion. But they might inspire a few in-depths analysis. Have a look at a few more on this Tumblr.

Thank you but mostly congratulations to the organizers at the World Bank and DataKind. For an event so open, it is impressive how purposeful it felt. A special thanks to the to data ambassadors of our group, Sisi Wei and Taimur Sajid. I hope that the World Bank, UNDP and other organizers and participants will benefit from the event. I know I did.

Posted in dataviz, events, open data | Tagged | 3 Comments

It starts with the data

BFe_lADCEAMOIiA.jpg_large

Photo credit: @worldbankdata

What good can data do?

The World Bank and DataKind set out to further explore this question during the Data Dive held March 16 and 17 in Washington DC (#data4good). People  who rarely work together — coders, quants, data visualizers, procurement experts, economists, lawyers, students, senior managers, open data evangelists — ended up at the same table for 36 hours of intense work, united by their love of data. The goals were attractive. How can we measure poverty more often and more accurately? Can we detect fraud by looking at the data?

Photo credit: Jake Porway @jakeporway

Photo credit: Jake Porway @jakeporway

It was my first participation and the first thing that I learnt is that bringing your desktop computer in the land of laptops makes for a good conversation piece and several tweets.

The second lesson is rather a reminder: all data visualization starts with data gathering and verification. Hold your horses, get the data right. Delayed gratification is the best anyway. And delay our gratification, we did.

The World Bank has some rich and reliable data sets and, indeed, they directed us to a file with 77 dimensions for 13,628 World Bank projects between 1947 and 2013. One million data points for your viewing pleasure. The list of disbarred firms was less enthusing: it had only firms currently disbarred, no historical data and the grounds for disbarment had typos and structure problems. Thankfully, the wizardry of Taimur, Sameer and Jayesh meant that about halfway through the day we had a historical list scraped from the Wayback Machine of Archive.org. The following morning, the grounds for disbarment were clean.

Data Dive World Bank March 2013

It was not as silent as it looks. Photo credit: Neil Fantom.

But the real problem was the missing link between these two data sets. The disbarment list contains no information about the project for which the firm or individual was disbarred. Without it, it is impossible to explore the characteristic of projects for which cases are detected. This information exists somewhere and in fact, it could be manually garnered from the determinations, made publicly available in scanned PDFs, a data person’s nightmare. Still, our three aforementioned wizards put their brains and digits to it, found some intermediary data set and, at the very end of the event, we had a debarment list with project names. I won’t link to it however as we did not have time to verify both the methodology and results, and this is delicate information to get wrong.

The event started Friday night, with some speeches and mingling, and finished Sunday morning with presentations. So it’s about 12-13 hours of work on Saturday, from 10 am to 11 pm. Receiving instructions, understanding the topic, seeing the data sets, thinking up  questions for the data, figuring out the problems, brainstorming solutions, weeding out the wrong ones, implementing the promising ones, seeing and checking the results took our group most of this precious time. We never got to the point where we could ask the questions we had early in the day. Reflecting upon the experience now, maybe we should have limited our questions to a universe that could be answered by the existing data. Make that the third lesson.

The data providers that make the data public would benefit from releasing it in the right format, sparing users a lot of the scraping. Webpages like this were certainly created from a database in the first place and yet we had one person spend the whole day just recreating it. World Bank: share the database. Since it is public information anyway, keep the master file on the server, update it right there.

These data issues are commonplace at such events, we’re told. I can believe it from my personal experience with data. I’m sure it’s fun to be a data visualizer fed with perfect data, but I’m yet to encounter such a situation. Learning to test and clean the data is still, today, a skill that a data visualizer needs. Jon Schwabish recently started a discussion on Twitter concluding that data processing is a defining skill of a data visualization expert and I can only agree.

Posted in dataviz, events, open data | Tagged | 5 Comments