In Appreciation of Measures That Tell Stories

Posted by

Not long ago, usability measures and Web analytics were few and far between. The usual standards amounted to little more than task completion, error rates, and click streams. Yet, they served us well.

Some years ago, when relaying one telling measure—how many clicks it took to find a book—to clients at a large metropolitan library group the room fell silent. Finding a book on a library web site should have been, as my father was fond of saying, “as easy as shooting fish in a barrel.” In our test sessions, however, it took eight of 12 participants an average of 6.25 clicks to find John Grisham’s book, A Painted House. The benchmark for the task was one click.

All but a couple of the participants meandered through pages looking for the best-selling book without feeling they were progressing toward their goal. Some participants clicked 18 or 20 times before giving up. Of all the performance data in our 147-page report, this one piece of information, the number of clicks it took to find the Grisham book, moved the client to take action.

 

“The Three Cs”

This was back in 2002. Now, of course, we have more measures in our toolbox. Or, at least, we should. While the old standards are still useful, the digital spaces we try to improve have become much more complex and so too, have clients’ expectations for functionality and a return on their investment.

 

Whether you call this a Web 2.0 era or not, there is no disputing that most clients these days care more than they ever did before about the “Three Cs”: Customers, Competitors, and Conversion. Click streams have made room for bounce rates, search analytics, and so much more. If we play our cards right, we can reduce and synthesize the raw data and give our clients more meaningful information that foments action.

 

Emblematic Measures Have Teeth

Of all the data we report, there are certain measures that are more meaningful than others. I call the more meaningful data emblematic measures. In dictionary terms, emblematic is a “visible symbol of something abstract,” which is “truly representative.”

 

In our presentation to the library group, the rate for the Grisham task was emblematic. That is, the measure was representative of the library website’s greater inadequacies: its failure to fulfill the basics of its fundamental purpose and meet its customers’ needs. In turn, the measure was understandable to the client on a visceral level because it was firmly planted in their business objectives.

“Emblematic measures ensure that the data are always in the service of the business,” writes Avinash Kaushik, author of Web Analytics: An Hour a Day. “I can’t tell you how many times I run into data simply existing for the sake of data with people collecting and analyzing data with the mindset that the sole reason for the business’s existence is so that it can produce data (for us to analyze!).”

However, not all of the measures we deliver to clients are emblematic, nor should they be. Emblematic measures need to epitomize the entire study’s findings eloquently and elegantly. In layman’s terms, emblematic measures are a lot like the best line from a classic movie: It’s not the only line, but it’s the one that is remarkable, memorable, and eminently quotable.

Emblematic measures are far from prescriptive, static, or context-free, too. With every bit of user experience research we conduct and on each and every site, the measures will surely vary, given the context of testing, the sample, the tasks assigned, the business objectives for the site, the functionality being studied, and so on.

Therein lies one challenge of our daily work.

 

The Site Abandonment Measure

Fast forward from 2002 to Summer 2006. During a usability test of a philanthropic extranet for a large foundation, we measured the occurrence of something we had seen happening a million times. We used to think it was just too obvious to formalize and report to clients.

 

But this time, we found our emblematic measure.

We call this measure a Site Abandonment Measure (SAM). We define a SAM as the percentage (or number) of participants who give up on a specific task (or set of tasks), leave a site altogether, and turn to another source—any source—to get a task done. Put simply, it’s the “I quit—I’ve had it with your site” rate.

When we asked our 15 participants to make a recommendation for a grant in support of a local Special Olympics team, 53 percent of the sample abandoned the task all together. Participants told us they would complete the task elsewhere (usually by using a phone to call the Special Olympics or the foundation directly).

We also found the SAM was significant for informational tasks. When we asked participants to get the latest tax return for the Special Olympics group, 40 percent of the sample left the site all together and went directly to the Special Olympics site for the information.

Overall, the SAM for the foundation’s site was 38.6 percent for the ten key tasks on the extranet. This showing was pretty dismal, especially given the context of our research. We were, after all, testing an extranet with the sole purpose of letting users manage their philanthropic funds—not an e-commerce site and click-through rates on ads. (There are no formal usability standards for unacceptable SAMs rates, as far as we know.)

This means that, on average, on any one task, about six out of every 15 participants agreed to take on the task we asked of them, went through the first motions, and then eventually gave up not only on the task, but on the entire site.

When we presented the findings to the client, the show-stopper was the Special Olympics task and the corresponding SAM. How could they have laid down cold, hard cash for a site that failed to let over half of the test participants make a grant recommendation online?

 

SAMs vs. SARs

SAMs may bring to mind Web analytics and their main use on e-commerce sites. Under the hood, of course, the data is as different as a Porsche from a Prius.

 

Web analytics, such as conversion rates and the more narrow site abandonment rates (SARs) for measuring user interaction with shopping carts, leverage quantitative data extracted from transactional logs to measure macro-level interactions across a large sample of users. SAMs, on the other hand, use behavioral data from one-on-one test sessions to measure micro-level interactions with a small set of representative users.

As a measure in our toolbox, SAMs can tell us things about users that SARs cannot. When users “think aloud” during usability sessions, SAMs can give us some information about the rest of the story behind the quantitative measure. They can collect qualitative data about users’ frustrations, annoyances, barriers and solutions. (Granted, there is always the issue “self report” in usability test sessions.)

According to Kaushik, there are, of course, emblematic Web analytics, too. And bounce rate, which measures the number of visitors who see only one page and leave, is a frequent one.

“Everyone (from the CEO on down) gets this metric right away and can understand if it is good or bad,” Kaushik says. “It is so hard to acquire traffic and everyone cares about the percentage of traffic that leaves right away. When I report that ‘your bounce rate is 60 percent,’ it simply horrifies the client and drives them to ask questions to take action.”

 

What’s in a Name?

Relying on a sexy metric or one type of usability measure alone is not always a sure way to reach a client with a call for change though. The underlying data also has to speak to clients. This means practitioners have to work at breathing life into the data they package and deliver.

 

Kaushik recounts a story about taking existing metrics and segments and simply renaming them to make them more emblematic: “We were measuring five customer segments: (1) those who see less than one page on a site, (2) those who see three pages or less, (3) those who see more than three pages and did not buy, (4) those who place an order, and (5) those who buy more than once. These were valuable segments and something worth analyzing, but the internal clients would simply not connect with the segments until we renamed them to ‘Abandoners,’ ‘Flirters,’ ‘Browsers,’ ‘One-off-wonders,’ and ‘Loyalists.’”

The simple change in how the data was communicated had a huge impact by creating a story around it. Kaushik’s client had a greater understanding and instantly began asking how they could turn Flirters into Loyalists.
 

Hitting Clients Right between the Eyes

Sophocles wrote, “The truth is always the strongest argument.” Likewise, many practitioners rely on data to provide the best approximations of truth they can. With so much of our research focused on striving for accurate representations of something as amorphous, varied, and hotly debated as user behavior, we are a profession usually awash in data, practicing a less-than-perfect science.

 

When I was in graduate school, we discussed ””construct validity”:http://books.google.com/books?id=eAdbEn-yZbcC&pg=PA190&lpg=PA190&dq=babbie+and+construct+validity&source=web&ots=k8tB76zIaW&sig=6-ww5WOHJhLKFk5siib3qUYheis#PPA190,M1.” Construct validity refers to the extent to which a test offers approximate evidence that a certain measure (e.g., the task of finding a library book) accurately reflects the quality or construct (the proficiency of users in carrying out a frequently conducted task on a library site) that is to be measured.

It is essential, of course, to weigh the validity of the tasks we develop and the results delivered. But collecting all of the “right” data is not always enough.

“The problem is that we are so immersed in data in our professional or academic worlds that, to a great extent, we become disconnected with reality,” Kaushik says, “especially when we lose touch with the business side of things and we lose touch with customers and base our analysis on how four people in a lab carried out a task.”

Do your rigorous research justice by communicating the data in such a way that it reveals any significant shortcomings. No matter the size of your project, look for the emblematic measures. They will allow you to tell stories that hit clients right between the eyes and move them to action.

 

Acknowledgements

Many thanks to Avinash Kaushik for his email interview for this article on October 4 and 5, 2007.

 

Kaushik is the author of the book, Web Analytics: An Hour A Day writes the blog, Occam’s Razor, and is the founder of Market Motive, a Silicon Valley startup that focuses on online marketing education. He is also the Analytics Evangelist for Google.

7 comments

  1. I see your point, how to impress the stakeholders (and how to keep our jobs).
    Thanks a lot for the article.

  2. This article is very well-informed; Avinash Kaushik is one of web analytics’ great “gurus” and almost undoubtedly the best source in the industry for this topic. I appreciate the discussion of the importance of properly representing insights to stakeholders – your SAM packs a one-two punch that would beat any SAR hands-down, and I’d love to see more people using more meaningful measures like these! There’s so much more value to be gained when you can use multiple sources of user data to paint a picture for the stakeholders.

  3. Great article. I recently interviewed Avinash Kaushik on how to measure the success of an e-commerce website.

    From the interview, when asked about the most important metrics: “Behavior: Loyalty, Recency, Depth of Visit and Length of Visit, Bounce Rate, Top Entry Pages, Top Referring URLs and Search Keywords.
    Experience: Task Completion Rates, Likelihood to Recommend.” (http://www.avangate.com/interviews/avinash-kaushik_7.htm)
    He is also insisting on improving the user experience with A/B Testing, Multivariate testing and web analytics.

    I think this should be the NEW Web Analytics. The major attention it shouldn’t be only concentrated on what actions users are performing on a website but also on the cause of the actions and on what they are trying to accomplish.

  4. As Andrea said, the distinction between SAM and SAR is a great one to make, but I think stopping at just distinguishing between them doesn’t go quite far enough. These two measurements can actually complement each other, and that’s where the real value of combining UXP techniques with Web analytics data lies. While you talk about how SAM can extract some meaning from a raw SAR, the SAR can also support the SAM. If your SAR is close to your SAM, then that offers a strong argument that what you observed with a small sample of users during user testing bears out across the entire user community (maybe not technically “construct validity, but damn close!).

  5. Absolutely, Fred. I entirely agree with you. There is a lot of richness and power in multiple measures, as Andrea also suggested in her post. The SAM and SAR combination could be a very meaningful one with one measure informing the other one, as you point out–dare I say emblematic ;-). I think one of the dangers we all face is that, at times, we try to do “textbook” work and work within the confines of providing an impeccably accurate measure (whatever the measure may be, e.g., bounce rates, clickstreams, SAMs, etc.). Subsequently, we lose sight of the bigger picture and what stakeholders really want to know. Thanks for posting your comment.

  6. Great comment. Thank you for passing along this clever strategy for presenting data to stakeholders.

  7. It’s a shame the Site Abandonment is not more heavily publicised. Gone are the days when site owners are only concerned with the loading time of a web site being under 8 seconds. Gone are the days when site owners are only concerned with how usable their website is and that their products can be readily viewed or purchase ont heir site.

    What they should NOW be VERY concerned about is how closely the site abandonment metric is related the ABILITY-TO-CONTACT-A-HUMAN-BEING metric – Many companies out there have made their website so very advanced that they have made it virtually impossible to find the simplest thing – a contact number for a person.

    I wonder how many people out there have gone to a website for a telephone number so that they can talk to a human being – and no matter how flash or advanced the website is they dont find anything. THAT should be the focus of attention these days.

    Fickle, us Humans; we want automation, but we want Joe Smith’s personal desk phone number if we get stuck.

    And as ever, the largest corporations are the largest culprits, without mentioning any names micro~cough~soft.

Comments are closed.