User testing typically consists of a sort of fishing trip. We lower a lure (the user) into the water (the application or site) and see what critters (defects) bite. This is a valuable and time-tested approach. But when we start fishing for defects, we are left with some tough questions. For instance: When are we finished? How many defects do we need to find before we have fully tested the site or application? If we find a defect, how do we know how severe it is, and by what measure? In iterative testing, how do we compare results from the test of the current version with results from testing earlier versions?
A productive way to address these issues and to incorporate user testing into the design process is by articulating, up front, the key issues you are investigating and predicting what users will do in certain situations. Imagine that you have been asked to review the user experience of a consumer shopping website and your first reaction is “They’ll never find the pricing page.” I’m suggesting that turning that hunch into an explicit prediction or hypothesis will improve the testing and the relevance of the findings for the design team.
Here’s how to do it
Whether you are testing your own design or someone else’s, start by defining questions you want answered. Describe the assumptions implicit in the design. Make predictions about users’ behavior and develop hypotheses about what they will do. That’s the first step. Then, structure your testing to address those hypotheses. That way, whatever the result, you have specific, relevant information about the design.
My colleague Dianne Davis and I developed this approach over several years of testing websites and applications. We start with the idea that every design, implicitly or explicitly, predicts how people will respond and behave. Articulating those predictions and thinking of them as hypotheses focuses our usability research and allows us to “test the design,” not the users.
Testing the design
The first step is to generate hypotheses. We begin with an extensive review of the site or application. We use relevant heuristics in our review, of course. We ground the review in the site or application’s goals and philosophy, as defined by its designers and as documented in project literature (requirements, specifications, creative brief, etc.). This is critical because we want the review to reveal whether the site conforms to, or deviates from, the design. It’s important to engage the design team in a dialogue at this point to clarify their goals and articulate their predictions about user behavior. That way, the results of the testing will answer their questions, not ours. (If we are testing our own design, then we use this opportunity to reflect critically on it and identify its key predictions and assumptions.)
This initial review yields what you’d expect: many “defects” and questions about user behavior. It lets us spot areas that we think will cause difficulties for users, aspects of the design that may not deliver on design goals, and features or regions that may go unfound or unused. We don’t treat these “defects” as certain, but rather as potential problems and as guesses about what users will do. We recognize that each of the issues is no more than a prediction, but one that is “test-able.” Our instincts or prior experience may “tell us” that an issue is critical and will affect user experience and task completion. But we won’t really be sure until we test with users. That’s where hypotheses come into the picture.
The predictions and guesses become an informal set of hypotheses about user behavior. For example, our review of an automotive manufacturer’s site suggested that users would not see or use a link that opened a pop-up window with detailed photographs of a product, even when the user was looking for those types of images. So our hypothesis was: Most users will not find the pop-up window. If most users do indeed find and use the link, then our hypothesis is disproved, and the design prediction supported. If they don’t find it, we know what we need to fix. In another study, we tested a redesigned interface for paint–colour-mixing software. We expected that users would have trouble because the application lacked indicators of the sequence involved in the task; we hypothesized that users would not know what to do next, and identified an area that needed to be examined.
Testing users testing the design
If our hypotheses relate to types of users, that may affect sample selection and study design. For example, in testing an educational site we predicted that a Flash-based animated introduction would be of interest to one part of the audience (kids) but a distraction to another part (teachers). So we were sure to include sufficient users in each category and to compare how these two groups used this feature. (By the way, we were wrong. The kids all clicked “Skip Intro.”)
Hypotheses or predictions about user behavior help us develop and refine usage scenarios and tasks. Of course, core tasks are derived from the project’s use cases or scenarios (if they exist). But in addition, we include tasks and situations that will expose users to the problems and opportunities we have identified. In this way, the users are “testing the design.” The automotive site mentioned above had loads of information on vehicles, but the rich-media, featured content was a key element of the design. So we made sure that our research design included tasks that invited users to search for that sort of information. In contrast, we did not generate hypotheses or tasks to address users accessing other information that was of less significance to the marketing strategy.
Watching and listening
This approach also has an impact on how we observe users and collect data. While we note all defects or user problems that we see, we are particularly interested in user behavior around the “pinch points” and “rabbit holes” that we’ve identified. With multiple observers, the hypotheses also serve as common points of reference. If we go in looking at the same things we can more easily compare and synthesize findings.
User observation methods are based on an ethnographic methodology in which the observer works hard to bracket their biases and expectations. The method I am suggesting complements this traditional approach in two ways. First, it is not intended to replace traditional observation methods, but to supplement them. When conducting tests with users, we keep our eyes open and record relevant user behavior, whether or not it relates to our predictions. We always find previously unidentified defects and see users do unexpected things. If you watch and listen, users will reveal interesting things about the tools and devices and products they use.
A second reason that hypotheses complement observation is that being explicit about expectations helps guard against biases on critical issues. Ethnography is hard to do well because it’s difficult to be aware of your biases and easy to find evidence to support them. In The Handbook of Usability Testing, Jeffery Rubin advises having specific testing goals in mind. “If not, there is a tremendous tendency to find whatever results one would like to find.” (p.95) Starting with hypotheses or predictions provides a framework for consciously assessing and interpreting user testing data.
Sifting and reporting
One thing about qualitative data: there is a lot of it around. Data reduction is always a challenge, and assimilating exhaustive lists of defects can be daunting for the usability person and the rest of the team. Aside from the number of issues, there is the problem of organizing them. The hypotheses you start with can provide a meaningful way to group the results as you interpret them. This way, results are organized around design goals and related user behaviors, rather than around interface features, making them more relevant and pointing out underlying design relationships. The hypothesis-testing approach also helps determine what to fix, because findings and recommendations are easier to prioritize when you are mapping them against previously set goals. If you are doing iterative testing, the hypotheses help triangulate between phases of testing and to see if and how design changes affect user behavior.
When it comes to reporting, I have found that I get the attention of designers and developers when I present findings that are put in terms of the goals and ideas they are pursuing. Since the hypotheses are built around their design vision, I am starting in a strong position. And, when I have actually observed users’ behaviour related to those goals, I can speak with greater confidence on the impact of design decisions. If finding the pricing page is an acknowledged part of the site’s task model, and I see that users can’t find it, designers are forced to ask what went wrong and less likely to blame the user.
Not the null hypothesis
It may seem that having hypotheses will bias you towards seeing certain behaviours and not others—that the observer will try to confirm their hypothesis. But that is not how an empirical method works. In fact, we don’t go out to prove hypotheses, but to test them. We’ve got to be open to whatever we see.
This hypothesis-testing approach to usability is not a true experimental methodology. In applied usability work you typically do not have the time to develop metrics and test their reliability, the luxury to assign users randomly to conditions, or the budget to test with sufficient numbers to use statistical tests of significance. You also want to cast a wide net and not restrict your observation to the hypotheses, as you might in a controlled study.
User testing remains a naturalistic (if not “natural”) research method founded on passive observation. So, it’s important not to micro-manage the user’s experience and remain open to whatever they do. And we don’t present our findings in terms of proved or disproved hypotheses because we don’t have (or want to impose) the strict controls needed for a true experimental design. But thinking in terms of hypotheses or predictions helps bring some of the rigour of empirical methods and helps focus the usability effort.
Though it’s rarely referred to in terms of hypotheses, I expect that in practice, usability often already proceeds in this way. For example, Susan Dray and David Siegel suggest beginning usability studies with a thorough review of the system or product to aid in “prioritizing the design issues and user tasks,” and recommend that it’s important to have “a good idea of the key areas to be probed” (p.28).
Design as hypothesis
I find that by basing hypotheses on a site or application’s goals, I can integrate usability testing into the design process. By thinking in terms of hypotheses based on design goals I can generate relevant, action-oriented findings. In this way, usability doesn’t stifle creativity, it focuses it.
One reason this approach works, in my view, is that every design is a hypothesis. The designer is consciously or unconsciously predicting the user’s behaviour: “If I put a button here and make it look like this, the user will see it and know that they should click it next.” Making such predictions explicit focuses your usability reviews and research, and allows you to test the design, not the users.
Dray, Susan M. and David A. Siegel (1999). “Penny-Wise, Pound-Wise: Making Smart Trade-offs In Planning Usability Studies”, in Interactions. ACM, May-June 1999, pp. 25-30.
Rubin, Jeffrey (1994). Handbook of Usability Testing. John Wiley and Sons, ISBN 0471594032.