Don’t Test Users, Test Hypotheses

Posted by
“Whether you are testing your own design or someone else’s, start by defining questions you want answered.”“Observe your users.” That’s a maxim most interface designers and user experience professionals subscribe to. But how do you “observe?” What do you look for? When testing websites or applications, I’ve found that generating hypotheses about user behavior helps inform the observation process, structure data collection and analysis, and organize findings. It also keeps you honest by being explicit about what you are looking for.

User testing typically consists of a sort of fishing trip. We lower a lure (the user) into the water (the application or site) and see what critters (defects) bite. This is a valuable and time-tested approach. But when we start fishing for defects, we are left with some tough questions. For instance: When are we finished? How many defects do we need to find before we have fully tested the site or application? If we find a defect, how do we know how severe it is, and by what measure? In iterative testing, how do we compare results from the test of the current version with results from testing earlier versions?

A productive way to address these issues and to incorporate user testing into the design process is by articulating, up front, the key issues you are investigating and predicting what users will do in certain situations. Imagine that you have been asked to review the user experience of a consumer shopping website and your first reaction is “They’ll never find the pricing page.” I’m suggesting that turning that hunch into an explicit prediction or hypothesis will improve the testing and the relevance of the findings for the design team.

Here’s how to do it

Whether you are testing your own design or someone else’s, start by defining questions you want answered. Describe the assumptions implicit in the design. Make predictions about users’ behavior and develop hypotheses about what they will do. That’s the first step. Then, structure your testing to address those hypotheses. That way, whatever the result, you have specific, relevant information about the design.

My colleague Dianne Davis and I developed this approach over several years of testing websites and applications. We start with the idea that every design, implicitly or explicitly, predicts how people will respond and behave. Articulating those predictions and thinking of them as hypotheses focuses our usability research and allows us to “test the design,” not the users.

Testing the design

The first step is to generate hypotheses. We begin with an extensive review of the site or application. We use relevant heuristics in our review, of course. We ground the review in the site or application’s goals and philosophy, as defined by its designers and as documented in project literature (requirements, specifications, creative brief, etc.). This is critical because we want the review to reveal whether the site conforms to, or deviates from, the design. It’s important to engage the design team in a dialogue at this point to clarify their goals and articulate their predictions about user behavior. That way, the results of the testing will answer their questions, not ours. (If we are testing our own design, then we use this opportunity to reflect critically on it and identify its key predictions and assumptions.)

This initial review yields what you’d expect: many “defects” and questions about user behavior. It lets us spot areas that we think will cause difficulties for users, aspects of the design that may not deliver on design goals, and features or regions that may go unfound or unused. We don’t treat these “defects” as certain, but rather as potential problems and as guesses about what users will do. We recognize that each of the issues is no more than a prediction, but one that is “test-able.” Our instincts or prior experience may “tell us” that an issue is critical and will affect user experience and task completion. But we won’t really be sure until we test with users. That’s where hypotheses come into the picture.

The predictions and guesses become an informal set of hypotheses about user behavior. For example, our review of an automotive manufacturer’s site suggested that users would not see or use a link that opened a pop-up window with detailed photographs of a product, even when the user was looking for those types of images. So our hypothesis was: Most users will not find the pop-up window. If most users do indeed find and use the link, then our hypothesis is disproved, and the design prediction supported. If they don’t find it, we know what we need to fix. In another study, we tested a redesigned interface for paint–colour-mixing software. We expected that users would have trouble because the application lacked indicators of the sequence involved in the task; we hypothesized that users would not know what to do next, and identified an area that needed to be examined.

Testing users testing the design

“Hypotheses or predictions about user behavior help us develop and refine usage scenarios and tasks.”In the second step of our hypothesis testing approach, we do research with users, employing traditional usability interviewing, observation, and measurement methods. But throughout, the drive to examine the hypotheses and answer our questions shapes the process: from sampling strategy through task selection, observation and interviewing, data analysis, and, ultimately, reporting.

If our hypotheses relate to types of users, that may affect sample selection and study design. For example, in testing an educational site we predicted that a Flash-based animated introduction would be of interest to one part of the audience (kids) but a distraction to another part (teachers). So we were sure to include sufficient users in each category and to compare how these two groups used this feature. (By the way, we were wrong. The kids all clicked “Skip Intro.”)

Hypotheses or predictions about user behavior help us develop and refine usage scenarios and tasks. Of course, core tasks are derived from the project’s use cases or scenarios (if they exist). But in addition, we include tasks and situations that will expose users to the problems and opportunities we have identified. In this way, the users are “testing the design.” The automotive site mentioned above had loads of information on vehicles, but the rich-media, featured content was a key element of the design. So we made sure that our research design included tasks that invited users to search for that sort of information. In contrast, we did not generate hypotheses or tasks to address users accessing other information that was of less significance to the marketing strategy.

Watching and listening

This approach also has an impact on how we observe users and collect data. While we note all defects or user problems that we see, we are particularly interested in user behavior around the “pinch points” and “rabbit holes” that we’ve identified. With multiple observers, the hypotheses also serve as common points of reference. If we go in looking at the same things we can more easily compare and synthesize findings.

User observation methods are based on an ethnographic methodology in which the observer works hard to bracket their biases and expectations. The method I am suggesting complements this traditional approach in two ways. First, it is not intended to replace traditional observation methods, but to supplement them. When conducting tests with users, we keep our eyes open and record relevant user behavior, whether or not it relates to our predictions. We always find previously unidentified defects and see users do unexpected things. If you watch and listen, users will reveal interesting things about the tools and devices and products they use.

A second reason that hypotheses complement observation is that being explicit about expectations helps guard against biases on critical issues. Ethnography is hard to do well because it’s difficult to be aware of your biases and easy to find evidence to support them. In The Handbook of Usability Testing, Jeffery Rubin advises having specific testing goals in mind. “If not, there is a tremendous tendency to find whatever results one would like to find.” (p.95) Starting with hypotheses or predictions provides a framework for consciously assessing and interpreting user testing data.

Sifting and reporting

One thing about qualitative data: there is a lot of it around. Data reduction is always a challenge, and assimilating exhaustive lists of defects can be daunting for the usability person and the rest of the team. Aside from the number of issues, there is the problem of organizing them. The hypotheses you start with can provide a meaningful way to group the results as you interpret them. This way, results are organized around design goals and related user behaviors, rather than around interface features, making them more relevant and pointing out underlying design relationships. The hypothesis-testing approach also helps determine what to fix, because findings and recommendations are easier to prioritize when you are mapping them against previously set goals. If you are doing iterative testing, the hypotheses help triangulate between phases of testing and to see if and how design changes affect user behavior.

When it comes to reporting, I have found that I get the attention of designers and developers when I present findings that are put in terms of the goals and ideas they are pursuing. Since the hypotheses are built around their design vision, I am starting in a strong position. And, when I have actually observed users’ behaviour related to those goals, I can speak with greater confidence on the impact of design decisions. If finding the pricing page is an acknowledged part of the site’s task model, and I see that users can’t find it, designers are forced to ask what went wrong and less likely to blame the user.

Not the null hypothesis

It may seem that having hypotheses will bias you towards seeing certain behaviours and not others—that the observer will try to confirm their hypothesis. But that is not how an empirical method works. In fact, we don’t go out to prove hypotheses, but to test them. We’ve got to be open to whatever we see.

This hypothesis-testing approach to usability is not a true experimental methodology. In applied usability work you typically do not have the time to develop metrics and test their reliability, the luxury to assign users randomly to conditions, or the budget to test with sufficient numbers to use statistical tests of significance. You also want to cast a wide net and not restrict your observation to the hypotheses, as you might in a controlled study.

User testing remains a naturalistic (if not “natural”) research method founded on passive observation. So, it’s important not to micro-manage the user’s experience and remain open to whatever they do. And we don’t present our findings in terms of proved or disproved hypotheses because we don’t have (or want to impose) the strict controls needed for a true experimental design. But thinking in terms of hypotheses or predictions helps bring some of the rigour of empirical methods and helps focus the usability effort.

Though it’s rarely referred to in terms of hypotheses, I expect that in practice, usability often already proceeds in this way. For example, Susan Dray and David Siegel suggest beginning usability studies with a thorough review of the system or product to aid in “prioritizing the design issues and user tasks,” and recommend that it’s important to have “a good idea of the key areas to be probed” (p.28).

Design as hypothesis

I find that by basing hypotheses on a site or application’s goals, I can integrate usability testing into the design process. By thinking in terms of hypotheses based on design goals I can generate relevant, action-oriented findings. In this way, usability doesn’t stifle creativity, it focuses it.

One reason this approach works, in my view, is that every design is a hypothesis. The designer is consciously or unconsciously predicting the user’s behaviour: “If I put a button here and make it look like this, the user will see it and know that they should click it next.” Making such predictions explicit focuses your usability reviews and research, and allows you to test the design, not the users.

Dray, Susan M. and David A. Siegel (1999). “Penny-Wise, Pound-Wise: Making Smart Trade-offs In Planning Usability Studies”, in Interactions. ACM, May-June 1999, pp. 25-30.

Rubin, Jeffrey (1994). Handbook of Usability Testing. John Wiley and Sons, ISBN 0471594032.Avi Soudack does usability consulting, information and instructional design, writing and editing, and research on educational media. A former market researcher and teacher, he’s been helping producers, designers, and others improve their communications and new technology efforts for almost twenty years. He’s been designing interactive instruction and information products for the last ten. His goal is to work in a bright room, whenever possible. His website is


  1. Great article!
    Yes, prototypes we design are usually based on hypotheses…hopefully based on some good experience and knowledge/systematic approach gained from various resources (requirements, etc.)………

    I also wanted to point out various degrees of hypotheses in prototype designs during usability testing…….I wonder if anybody has a good recommenation or comment on getting our hypotheses little closer to the solution……


  2. Rather than hypotheses, I typically define a set of key questions that I hope to answer through any study. These questions help me focus the study and determine the appropriate methodologies I should use to answer the questions. They’re also a useful way to get agreement from the team about the goal of the study. I think that organizing the research around questions instead of hypotheses helps avoid any appearance of bias by the researcher.

  3. You can use a hypothesis-based approach without stipulating tasks. Depends on the hypothesis. And you can even listen to users (in focus groups, in think-alouds, in the previous testing session) to help you form your hypothesis. That’s what I like to do, because it can show up where users and design concepts diverge. I just like to go back and watch the users and see if they do as they say.
    AleX… I guess some of the difference in approaches, between what Mark Hurst suggests and I propose,is due to the whens and the whys… when in the development cycle the testing is done and why the research is being conducted.

  4. Great article. When we put a prototype through testing, it’s funny how often we talk about it as if we’re testing the users when in fact we are testing the preconceptions we put into our designs.

    When designing, we think “this oughtta work!”, or “I wonder if this will work?”. Before testing, we can ask the IA and GUI design team members what aspects of their design they argued about the most, or which aspects they felt diverged from widely-accepted best practices. Writing these down is easy – these are, in effect, the designers’ hypotheses. If testing only had one goal, it would be to confirm or deny the success of these design hypotheses. Perhaps it’s just obvious to many in the community, but somehow I feel that this article proposes a fresh way of looking at the purpose of testing.

  5. Interesting…I’m wondering if you share the hypotheses with the rest of the development team before testing. I think if I told my clients what I was going to find and then proceeded to find it, they would think I was being a smart aleck while I was presenting the hypotheses (“The product is so obviously bad from my looking at it, why are we even testing it?”), looking only to confirm my preconceived ideas while testing (fostering suspicion that the test was designed to make the marketing VP’s ideas look bad), and having wasted their time and money when I told them that I discovered what I said I’d find (the less numerous surprise discoveries seeming less valuable among all the stuff that was already suspected). Consider that the rest of the team usually thinks a system that they can figure out is OK for everybody else, too, and that it’s much easier to argue with the tester’s heuristics and intuitions than it is with reactions of real people discovered through testing.

    I think of usability testing as a way to show other members of the team in human terms that design decisions have consequences–delight and productivity vs. confusion and frustration. I think too much of a preview would be upstaging this important phase. In other words, I think it’s more valuable to make the problems with the design that lasting impression than to show that we are so smart we knew they were there all along.

  6. Brian, two reponses to the issue you raise: 1. framing hypotheses, 2, relating to teams.

    1. You can always make your hypothesis a positive one: e.g. “the user *will* find the pricing page.” By doing that you are making explicit the hypothesis that is embedded in the design. Then you all know what you are looking for, and if the user does not find it, the design hypothesis was not proven. Your finding then suggests where rework is needed. (I didn’t mean to imply that you only test for problems you expect to see; rather that you test to see if the users accomplish what they are expected to.)

    2. The dynamics of working with teams always requires tact and good communication skills, and will vary, as you know, according to the nature of the your relationship to the team. If it is a close one, then you may be able to get the team involved in spotting the potential problems themselves, essentially helping to build those hypotheses. In a more external, consulting relationship, I try to get the team to define what ‘success’ is for them. Their definition helps me form hypotheses that help me to structure my work, even if I don’t share actual hypotheses with them before them. That way, when I do the testing and report findings, I’m looking at issues that are important to them.

    p.s. I think the issue you identify — of the usability person being perceived as a ‘smarty-pants’ know-it-all — is a real problem, and wonder what others do to address it.

  7. Great ‘crossover’ piece. While most B&A articles are of little interest for my needs (classification, card sorts, et. al. — all specific to one practical dimension of IA), this was a good summarization of tidbits floating around in my head, but not in any way confimed/solidified. Thanks for organizing my mind — and for contributing to good perspective on activities necessary for any design work.

Comments are closed.