I have developed and practiced a card-based system that allows me to evaluate a classification outside of its implementation. It is simple, requiring little input from individual users (10 minutes from 20 users is not a significant amount of time for them, but provides me with a significant amount of feedback). Using this technique means that I can focus my in-depth usability testing on interface issues, rather than the classification.
A bit of history
It started while I was working for the Australian Bureau of Statistics. I poked my nose into someone else’s project. I’d heard about a new hierachy being designed for our external website and decided to have a look.
After much discussion with the creators, making some changes, and addressing some remaining concerns, I said the fateful words, “We should test this.” They agreed wholeheartedlyas long as I did all of the work.
I was already in contact with a lot of customers, and knew I would be able to get people to participate. That was the easy part. The difficult part was figuring out how to test it. In this case, I faced the following problems:
- The development work hadn’t started, and the timetable was already tight. I couldn’t wait until the system was developed to test it on screen (and if I waited, there was little chance of getting any changes implemented).
- The classification was for an external site and, due to the technical infrastructure, we couldn’t get a prototype on a live site to test.
- I didn’t want to bring customers into the office to test, as I would only need a short time from them. Yet I didn’t really want to lug around a laptop to customers’ offices either.
- I knew there were usability issues with the current site and didn’t want the existing problems to impact the test of the classification system.
Given these constraints, and after practicing a few times, I developed what I call “Card-Based Classification Evaluation.” This testing method can be organized and run quickly without a lot of overhead, yet can be very effective in ensuring that your classification will help your users find what they need.
OK. Computer off, pens out.
Here’s what you need to run your own card-based classification evaluation:
- A proposed classification system or proposed changes to an existing system. Some uncertainty, mess, and duplication are OK.
- A large set of scenarios that will cover information-seeking tasks using the classification.
- A pile of index cards and a marker.
- Someone to scribe for you.
Go through your classification and number it as shown in the example below, as far into the hierarchy as you need:
Here’s an example of the hierarchy I tested for the Australian Bureau of Statistics:
1.1. Balance of Payments
1.1.1 Current Account/Capital Account
1.1.2 Exchange Rates
1.1.3 Financial Accounts
1.2 Business Expectations
1.3 Business Performance
1.4 Economic Growth
2. Environment and Energy
Next, transcribe your classification system onto the index cards. On the first index card, write the Level 1 headings and numbers. If you need to use more than one index card, do so. Write large enough that the card can be read at a distance by someone sitting at a desk.
Repeat for Level 2 onward, with just one level on each card or set of cards. Bundle all the cards with elastic bands.
On a separate set of index cards, write your scenarios. For the ABS example, a scenario might be “What is the current weekly income?” or “Which Australian city has the highest crime rate?” On one corner of each card, write a letter (A, B, C, etc.) to represent the scenario.
Running the evaluation
Arrange 10-15 minute sessions with each participant. How you make arrangements will vary depending on your audience, so I’ll leave it to you to figure out the best way. In an office situation, I sometimes let people know that I’ll be around at a particular time, and that I’ll come to talk with them. This saves people the worry of meeting me at an exact time.
For each evaluation, take a minute or two to introduce the exercise, and let the participant know why you are doing it. Remind them that it will only take a short time, and give them any other background information they need. These few minutes give them a chance to settle, and can provide you with some initial feedback.
My usual intro goes:
“Hi, I’m Donna Maurer, and this is [my colleague], and we work for [my department]. I’m working on a project that involves making improvements to our website. I’ve done some research and have come up with a different way of grouping information. But before it is put onto the computer, I want to make sure that it makes sense for real people. I need about 15 minutes of your time to do an exercise. You don’t need any background information or special skills.”
(They usually laugh at the “real people” part and nod hesitantly at the end.)
“I’m going to ask you to tell me where you would look for some information. On these cards is a set of things that I know people currently look for on the website. I’ll ask you to read an activity; then I’ll show you a list. I want you to tell me where in the list you would look first to find the information.
“I’ll then show you some more cards that follow the one that you choose, and get you to tell me where you would look next. If it doesn’t look right, choose another, but we won’t do more than two choices. This isn’t a treasure hunt; it is me checking that I’ve put things in places that are sensible for you. Don’t worry if this sounds strange – once we have done one, you’ll see what I mean. And if there are tasks on those cards that you wouldn’t do or don’t make sense, just skip them.
“[My colleague] is going to take some notes as we go.”
They usually still look hesitant at this point. But don’t worrythey figure it out after the first card.
Presenting the scenarios
Put the top-level card on the table, and check that the participant is reading the first scenario. Ask, “Out of these choices, where would you look for that information?”
The participant points or tells you his choice. For that item, put the second-level card on the table. Ask, “Where would you look now?”
For each item the participant chooses, put the corresponding card on the table until you get to the lowest level of your classification.
Have your colleague write down the scenario ID and classification number. Then change the scenario card, pick up all the cards except the top level, and start again.
During a scenario, if a participant looks hesitant, ask him if he’d like to choose again. I only offer two choices at any level because I want to see where the person would look first, not get him to hunt through the classification endlessly. Record all the participant’s choices.
Do this for about 10 minutes, or until the participant seems tired. I usually get through 10-15 scenarios. Wrap up briefly: If you noticed something unusual or think the participant may have something more to add, talk about it. Don’t just stop and say goodbyelet the participant wind down. Thank the person, and then move on to the next participant. Start the scenarios where you left off in the previous session. If you start from the beginning every time, you may never get through all your scenarios.
You’ll probably want to practice this procedure a bit before you do your first real test. It takes a few tries to get the hang of presenting the cards and re-bundling them, particularly with a big classification.
Record your results in a spreadsheet. In the first column of the spreadsheet, list the classification (by number or number and name). Across the top row, list the scenario IDs.
Mark the intersection of each scenario and the classification item selected by the participant. I usually use capital letters for first choices and lowercase letters for second choices. If I’m testing more than one group of participants, I use different letters for each group. (In the example below, I have used X and O to represent two different groups of participants.)
For scenarios or parts of the classification that work well, you will see clusters of responses. For instance, in Scenario B in the table above, the Xs and Os are in a cluster on item 1.2.3, indicating that all the participants chose that item. Assuming the choice is the one you wanted them to make, a cluster means that your classification works well in that scenario. In scenarios where the appropriate choice is less clear, people may look in a range of places (scenario C in the table), and the responses will be more spread out.
There will be some labels that participants consistently and confidently select, some that every participant ponders over, and some that no participant selects. Keep the first, consider the second, and ask questions about the third. Think about why some things worked well: Was it because of good labelling? Straightforward concepts? Similarity to an existing system? Think about why other things didn’t work: Was the scenario able to be interpreted in more than one way? Was there no obvious category to choose? Were there labels that people skipped entirely?
Don’t be afraid to revise and iterate your classification until you are happy with the results (actually, until the participants are happy with the results). Change labels and groupings, reword ambiguous scenarios, and ask further questions in the wrap-up. A bit of shuffling at this point is much easier than changing a navigator after implementation.
At the end of the evaluation, I usually write up a short report for the project team, which I also send out to the participants. This is a good chance to show them that you are doing something with their input, and maybe an opportunity to get some extra feedback.
A few other things to keep in mind:
One of the major benefits of this technique is its speed. Because it is fast and done on paper, you can:
- Get people to participate easilyeven busy people can spare a few minutes.
- Get a lot of participants (which is necessary to show the clustering of responses).
- Cover many scenarios and much of the classification.
- Change the classification as you go or test alternatives on the fly. This is a good reason to write out the cards rather than type themit makes them much easier to change. (You can see that I did this in the example above. The numbering system is not sequential.)
- Rerun the evaluation whenever you make changes.
With this method, not only do you see whether your classification will be successful, you gather valuable information about how people think. Some participants will race through the exercise, hardly giving you time to get the cards out. Some will ponder and consider all of the options. Some take the new classification at face value, while others think about where things may be in an existing scheme. Some learn the classification quickly and remember where they saw things previously. Take notes of participants’ commentsthey are always useful.
One trick with this process is to get people comfortable quickly so they will perform well for the short time they participate. Pay attention to the non-verbal signals they give you, and adjust your introduction and pace for each participant. (For instance, this evaluation can work with vision-impaired participants, by reading the options to them.)
The wrap-up at the end is especially useful for getting additional feedback from participants. If I am having trouble with a label, I often ask what it means to them. If there is a section of the classification that they didn’t choose (which may be because it is a new concept), I explain what might be there and see how they respond.
Make sure your classification goes down to a fairly detailed level, not just broad categories. Even though you may tell participants that this isn’t an information-seeking exercise, people are pleased when they “find it.” For this reason, it is also worth starting with an “easy” scenario.
I have used this method mostly with internal audiences, where it is easy to get people to give you a few minutes. My experience with external audiences has only been with existing customers, but there are many ways to get other external participants. For example, you could run this evaluation as part of a focus group or product demonstration.
To date, the only criticism of this technique I’ve heard is that it just tests known-item searching (where users know what they are looking for), and that it doesn’t test unknown-item searches (where the user is looking for something less well-defined). But if you have figured out how to test unknown-item searching, please let me know!
So far, I’m pleased with the results of this technique. Getting a lot of feedback from many participants has proven to be amazingly useful, both for developing the best classification system for users, and getting the important people involved and interested.
In her spare time Donna tutors Human Computer Interaction at the University of Canberra, studies for a Masters in Internet Communication, and maintains a weblog, imaginatively called DonnaM, about IA, usability, and interaction design.