Using Site Evaluations to Communicate with Clients

by:   |  Posted on

How do you prove your worth to clients in today’s difficult economy? One of the tools in my arsenal that has proved tremendously effective is a website evaluation (or assessment). Performed as part of a sales proposal, a site assessment can help you speak knowledgeably about solutions to your potential client’s problems. As part of a “discovery” phase of a project, it can help uncover opportunities for improvement. Additionally, it can serve as a benchmark to be tested against later in the design process. Because many clients understand ratings, site assessments early in a project can help you and your clients speak the same language, establishing a base vocabulary you can reference later when you do user research, personas, card sorts, and usability tests.

Over the last few years and umpteen site evaluations, I’ve developed some templates I customize as needed. I will describe these templates, when and why I use them, how I present the assessments, and will discuss some specific cases where site assessments have helped me convey my arguments to clients. If you already know what an assessment is, move on to the Three basic templates section. Otherwise, a brief introduction follows.

What’s an assessment?

Two kinds of assessments
A current state assessment evaluates one site, typically the site you are redesigning. I use these assessments to help define problem areas as well as successes.

A competitive assessment is a bit different. This type of assessment helps set an industry “marker” by looking at what the competition is up to, what features and functionalities are standard, and how others have solved the same problems you might be tasked with. Depending on particulars, it’s a look at a bunch of sites, none of which you are designing.

I start with an Excel spreadsheet divided into categories. Each category has a number of attributes underneath, questions that the evaluator must answer. Each attribute receives a score, and scores are averaged to come up with a total for the category. Since scoring is so subjective, I make specific comments to argue my point of view. If a point is particularly engaging (in a positive or negative light), I pull a screenshot that illustrates it.

Rating scale
I like to use a scale of 1 to 5 (No capability=1, Average capability=3, Best practices=5), but you can also use a high/medium/low scale. And sometimes numbers seem too set in stone, or are bound to cause controversy with your client. In those cases, I use Harvey balls, with an empty ball signifying no capability and a filled-in ball best practice.

Decide on your point of view and this will help determine your category headings. Are you evaluating only information architecture? Then your categories would be subsets of IA, such as navigation, taxonomy, and information organization. Are you looking at a particular feature? Your categories would reflect how that feature functions.

Notes on adapting
It’s also worth thinking about why you should do this analysis. What are your strengths? Where does your expertise lie? And, concurrently, what kind of results do you anticipate? Has the client come to you with regard to a particular problem?

Three basic templates

I start with three different base templates, and modify them depending on the project and on my role. I customize each template depending on the project, sometimes down to very specific attributes. For example, on a financial services assessment, I added macro-level questions about specific content items I knew were required: real-time quotes, interactive chart features, etc. For the same assessment, I also looked at features the client specifically wanted us to improve, such as search, personalization, and customer service.

Template 1: IA
My IA template looks closely at information architecture/interaction design details. It really picks at the nitty gritty details of navigation, classification, and interface. Remember all those “rules” we’ve struggled for so long to practice? These rules are my attributes. Using this template is a forum for showing my clients my main area of expertise. It also helps them see their site (or their competitors’ sites) from my point of view.

For example, some categories and attributes are:

Information organization

  • Is information presented in a logical structure?
  • Is it clear how site components and sections are related? Are content/function areas logically/intuitively grouped?


  • Is the navigation consistent across the site?
  • Does it help users answer the three fundamental navigation questions: Where am I? Where have I been? Where can I go?


  • Is terminology clear and consistent across the site (including navigation elements)?
  • Does the site avoid the use of incomprehensible industry jargon?

User interface

  • Does the site support both experienced and novice users effectively?
  • Are visual cues are easily understood and consistent?

Using this base template helps me offer my clients solutions to their specific problems. If the navigation is inconsistent, I can describe the benefits of persistent global links. Combined with other analytical tools such as log files, this evaluation can inform a sales proposal to improve the client’s IA, and can also help my company create a very specific statement of work.

Template 2: Proposal “No matter how inclusive the assessment becomes, the message is always the same: as a user, here’s what I see, and here’s how my team can improve the customer experience.”When I am performing an assessment as part of a proposal effort, however, I often expand the IA assessment and evaluate from the larger customer experience point of view. The assessment forms the basis for my company’s arguments of which features need to be enhanced, which features need more emphasis, etc. This template starts with the IA categories and includes some important others.

For example:

Content strategy

  • Is the content relevant to the target audience?
  • Is the content up-to-date?


  • Is the brand identity clearly presented?
  • Is the brand applied consistently across the site?

Visual design

  • Is there a consistent use of iconography and visual design language?
  • Are visual elements used effectively to show relative importance of various screen elements? Do the most important things stand out?

I have successfully used this template for large enterprise clients in financial services, media, retail, and healthcare, as well as smaller, more unique clients such as museums. Because much of my work is a team effort, this assessment helps clarify why there are so many different people on this team, and how we are all interdependent. And the assessment’s worth can be increased even more by adding technical and strategic categories, helping my company prove our value even more. No matter how inclusive the assessment becomes, the message is always the same: as a user, here’s what I see, and here’s how my team can improve the customer experience.

Template 3: Lifecycle
Another approach that I’ve found to be powerful with clients takes a different perspective, by changing the focus of the assessment. By using the lifecycle of customer experience—Attraction, Orientation, Interaction, Retention, Advocacy—as your categories, you can apply your own vision and opinion within a concrete framework. This template is the most “consultanty” of all, but I’ve found that when clients aren’t able to articulate their goals, this assessment can help direct the conversation. It’s also the most holistic view of an assessment, viewing a site as an “experience.”


  • Is my attention drawn by my initial experience, and do I feel welcomed and familiar?


  • Can I quickly understand what I can do here, and do I feel invited to look at different options?


  • Does the content of the site engage me? Is it possible to discover new information?


  • Does the site offer a way to help me know when special offers are coming? Is there a way gain from other people’s experiences? Does it link to complementary offerings?


  • What is the site doing to keep me coming back? What is the site doing to encourage me to share my experience with others?

Some examples
A current client asked my company to bid on a project for a digital photography site geared toward teenage girls. To choose which sites to evaluate, I broke the assessments into three tiers: business competitors, functionality competitors, and demographic interests. Since the client was already familiar with our team, I didn’t need to establish my competency with user-centered design principles, and since they were looking primarily for our point of view, I decided to make template 3, Lifecycle, my starting point. I added high-level views from the other two templates and customized some categories according to functionality.

Then I assessed about five sites about photography—looking particularly at the image upload features. I also checked out six community sites for teenage girls, as well as a few sites that already combined community and photography. Overall I looked at about 15 sites. I already knew that to engage this client, the pace needed to be quick and the visuals needed to be showy, so I kept the comments short, sweet, positive, and made liberal use of screenshots (see Figures 1 and 2). Using these assessments, I was able to describe successes and failures succinctly, helping to complete our client’s vision and goals for the new project.

Another time, as part of a current state market evaluation, I looked at ten different financial services sites, evaluating everything from features and functionality to branding. This client wanted cold hard facts and needed to prove the cost benefits of our project to his superiors. So this assessment used numbers to rank features, with each site receiving an average score per reviewed category. The assessment accomplished two things: first, it reinforced our client’s view of the current marketplace and competition; and second, it pointed out a hole in a competitor’s online offering, an opportunity our client thought they could leverage to get ahead in the market.

Time spent on assessments varies depending on the complexity of the site and how deep the assessment is. Typically I spend an hour at a minimum; however, I can spend a whole day doing a really involved assessment. For fairness, when comparing multiple sites for a project I try to spend the same amount of time on each.

Notes on presenting, or How to bring the message home
In many cases clients are impressed with a spreadsheet full of data. In other cases it’s necessary to add some punch—to tell it like it really is. In these situations, I use screenshots with callouts to demonstrate both good and bad examples (which is especially valuable during a client presentation or as part of a proposal). It’s all about what story you want to tell (see Figure 3).

I like to start off with something good. Look at something that works really well, and point it out. Then I can start in on problems. Or split up your conclusions into a few main points, and start with a summary page. Think about effectively illustrating your arguments with visual examples, and use your screenshots. Keep your language positive, and focus on solutions to the problems.

Last thoughts
Keep in mind that an assessment is a truly adaptable tool. And it’s not limited to IAs. A business analyst can look at how successful the site is in reaching ROI goals. A brand strategist can see how the messaging performs. A team can assess a site at the beginning of a project (averaging everyone’s scores), and at the end to see what’s improved. Remember, though, an assessment is just one part of the toolkit. It needs to be used in conjunction with everything else: user research, personas, flows, etc. If a site scores poorly on a feature in an assessment, for example, user research on the subject can uncover whether that feature is important to users or not.Dorelle Rabinowitz has over 15 years experience working as an information architect, designer, producer, and a storyteller in new and old media. She’s been at SBI and Company (formerly Scient) for three years as a lead information architect. Before that she produced an award winning website on called Our Stories, and was the design director for the National Media Department at Coopers & Lybrand Consulting. A graduate of the Interactive Telecommunication program at NYU’s Tisch School of the Arts, her personal storytelling projects can be found on her website. Dorelle also holds a BFA in graphic design from the Rhode Island School of Design.

Consolidated Assessment:

by:   |  Posted on

A user-research approach which integrates our best tools into a single session

“Effectively understanding and designing for these most important factors (the 20%) is a much better use of time than spinning wheels (the 80%) over the minutiae that fall outside of most common use cases.”There are several research tools at our disposal for understanding user behavior. But how many times do we get the chance to spend as much time on research as we think is required? Since resource constraints are so common these days, we need ways to make ourselves more efficient on the job. There are two ways to do this: collect better results with less testing time, or collect expanded information without increasing testing time.

One way to gain this efficiency is to integrate three techniques —scenario design, card sorting, and participatory design —into a single session. “Consolidated Assessment” isn’t necessarily the right phrase for what is discussed here, but it’s the first one that comes to mind. It’s like cross country skiing, which works several large muscle groups and the heart, all at the same time. This kind of combined exercise is much quicker than exercising each muscle separately. Nordic track TV commercials claim that you can get twice the workout in half the time. I’ll stake a similar claim for this approach.

The findings from this type of evaluation lean strongly toward the behavioral side of user research, which studies what people do. On the other end of the spectrum, which this technique doesn’t study as closely, is attitudinal research, which studies what people think. Attitudinal research is more closely related to marketing and branding and is better explored through other testing approaches.

The consolidated testing approach is helpful for several reasons:

  • Test environment is more reflective of users’ real-life activities. For example, pure card sorting is a ‘laboratory technique’ used to evaluate users’ mental models. It’s not an activity, or even a thought process that a normal user would go through.
  • More ‘lively and engaging’ evaluation environment for respondents. Respondents tend to zone out after spending too much time on a single monotonous activity. A little variety during evaluations helps keep respondents engaged and providing good feedback.
  • More meaningful results. Rather than use more respondents to test smaller sections of the site, we are able to understand a broader slice of a user’s behavior by performing a holistic evaluation. If respondents become bored, they glaze over and don’t contribute much.
  • Improved efficiency.
    • Logistics planning and recruiting only need to happen once,
    • which is a huge reduction in overhead.
    • Single test script
    • One findings presentation

Now that I’ve explained why the approach is so useful, let’s discuss the research technique in more detail. There is no one correct way to conduct a consolidated assessment, so you’ll want to explore variations that are best suited to your particular project environment.

The 80/20 principal —Pareto’s Principal
To understand the evolution of this approach, it’s important to understand my general philosophy of user research and site design: the 80/20 principal. (Lou Rosenfeld has also been discussing and presenting this topic on his site) In the 19th century, Italian economist Vilfredo Pareto quantified a relationship between production and producers in this way: 80% of production comes from 20% of producers. If ever there was a principal that applies beyond its original definition, this is it. I can only imagine how delighted Vilfredo would be to know his work is quoted on websites about site architecture and design.

By extending the 80/20 principal to the Internet, one may make general assumptions like:

  • 20% of a site’s content accounts for 80% of its use. Think front page on vs. older archived news.
  • 20% of a site’s features account for 80% of use cases. Think buying and selling stocks on Etrade vs. using the Black and Scholes Option Pricing calculator.
  • 20% of a site’s errors account for 80% of poor user experiences. Think not being able to find a unique hotmail e-mail account address under 10 letters vs. trying to add too many names to the address book.

And, carrying the extension one step further to user research, it’s reasonable to assume the following: a well planned consolidated assessment contributes insight into a majority of user experience issues —including scenario development, site organization, and site/page flow. Effectively understanding and designing for these most important factors (the 20%) is a much better use of time than spinning wheels (the 80%) over the minutiae that fall outside of most common use cases. Don’t worry, you’ll always be able to come back and iron out those smaller points later.

Ideal uses for the technique
This approach works best with sites that involve goal-driven users who come to a site with a purpose. They have an idea of what they are looking for, so it is an example of known item or known task based site usage. That includes sites built around scenarios and activities rather than simple document retrieval (for which pure card sorting is well suited). And to achieve these pre-specified goals, the user has to complete some specific tasks that are more involved than simple document retrieval. We’re not talking about finding articles, but going through a set of steps to complete a task.

This 80/20 approach to consolidated assessment works well for general consumer sites and applications where the risks are relatively low or easily remedied. I don’t trust mission critical applications that involve national security, management of large sums of money, or diagnosis of medical conditions to systems built using the 80/20 principal. Those need to be tested, and tested again —no short cuts there. But, those aren’t the kind of projects I am talking about here.

Two-fold improvement is a reasonable expectation
By combining a variation of card sorting with scenario based participatory design, we can improve our efficiency and our research yield almost two-fold. That means the feedback collected is twice as valuable, or that it’s collected in half the amount of time.

If the basics of these techniques are old hat, skip to page three for a description of how combining scenario development, card sorting, and participatory design can shorten the research time and improve the yield.

For people new to the field of user research, you should know that sometimes there are no earthshaking insights uncovered. Sometimes the testing reveals things that you knew along. When that happens, consider it validation and reinforcement —two things you can never have too much of.

If the ideas of scenario development, card sorting and participatory design are new or you would like a refresher, continue on.

“By combining a variation of card sorting with scenario based participatory design, we can improve our efficiency and our research yield almost two-fold. That means the feedback collected is twice as valuable, or that it’s collected in half the amount of time. ”Scenario Development – summarized
Scenarios document what activities and information needs people work through to complete a task. A thorough scenario takes into account what helps people make progress, what holds them back, and the different routes that people can take to reach a goal. Those are the things you have to look for.
Some scenarios are very predictable with only one or two variations, while others are full of twists and turns.

Traditional scenario development approaches may involve activities like contextual inquiry, which is a clinical way of saying that a researcher is going to watch someone as they work in their natural environment. The researcher is hoping to learn what situations a person encounters during an activity. This is understanding through observation.

Contextual inquiry can be a lengthy and costly process because of all the observation and recording that is necessary. It’s also sometimes an annoyance to the person being watched. Even though a skilled observer can be subtle, the very nature of the observation process likely changes the way the watched works.

Sure, it can yield effective results, but there is also opportunity for misinterpretation. Sometimes it’s simply easier to ask about a process than it is to spend time ‘shadowing’ a person as they work. That’s not to say there isn’t a time and place for more formal inquiry, it’s just not always appropriate.

Card Sorting – summarized
Card sorting is so simple a 6 year old could do it. Actually, that’s how old I was when I first started card sorting in the late 1970s. Not that I’ve been in the research field that long, card sorting just seemed a natural thing to do with my baseball card collection. On an almost weekly basis, I’d reorganize my cards. Usually I’d lay them out all over the floor and then get to work. Sometimes I’d sort by team (Go Orioles), or by position (all first basemen), or by year, or by card brand… Card sorting for research isn’t really much different.

In short, the technique is a simple exercise in grouping like items that share attributes. Sorts can be based on documents, concepts, tools, similar tasks, or just about anything that can be grouped. But it’s most often used to figure out navigation categories and which items belong to them. Or it is used to establish document or merchandise categories and related items. To sound like a real clinician, throw around terms like mental model or cognitive relationship. They are simply terms that describe the way people think about items in their mind.

Sorts can reveal four specific things:

  • Items that are consistently grouped together
  • Outliers that are inconsistently grouped
  • Items that can be grouped in multiple categories
  • Titles/headings for groups of like items

You will find that a 100% fit between all items in a group can’t be established all the time. No problem. As long as there is certainty that the groupings make sense to a majority of users, the activity can be considered successful —there’s that 80/20 principal again.

Participatory Design – summarized
Participatory design is exactly what it sounds like. Participants actually design the site and pages, with the moderator helping to guide them through the process. That’s not to say you sit them down in front of Dreamweaver and watch them hack at it. Rather, a design moderator works with the respondent to sketch out, in rough terms, page layouts and process flows. It might be a matter of determining what items belong on a page and the relative prominence they should receive. Or it might be a matter of walking through a process, such as finding and purchasing merchandise or trading stocks, and sequencing the key steps the user must complete.

I imagine there are some similarities to the process of working with a police sketch artist to come up with a composite image of a perpetrator, although I’m not personally familiar with how that works.

Assuming these three activities happen during the course of a project, they are usually handled separately. But can you imagine the value and time savings of combining them?

“Clients appreciate the fact that respondents create their own scenarios, which gives the whole process an extra legitimacy. By keeping your ears open, you will find that people will say all sorts of interesting and useful things during the evaluation.”Consolidated Assessments, explained by example
To illustrate, let’s use the example of a travel website that includes travel planning tools and destination content.

Broad research goal: The broad research goal is to understand how users approach the process of online travel planning and how these issues can be resolved online.

We gain insight by:

  • Developing scenarios and use cases
  • Learning what content and tools people need to understand and complete tasks
  • Having users sequence page flows and prioritize page elements

At the conclusion of the research, we will have generated the following artifacts:

  • Representative scenario narratives
  • Site flow diagrams
  • Page level schematics for key pages in the scenario
  • Prioritized content grouped by activity

Audience assumptions: Users of travel sites usually arrive with specific goalsthat are either task-related or fact-finding. Even if the user lands on the site by chance, let’s assume the site clearly states what it offers and the user is so engaged they jump right in and decide to explore.

Our sample travel site offers the following main features:

Online booking, email alerts for price reductions, a vacation planning wizard, and a very sophisticated search.

In addition to these important features, the site offers interesting content as well: syndicated content for major cities, travel tips, and maps.

Since this kind of site is targeted toward a specific group of users, it’s important that respondents are well screened. The respondents must have a realistic need for what the site offers so they can help build accurate use cases. Scenario development doesn’t work if respondents are required to fantasize about the hypothetical or work through scenarios that don’t apply to them. We want realistic scenarios with personal relevance to the respondent. In this situation, friends, family and co-workers are usually poor substitutes for a properly qualified respondent.

New way —3 steps
Step 1- scenario definition and selection
One goal of testing is to identify generalizations or recurring themes. To that end we might keep the scenarios flexible but still focused. We would provide users with a list of five to seven travel planning scenarios, and they would select three with which they identify.

These scenarios are brief, including little more than a few sentences about the goal and another sentence or two about what constitutes the end of the scenario. An example might read as follows:

You and your wife are planning a vacation. You have one week in July, $2500, and want to go somewhere warm where English is widely spoken.

Variation 1: Ask them to add details to the supplied scenario so that it’s more relevant to them. The more realistic the scenario, the easier it is for respondents to provide firsthand feedback. They might mention facts about sites they want to see, foods they want to eat, or maladies they want to avoid.

Variation 2: Ask them to describe, from scratch, a situation that is current and relevant to them. This gives a lot more latitude to the scenario and allows more focus on exploring new scenarios rather than validating existing ones.

Once we have a few scenarios established, we need to figure out how users will work through them. Since these scenarios are goal driven, we need to learn what information is needed for the user to achieve their goal. The goal here is to research and book a vacation.

As researchers, what can we expect to collect and understand from the scenario development part of the evaluation?

  • The information respondents believe they need to reach the goal; note that some information comes from external offline sources. For example, people are open to travel suggestions from friends and word of mouth. Just because it doesn’t happen online doesn’t mean it’s not worth knowing about —we still need to capture that information.
  • The tools respondents believe they need to reach the goal
  • The sequence of steps respondents might follow
  • The respondent’s level of confidence that they have completed the task accurately

Step 2 —Identifying required content and tools. Card sorting variant.
Although the focus of traditional card sorting is grouping like items, sometimes that’s not appropriate. Our goal here, which is loftier, is to relate content items to specific tasks, not each other. In other words, we are grouping information with corresponding tasks/activities.

Using this consolidated approach, users identify the information and tools they need to complete the tasks in the scenario before beginning the card sorting activity. The respondent tells us which questions need to be answered and where they expect to find the answers. They could be given a pool of article titles and travel planning tools and asked to choose those which they think would be required, as well as those which they think would be helpful.

As researchers, we must keep a keen eye open during the observation. Watch how users sequence their information needs. Which resources do they seek first? What information/tools do they need to get to the answer? Which are helpful but not required? Which are dependant on other activities/information? What do they go offline for? What’s extraneous? What’s missing?

Variation 1: Don’t provide any cards. Rather, have the users tell you exactly the information they need to complete the task.

Step 3 —Participatory paper design
Once we have an idea of important content, data, and functionality, we work with users to define page/activity flows, then we build basic page schematics that support these tasks.

We first ask the user to construct a logical task sequence that incorporates the key steps in the scenario they defined. Then we request that they “build” pages with a template that has basic global elements in place (navigation, promotional space, footer, etc). Users then draw the other key items onto the page.

Variation 1: Allow users to sketch their ideal page layouts from scratch (i.e. a travel booking tool and teasers to interesting articles on the front page).

Variation 2: Show users sections of sites that offer similar features and have them select the ‘best of breed’ approaches and build a page using the cut and paste method. Sort of like the Color Forms where people and objects are slapped down onto a background. (i.e. booking engine from Expedia, promotion section from, and content from Lonely Planet)

By the time several users have gone through the same scenario, the research team should have enough feedback to know the critical content/tools and the key steps of the flow.

The evaluation would probably work just as well by altering the order of steps 2 (Card Sort) and 3 (Participatory Design). A reasonable case could be made for either positioning. If the page flows and layouts are drawn out first, then the content could be organized then dropped onto the pages. If the content is identified first, the pages could be constructed as containers for the content. Either way works.

By and large, this type of testing has worked well for me, both in terms of getting good insight, and in terms of communicating findings to clients. Clients appreciate the fact that respondents create their own scenarios, which gives the whole process an extra level of authenticity. By keeping your ears open, you will find that people will say all sorts of interesting and useful things during the evaluation. Clients love quotes.

In my experience, while results have been mostly positive, there was one time when things didn’t work so well. I wasn’t able to get the respondents to describe the scenario in enough detail to finish the rest of the activities in the evaluation. They were stuck in a rut and I wasn’t able to nudge them out of it. Had I done a practice run or two with the scripts (piloting) before testing, I would have known this. Always pilot the test scripts before the evaluation and have alternative activities ready if needed.

If you have a chance to apply this technique, I’d be interested in hearing what works well for you, what doesn’t, and any suggestions for how to refine it. In particular, I’m interested to know how well it works in areas outside of travel. Drop me a line, or discuss in the forum here.

Suggestions for the moderator:

  • Pre-test the evaluation script to make sure that it flows well and that it can be completed within the timeframe users will be given.
  • Don’t lead users to conclusions, only facilitate discussion.
  • Don’t let users spend too much time revisiting their responses/recommendations. Their gut reactions are usually right. More time does not usually bring us to more accurate responses.
  • Remain flexible. Don’t be afraid to divert from the test plan as long as you’re collecting quality feedback.

Sample test plan:
Respondents: Recruit six and expect to test five. One of the six will most likely be a no show or not much of a communicator.

Materials Required: Labeled cards for sorting. Participatory design materials: Sketch paper (graph paper works well for those who like to draw in the lines). Big sticky poster board works well too. Possible A/V recording equipment for a highlights reel.

Time: 1.5 – 2 hours
The evaluation should move relatively quickly. This format isn’t designed for introspection. Rather, we want to run with user gut reactions. Once they make a choice, they need to stay with it, rather than overanalyze and edit.

10 minutes for orientation
20-30 minutes for scenario development (3 scenarios)
20-30 minutes for sorting
25-35 minutes for participatory design
20 minutes for debrief / wrap-up

Report Highlights: There will be no shortage of discussion material to include in the final report. Direct user quotes, photos, and artifacts go a long way to constructing a compelling report. Here is an example outline for your report:

  • Introduction
  • Methodology Overview
  • Executive summary of entire evaluation
  • Scenario development highlights
    • Consistencies
    • Inconsistencies
    • Surprises
  • Sorting highlights
    • Consistencies
    • Inconsistencies
    • Surprises
  • Participatory design highlights
    • Consistencies
    • Inconsistencies
    • Surprises
  • Actionable recommendations for IA and content

Seth Gordon uses his understanding of user research and IA to improve user experiences and solve business problems. He has recently completed consulting projects for the Nielsen Norman Group and Razorfish. Visit him at, where there isn’t a drop of content about user experience.

Recording Screen Activity During Usability Testing

by:   |  Posted on

Recording what users do is a crucial aspect of usability testing. One of the most useful recordings you can make is a video of screen activity, recording everything on the screen, much like a VCR: the mouse moving, pages scrolling, clicking links, typing in the search terms, and so on.

Recording screen activity doesn’t necessarily cost much. Three Windows-based software programs—Lotus ScreenCam, TechSmith Camtasia and Hyperponics HyperCam—range between $30 and $150.A visual record of these mouse movements, keystrokes, and other activities is most useful for usability testing. While there is no substitute for good observational skills, it can be difficult to remember everything that happened during the test. Having a visual record not only reminds you of what happened, it allows for more detailed analysis after the test and comparisons between individuals.

Recording screen activity doesn’t necessarily cost much. Three Windows-based software programs—Lotus ScreenCam, TechSmith Camtasia and Hyperponics HyperCam—range between $30 and $150 and all have free trial versions available for download so you can try before you buy. All three offer good performance, but unfortunately, I can only recommend two, since the third is no longer being actively developed by its maker.

How to record screen activity
Before we get to the review, let’s take a brief look at the three ways of recording screen activity: a camcorder, a VCR, or software. All the tools described in this article use the software approach, but to understand the benefits and drawbacks it’s useful to compare all three methods.

  1. Camcorder—This is the simplest method. Put your camcorder on a tripod, point it at the screen and record. Although simple, the resulting video will be a bit fuzzy and hard to read. It’s useful for getting an idea of what the user did, but it can be difficult (sometimes impossible) to read small text.
  2. VCR—If your video card has a TV-out option (a feature that’s fairly common on modern video cards) you can probably connect it to a VCR and record directly to tape. The result should be an improvement on the camcorder method, but because the resolution and sharpness of a television is lower than a computer screen the result will still be fuzzy and downgraded from the original image. To get something readable you’ll need to limit your screen resolution to 800×600 at most, preferably 640×480.
  3. Software—In the software solution a program runs in the background, silently capturing everything that appears on the screen and saving it to a video file. The result is a perfect recording with no loss of detail. Each frame of the resulting video could serve as a screenshot. Indeed, that’s one way to think of how the software works: taking series of screenshots and stringing them together into a techno-flipbook (of course the technical details are more involved).

The software approach is the most appealing, but traditionally it’s had one huge drawback: performance. The software has to capture and compress an immense amount of data in real time, without slowing down the machine. When I tested these programs on older hardware they would sometimes bog down so much it took ten seconds for a pull-down menu to appear.

In my tests the performance problem vanished when testing on a 1 GHz machine and a good video card. As I write this, 1 GHz machines are near the bottom end of the scale for desktop PCs. Hardware requirements are no longer the hurdle they used to be.

There is one obvious limitation to the software approach—it will only record what happens on the screen. It won’t record users themselves. If you want to learn something from the body language and physical movements of the user then you’ll still need a camcorder.

Features and requirements
This article arose out of a research project I was doing on how people search. For this project I developed the following set of software requirements. They should satisfy most usability testing situations:

  • Record at 10 frames-per-second and 800×600 in 16-bit color with no noticeable impact on system performance. Obviously lower frame rates, resolution, and color depth would improve performance, but this was my bare minimum. Much of the web doesn’t look good in 8-bit color and even participants in a research project shouldn’t be forced to suffer the indignities of 640×480. While I was willing to settle for 5 frames-per-second, I was hoping for 10 or more. Given a fast enough machine all three programs were able to meet this requirement.
  • Unobtrusive recording. I wanted the capture software to be invisible during recording. I didn’t want users to be distracted or feel anxiety by being constantly reminded of the recording. Most of the tools didn’t completely disappear when recording, but they all reduced to a small and unobtrusive icon in the toolbar.
  • Low cost. I couldn’t spend more than a few hundred bucks.
  • Pause, Fast Forward, and Rewind. Some of the tools use a special video format and thus a special program for playing the video. The playback tool needed have a pause feature and preferably fast-forward and rewind.
  • Immediate playback. My project used a technique known as retrospective verbal reports, more commonly called a “think after.” In this technique the user is recorded while doing the assigned task. When the task is completely they are shown the video and asked to conduct a think aloud. For think afters it’s best to watch the video immediately after the test to minimize forgetting. The only program that had problems here was ScreenCam which required a minute or two to write out the video file after recording. Even for the think after protocol this wasn’t a showstopper.

Those were my required features. There were a few other features I was also interested in but they weren’t critical.

  • Record Sound. All three products can record an audio track along with the video. Of course this requires even more computing power. Since I needed to record video for only part of the session, but audio for the entire thing (participants were interviewed after the think after session), I went analog and used a tape recorder for the audio recording. I didn’t need this feature, but you might.
  • Hotkeys. To minimize futzing with the program during test sessions I wanted hotkeys for important commands like Record, Pause, Play, and Stop. All of the programs had hotkeys. I found this to be a useful feature.
  • Record “raw” data. My dream program would have recorded a separate data stream of every keystroke, every mouse click, every URL visited, and so on. It would have time stamped each event, and automatically correlated it with the video. None of the programs did anything close to this so I had to record this data by hand by reviewing the video. One possible solution here is using a “spyware” program to record this raw data stream and then manually correlate them. I never seriously investigated this option.

Curiously, none of the tools I investigated were designed for usability testing. They’re mainly used for creating tutorial videos and software demos. This means they have a lot of other features that look nifty, but for someone engaged in usability testing they are thoroughly useless and so I’ve ignored them here.

Testing the Software
I wound up testing three software packages: Lotus ScreenCam, TechSmith Camtasia and Hyperponics HyperCam.

I tested the products on three different machines with differing capabilities. (Note: I was only able to test ScreenCam on Machine A because ScreenCam only runs on Windows 95, 98, and NT.)

  Machine A Machine B Machine C
Processor 200 MHz (Pentium Pro) 333 MHz AMD K62-333) 1 GHz (AMD Duron)
RAM 64 MB 320 MB 256 MB
Video Card Matrox Millenium Matrox Millenium II ATI Radeon 8500
Video Card RAM 8 MB 16 MB 64 MB
Windows NT 4.0 with Service Patch 6a Windows 2000 with Service Patch 2 Windows 2000 with Service Patch 2

My test procedure was as follows:

  • Set the display to 800×600 and 16-bit color.
  • Set the frame capture rate to 15 frames per second.
  • Start recording.
  • Start Internet Explorer, maximize the browser to fill the entire screen, and begin browsing the Web.
  • If the performance is not acceptable:
    • Reduce the frame rate until either performance is acceptable or the frame rate is 5 frames per second. Never go lower than 5 frames per second.
    • If performance still suffers and the frame rate has been reduced to 5 frames per second, reduce the color depth to 8-bits (ie: 256 colors). Keep the resolution at 800×600.
    • If it still doesn’t work, reduce the resolution to 640×480. Keep the color depth at 8-bits and the frame rate at 5 frames per second.
    • If it still doesn’t work give up on the program and have a nice cup of tea. I have found that second flush Darjeeling from the Margaret’s Hope Tea Estate to be particularly relaxing.
  • If performance is acceptable:
    • Continue browsing for about five minutes. Visit sites with long pages (so I can scroll), complex layouts, forms, and other features. My standard routine was a few searches on Google, Yahoo, Amazon, Salon, and CNN.
    • Repeat the test at 1024×768. If that works, move up to 1280×1024.

The following aspects of the test environment should also be noted:

  • The browser cache was cleared before each test.
  • No proxy servers were used.
  • The Internet connection was a 384 KBps ADSL line.
  • Only video was recorded. All of the tools can optionally record an audio track. Camtasia and HyperCam can also add sounds and visual effects to certain events like mouse clicks. None of these features were used.

Lotus ScreenCam
Version tested: Lotus ScreenCam for NT
Price: $86

ScreenCam is a story of good news and bad news.

The good news is that it offers excellent performance. When compared with Camtasia and HyperCam on the same machine it had the highest frame capture rate while having the least impact on overall system performance.

The bad news is that according to the web site “there are no plans to create a version of ScreenCam to work on Windows 2000 or Windows XP.” In other words, ScreenCam is a dead product, though you can still buy it. ScreenCam is only available for Windows 95 and Windows NT. It will also run on Windows 98 and variants like Win98 SE and ME, but won’t work with certain video cards (see the website for details).

Results from Machine A (200 MHz)
ScreenCam was the clear winner on Machine A, the oldest and slowest system I tested on. It had the least impact on system performance and captured the most data. The resulting video was smooth and sharp. However, to get good performance at 800×600 I had to reduce the color depth to 8 bits. It worked at 16-bit color but it was noticeably slower. Pages were slower to display and scrolling felt chunky. It worked, but not very well.
If you’re stuck with an old 200 MHz machine and it’s running an older version of Windows, then ScreenCam is definitely your best bet. Even so, you may be forced to go with 8-bit color depending on the system speed.

Results from Machine B (333 MHz)
ScreenCam was not tested on Machine B because it’s doesn’t run on Windows 2000.

Results from Machine C (1 GHz)
ScreenCam was not tested on Machine B because it’s doesn’t run on Windows 2000.

Details about ScreenCam
For ScreenCam to work you need to install special ScreenCam video drivers. I found this surprisingly painless, but it’s unique to ScreenCam. Neither Camtasia nor HyperCam require special drivers. These video drivers are the reason for ScreenCam’s superior performance, enabling ScreenCam to access the video display through low level operating system calls. The downside to this approach is the ScreenCam must be rewritten to support each version of Windows. That’s why it works on Windows 95, and NT, most versions of 98 (depending on the video card), but not at all on Windows 2000 or XP.

ScreenCa m records data to a special file format that can only be played back using the ScreenCam player. The player can be downloaded for free and runs on any version of Windows. The good news here is that while you can only record on Windows 95/98/NT, you can play ScreenCam recordings on any version of Windows, including Windows XP.

When recording is finished, ScreenCam needs to spend time processing and creating the final video. The amount of time this takes depends on the length of the recording. For a nine-minute test at 800×600 and 8-bit color, ScreenCam spent approximately 70 seconds “processing data.” This processing creates a temporary file that can then be played back. But this file still needs to be saved if you want to keep it. In my test this took an additional 30 seconds.

It’s possible to convert ScreenCam videos to standard AVI movie files, but I don’t recommend it. My nine-minute test produced a 58 MB ScreenCam file. When I converted this to an AVI file at 10 frames per second the resulting file was 2.5GB.


  • Better performance than either Camtasia or HyperCam.
  • Can be used even on older hardware.
  • ScreenCam player is free and runs on all versions of windows.


  • Only supports Windows 95, Windows NT, and most versions of Windows 98.
  • No longer being developed. No plans to support Windows 2000 or XP.
  • Requires special video driver (easy to install).
  • Uses a proprietary video format and converting to standard formats like AVI creates huge files.

The bottom line:
ScreenCam had the best performance of any program tested, but the lack of support for Windows 2000 and XP makes it hard to recommend. It’s probably the best choice if you’re stuck with older hardware running Windows 95, 98, or NT.

TechSmith Camtasia
Version tested: 3.02
Price: $150

Camtasia offers excellent performance, the richest feature set, and it runs on all versions of Windows. On Machine C, the fast machine in my test group at 1 GHz, Camtasia had no troubles recording 15 frames per second at resolutions up to 1280×1024 in 16-bit color. Even at 1600×1200 it was able to record 15 frames with only a hint of sluggishness. Camtasia also performed well on the 333MHz machine B. It had no troubles at 800×600 and was only slightly sluggish at 1024×768.

There are only two downsides to Camtasia. It has a lot of features that you probably don’t need for usability testing, and it’s by far the most expensive tool in this review. At $150 it’s almost double the price of ScreenCam and five times the cost of HyperCam.

Results from Machine A (200 MHz)
Camtasia didn’t run particularly well on this machine, but it did run. In 16-bit color at 800×600 I was able to capture 5 frames per second, but just barely. The cursor would flash constantly as the machine tried to keep up, pages loaded slowly, and scrolling felt sluggish. It worked, but it was far too slow for usability testing.

Dropping to 8-bit color made a noticeable improvement. Although performance was much improved, I couldn’t increase the frame rate significantly. I was barely able to capture five frames a second at 1024×768 in 8-bit color.
ScreenCam was definitely better on this system (which is admittedly ancient). Camtasia was almost good enough to be usable at 8-bit color on this machine, but not quite.

Results from Machine B (333 MHz)
Camtasia had no troubles capturing the required 15 frames per second at 800×600 in 16-bit color. Bumping the resolution up a notch to 1024×768 was acceptable, though there was a noticeable pause when loading pages. Performance wasn’t quite smooth, but it was usable. For someone used to browsing the web over a 56k modem the pauses would probably seem normal. At higher resolutions Camtasia began to bog down.

Still, this was a significant improvement. Machine B is roughly 60 percent faster overall than machine A, but where Camtasia was just barely able to capture 5 frames per second at 800×600 on machine A, it grabbed 15 frames a second on machine B with no performance impact and even worked well at 1024×768.
Results from Machine C (1 GHz)

Camtasia performed flawlessly on this machine. It recorded 15 frames per second at resolutions up to 1600×1200. There was a slight sluggishness at the highest resolution, but nothing significant. The machine was still perfectly usable. At lower resolutions there were no performance degradations.

Details about Camtasia
When you buy Camtasia you actually get three pieces of the software. There is Camtasia Recorder for recording the video, Camtasia Player for playing the videos, and Camtasia Producer which is a basic video editing tool.
Camtasia also requires that you install a special Camtasia video codec called TSCC (it’s free). Using TSCC dramatically reduces the size of captured video files without any loss in image quality. One of my Camtasia tests ran 19.5 minutes in 800×600 at 16-bit color. The resulting video file was 36.8MB. Installing the codec is easy and quick (and doesn’t require rebooting your system).

An important trick to using Camtasia is the “hardware acceleration” setting. It’s counter-intuitive, but turning hardware acceleration off results in a dramatic performance improvement. With hardware acceleration on, Machine B was chunky and sluggish at 800×600. When I turned it off, this sluggishness vanished.

The hardware acceleration option is actually a Windows setting and has to do with your video card. Camtasia has an option to automatically disable acceleration when you start recording and enable it when you recording ends.
Camtasia will automatically attempt to determine the best video and audio capture rates. For my tests I elected to set these values manually, but I also ran tests to see how the auto-detect feature worked. No complaints here.
Unlike ScreenCam, the Camtasia video was available immediately after recording. No post-processing was required for straight video. If sound is being recorded, Camtasia records it in a separate file. When recording is stopped, Camtasia merges the two files. During a five minute test merging the audio and video streams took about 15 seconds on the 333 MHz machine B.

Camtasia has a wealth of other features. I won’t go into all of them, but here are the highlights:

  • You can choose to capture the entire screen, a single window, or a specific region of the screen.
  • Although Camtasia includes a free Camtasia player, you don’t need to use it. Any video player will work so long as you have the TSCC codec installed.
  • Camtasia Producer is a video editing tool for combining, editing, and otherwise munging your videos. None of the other tools included something like this.
  • Camtasia also sells an Software Developer Kit (SDK) “to allow you to easily add screen recording functionality into your Windows application.” The SDK is available as a separate product. None of the other tools offer a similar package.


  • Excellent performance.
  • Excellent features.
  • Easiest to use of the programs tested.
  • Supports all versions of Windows, except for Windows 95 (but does support Windows 95 OSR2).


  • The most expensive tool reviewed. At $150 it’s almost twice the cost of ScreenCam and five times more than HyperCam.
  • Includes features you probably don’t need for usability testing (like Camtasia Producer).

The bottom line:
Camtasia offers the best blend of performance, features, and ease of use among the programs tested. It runs on every version of Windows (except the original Windows 95) and installation is a snap. The only drawback is price, but at $150 it’s still within the range of almost every budget. Highly recommended.

Hyperponics HyperCam
Version tested: 1.70.03
Price: $30

HyperCam is by far the cheapest of the products tested, yet it probably has all the features you need for usability testing. It offers slightly less performance than Camtasia, but at one-fifth the price. Almost any machine you buy today will have enough spare computing power to make up the difference. The biggest drawback to HyperCam is that it’s a little harder to configure properly. Most of these are minor and, considering the price, you may be willing to live with them.

Results from Machine A (200 MHz)
HyperCam performed almost as well as Camtasia on this machine. It was barely able to capture 5 frames a second at 800×600 in 16-bit color. It performed much better at 8-bit color. As with Camtasia it wasn’t great, but it did work at the reduced color depth and at modest frame rates, though not well enough to use for usability testing.

Results from Machine B (333 MHz)
HyperCam required a bit of coaxing to get it working properly on this machine. Once I got the settings right, which took some fiddling (more on this below), it captured 15 frames per second at 800×600 in 16 bit color. At 1024×768 I could do no better than 11 frames per second, but performance was smooth. Overall Camtasia performed better on this machine, but HyperCam’s performance was certainly acceptable.

Results from Machine C (1 GHz)
On this, the fastest test machine, the difference between Camtasia and HyperCam was almost negligible. HyperCam had no troubles with the base requirement of 15 frames per at 800×600 and 16-bit color. Even at 1024×768, 1280×1024, and 1600×1200 HyperCam was able to capture a full 15 frames per second with little or no performance problems.

Details about HyperCam
HyperCam has most of the same features and options as Camtasia, but I found it a little harder to use. For example, HyperCam lets you capture either a window or any rectangular region of the screen. Camtasia does this too, but it also has a one button feature for capturing the entire screen. To capture the entire screen in HyperCam you have to first define a region which covers the entire screen and then press record.

Admittedly this is a little thing. But there are three other “little things” related to performance that I found frustrating. Once I figured them out HyperCam worked like a champ, but until I figured them out HyperCam left me unimpressed.

The first of the “little things” is hardware acceleration. Like Camtasia, HyperCam works best if the video hardware acceleration is turned off. Unlike Camtasia you have to muck around with the Windows display properties to turn this off, then run HyperCam, and when you’re finished recording you have to turn it back on. Camtasia has a “Disable display acceleration during capture” checkbox that automatically disables acceleration when you start recording and enables it when recording is finished. A small but helpful touch.

The second little thing is the frame capture rate. Camtasia will automatically try to determine the best frame capture rate for your system. You can also set it manually, and if you set it too high Camtasia will automatically drop frames and keep recording (though the system will probably slow down).

HyperCam takes a different approach to frame capture rates. First, there is no auto-configuration option—you must set the frame rate manually. This isn’t a big deal, but if you set the frame rate too high HyperCam will start recording, then stop suddenly and display an error message saying the frame rate is too high. In my tests I started at fifteen frames per second and lowered the frame rate step-by-step until HyperCam stopped complaining.

The third little thing is the video codec. HyperCam lets you select which video codec to use for the recording. Since most users (including me) know nothing about video codecs, HyperCam has an autoselect feature which is “Strongly Recommended.” Unfortunately, HyperCam was much slower than Camtasia when I chose autoselect.

Wanting to give HyperCam a fair shake I decided to try other codecs. Scanning the list I saw an entry for the “Techsmith Screen Capture Codec.” This is the codec that Camtasia installed (TSCC). When I tried recording with TSCC, HyperCams’ performance shot up to the point where it ran almost as fast as Camtasia.

In other words, HyperCam by itself has some performance problems, but you can overcome these problems by using the TSCC codec from Camtasia. I have been unable to find any reason why this would not be allowed. The TSCC codec from Camtasia is available as a free download and I had no technical difficulties using it with HyperCam.


  • Inexpensive. At only $30 USD, that’s one-fifth the cost of Camtasia.
  • Supports all versions of Windows.
  • Performs almost as well as Camtasia as long as you’re using Camtasia’s TSCC codec.


  • Not as many goodies and features as Camtasia, but probably enough for the usability professional.
  • Harder to use and configure for decent performance.

The :
My first impression of HyperCam was that for $30 I was getting what I paid for. But once I fiddled with it and found the “secret” of using Camtasia’s TSCC codec, I was entirely satisfied. Unless you need the extra features of Camtasia, HyperCam will probably do the job (but download the trial version and test it to make sure).

Summary and recommendations
Before you make a decision I strongly recommend that you download these programs and try them yourself. It’s the only way to be sure you’ll get acceptable performance on your hardware. The trial versions are free, installation is a snap, and running some basic tests will take just a few minutes.

Camtasia is clearly the best of the bunch, but it’s also the most expensive. With a bit of fiddling, HyperCam will perform almost as well as Camtasia for a fraction of the cost. Camtasia has a lot more features, especially since it includes a basic editing and production tool, but for usability testing the programs are roughly equivalent when it comes to features.

I can’t recommend ScreenCam. While it used to be the gold standard in this area, it’s now a dead product with no future.

Choosing between Camtasia and HyperCam is difficult. I preferred Camtasia for it’s ease of use. It’s a slightly faster and more polished than HyperCam. Still, HyperCam is a bargain at $30 and it’s probably worth the fiddling required to make it perform well.

  ScreenCam Camtasia HyperCam
Purchase Options
Cost (USD), single copy $86.00 $149.95 $30.00
Free trial for download? Yes (15 days) Yes (30 days) Yes (no time limit, but all videos are stamped with message saying it’s unregistered)
Buy online? Yes Yes Yes
Site license available Unsure Yes Yes
Educational discount Unsure Yes No
Platform Support
Windows 95 Yes Yes (only on Windows 95 OSR2) Yes
Windows 98 (including Win 98, 98 SE, and ME) Yes (may not work with all video cards) Yes Yes
Windows NT Yes Yes Yes
Windows 2000 No Yes Yes
Windows XP No Yes Yes
Recording Features
Hot-keys to start, stop, and pause recording Yes Yes Yes
Record sound Yes Yes Yes
Record full screen Yes Yes Yes
Record any region No Yes Yes
Set frame capture rate No Yes Yes
Choose codec used for recording N/A Yes Yes
Hide when recording Yes Yes Yes
Playback Features
Pause Yes Yes Yes
Fast Forward Yes Yes Yes
Reverse No Yes Yes
Special player required? Yes.
The player is a free download. It runs on all versions of Windows, including 2000 & XP. You can only record on 95/98/NT, but you can playback on anything.
A special player is available as a free download, but any video player will do as long as you’ve got the TSCC codec installed. The codec is a free download.
Any video player will work. If playing back on a different machine you must have the codec installed that was used for recording.

Karl Fast was an information architect at Argus Associates. He is currently pursuing a Ph.D. in information visualization at the University of Western Ontario.

Customer Experience Meets Online Marketing at Brand Central Station

by:   |  Posted on

“Marketing” covers a broad range of activities that together represent how easy you can make it for people to do business with you. Customers need to discover your offering, learn about it, compare it, and form a desire for it that is strong enough to compel them to shell out hard-earned cash to acquire it.

A brand is not a name. A brand is not a positioning statement. It is not a marketing message, a jingle, or a logo. A brand is the culmination of all of the interactions that all the people in a marketplace have with the firm.“Customer Service” activities revolve around how easy you make it for people to buy and use your offerings. Once they’ve shelled out the cash for the product, can they figure out how to use it? Can you and do you help them?

“Customer Experience” is all about how your prospective and current customers perceive your company, based on the effort they had to expend accomplishing the above tasks. If the word “brand” pops into your head, you may go to the head of the class.

The sum total of an individual’s experiences with your company will color his perception of it and build a brand image in his mind. A brand is not a name. A brand is not a positioning statement. It is not a marketing message, a jingle, or a logo.

A brand is something that lives apart from what the company plans, because it is the culmination of all of the interactions that all the people in a marketplace have with the firm:

  • A person sees an ad and forms an impression.
  • She looks up information on the web, and her impression changes.
  • She calls the firm and talks to the receptionist, and her impression changes.
  • She is put on hold and hears the music and “Your call is important to us.”
  • She talks to a sales rep.
  • She waits for the materials to arrive.
  • She reads the materials.
  • She talks to her colleagues about the product.
  • She reads about the firm in the financial pages.
  • She reads product reviews.
  • She makes the purchase.
  • She sees and feels the product packing.
  • She tries to use the product.
  • She calls customer service.
  • She talks to her friends about her experience.

There are only a few touch points where a company can exercise any serious control over that brand-building series of customer interactions: the advertisement, the marketing materials, the packaging, the product itself, and the web.

When a prospective or current customer calls your firm, the receptionist may not be having a great day, and that fact may make itself heard across the phone line. Your sales rep may be more worried about a commission and a house payment than listening closely to a potential client. Once a product is launched and on the shelves, you can only hope for great product reviews.

To influence our customers’ opinions and impressions about our goods and services, we want to use the most consistent, trustworthy, high-impact means we can to build a brand. Since we can’t control our employees’ moods, a company’s website is one of the most powerful tools available.

Your website represents your company in a very visceral way. Like advertising, you can use colors, pictures, copy, voice, music, and more to communicate your desired brand. Many companies stop at this cosmetic level and overload their sites with animations, a practice that has given rise to a new addition to our common lexicon: Skip Intro.

Your website also paints a picture of whether your firm is open and generous with information, or reticent and secretive. It shows how willing you are to inform, how hard you are willing to work to educate, and how freehanded you are with information about your abilities and goals. But a much more subtle message than all of these is how well your website actually works.

Think about the last time you praised a company to a friend or associate. You
probably used words like, “professional,” “easy to work with,” “capable,” and “on the ball.” Wouldn’t we like all the firms we do business with to have those attributes? Are people who visit your website left with the impression that your firm is on the ball?

Most companies that bother to interview or survey site visitors often ask questions that don’t quite dig beneath the surface of customer experience. They ask questions such as:

  • Did you like the visual appearance and layout?
  • Did you like the content?
  • Did you like the style, tone and character of the site?

Questions about customers’ responses to site characteristics are valuable, to be sure, but they are missing the customer experience side of the equation: what is it like to use the site? Customer experience questions are more along these lines:

  • Why did you visit our site?
  • Did you find what you were after?
  • Did you run into any trouble?
  • Was it fast enough?
  • What did you like best?
  • What did you like least?
  • Are there any features you like on competitor sites?
  • What would you like to change on ours?
  • What one button would you add?

These are akin to asking a customer if they felt the company was easy to work with, professional, and on the ball.

Customer experience, the building of a brand image in the mind of the customer, is the culmination of all touch points, be they outbound corporate communications or a conversation over the backyard fence with a neighbor who just found a series of 404s on your site.

Because your website is so interactive, and because it is such a strong indicator of how much you devote to potential customers, it becomes one of the most important tools you can use to impress and delight customers and convince them that your firm is the most qualified and the most deserving of their business.

Your website is also the single most measurable means of communicating with the world. In my consulting work with large corporations, I have only discovered a few that understand the brand-building power of the web well enough to bother measuring their success there in a serious way. These few understand that there are quantitative ways to calculate the answers to questions like:

  • How good are we at attracting attention to our site?
  • How easy is it for people to find what they’re looking for?
  • How quickly are we improving the conversion process?
  • What is the value of adding additional content?
  • Want impact does online customer service have on lowering costs and increasing customer satisfaction?

Customer experience is no longer an entirely soft science.

The capacity to measure your customers’ experience on your site is not simple, nor is it inexpensive. But the power of assigning numerical values to customer experiences is being used by a handful of forward-thinking companies, and it will allow them to make the most of the web.

At Hewlett-Packard, for example, the team responsible for web metrics tracks what they call the “Seven Recipes”:

Optimize lead generation
Which attention-getting methods are bringing in the most—and the best—traffic to your website? You can get millions of people to show up by promising free money, but if they’re not qualified buyers, you’re wasting time and resources and giving your brand a black eye. HP tracks leads from the banner ad to the purchase page to make sure their ads are making sales, not just building traffic.

Tune your pages
Are your pages helping people find what they’re after or sending them away? HP measures the number of people who click through each page. They also measure how quickly pages load, and whether customers have good things to say when surveyed about specific pages, rethinking pages that fall below the average threshold.

Optimize navigation
How hard is it for a visitor to get from a landing page to a conversion (shopping cart) page? How many clicks? How much reading? Perhaps even more importantly, are people taking the path you laid out for them? Or, in their confusion, are they wandering through other pages to get there? Is there a lot of “pogo-sticking” going on, where people click back and forth endlessly from a main navigational page to various content pages?

Enhance search
Do people find what they’re looking for when they use your internal search engine? Does it point them to a press release about your product instead of the product home page? HP hardwires some product page links to specific search terms to improve findability.

Optimize merchandising
HP constantly watches what gets put into their shopping carts. Which items sell together? What’s the best way to notify a customer who is interested in Product A that they might also be interested in Product B?

Tune entry points
Landing pages are the bridge between your advertising and your persuasion pages. HP looks at click density maps of each landing page to see if the dispersion ratio is in line with the rest of the site. If people arrive at a landing page and click through (disperse) to different sorts of content than what you intended, your promotion is missing the mark.

Analyze sales funnel
Given a three-, four-, or ten-step process to find, choose, configure, and purchase something on your site, where are the trap doors that people fall through on their way to check out? Where do most people abandon the process?

Throughout their analyses, HP has a very straightforward methodology. They review the reported numbers, surmise the reasons for those numbers, decide what to change in order to affect those numbers, and then closely measure the results of those changes.

Can you measure whether your customers are impressed with your brand because they are having a good experience on your website? There’s no spreadsheet you can download that will answer all your questions. But we’re definitely making progress.

Jim Sterne is an internationally-known speaker and a consultant to Fortune 500 companies, with 20 years experience in sales and marketing, and the author of “Web Metrics: Proven Methods for Measuring Web Site Success.”

What You Should Know About Prototypes for User Testing

by:   |  Posted on

There are several important factors to consider when you are planning to do prototyping for user testing. You will want to make careful choices about fidelity, level of interactivity and the medium of your prototype.

Degree of fidelity “An information architecture wireframe is NOT graphic design. I swear, it’s really not!!!”Fidelity is the degree of closeness to the “depth, breadth and finish of the intended product” (Hakim & Spitzer). Opinions vary a great deal on how much a prototype should resemble the final version of your design. Usability practitioners like Barbara Datz-Kauffold and Shawn Lawton Henry are champions for low fidelity —the sketchier the better! Meanwhile, Jack Hakim and Tom Spitzer advocate a medium- to high-fidelity approach that gives users a closer approximation of a finished version. You’ll want to make a decision about the right approach for you based on the needs of your project.

Low fidelity
You can use hand-drawn sketches to create a paper prototype. If you go this route, you may also want to help your users get into the spirit of things during the test by creating a complete low-fidelity, paper environment. This could include a cardboard box made to look like a computer and an object to hold to point and click with. These techniques help users to suspend their disbelief and get their imaginations involved so that they can better visualize the interface. The advantage of using rough sketches is that users will have an easier time suggesting changes. They may even grab a pen and start making their own changes (Datz-Kauffold and Henry).

In theory, low-fidelity sketches are also a time-saver, but this really depends on your point of view. Personally, I like to draw diagrams and wireframes in Visio where I can revise and move things around without erasing and redrawing. If you prefer to work this way too, and if time allows, you can always have those Visio drawings hand-traced or use them as a roadmap for making sketches to test with. You might even find a graphics tool with a filter that will convert a Visio-generated graphic into a hand-drawn sketch with wavy lines.

High fidelity
This approach takes you as close as possible to a true representation of the user interface —screen-quality graphics. All of the blanks on the page are filled in, and it looks good. However, you might not have all of the technical or backend problems worked out yet, or you might have only a small part of the entire site rendered. That’s why it’s still considered a prototype. For example, it might consist of a small series of Photoshop images or HTML pages with just enough functional links to convey the feel of the site’s flow. You may need to enlist the help of a graphic designer or web developer to build these in a reasonable amount of time. Advocates for high-fidelity prototypes argue that they are easier for users to understand just by looking at them. There is no disbelief to overcome, and it is easier to determine when they really do not understand the design. If you choose a high-fidelity prototype, make sure the you have enough of the design fleshed out so that users can complete several tasks. Decide on these tasks early, so you know which areas of the design need to be represented for your tests. Otherwise, you will be in for a great deal of preparation work.

Medium fidelity
In the grand tradition of Goldilocks, I find myself drawn to the middle approach. A medium-fidelity approach tends to include some visual design and a level of detail somewhere between high and low fidelity. Does this sound familiar? As an information architect, I’m accustomed to creating wireframes I can hand off to decision-makers, graphic designers, web developers and programmers. An information architecture wireframe is NOT graphic design. I swear, it’s really not!!! But… I’ll admit that it has enough visual design to convey a rough version of the user interface. Because I create these with drawing tool software, they tend to have more polish than hand-drawn diagrams. Hakim and Spencer are champions for medium-fidelity prototypes because they fit more seamlessly into the design process while providing more realism for users. I found this to be true during a project to design a search interface for Egreetings with my colleagues at Argus. I created rough draft wireframes for the prototype, and after testing I revised them for use in my deliverables.

Interactivity describes how your prototype behaves. Does your prototype react to user inputs with feedback? Can they “click” on something to go to another page or fill in a form? Will buttons appear to depress and drop-down menus work?

Static prototypes
Prototypes used for testing are static if they are pages or page elements shown to users, which don’t provide any feedback. It can sometimes work well to show a page to a user and ask them to explain it to you or to guess where they can go from here. In this kind of test, the user interprets the prototype rather than interacts with it. This is a good way to validate your design by checking to make sure users understand it. It’s also easy to score this sort of test when you have a standard list of questions to ask about each page.

Automated prototypes allow users to make choices that cause changes. The testing prototype provides the user with feedback. Elements are “clickable” and forms can be filled out. The interface reacts to the user while the tester observes. One way to do this is to create the prototype in HTML or some application that allows interactive elements such as Flash, Visual Basic or even PowerPoint.

Another way to achieve a kind of pseudo-automated interactivity when you have chosen a paper prototype is to pretend (Datz-Kauffold and Henry). Have you ever seen children at play pretend that they are driving a car by setting up chairs for the front and back seats, drawing a dashboard on a cardboard box, and using a Frisbee for the steering wheel? If you have set up the right environment for your users, you can ask them to pretend scraps of paper on a table are their computer screen. When they “click” on a drop-down menu by touching the element with a pointer, a tester assigned to the role of the computer provides feedback by swapping the closed menu for an open one that shows choices. The “computer” may need to write on some elements before showing them to the user, i.e., “Your search retrieved 523,621 hits.” It takes a few minutes to get test participants used to the idea, but if you encourage them to have fun with it you will learn a great deal. You can also easily try out different possible reactions to user input.

This method worked well during the Egreetings project. We especially emphasized the technique of asking the users to click and then provide feedback. We found it useful to laminate the screen components so we didn’t need to produce a clean copy of the test for every subject. The users could write on the laminated pieces with thin whiteboard markers when making selections and entering search criteria. Of course, this meant that we needed to take careful notes because of the need to erase between each test subject.

Here are some other tips to try for low-fidelity testing with simulated interactivity:

  • Bring extra paper so you or the respondent can sketch out an idea if the opportunity arises.
  • As with any user test, it really helps to ask the respondent to think aloud.
  • If you have the luxury, bring a team of three to the test: someone to take notes, someone to play the “computer” and another to facilitate.
  • Use a piece of posterboard as your “screen.”
  • Cut your design into separate pieces or zones as appropriate and ask the user to rearrange them in the order they prefer.
  • Attach the folder tabs that come with hanging files to components so they are easier to grab.
  • Invite users to throw away or cross out components that they don’t think are important.
  • Number the pieces so that you can easily refer to them in your notes and keep them organized.
  • If you do decide to bring separate copies of the test materials for each session, tape down the components to a larger piece of paper as arranged by each user so you have these artifacts to analyze later.

Prepare a kit for yourself containing:

  • Scissors and tape,
  • Different sizes and varieties of sticky notes (which make great drop-down menus),
  • Markers and pens in various colors and sizes,
  • Paper clips and binder clips for keeping slips of paper organized, and
  • Objects that the user can pretend are the mouse pointer, such as a feather or a small toy.

There are many possible combinations to choose from for building your prototype. One of the first choices to make is whether you want to have your prototype viewed on an actual computer screen or if you’ll be working on a tabletop with a paper prototype. Believe it or not, fidelity and interactivity are independent of the medium you choose. It’s probably most natural to think of the extreme cases. An automated HTML prototype is often high-fidelity and, of course, the medium is a computer screen. Likewise, a natural medium for a low-fidelity automated interactive prototype is hand-drawn sketches on paper. However, you can also have the following:

  • Low to medium-fidelity wireframes built in PowerPoint that show only lines and boxes with text;
  • animation features provide automated interactivity,
  • Static Photoshop prototype pages shown to users on a computer screen, or
  • Same as above, but printed out in color on paper.

Mixing the variables
You can mix these three variables (fidelity, interactivity and medium) in many different combinations. The exact combination you choose should match the goals you determine for your testing. Possible goals for an IA prototype include:

  • Testing the effectiveness of labels and icons.
  • Finding out the right balance of depth and breadth of a topical hierarchy.
  • Determining the right options to offer for narrowing a search.
  • Choosing the most important metadata elements to show on a search results screen.
  • Settling the question of whether your target audience accomplishes tasks better with a task-oriented organization scheme or with a topical organization scheme.

If you live and breathe NetObjects Fusion and don’t have much time, your preference might be to create a medium-fidelity prototype. That way you could test that sitemap you are working on using some rough placeholder graphics or text instead of the finished graphic design. How you mix the variables depends on the time and budget you have available, as well as your work style. Try experimenting with different approaches to learn how prototyping will work best with your design process.

For more information

  • Evaluating Information Architecture,” Steve Toub (2000).
  • UPA 2000 Proceedings:
    #28 – “Waving Magic Wands: Interaction Techniques to Improve Usability Testing Low-Fidelity Protoypes,” Barb Datz-Kauffold & Shawn Lawton Henry.
    #32 – “Prototyping for Usability,” Jack Hakim & Tom Spencer.
  • “Prototyping for Tiny Fingers,” Marc Rettig, Communications of the ACM, Vol.37, No.4 (April 1994). (ACM Membership required)
  • Using Paper Prototypes to Manage Risk,” User Interface Engineering.
Chris Farnum is an information architect with over four years’ experience, and is currently with Compuware Corporation. Three of those years were spent at Argus Associates working with Lou Rosenfeld and Peter Morville, the authors of Information Architecture for the World Wide Web.

Slate: Calculated Refinement or simple inertia

by:   |  Posted on

Before we get started, I just wanted to note that my comments are intended to supplement the diagram, rather than vice versa. So be sure to download the PDF version of the diagram to get a full understanding. That said…

No matter how you look at it, publishing content on the web daily is a lot of work. From an information architecture perspective, a daily web publication presents challenges and possibilities no newspaper editor ever had to face. As one of the longest-running daily publications on the web, Slate has dealt
with these issues for years. But it is unclear whether the site’s
current architecture is the result of calculated refinement or
simple inertia.

The architectural decisions here demonstrate one key assumption about the site’s content: the ‘shelf life’ of any given article is about seven days. Navigating to a piece during those first seven days is fairly easy; after that, it becomes very hard.

At a glance, the high-level architecture seems fairly straightforward. But a closer look reveals that the five primary ‘sections’ exist only in the tables of contents. These categories appear nowhere else on the site—not even on the articles themselves. Furthermore, the classification of articles into these
categories only persists for seven days from the date of publication. After that, the section to which a piece belonged is forgotten.

Note the absence of an ‘archive’ area. The only access to articles more than seven days old is through the advanced search page. In place of a browsable archive, Slate offers canned searches by “department” and by author. The author list page works well enough, though such a feature would only be useful in the event that a user already knew the name of the author of a desired piece; but if that were so, the search interface would be sufficient.

The department list page has a greater burden to bear. As the only persistent classification scheme employed on the site, the department list is the only element that can provide the reader with a sense of the range of content and subject matter covered on the site. But the page currently falls far short of this goal. What the user faces here is nothing more than a very long list
that makes no distinction between limited-run features like “Campaign ’98”; occasional, semi-regular features like Michael Kinsley’s “Readme”; and ongoing staples like “Today’s Papers.”

This problem is only exacerbated by the fact that, by and large, the department titles are too clever by half. Even the savviest user could be forgiven for having trouble remembering whether Slate’s roundup of opinions from movie critics was filed under “Critical Mass” or “Summary Judgment.” The cute titles would be fine if the site provided some sort of context for what was to be found inside; as it is, providing a plain list of titles like “Flame Posies”, “Varnish Remover”, and “In the Soup” does little to help readers find specific items or even get a general sense of what the site has to offer.

Letter-sized diagram ( PDF, 41K)

Note: The date on the diagram indicates when the snapshot of the system was taken. Slate may be substantially different now.

Finally, I wanted to find out what sites you’d like to see me diagram in the future. You can post your suggestions here.

Jesse James Garrett is one of the founders of Adaptive Path, a user experience consultancy based in San Francisco. His book “The Elements of User Experience” is forthcoming from New Riders.

Got Usability? Talking with Jakob Nielsen

by:   |  Posted on
Photo of Jakob Nielsen

Jakob Nielsen is the usability guru who hardly needs an introduction. But for the sake of completeness we’ll mention he’s the co-founder of the California-based consultancy, Nielsen Norman Group, and has been crusading against bad web design for years through his biweekly column, The Alertbox, and his numerous books. He’s brought usability to the attention of the general public, but within the user experience community he’s been criticized by those who say he emphasizes a puritanical view of utilitarianism that excludes other dimensions of user experience. Oh, and did we mention he’s the man who launched a thousand parody sites?

So is Nielsen the defender of ease-of-use or the enemy of creativity? We talked to the controversial Dane, and you might be surprised…

B&A: What are some of the toughest design challenges on the web today?

Nielsen: I think to get a really big jump in usability, because I think we can make a website that can show a few things quite well, if you have a few products. We can also do a huge database and you can search it, and it works reasonably well.

But I don’t think we really have a handle on getting the average person through the vast number of things that a website can offer. If you narrow it down and show a few things, yes, if you assume that they are capable doing a lot of data manipulation. But I think there’s a large number of cases that do not fall into one of those two categories. You can go to CNN and see the five big headlines of the day, and that works fairly well. You can go to Amazon and you can buy my book, for example, if you know the name of the book. But in the intermediate case of having a website with 10,000 articles and finding the one that’s right for you, which is quite often the case on a tech support website … basically doesn’t work at all.

B&A: What types of research interest you the most?

Nielsen: How to get usability out to the masses. When I say masses, I mean web designers, not users. Right now we have about 30 million websites, and we will have up to 100 million in three to five years. That’s a large number of design projects. How many usability people are there in the world who are in any way qualified? At the most, maybe 10,000 or so.

Therefore, we know that we’re not going to have this number of web projects done according to the recommended old methodology. So, even what I’ve been pushing in the past—more efficient, quick usability methodologies—is not good enough when you have that number of design projects. We need to have several orders of magnitude improvement in the efficiency of usability to really impact that number of design projects. Can we do things like encapsulate usability knowledge in guidelines such that an average designer can actually apply them?

B&A: What do you feel is the relationship between a usability professional and a designer?

Nielsen: I think they could play two different roles: either that of an editor and a writer, or a professor and a student.

In the more integrated projects, which is the preferred way to do it, I think it’s more like the editor and the writer, where the designer will come up with things just as the writer would write the article, and the editor will make it better, will know what the readers need and how to present it in a good way and help the writer improve their article. I have never met a professional writer who didn’t like to have a good editor. There often seems to be a conflict between designers and usability people, but I think that once you conceptualize it as the usability person helping to improve the design, then I think it goes away.

But you’re going to have a lot of designers who don’t have a usability professional in their team. So the vast majority of them just have to learn what the principles are that work well with users from usability professionals, and then it becomes more of an educational mission. So the relationship is more like that of the professor and the student. The student is the one who has to go do it at the end of the day, but the professor is the one who has the knowledge, having had done all the research in the past and can tell the student what works well.

B&A: How do you react to designers who have strong feelings about usability in one way or another?

Nielsen: I think that designers that don’t want usability are misguided because it’s really just a way of helping them achieve a better design. Some of them just reject the goal of having a design that’s easy to use. If you have the goal of a design as actually trying to accomplish something, then you’re more in the art world, and if the project doesn’t have a goal, then maybe it’s appropriate—design for design’s sake. But if you do design to actually accomplish something, then I’d argue that it has to be easy to use, so I don’t think that it’s appropriate to reject the goal of usability if your project has to accomplish something. Design is creating something that has a purpose in life; art is creating for the sake of creating — that’s my distinction between those two terms.

Whether they want to get usability from someone who knows about it, or whether they want to find it out themselves … can be debatable. How did any of us become usability specialists in the first place? Only by doing a lot of the research and studies. Any designer could do that as well if they bothered. They don’t have to get it from us, but then I would argue that they would need to do it themselves.

B&A: Is there a particular reason you advocate for using guidelines? I’ve heard people say that it comes off as overly dogmatic to simply have a huge list of guidelines.

Nielsen: Experience says that usually these work — usually, but not always. Usability guidelines always need to be applied with a certain amount of understanding as to when they apply and when they don’t apply. If a set of guidelines is written well, then usually they will apply, and it will be the exception when they don’t apply. You have to acknowledge that on one hand it may be that only 90 percent of the guidelines apply … so you can’t violate all guidelines, you can only violate some if you have a good reason to do so.

Some people may not understand the difference between a guideline and a standard. A standard is something that is 100 percent firm, and a guideline is something that is usually right — that’s why it’s called a guideline.

B&A: What’s the difference between a standard, a guideline, and a heuristic?

Nielsen: You get even more vague when you get into the area of heuristics. Heuristics are things that are rules of thumb, so they are very vague and very broad. At the same time, they are very powerful, because they can explain a lot of different phenomena, but that explanation has to be done with a lot of insight, and that is what’s more difficult. One of the lessons from a lot of my research is that heuristic evaluations indicate how to adjust an interface relative to these general principles of good usability. It’s fairly difficult to do well. Anybody could do it to some extent, but they couldn’t necessarily do it very well, and you have to have a large amount of experience to do it well.

On the average design project today, they don’t have that amount of usability expertise on their team, and therefore we’ve got to give them something more complete that it’s easier for them to deal with. It’s a matter of the usability of the usability principles, really. If we make them more specific, they become more concrete, they’re easier to interpret, and … easier for the designers to judge when they do not apply.

B&A: What’s the difference between someone doing a heuristic evaluation solo versus doing it in a team?

Nielsen: The way I developed heuristic evaluations back in the 1980s was meant to be an interaction between solo and the team, because you first do it individually, and then you combine a few people who have done the heuristic evaluation. That’s done very rarely, because it’s rare that a project team will have that many people on board who really know about usability.

“(I)t’s not a matter of intuition. It’s a matter of being very good at pattern matching, being able to spot small things, and hold together the big picture of what that really means.”

A common mistake about heuristics is thinking that it’s just a list of complaints. It’s not a list of complaints, it’s a list of issues relating back to the underlying fundamental principles. When you say that this button is wrong or this flows wrong, you say it’s wrong because it violates this well-known usability principle. And then, of course, people can argue. They can say, “no, it does not violate this principle,” and then you would have a discussion about that, which is a great method of illuminating and getting insight into the design.

B&A: What are the most important skills for a usability specialist to have?

Nielsen: I would say experience. It’s an unfortunate thing to say, because you can’t acquire experience other than by doing it. This is a discipline where you will always start off being bad and you end up being good. You only get to be good by slogging through several initial projects where you didn’t do that well, and then you get better and better. I think that being a truly great usability specialist comes from having 10 years of experience and having seen a very large number of different designs, different technologies, different types of users — a very broad variety of experience.

The benefit of usability, though, is that it is such a powerful method, and the return on investment is so huge that even if you don’t do that great a job at it —maybe you don’t get a return of 100-to-1 and you only get a return of 20-to-1 — that’s still a huge return investment. Even the very first usability project someone does, and they mess up everything, it’s still going to be positive, and it’s going to be a great learning experience for them personally, and their team is going to get value out of the investment as well. Just keep doing it and doing it and doing it.

It’s very much of an analytical and interpretive discipline as well. Intuition is completely the wrong word to use — it’s not a matter of intuition. It’s a matter of being very good at pattern matching, being able to spot small things, and hold together the big picture of what that really means. That’s where experience helps you — it helps you to do pattern matching and match patterns you’ve seen before, and the more things you’ve seen before, the better you can do that.

There’s definitely a big evangelizing and propaganda component as well, so having good communication skills is very important too.

B&A: Are there any usability specialists you particularly admire or whom you took guidance from?

Nielsen: I did actually. I’ll say that two of them are actually colleagues at my company, Don Norman and Bruce Tognazzini. They are two incredibly great people. Another one I’d like to mention who’s now retired is John Gould. He worked at IBM in the 1980s. He developed a lot of the early approaches and for any question you could come up with he’d say, “OK, you can do a study of that.” He was just such an empirical guy that it was incredible.

Another person is Tom Landauer, who worked at Bell for many, many years. I was privileged to work with him for four years when I worked there as well. He was very much on the measurement side: “We can quantify this. We can estimate these things.”

I’d like to mention one more person … I never worked with, Ted Nelson, who was the guy who kind of invented hypertext. He got me into this feeling that we shouldn’t accept computers being difficult, that computers can be a personal empowerment tool. I read a lot of his writings when I was in grad school. His writing is really what got me going in this area in the first place back in the 1970s.

B&A: How many users do you yourself observe in the average month?

Nielsen: I probably sit with too few users, actually. Probably less than 10. It ought to be many more. In my own defense, I’ll say that I’ve done it for many years, and the learning is cumulative. I run a lot of projects where someone else will sit with the user, but I’ll still monitor very closely what goes on. I would still say that it’s very important to sit with the user as well. People should continue to do that forever — you never get enough of that. In particular, for someone who’s starting out in usability, I would say 20 or 30 a month would be a good goal to have, so that you can try to run a study every week.

B&A: Will there be new methodologies for user research in the future, or will we keep refining the ones we have right now?

Nielsen: I think mainly we will keep refining the ones we have. Of course, you never know if some completely new thing will come up, but I think it’s not likely. The classic methodology was developed in the 1970s and early 1980s. John Gould was one of the big people doing that and I learned a lot from him. That was pretty much established by then: how to do measurement studies and all that.

“Usability has very much seemed like a black art … Many things are testable, but at the same time we have to broaden the scope to make it even cheaper, even more accessible, get even more people doing it.”

Then, in the late 1980s, I reacted a bit against my own mentors and said, “These are all great methods, but they take too long, and a lot of projects won’t do them if they’re not at a big, rich company like IBM.” So, we developed discount usability methodologies, which was a faster way of doing these things.

Since 1990 there hasn’t been that much change. I think it’s pretty slow-moving because it doesn’t relate to technology, which changes all the time. It relates to humans and the process of accommodating human needs, which doesn’t change very much.

B&A: Do you ever feel like discount usability methods can be misused?

Nielsen: I think there could be cases where someone does a heuristic without truly understanding the principles. Or you might have someone who tests one user and says, “Let’s go with that.” But in general I think that the methods are so powerful that they actually hold up pretty well even if they’re abused.

I read recently somebody who had criticized the idea of doing studies with a small number of users with the argument that you cannot judge the severity of the usability problems because you don’t have enough instances of observation to know the frequency with which it occurs. This is a circular argument, a self-fulfilling prophecy because you are accepting in their argument that the only way you can judge the severity of a problem is by having a statistically accurate assessment of it’s frequency. I’m arguing that after having had observed it a few times, you can, with the insight that comes from experience, estimate the severity pretty well — good enough anyway. The real issue in severity ratings is that you’ve got to do a cost-benefit analysis.

B&A: What’s your take on information architecture?

Nielsen: The first question I have is what it really even is. I tend to operate under the definition that it’s the structuring of an information space. I view that as being different from information design, which has to deal with how you present the information once you’ve found it, or interaction design, which is a matter of flow through a transaction or task. I know that some people like to use the words information architecture to apply to everything, which is what I would tend to call user experience. That’s purely a matter of what terminology you feel like using. I tend to think that user experience is built of these components: how are things structured, how it is presented, how do you flow through it, and other things like how is it advertised.

B&A: What’s next for you and the Nielsen Norman Group?

Nielsen: Trying to drive usability more broadly toward that larger set of design firms, really trying to encapsulate it to make it more portable. Usability has very much seemed like a black art. I myself have often said, “Well, you can just test that.” Well, that is true. Many things are testable, but at the same time we have to broaden the scope to make it even cheaper, even more accessible, get even more people doing it.

There’s another trend as well which is tackling deeper issues that have been neglected in the past that need to be more in the forefront. Things like users with disabilities, international users, much more focus on task analysis and field studies — those are some of the other things we’re pushing now.

Recently I’ve been pushing the notion of doing discount field studies. Field studies don’t need to consist of five anthropologists taking a year to do a project. We’ve had a seminar at our conference on simplified field studies, which I personally think is a good seminar. But, empirical data shows that people don’t want to do this. You can go to the conference and see people crammed into sessions on everything else, but then you go into the field studies seminar and there’s only 30 people or so. We are pushing it, but we’re not getting enough acceptance of this idea of the simplified field study.

B&A: Who do you think does a good job dealing with content online?

Nielsen: Very few actually. I can’t come up with any great examples — it’s still so print-oriented. My own articles aren’t that great either, actually. I’m very verbose in my writing style. It needs to be very punchy and very short, and it’s very hard to write that way.

There’s more linking happening today with all of the weblogs, which is kind of nice, but I think the commentary is often not that great. The reason is that I think weblogs tend to emphasize this stream of consciousness posting style, which I don’t think is good—that’s not respectful of the readers’ time. What’s good about weblogs is that they’ve broadened the number of authors, but at the same time they’ve removed that feeling that the writing is really being edited.

B&A: If you weren’t doing usability, what do you think you’d be doing?

Nielsen: I would probably be a university professor of something or other. When I think back to when I was a kid, I had a lot of different interests and things I was good at, which I think was one of the reasons I ended up in usability. You have be good at communicating, you have to know about technology, you have to understand interaction and human behavior. There’s all these different angles that pull together very nicely in usability. It’s good for a person who’s broad in the types of things they’re good at.

I might have ended up as a historian, I might have been a mathematician, I don’t know. I think that being a professor is the most likely. The reason I got into usability is that it’s a discipline that gets interesting when you go into the actual practice of it. There’s actually not that much theory, and it’s not that exciting actually.

Chad Thornton works as a Usability Specialist in the User Experience Group at Intuit. He has done similar work at Achieva, the American Museum of Natural History, and Pomona College, where he received his degree in Biology.

Yahoo! Mail: Simplicity Holds Up Over Time

by:   |  Posted on

Email was one of the first applications to move to the web, and the first to find widespread popularity among users.

In many respects, email is the ideal web application: it’s an application that people often need access to when they’re away from their “home” environment, and the core user tasks (reading and writing) are easily accommodated with standard HTML interface elements.

As a result, it should come as little surprise that the basic flow of Yahoo! Mail has hardly changed at all since the portal first acquired the RocketMail service in 1997. But rather than offering an outdated solution to the web-based email problem, Yahoo! Mail demonstrates the lasting effectiveness of a simple approach.

The application is extremely conservative with page designs. Almost all user interaction takes place across only three pages: the “message list” folder view, the “message display” page, and the “compose” page.

Another demonstration of this conservative approach is in the site’s error handling. The entire application contains only one standalone error page (the “no account found” page in the login flow), and this seems more likely to be the result of a back-end limitation than a deliberate design choice.

A few awkward spots do appear in the flow. An empty search result set returns a search result page with a “no messages found” message, rather than bringing the user directly back to the query interface to retry the search.

Downloading attachments is a two-step process, which seems like one step too many. The dichotomy between viewing and editing contact information in the address book seems like an artificial distinction whose purpose is unclear. But these are really minor quibbles; overall, Yahoo! Mail is a model of streamlined interaction design.

Yahoo! Mail diagram
Poster-sized diagram ( PDF, 37K) | Letter-sized diagram ( PDF, 100K)

Note: The date on the diagram indicates when the snapshot of the system was taken. Yahoo! Mail may be substantially different now.

Jesse James Garrett has been working in the Internet industry since 1995. In the information architecture community, Jesse is recognized as a leading contributor to the development of this rapidly evolving discipline.

The Evolving Homepage: The Growth of Three Booksellers

by:   |  Posted on

Web design is expensive. Web designers earn upwards of $50,000 a year1, information architects earn even more.2 During the heyday of web design—the late 1990s—designing a large commercial website could cost as much as designing a medium-sized building. During this period, commercial websites were created and then often completely replaced with redesigned versions a short time later. Today the redesigning continues, albeit at a slower pace. What is the return on this design investment? A report on online ROI from Forrester finds that many commercial sites fail to even try to measure the effectiveness of design changes.3

What lessons have we learned about how design improves the interface between customers and companies?

The web has been with us for about a decade now. We’ve seen some obvious trends, such as greater use of multimedia, search engines, and increasingly sophisticated markup techniques. But these trends were facilitated by changes in technology. What lessons have we learned about how design improves the interface between customers and companies? Perhaps we can start by asking how websites have actually changed over time, and from that we can learn how websites should change in the future.

To start working toward an answer, I compared three eCommerce sites:,, and Much of the media’s coverage of these websites, especially coverage of, discusses the business models, corporate cultures, and finances of the companies. Since the medium of interaction with these companies is the website, it’s ironic that the media rarely critiques the site design and its effect on business performance.

Because it is the homepage that carries the most responsibility for guiding customers, I examined the homepages of all three sites from a number of years, using screenshots from the Web Archive4. Presumably these large retailers had a great deal to gain, and lose, with these substantial online ventures. By comparing design decisions over time among the three sites, I hoped to discover lessons from their extensive and expensive design experience.

The companies
Competition is fierce in the online bookselling market, currently erupting in offers of “free shipping.” All three companies have annual revenues in the billions of dollars.

Barnes and Noble, which runs a large chain of stores in the United States, claims the largest audience reach of any bricks-and-mortar company with Internet presence.5 Yet, both they and Borders were put on the defensive when Amazon’s growth rocketed. During December, 2001 attracted over 10 million unique visitors,6 compared to Amazon’s 40 million visitors.7

Borders is the second largest bricks-and-mortar book chain in the U.S. 8 In April 2001, after operating their own online bookstore for several years, Borders announced an agreement to use Amazon’s eCommerce platform to power a co-branded website.

Amazon claims to be the leading online shopping site, having expanded their selection to music, video, auctions, electronics, housewares, computers and more.9 By February of 2002, Amazon, which had pursued a get-big-quick strategy typical of internet companies in the late 1990s, announced its first profitable quarter.10

I first studied these sites quantitatively looking for clear trends over time. I then critiqued them in a more qualitative way based on my own experience as both an in-house website designer and as an information architecture consultant.

There are many criteria that could be examined in such a study. I limited myself to those that would, I hoped, reveal as much as possible about the business intent of the design. I looked at criteria such as the type and size of layout, the type and amount of navigation, the amount of images and text, and functionality specific to the industry. Detailed results can be seen in the attached spreadsheet (PDF, 75k).


Chart showing growth in length of homepages over time
Click to enlarge.
Note: Missing data due to imperfect records at the Web Archive.

All three sites use very long screens to display content on their homepages.
Using a browser window with a constant width, we can compare the vertical size of each site (all screen references assume an 800 by 600 pixel monitor). The homepage grew from a vertical size of about 917 pixels in 1996 to over 3,000 pixels in 1999. Barnes and Noble’s homepage has hovered around 1,500 pixels for the last several years. Amazon’s homepage, which began at only 626 vertical pixels in 1995, stands at roughly 2,156 pixels today. In a web browser, that equals five scrolling screens of information. homepage above the fold, 1999 above the fold (1999) Click to enlarge.
Barnes and Noble homepage above the fold, 1999
Barnes and Noble above the fold (1999) Click to enlarge.
Amazon homepage above the fold, 1999
Amazon above the fold (1999) Click to enlarge.

Note: Incomplete web pages are due to imperfect records at the Web Archive.

All three sites evolved to use three-column layouts.
In 1995 and 1996 respectively, Amazon and used single-column layouts. By 1999, both of these sites as well as Barnes and Noble used three column layouts.

Amazon has consistently placed more links above the fold.
In 1999, the Borders site displayed only about eight links “above the fold” (the top portion of the screen that is viewable without scrolling). Both Barnes and Noble and Amazon had significantly more links above the fold in 1999, 30 and 48 respectively. Amazon averaged 43 links above the fold between 1999 and 2002 versus only 27 links for Barnes and Noble during the same period.

Through the years, the density of links on was half of that on Barnes and Noble or Amazon.
The density of links has varied over time, but as of 2002 both Barnes and Noble and Amazon stood at about one link for every 15 vertical pixels of screen real estate. Historically, the highest link density at was one link for every 28 vertical pixels.

Amazon communicates using images and links rather than text descriptions.
From 1999 through 2001, Amazon used more images and fewer text descriptions than Barnes and Noble. In 2002, both sites used about 560 words per page, yet the density of words was 33 percent lower on Amazon; Amazon distributes the words across the page as links rather than bunching them together in paragraphs. Over time, Barnes and Noble is becoming more like Amazon in this respect.

All sites eventually included navigation targeted at specific audiences.
Audience-based navigation—navigation labeled for a particular audience—appeared on in 1998, on Barnes and Noble in 2000, and on Amazon as early as 1999.

Invitations to subscribe to an email newsletter were offered inconsistently. didn’t include this feature until 1998. Barnes and Noble included it only in 1998 and 2001. Only Amazon consistently included this feature from 1995 to 2002.

Online and offline design
So what lessons can we learn about how these sites changed over time? How has design contributed to Amazon’s high growth and significant lead over the others? In general, Amazon found a winning formula and applied it consistently over time. In my mind, the successful design elements emulated offline shopping experiences in many ways.

Personally, I was surprised at how long these homepages had grown. Combined with the three-column layout, each page contains a great deal of information. This is quite like the perceptual experience of browsing in a physical store. When you walk down an aisle in a bricks-and-mortar store you can visually scan the shelves quite quickly. On these websites, the long, scrolling pages are analogous to aisles (major groupings of items) and the columns are analogous to shelves (more specific groupings of items). With a similarly natural, efficient motion, a visitor can scroll down the page and visually scan the three columns of product listings.

Amazon homepage
Amazon homepage
(January, 2002)
Click to enlarge.
Barnes and Noble homepage
Barnes and Noble homepage
(January, 2002)
Click to enlarge.

Amazon’s higher number and density of links, and placement of those links above the fold, also reminds me of the aggressive product positioning in a physical store. It’s like walking into a food market and immediately being overwhelmed with rows and rows of colorful fresh fruit, stimulating our eyes and engaging our appetites.

The prominent use of images and sparse use of text on Amazon again harks back to physical objects with simple labeling.

The arrival of navigation intended for specific audiences seemed inevitable. Especially for the book market, a children’s section was developed surprisingly late on these sites given the disproportionately high revenues that come from children’s books in traditional shopping venues.

In general, many of the functions of these pages have become commodities: search engines, shopping carts, authentication and store locators. But Amazon’s extensive personalization sets this site apart functionally. Personalization mimics a personal shopper or a local store employee who knows you. While the online recommendations aren’t always right on, neither is a human assistant.

Rate of change
Many studies have found that our performance using a software application improves over time as we become familiar with its interface. Gerald Lohse and his associates translated this finding into the realm of eCommerce websites using statistical analysis.11 They also found that website visitors learn to use a site more efficiently over time and that this increases their purchase rate. In simpler terms, it means familiar sites are easier for people to use, so familiar sites are where visitors will make purchases.

It follows that sites that can be learned more quickly will more quickly become familiar, increasing the amount of purchases. So a faster learning rate equals a higher purchase rate.

Furthermore, Lohse found that familiarity with a particular website makes visitors less likely to switch to a competitive site because of the effort and time needed to become familiar with another site. He refers to this behavior as “cognitive lock-in.” Essentially, we are creatures of habit. He applied this analysis to several eCommerce websites by measuring the number of visits per person, length of sessions, and timing and frequency of purchases. He found the learning rate significantly faster at Amazon than at Barnes and Noble.

The rate of design change supports this finding. Amazon had no major redesigns from 1999 to 2002, only adapting their design gradually to changing needs. Barnes and Noble significantly altered their navigation in 2000 and 2001. implemented major homepage changes in 1998 and 2000. Fewer redesigns make it easier for visitors to remain familiar with the site.

Many design elements on these websites are reminiscent of physical store layout, an approach to web design we should investigate further. Like physical stores, those designs should only change gradually to keep visitors buying. Continued analysis of other sites will hopefully help confirm or deny these findings.

It may be a fallacy to state, “Amazon is a successful business, therefore their website design is successful,” since many factors have contributed to their business success. And yet it’s hard to imagine them having such great success with a mediocre site. A similar eCommerce site launching today could do worse than examine and emulate the design elements that Amazon utilizes.

View all End Notes
Victor Lombardi writes, designs, and counsels, usually in New York City. His personal website is