Measuring the Success Of a Classification System

Written by: Iain Barker

When working with government and large private organizations on complex information systems, project managers and business representatives often demand early-stage validation that the proposed classification system provides the user-friendly solution they are charged with delivering. They also require this validation in a format that will be engaging for senior business stakeholders.

I developed the following enhancement to Donna Maurer’s “card-based classification evaluation technique” as a direct response to a client that wanted to engage with the process of restructuring their content-heavy intranet. My client knew the current classification structure was ineffective at enabling users to find the information they required, but they felt the process of developing an alternate structure would be complicated and contentious due to differences of opinion between senior stakeholders. My client requested quantitative data to validate that the proposed classification system was an improvement on the existing structure. They also had tight timescales and budgetary constraints.

Having previously used card-based classification evaluation to obtain qualitative insights into labeling and the general effectiveness of a classification system, I felt there was an opportunity to enhance the technique and deliver just the kind of information my client demandedwithout breaking their budgetary constraints.

Key differences between this and standard card-based classification evaluation

The two key differences between this and standard card-based classification evaluation are the way in which the captured data is analyzed, and that the technique should be conducted in a number of rapid iterations throughout the development of the classification system so that any improvements can be identified.

How to conduct card-based classification evaluation

Card-based classification evaluation should be conducted in exactly the same way as described in Donna Maurer’s article. For those unfamiliar with the technique, what follows is a précis of how the technique is conducted. For more detailed instructions see “Donna’s article”:maurer.

  1. Transfer the top 3 or 4 levels of the classification system you wish to evaluate onto index cards. On the first card, put all of the top level categories. On each subsequent card place the next level of classification labels.Top 3 or 4 levels of classification system

     

    Top 3 or 4 levels of classification system

  2. On another set of index cards, write and number around 15 common information-seeking tasks. One task per index card.Common information-seeking tasks
  3. Arrange ten-fifteen 30-minute one-on-one sessions with representative users of the system. I find that this number enables you to conduct the sessions over a single day, thus enabling rapid progress.
  4. To conduct each session you should:
    1. a. Introduce the activity (ensuring you inform the participant that you are not testing them but the proposed system).
    2. Show the participant the top level card and ask them where they would go to complete each scenario (never let them know whether their response is correct or not).
    3. Make a note of the last selection each participant makes for each task. For example, Task A – 1.12.4.
    4. Capture any general comments the participant makes about the classification labels.
    5. Thank the participant for their involvement.

Analyzing the results

To analyze the results it is easiest to use a spreadsheet application. On a spreadsheet, create a separate column for each of the following:

  • The tasks
  • The correct location(s) of the information (or where you think it should be)
  • A column to record each users’ results, i.e. one column per user
  • Columns for analyzing the results (these are explained in detail later)

A screenshot of an analysis spreadsheet is included below.

Analysis spreadsheet

For each task define where in the classification system the correct result would be found. This is easy when evaluating an existing classification system, but for new structures this forces you to make a decision. If the correct answer could be found in multiple places, capture all the possible locations.

For each task record the location in which each participant believed they would find the information they required. If the user made two attempts, enter both numbers separated by a comma.

Once the results have been input, use a color-coding system to make the results easy to scan. I highlight first-time correct responses in bright blue, second-time correct responses in green and incorrect responses in yellow.

Spreadsheet details

Having entered the raw data, this can be analyzed to generate results that can indicate whether or not the proposed classification system is working. The following are key pieces of analysis that can be extracted from the data:

* Percent correct on first attempt: The total number of first time correct answers divided by the total number of attempts * Percent correct on first or second attempt: The total number of first or second time correct answers divided by the total number of participants

Other data, such as percent of participants making the correct first choice from the top-level navigation, can also be interesting as it can indicate where in the classification system problems lie.

I then use a traffic light system to give an immediate visual indication of the results of this analysis. I color the cells in the following way:

  • red for 0-40% correct
  • amber for 40-60% correct
  • dark green for 60-80% correct
  • light green for over 80%.

Spreadsheet details

What the results mean

The results are intended as a guide to support what may otherwise be a subjective analysis. Don’t focus on the specific percentages; instead look at the color bands.

  • Lots of red in the ‘getting first click right’ cells suggests that there is a problem with the top-level categories. Poor top level labeling alone can affect the success rates for the entire classification system. Consider whether the entire information architecture needs to be restructured, or whether the problem can be resolved by creating more meaningful top-level navigation labels.
  • Green in the ‘getting first click right’ column, but red ‘overall success rate’ suggests that the top-level categories are good, but that there is poor distinction between the second level categories, or that the labeling requires attention.
  • Green ‘correct first time’ suggests that it is time to start thinking about the next phase of the project.
  • Streaks of green and red suggests that the classification system is working very well for some tasks, but very poorly for others. More work is required.

Iterative evaluations

By conducting this technique at intervals during the development of the classification system you can assess the progress being made.

The following screenshot shows a summary of the results of 4 rounds of this technique conducted iteratively during the rapid development of a classification system for an intranet. As you can see from the dates, three iterations were conducted in a seven day period and the percent of people being able to get to the right location on a first or second attempt improved from 39% to 81%.

Spreadsheet details

On the spreadsheet, the first round of evaluations was conducted on the existing site. This served as a benchmark against which the new classification system was measured. Subsequent iterations were on different versions of the redeveloped classification system.

The spreadsheet highlights the following:

  • the poor initial quality of the classification system
  • the improvements that were made with each iteration
  • the relatively limited improvements that are made after a number of iterations, indicating the diminishing returns of continuing to make refinements after a point

Last thoughts

The data produced by this technique should be used as a guide to support qualitative assessments made during the development of a new classification system. If conducted iteratively during the development of a classification system, the technique can provide a sense of the progress being made.

Please note: I am careful to say “a sense of the progress being made.” I am not a statistician and am thus cautious when presenting numeric information. I believe it may be possible to use the technique to produce statistically valid data, but I have only used it to produce indicative data to support and help communicate my qualitative assessment of the success of a classification system.

Obviously there are many other methods of enabling people to find the specific content they require (searching, A-Z indexes, etc.), so I recommend against zealously repeating this technique until you obtain 100% success. Instead use the technique to support your instincts and the other card sorting activities conducted during the development of a classification system.

I have found that the results from this technique are very useful for communicating progress to stakeholders and other parties that wouldn’t usually engage with the process of restructuring a classification system.

I am very interested to hear your opinions and the experiences of those that try to use this technique.