What is the Holy Grail of information architects? It’s the secret that will help them develop and maintain a centralized, user-centered information architecture for a large, distributed organization—the kind made up of all sorts of autonomous, bickering business units that have their own goals, their own sites, their own infrastructures, their own users, and their own ideas of how to go about things.
Before you protest, we admit that yes, we understand that you probably don’t have the same resources at your disposal as does Microsoft’s team. But we think everyone can learn from their efforts; what they’re doing today is what most intranets will be doing in three to five years, for two reasons. First, MSWeb’s approach is flexible enough to be customized for many large organizations. And second, knowing Microsoft, it’s a reasonable bet that the good ideas described here will soon enough find their way into Microsoft’s product offerings (if they haven’t already) and into your IT department. So perhaps you’ll own a piece of this approach in the not-too-distant future. Let’s preview it here so you’ll be ready.
Challenges for the user
Like Microsoft itself, MSWeb is insanely huge and distributed. Let’s use some numbers to paint a picture of the situation. MSWeb contains:
- 3,100,000+ pages
- Content created by and for over 50,000 employees who work in 74 countries
- 8,000+ separate intranet sites
With apologies to Herbert Hoover, Microsoft has put a web server in practically every employee’s pot. Employees, in turn, have responded by embracing the technology, as you’d expect from one of the world’s largest technology companies, and by churning out an impossibly huge volume of content.
But if you’re a typical Microsoft employee, these numbers also represent a bit of a problem. Microsoft estimates that a typical employee spends 2.31 hours per day engaging with information, and 50 percent of that time is used looking for that information. Although you already know how ambivalent we are about using such calculations to estimate actual costs to the organization, we think these numbers show that at least some valuable employee time is being wasted flailing about in this huge environment in search of information.
Here are just a few examples of how this chaotic environment hurts Microsoft employees.
Where to begin?—This is your typical case of “silo hell.” With as many as 8,000 possibilities available, employees have a hard time determining where they should begin looking for the information they need. While some starting points are obvious—check the human resources site for information on your medical insurance or 401(k) plan—other areas, such as technical information, are scattered throughout Microsoft’s intranet environment.
Inconsistent navigation systems—Navigation systems are quite inconsistent because they employ many different labeling schemes. Therefore, users are confused each time they encounter a new one. Not only does this inhibit navigation, it also muddles the user’s sense of place.
Same concept, different labels—Because different labels are used for the same concepts, users miss out on important information when they don’t search or browse for all the possible labels for those concepts. For example, users may search for “Windows 2000” without realizing that they also need to hunt for “Microsoft Windows 2000,” “Windows 2000,” “Win 2000,” “Win2000,” “Win2k,” “Win 2k,” and “w2k”.
Different concepts, same label—Conversely, a term doesn’t always mean what you think it does. For example, ASP can mean “active server pages,” “application service providers,” or “actual selling price.” And the term “Merlin” has been used as the code name for three very different products.
Ignorance is not bliss—Often, users are happy when they get any relevant information. But in a knowledge-intensive environment like Microsoft’s, users are much more demanding—their jobs depend on finding the best information possible. In this case, employees often get frustrated because they don’t know when to stop searching. Is the content simply not there? Or is a server down somewhere? Or maybe they didn’t enter a good search query?
It’s not hard to see how a typical employee’s 1.155 hours per day might get burned up. In short, Microsoft employees face an expansive and confusing information environment that’s about as intimidating as the Web itself.
Challenges for the information architect
The flip side of this problem is how these numbers affect the people who are responsible for making Microsoft’s content, or aggregating that content into portals. Let’s make another comparison to the broader Web. Building and maintaining the Yahoo! portal has been a huge undertaking, spanning years and a gigantic collection of content—the Web as a whole. MSWeb is a portal too, and though 8,000 sites is a much more manageable number than what Yahoo! faces, consider the varying motives and concerns of those who own and maintain those independent sites. While Yahoo! can now get away with charging sites for inclusion in its directory, Microsoft can’t charge or compel site owners within the company to register. Instead, the MSWeb team has to create incentives for participation in its model. But the owners of the intranet’s various sites are too distracted by other concerns (such as serving their own constituencies) to consider how their site fits into the bigger picture of Microsoft’s intranet.
When a site is brought into the MSWeb fold, it comes with its own information architecture. Its organization and labeling systems and other tricky information architecture components must be integrated into the broader MSWeb architecture or be replaced altogether. For example, as many as 50 different variants of product vocabularies had been created in the Microsoft intranet environment. Fixing such problems is a messy and complicated challenge for any information architect.
And it gets even worse: all of those Microsoft intranet sites are backed up by a technical architecture of some sort. Some are designed, built, and maintained by in-house technical staff and are quite advanced and elaborate. At the other extreme are sites maintained by hand or by a simple tool like MS FrontPage. The technology architectures that support the Microsoft intranet environment vary widely in complexity, and the MSWeb team must determine ways to normalize and simplify the environment to make content management easier and more efficient. Additionally, many of these technology architectures are not designed to support a portal or any other sort of enterprise-wide information architecture, so that’s another crucial factor the MSWeb team must account for.
Does your head hurt yet?
We like taxonomies, whatever they are
Four years ago, many heads were already throbbing at Microsoft. And an odd and often misunderstood term—“taxonomies”—began to be heard in corridors at Redmond. Although they share a common “X,” “taxonomies” and “sexy” are two words that aren’t often seen together in public. So when “taxonomies” become a common part of everyday conversation, it’s a sure sign that an organization is ready for a deeper look into information architecture.
So Microsoft’s MSWeb team heard the word and knew that the time had come for a more ambitious approach to improving MSWeb. The team—populated by an impressive mix of information scientists, designers, technologists, and politically savvy managers—began to consider what users meant when they called for better (or any) taxonomies. Instead of the traditional biology-inspired definition, Microsoft’s employees thought of taxonomies as constructs that would help them search, browse, and manage intranet content more effectively.
In response, the MSWeb team developed a more generalized operating definition of taxonomies that would be more in line with how other employees were using the term. This flexibility—the willingness to speak the language of clients, rather than rigidly clinging to a “correct” but ultimately unpopular meaning—was key. It set the tone for successful communications between the MSWeb team and its clients throughout the organization.
Three flavors of taxonomies
The team defined taxonomies as any set of terms that shared some organizing principle. For example, descriptive vocabularies were seen as controlled vocabularies that described a specific domain (e.g., geography, or products and technologies) and included variant terms for the same concept. Metadata schema were collections of labeled attributes for a document, not unlike a catalog record. Category labels were sets of terms to be used for the options of navigation systems. These three areas comprised the foundation of the MSWeb approach. Better searching, browsing, and managing of information would be achieved by designing taxonomies that could be shared throughout the enterprise.
Descriptive vocabularies for indexing
Developing terms to manually index important pieces of content seemed a smart proposition for the MSWeb team. It would complement automated indexing by the search engine, which was currently the primary means of making the site’s content available. But creating and applying descriptive vocabularies is an expensive proposition, especially within an information environment as large as Microsoft’s. And there are so many different ways to index content. So half the battle was in selecting which vocabularies would deliver the most value to the organization as a whole.
The MSWeb team considered a number of issues when deciding which vocabularies to develop. Not surprisingly, characteristics of the content drove many of the decisions.
Search Log Analysis—Queries from MSWeb’s search query logs are storied in an SQL database, and could therefore be searched and more easily analyzed. Search log analysis helped the MSWeb team gauge user content needs in their own words and determine appropriate vocabulary terms. Studying the search log’s most common queries also helped the team get a good overview of which content areas were generally most valuable to users.
Availability—The team looked for decent controlled vocabularies that had already been developed in-house or that were available commercially. Vivian Bliss, MSWeb Knowledge Management Analyst, puts it simply: “Don’t reinvent the wheel!” If there’s a useful vocabulary out there, it’s much cheaper to license and adapt it than to create a new one. Unfortunately, most of the required vocabularies were very specific to Microsoft’s content, and had to be custom-built in-house.
Other decisions were driven by business context. The MSWeb team considered such issues as:
Politics—The team was careful to talk with content stakeholders about what they felt they needed to make their content more accessible. In some cases, stakeholders were interested both in information architecture concepts and in committing to working with the MSWeb team. Others were interested in neither. Through such discussions, it became apparent which stakeholders were ready to participate and which weren’t.
Applicability—Some vocabularies were too specific to have broad value for users across the company. The MSWeb team instead focused on vocabularies with broader appeal and value.
After taking all of these considerations into account, Microsoft narrowed its vocabulary development to the following vocabularies:
- Proper names
- Organization and business unit names
- Product, standards, and technology names
Developing some of these vocabularies was trickier than you might think. Geography, for example, had to be split into two separate vocabularies: general place names, and locations of Microsoft installations. On the other hand, the subject vocabulary development was simpler than it might have been: its development was constrained to addressing primarily equivalence relationships. The MSWeb team hasn’t added extensive hierarchical and associative relationships; that would require a huge effort and take resources away from developing other vocabularies that could provide broad benefits right away. (In the future, the team does plan to selectively address these other relationships as time and resources permit.)
Developed hand-in-hand with controlled vocabularies, metadata schema describe which metadata to use to describe or catalog a content resource. While Microsoft’s descriptive vocabularies were driven by content and context, metadata schema were informed more by issues of users and content.
The MSWeb team developed a single schema that has value for both MSWeb and other intranet sites. Borrowing from the Dublin Core Metadata Element Set (see http://dublincore.org), MSWeb’s schema was intended to be sufficiently “stripped down” so that content owners would use it to describe resources, resulting in more records and therefore a more useful collection of content. The schema’s simplicity was balanced with the goal of providing enough descriptive information to augment searching and browsing by users.
The team also had to ensure that records produced using the schema would include fields useful for resource description, display, and integration with other parts of the information architecture (namely by integrating with search results and browsing schemes). The process used to develop this metadata schema was, in the words of one team member, “down and dirty.” Although more polished methodologies exist, sufficient resources were not available at the time for this initial schema development project. For this reason, it was important to structure the schema to include both a required “core” set of fields and the flexibility to support future extensions of the schema by other business units. To date, seven other major portals are using the metadata schema, and many have extended and customized it for their own context.
The schema’s core fields are:
- URL Title—The name of the resource
- URL Description—A brief description of the resource; suitable for display in a search result
- URL—The address of the resource
- ToolTip—Text displayed for a mouseover
- Comment—Administrative information that helps manage a record (not seen by the end user)
- Contact Alias—The name of the person responsible for this resource
- Review Date—The date that the resource should be next reviewed (default setting is six months from when the record was created or last updated)
- Status—The record’s status; e.g., “active” (the default), “deleted,” “inactive,” and “suggestion”; used for content management purposes
The schema has been commonly extended with these optional fields:
- Strongly Recommended—Flags resources that are especially appropriate
- Products—Terms from the product, standards, and technology names vocabulary that describe the subject matter of the resource
- Category Label—Terms from the vocabulary of category labels; used to ensure that the resource is listed under the appropriate label in the site’s navigation system
- Keywords—Terms from descriptive vocabularies used to describe the resource
MSWeb began to use the metadata schema to create resource records in 1999; since then, over one thousand records have been created. These fuel the immensely useful “Best Bets” search results and hold huge potential for improving areas such as content management. We’ll describe the role of both metadata schema and “Best Bets” at Microsoft in greater detail later in this chapter.
The third type of taxonomy—labels for the categories in site-wide navigation systems—was geared toward providing users of Microsoft intranet sites with navigational context. Category labels help users know where they are and where they can go. The MSWeb team employed a user-centered process for designing navigation systems, relying upon useful standbys as card sorting and contextual inquiry. In [this screenshot], the category labels are shown on the left-hand side of the screen. Descriptions of nodes, displayed on the right-hand side, help catalogers choose the appropriate category label.
The MSWeb team responded by making its user-centered design process and expertise into a service that other site owners could utilize. As collaboration with other sites increases, a “standard” intranet navigation system will eventually be created, likely a combination of predetermined intranet-wide options (e.g., another “core”) and a locally determined selection of choices (“extensions”) that would be informed by a shared set of guidelines. For now, the transitional stage of raising awareness and providing support to other site owners is considered a great leap forward, and a prerequisite to further navigation standardization.
How it comes together
The impact of all three taxonomies is clear from the MSWeb search results shown [here]. Category labels provide contextual navigation at the end of each “Best Bet” result (the first two displayed) and populate the “categories” site-wide navigation system on the left-hand side. Below that, the “terms” area displays two variants of the search term that come directly from the descriptive vocabularies. The “Best Bet” search results themselves are drawn from resource records based on a metadata schema.
In the next issue (Sept. 9)—Beyond taxonomies: selling services, benefits to user and what’s next for MSWeb
Peter Morville is President and Founder of Semantic Studios, a leading information architecture and knowledge management consulting firm.