MSWeb: An Enterprise Intranet #1

Posted by

What is the Holy Grail of information architects? It’s the secret that will help them develop and maintain a centralized, user-centered information architecture for a large, distributed organization—the kind made up of all sorts of autonomous, bickering business units that have their own goals, their own sites, their own infrastructures, their own users, and their own ideas of how to go about things.

We understand that you probably don’t have the same resources at your disposal as does Microsoft’s team. But we think everyone can learn from their efforts; what they’re doing today is what most intranets will be doing in three to five years.It’s nearly impossible to develop a successful information architecture against a backdrop of explosive content growth, content ROT, and the political twists and turns common in any organization. And, we’re sorry to say, we don’t have the Holy Grail. But we’ve had the privilege of getting up close to a large number of corporate intranets. And the best approach we’ve seen so far is that taken by Microsoft’s intranet portal (MSWeb) team.

Before you protest, we admit that yes, we understand that you probably don’t have the same resources at your disposal as does Microsoft’s team. But we think everyone can learn from their efforts; what they’re doing today is what most intranets will be doing in three to five years, for two reasons. First, MSWeb’s approach is flexible enough to be customized for many large organizations. And second, knowing Microsoft, it’s a reasonable bet that the good ideas described here will soon enough find their way into Microsoft’s product offerings (if they haven’t already) and into your IT department. So perhaps you’ll own a piece of this approach in the not-too-distant future. Let’s preview it here so you’ll be ready.

Challenges for the user
Like Microsoft itself, MSWeb is insanely huge and distributed. Let’s use some numbers to paint a picture of the situation. MSWeb contains:

  • 3,100,000+ pages
  • Content created by and for over 50,000 employees who work in 74 countries
  • 8,000+ separate intranet sites

With apologies to Herbert Hoover, Microsoft has put a web server in practically every employee’s pot. Employees, in turn, have responded by embracing the technology, as you’d expect from one of the world’s largest technology companies, and by churning out an impossibly huge volume of content.

But if you’re a typical Microsoft employee, these numbers also represent a bit of a problem. Microsoft estimates that a typical employee spends 2.31 hours per day engaging with information, and 50 percent of that time is used looking for that information. Although you already know how ambivalent we are about using such calculations to estimate actual costs to the organization, we think these numbers show that at least some valuable employee time is being wasted flailing about in this huge environment in search of information.

Here are just a few examples of how this chaotic environment hurts Microsoft employees.

Where to begin?—This is your typical case of “silo hell.” With as many as 8,000 possibilities available, employees have a hard time determining where they should begin looking for the information they need. While some starting points are obvious—check the human resources site for information on your medical insurance or 401(k) plan—other areas, such as technical information, are scattered throughout Microsoft’s intranet environment.

Inconsistent navigation systems—Navigation systems are quite inconsistent because they employ many different labeling schemes. Therefore, users are confused each time they encounter a new one. Not only does this inhibit navigation, it also muddles the user’s sense of place.

Same concept, different labels—Because different labels are used for the same concepts, users miss out on important information when they don’t search or browse for all the possible labels for those concepts. For example, users may search for “Windows 2000” without realizing that they also need to hunt for “Microsoft Windows 2000,” “Windows 2000,” “Win 2000,” “Win2000,” “Win2k,” “Win 2k,” and “w2k”.

Different concepts, same label—Conversely, a term doesn’t always mean what you think it does. For example, ASP can mean “active server pages,” “application service providers,” or “actual selling price.” And the term “Merlin” has been used as the code name for three very different products.

Ignorance is not bliss—Often, users are happy when they get any relevant information. But in a knowledge-intensive environment like Microsoft’s, users are much more demanding—their jobs depend on finding the best information possible. In this case, employees often get frustrated because they don’t know when to stop searching. Is the content simply not there? Or is a server down somewhere? Or maybe they didn’t enter a good search query?

It’s not hard to see how a typical employee’s 1.155 hours per day might get burned up. In short, Microsoft employees face an expansive and confusing information environment that’s about as intimidating as the Web itself.

Challenges for the information architect
The flip side of this problem is how these numbers affect the people who are responsible for making Microsoft’s content, or aggregating that content into portals. Let’s make another comparison to the broader Web. Building and maintaining the Yahoo! portal has been a huge undertaking, spanning years and a gigantic collection of content—the Web as a whole. MSWeb is a portal too, and though 8,000 sites is a much more manageable number than what Yahoo! faces, consider the varying motives and concerns of those who own and maintain those independent sites. While Yahoo! can now get away with charging sites for inclusion in its directory, Microsoft can’t charge or compel site owners within the company to register. Instead, the MSWeb team has to create incentives for participation in its model. But the owners of the intranet’s various sites are too distracted by other concerns (such as serving their own constituencies) to consider how their site fits into the bigger picture of Microsoft’s intranet.

When a site is brought into the MSWeb fold, it comes with its own information architecture. Its organization and labeling systems and other tricky information architecture components must be integrated into the broader MSWeb architecture or be replaced altogether. For example, as many as 50 different variants of product vocabularies had been created in the Microsoft intranet environment. Fixing such problems is a messy and complicated challenge for any information architect.

And it gets even worse: all of those Microsoft intranet sites are backed up by a technical architecture of some sort. Some are designed, built, and maintained by in-house technical staff and are quite advanced and elaborate. At the other extreme are sites maintained by hand or by a simple tool like MS FrontPage. The technology architectures that support the Microsoft intranet environment vary widely in complexity, and the MSWeb team must determine ways to normalize and simplify the environment to make content management easier and more efficient. Additionally, many of these technology architectures are not designed to support a portal or any other sort of enterprise-wide information architecture, so that’s another crucial factor the MSWeb team must account for.

Does your head hurt yet?

We like taxonomies, whatever they are
Four years ago, many heads were already throbbing at Microsoft. And an odd and often misunderstood term—“taxonomies”—began to be heard in corridors at Redmond. Although they share a common “X,” “taxonomies” and “sexy” are two words that aren’t often seen together in public. So when “taxonomies” become a common part of everyday conversation, it’s a sure sign that an organization is ready for a deeper look into information architecture.

So Microsoft’s MSWeb team heard the word and knew that the time had come for a more ambitious approach to improving MSWeb. The team—populated by an impressive mix of information scientists, designers, technologists, and politically savvy managers—began to consider what users meant when they called for better (or any) taxonomies. Instead of the traditional biology-inspired definition, Microsoft’s employees thought of taxonomies as constructs that would help them search, browse, and manage intranet content more effectively.

In response, the MSWeb team developed a more generalized operating definition of taxonomies that would be more in line with how other employees were using the term. This flexibility—the willingness to speak the language of clients, rather than rigidly clinging to a “correct” but ultimately unpopular meaning—was key. It set the tone for successful communications between the MSWeb team and its clients throughout the organization.

Three flavors of taxonomies
The team defined taxonomies as any set of terms that shared some organizing principle. For example, descriptive vocabularies were seen as controlled vocabularies that described a specific domain (e.g., geography, or products and technologies) and included variant terms for the same concept. Metadata schema were collections of labeled attributes for a document, not unlike a catalog record. Category labels were sets of terms to be used for the options of navigation systems. These three areas comprised the foundation of the MSWeb approach. Better searching, browsing, and managing of information would be achieved by designing taxonomies that could be shared throughout the enterprise.

Descriptive vocabularies for indexing
Developing terms to manually index important pieces of content seemed a smart proposition for the MSWeb team. It would complement automated indexing by the search engine, which was currently the primary means of making the site’s content available. But creating and applying descriptive vocabularies is an expensive proposition, especially within an information environment as large as Microsoft’s. And there are so many different ways to index content. So half the battle was in selecting which vocabularies would deliver the most value to the organization as a whole.

The MSWeb team considered a number of issues when deciding which vocabularies to develop. Not surprisingly, characteristics of the content drove many of the decisions.

Search Log Analysis—Queries from MSWeb’s search query logs are storied in an SQL database, and could therefore be searched and more easily analyzed. Search log analysis helped the MSWeb team gauge user content needs in their own words and determine appropriate vocabulary terms. Studying the search log’s most common queries also helped the team get a good overview of which content areas were generally most valuable to users.

Availability—The team looked for decent controlled vocabularies that had already been developed in-house or that were available commercially. Vivian Bliss, MSWeb Knowledge Management Analyst, puts it simply: “Don’t reinvent the wheel!” If there’s a useful vocabulary out there, it’s much cheaper to license and adapt it than to create a new one. Unfortunately, most of the required vocabularies were very specific to Microsoft’s content, and had to be custom-built in-house.

Other decisions were driven by business context. The MSWeb team considered such issues as:

Politics—The team was careful to talk with content stakeholders about what they felt they needed to make their content more accessible. In some cases, stakeholders were interested both in information architecture concepts and in committing to working with the MSWeb team. Others were interested in neither. Through such discussions, it became apparent which stakeholders were ready to participate and which weren’t.

Applicability—Some vocabularies were too specific to have broad value for users across the company. The MSWeb team instead focused on vocabularies with broader appeal and value.

After taking all of these considerations into account, Microsoft narrowed its vocabulary development to the following vocabularies:

  • Geography
  • Languages
  • Proper names
  • Organization and business unit names
  • Subjects
  • Product, standards, and technology names

Developing some of these vocabularies was trickier than you might think. Geography, for example, had to be split into two separate vocabularies: general place names, and locations of Microsoft installations. On the other hand, the subject vocabulary development was simpler than it might have been: its development was constrained to addressing primarily equivalence relationships. The MSWeb team hasn’t added extensive hierarchical and associative relationships; that would require a huge effort and take resources away from developing other vocabularies that could provide broad benefits right away. (In the future, the team does plan to selectively address these other relationships as time and resources permit.)

Metadata schema
Developed hand-in-hand with controlled vocabularies, metadata schema describe which metadata to use to describe or catalog a content resource. While Microsoft’s descriptive vocabularies were driven by content and context, metadata schema were informed more by issues of users and content.

The MSWeb team developed a single schema that has value for both MSWeb and other intranet sites. Borrowing from the Dublin Core Metadata Element Set (see, MSWeb’s schema was intended to be sufficiently “stripped down” so that content owners would use it to describe resources, resulting in more records and therefore a more useful collection of content. The schema’s simplicity was balanced with the goal of providing enough descriptive information to augment searching and browsing by users.

The team also had to ensure that records produced using the schema would include fields useful for resource description, display, and integration with other parts of the information architecture (namely by integrating with search results and browsing schemes). The process used to develop this metadata schema was, in the words of one team member, “down and dirty.” Although more polished methodologies exist, sufficient resources were not available at the time for this initial schema development project. For this reason, it was important to structure the schema to include both a required “core” set of fields and the flexibility to support future extensions of the schema by other business units. To date, seven other major portals are using the metadata schema, and many have extended and customized it for their own context.
The schema’s core fields are:

  • URL Title—The name of the resource
  • URL Description—A brief description of the resource; suitable for display in a search result
  • URL—The address of the resource
  • ToolTip—Text displayed for a mouseover
  • Comment—Administrative information that helps manage a record (not seen by the end user)
  • Contact Alias—The name of the person responsible for this resource
  • Review Date—The date that the resource should be next reviewed (default setting is six months from when the record was created or last updated)
  • Status—The record’s status; e.g., “active” (the default), “deleted,” “inactive,” and “suggestion”; used for content management purposes

The schema has been commonly extended with these optional fields:

  • Strongly Recommended—Flags resources that are especially appropriate
  • Products—Terms from the product, standards, and technology names vocabulary that describe the subject matter of the resource
  • Category Label—Terms from the vocabulary of category labels; used to ensure that the resource is listed under the appropriate label in the site’s navigation system
  • Keywords—Terms from descriptive vocabularies used to describe the resource

MSWeb began to use the metadata schema to create resource records in 1999; since then, over one thousand records have been created. These fuel the immensely useful “Best Bets” search results and hold huge potential for improving areas such as content management. We’ll describe the role of both metadata schema and “Best Bets” at Microsoft in greater detail later in this chapter.

Category labels
The third type of taxonomy—labels for the categories in site-wide navigation systems—was geared toward providing users of Microsoft intranet sites with navigational context. Category labels help users know where they are and where they can go. The MSWeb team employed a user-centered process for designing navigation systems, relying upon useful standbys as card sorting and contextual inquiry. In [this screenshot], the category labels are shown on the left-hand side of the screen. Descriptions of nodes, displayed on the right-hand side, help catalogers choose the appropriate category label.

The initial set of category labels was developed solely for the MSWeb portal’s navigation system. But because the portal is so widely used and because the revised navigation represented a major upgrade for many users, the owners of other intranet sites began to approach the MSWeb team for assistance in developing their own navigation systems.

The MSWeb team responded by making its user-centered design process and expertise into a service that other site owners could utilize. As collaboration with other sites increases, a “standard” intranet navigation system will eventually be created, likely a combination of predetermined intranet-wide options (e.g., another “core”) and a locally determined selection of choices (“extensions”) that would be informed by a shared set of guidelines. For now, the transitional stage of raising awareness and providing support to other site owners is considered a great leap forward, and a prerequisite to further navigation standardization.

How it comes together
The impact of all three taxonomies is clear from the MSWeb search results shown [here]. Category labels provide contextual navigation at the end of each “Best Bet” result (the first two displayed) and populate the “categories” site-wide navigation system on the left-hand side. Below that, the “terms” area displays two variants of the search term that come directly from the descriptive vocabularies. The “Best Bet” search results themselves are drawn from resource records based on a metadata schema.

MSWeb’s “three taxonomies” approach is steeped in traditional library science, which isn’t surprising considering the backgrounds of many of those on the MSWeb team. But it’s important to note how willing the team was to abandon the traditional library science concepts that didn’t make sense in the intranet environment. For example, the team did not try to create “traditional” thesauri for its metadata schema and category label taxonomies. Other standards familiar to the LIS community, such as Dublin Core, weren’t initially adopted for MSWeb’s metadata schema because they were not appropriate at the time (although the Dublin Core schema may be partially or completely adopted by MSWeb at some point).

In the next issue (Sept. 9)—Beyond taxonomies: selling services, benefits to user and what’s next for MSWeb

Louis Rosenfeld is an independent information architecture consultant.

Peter Morville is President and Founder of Semantic Studios, a leading information architecture and knowledge management consulting firm.


  1. Amazing article. It should be a lesson to everyone on how NOT to run an intranet.

    It should also be a lesson in how NOT to make a broken intranet work. Relying on multiple taxonomies and meta-schema is a huge undertaking and simply won’t work. We looked at taxonomies for our intranet about five years ago. Later we realised that they will never accurately reflect content.

    Trying to get a search engine (or portal) to make an intranet work will always just replicate the you have problems of finding information on the internet.


  2. Umm… Yow.

    I might also add that you’ve only read an excerpt of a much longer chapter (more of which will run right here in B&A). Maybe read the whole thing; *then* you’ll have even more fuel to flame us with. 😉

  3. Wow. Super informative.

    Often projects of such scope are sinking in information. You provided a compelling case study for a large project in a broad organization. I appreciate the fact that you went into such detail about the steps that the team went through to come up with a final, workable solution.

    Thanks L&P.

    p.s. a copy of the 2nd edition is now on order.

  4. While this is an interesting approach, I would also question whether providing a search engine on over a million pages, and 8000 separate intranets is really a solution.

    Surely the issue is information overload, not necessarily difficulties in finding information. And what of the quality, accuracy and relevance of all these pages?

    I would expect that a thorough review of the content (no small task, of course) would end up deleting 75% of the pages…

    Anyway, all good food for thought.

  5. Thanks to Lyle for rant and attack. I read a 3,000 word article (approx) then respond. You flame me after only reading my six sentences. Such is life I guess.

    I’ll answer Lyle’s question by re-iterating that taxonomies simply do not work. Period. They are fundamentally flawed because they ultimately rely on a subjective (human) interpretation of which category/ies (from the taxonomy) to use. The more complex the taxonomy the greater the degree of confusion. MSWebs three huge taxonomies are simply a grotesque example of this.

    I’ll go further and say that it is not possible to fix a broken intranet. If it has already descended into an internet-like anarachy then it is too late. The solution sounds painful and time-consuming – tear
    it down and rebuild it.

    Naturaly the suppliers of packages/consultancy advertised as “making your (broken) intranet work” (and their drones) will not agree with me.

    As for Lyle’s little dig at my team’s inaility to execute a plan properly? In the last five years I’ve designed, implemented and managed three large corporate intranets. All delivered on-time and under-budget. So there 🙂

    Webauk (Contentious? Moi?)

  6. I agree that the “sticky plaster” approach, i.e. putting a taxonomy/searchengine/portal on top of an unorganised and chaotic information store is not the answer.
    However, I believe that implementing a taxonomy across the information architecture of the content management system, the portal, and the refinement of the search engine results will provide consistency and organisation to the information and address the overload problem in an organisation by providing context and filtering capabilities.
    The key to a successful taxonomy is that it be dynamic and flexible.
    Anyone who truly wants to solve information overload shouldn’t expect a quick fix and the investment in a taxonomy will be worth it. This has been proven by over 50 years of information science research.

  7. To clarify one aspect of this: navigational (or category) taxonomies are only one of the types of taxonomies that are used in improving infomation retrieval Microsoft. Many directories (such as Yahoo) rely on this approach exlusively. The larger of the taxonomies used by Microsoft on both the Intranet and Internet sites take a more thesaural approach to map concepts together.

    Is there subjectivity? Of course. But there are also measurable results that show that the cumulative value of these approaches has had a positive effect. Your mileage may vary. Our mileage has been pretty good.


  8. Webauk, I am happy you were on-time and under budget, but that is not a great measure of success. Use and findability of information is a major need for an Intranet. Measuring these elements and finding vast improvement in these areas would be something worth boasting.

    Taxonomies are a strong method for adding to positive experiences, on which many an Intranet’s experience is based on users finding information to get their job done (among many other information sharing tasks). Finding folks that can build a taxonomy properly or adequately is not easy, but it is getting easier as more folks are getting trained. Your customers and the users of the Intranet may thank you more than money back up front. The money saved in the long run is in the lack of lost time trying to find information or better yet having the information needed within easy grasp. Lost time is the killer in any organization and the Intranet where information is easily findable is a great way to cut the missing information money pit that eats profit and resources.

    Taxonomies may not be the perfect answer to every situation, but ruling out a great resource from a developer’s toolbelt on one poor implementation or mis-matched solution hurts the clients who depend on us to provide them helpful solutions. Learning why many find great success with including taxonomies in thier solution sets may be very helpful. It could be worth giving taxonomies a second chance as well as chosing other metrics of success.

    –Just trying to help.

  9. Weighing in a little late on this, but hopefully with some additional information that might help in thinking about this very difficult problem. The question is not really about taxonomies, but how taxonomies are used. What Microsoft did very well was to embed taxonomies into a set of service offerings that have become part of the intranet. They did so by dedicating technical, intellectual, and political resources to the problem, and making sure that those resources were targeted at the needs of the users and stakeholders in the organization.

    They also went after low-hanging fruit, by integrating the taxonomy with their intranet search engine, and using it to consistently expose the right answers to the most common queries being asked- the results were high user satisfaction, and definitive proof that the approach was working. This was the key to getting others to realize the advantage of the services, and begin to use them, which of course gets buy-in and longer term support for the initiative.

    Discussion of taxonomies too often focuses on the taxonomies themselves, not on the ultimate purpose for their creation. By staying aware of what they were trying to accomplish with the taxonomy, and making it the means rather than the end, Microsoft was able to accomplish a great deal in a very short period of time.

    A side note about the right or wrong way to build an intranet- whether a centralized or decentralized approach is taken, it’s very likely that there will be a decentralized intranet behind the scenes, since it’s so easy to build (not maintain!!!). It’s a natural extension of the old water cooler/hallway conversation, and pretty difficult to prevent. Given that reality, approaches like Microsoft’s are a good way to skim the best of that back-channel conduit and bring it into the “official” channel- use both for what they are best at, and integrate them at the point of use. It’s what knowledge management is all about- integrating tacit and explicit knowledge to build a base for creation of new knowledge.

  10. In defense of Webauk a bit, I would say that Taxonomies do not work…ALONE. If you merely approached the problem by pushing a taxonomy the company would look at you and say, “ya and what do I do with it?” The difference in our approach was to apply other tools, we gave built back end management tools, integrated the information into usable UI for search and browse. We made taxonomies understandable. It took a triumvirate of skills to enable the solutions to work. Good development, to make the platform stable, while allowing flexibility. Good management of the taxonomies so that they are relevant and useful, and usable designs (UI) that exposes the information and conveys the content to the user so that the knowledge can be extracted.

  11. >I’ll answer Lyle’s question by re-iterating that taxonomies
    >simply do not work. Period. They are fundamentally flawed
    >because they ultimately rely on a subjective (human)
    >interpretation of which category/ies (from the taxonomy)
    >to use. The more complex the taxonomy the greater the
    >degree of confusion.

    Of course, taxonomies have helped to solve information organization and retrieval problems for hundreds of years. We might not have developed approaches to quickly and accurately apply them to intranets yet, but there’s certainly nothing special about intranet information that makes it inappropriate to classify with taxonomies.

    As to the “problem” of human subjectivity, forget it. You think that just because you’re using (for example) an automated classification tool that you’ve eliminated or even reduced it? You’re simply moving the responsibility for decisions from the human information classifier to the equally human programmer.

    As to the one time and under budget claim, well, I need you as a project manager! Unfortunately I’ve never seen an intranet, under budget or not, which was useful or usable. Never. I bet the MS Intranet is still a god-awful hard thing to use, despite improvements made by the techniques in the article.

  12. So Webauk, your supposedly better answer for how TO run or fix a broken intranet would be what exactly?…

    Maybe your team/company just couldn’t execute the plan properly?

    Taxonomies aren’t trivial (or sexy). People should not tread lightly down that road. Do all sites or companies need a taxonomy? Likely not, but many could greatly benefit from one that’s well executed and maintained.

    Saying taxonomies never accurately reflect content is like saying Knowledge Management systems never accurately reflect what’s in peoples’ heads. It’s not about perfection, but delivering SOMETHING THAT’S BETTER THAN TODAY, HAS BENEFITS, and VALUE.

    Sum of all critics:
    Taxonomies don’t work
    + Search is broken
    + Flash is 99% bad
    + Banner ad blindness
    + People don’t read online
    + Dot coms will fail
    + Global warming
    + Asteroids will hit the earth
    + Eating at Fast Food restaurants will make you fat
    Total: Give in, curl up and die.

    The best we can do is to identify “best practices” and give them a whack. Your mileage may vary.

    Lou & Peter: Great info — thanks!

  13. Yikes. I wasn’t sure if I wanted to get the 2nd edition, but I think I have to now.

    It’s encouraging to know that some of the problems I’ve encountered here in my little world are reflected in the big boys backyards also. There was some good stuff in there – hope the rest is as enlightening.

  14. Webauk, all I was trying to say is that there are many reasons why a taxonomy implementation may fail – no need to catalog them all here. I wasn’t trying to insult your team. I’ve worked on a number or products/sites that’ve failed in the long run — and I’ve learned along the way. Oh and I’ve had many successful ones too…

    I wasn’t attacking you, but merely pointing out that you say taxonomies don’t work, and then don’t backup your claim or offer anything better as a solution. The whole point of the MSWeb piece is to outline a case study where TAXONOMIES DID WORK. You might want to go back and re-read that first before panning the whole idea. I’ve worked on taxonomy implementations on a large intranet and can say it has helped some users some of the time — it’s not been a silver bullet. Our implementation isn’t perfect by any measure, but it was an improvement over what we had. As far as “tearing down” an intranet – you clearly don’t understand what that means in a large corporate setting — intranets can be hundreds or thousands of *sites* — starting over is out of the question. Day to day business and budgets won’t allow for it even if you could manage the sheer scope of it.

    On-time and on-budget have nothing to do with users or usability. So there. 🙂

Comments are closed.