Unraveling the Mysteries of metadata and taxonomies

Posted by

Christina Wodtke of Boxes and Arrows interviews Samantha Bailey (former Argonaut and current lead IA for Wachovia Corporation’s Wachovia.com website) about Information Architecture, her dream process and the mysteries of metadata and taxonomies.

B&A: Let’s get meta – you come from the Argus LIS-flavored school of IA. What is your definition of Information Architecture?

SB: I’m going to pull this answer directly from an article I just wrote: “While it is unlikely that any two practicing information architects will give identical definitions of the term, there is consensus that information architecture has organization at its root. Basing my understanding on Morville and Rosenfeld’s approach, I define information architecture as: “the art and science of organizing information so that it is findable, manageable, and useful.” This definition is a “I think good IAs (like many good librarians) are often generalists at heart-people who have a love of learning and a tendency to be interested in practically anything that comes their way.”content-intensive interpretation, indicating my bias that information architecture skills are most critical in content rich environments. It also draws on the information retrieval roots of library science, emphasizing the importance of being able to find that which one seeks, whether known or unknown. Finally, information architecture is a user-centered discipline, understanding that usability is at the heart of a successful information based interaction.”

B&A What skills does one need to become a good IA?

SB: On an ongoing basis and in terms of basic personality traits, good IAs need to be inquisitive, problem/solution oriented, and dedicated to continual learning. The field is so new that there isn’t a set body of knowledge that you can learn in full and then have “mastered.” I think there is certainly a body of knowledge that an IA needs to pursue and absorb, which lays a foundation upon which to build.

In terms of the fields that I think most profoundly influence IA and are the best fodder for ongoing learning: Library and information science (my bias, obviously), HCI, cognitive psychology, ethnography and linguistics are among those I consider most critical.

Additionally, all of us need sales/marketing skills so that we can promote the field and continue inserting information architecture practices into processes that have been around long enough and are well established enough that it can take some work to make room for the IA piece.

B&A If someone wrote you having just gotten their BA-perhaps in English or philosophy-and wanted to become an IA, what would you tell them?

SB: I actually have a BA in philosophy, so it doesn’t appear to get in the way of pursuing IA too much. I guess I’d recommend reading as much as possible; there’s such a rich reading list now, and so many people with great insights. When I first became interested in IA, Lou [Rosenfeld] & Peter [Morville] hadn’t written their book yet, and IA was more nebulous. The ambiguity was appealing to me, as I was attracted to being part of something that was in the process of being formed. At times it also felt somewhat insubstantial; we were making it up, and sometimes there was a lurking sense that it lacked legitimacy for the very reason that it hadn’t been codified.

In addition to the reading, join the SIGIA listserv, find a discussion group, look for a mentor. And of course there is working on actual information architectures: your own site, volunteer projects, student projects. I wasn’t clear about what I wanted to do, career-wise, immediately after college, so I worked for several years. I’m really glad about that, as it made it easier to be confident and to be taken more seriously. After I got my master’s degree and my first “real” IA position, I had real world life and work experience. While it’s important to have rather specific skills in classification and user-centered design methodology, I think good IAs (like many good librarians) are often generalists at heart-people who have a love of learning and a tendency to be interested in practically anything that comes their way. I recommend throwing yourself in the way of whatever learning opportunities strike you as even remotely relevant.

B&A You recently joined a large financial institution. What are some of the differences you’ve seen between being a consultant and being an employee?

SB: There are both similarities and differences. Perhaps the biggest surprise has been in the area of sales/business development. As a consultant, I was never fond of the part of my job that involved business development (e.g., marketing the company, bringing in business via sales calls, structuring projects to enhance future business opportunities, etc). But I knew it was a critical part of my role as a consultant and, more particularly, as a consultant in a small start-up. So, when I joined a very specific department in a large company, I thought my bus dev days were behind me. And, indeed, I no longer have direct sales responsibilities. There aren’t calls to sit in on, RFPs to respond to, proposals to defend, etc., but my sales/marketing role remains a critical part of my new job. In this role, I’m selling something a bit different. Instead of selling a specific company/group of individuals, or a specific methodology or “secret recipe,” I’m now selling information architecture as a discipline that is critical to successful web design and that can be successfully fit into the company’s existing processes without too much pain. So, I’m changing my attitude about business development; from something that consultants or folks in small companies do to something that everyone has to do, in some way or another, all the time.

There is also, of course, the innie vs. outtie issue, that has been discussed on SIGIA. As a consultant, you see the pros and cons of being an outtie depending on the nature of the project- e.g., it can be a benefit to be removed because you’re not bogged down and swayed by existing politics, and yet it can also be a negative, as you may not fully understand the complexity of the environment and can put your foot in your mouth past the ankle before you even realize you’ve goofed. As an innie, there are pros and cons as well, and they’re often of an opposite nature-you have your finger on the pulse of the politics but you may not command the respect that a consultant’s “outsider” status conveys.

The biggest thing I miss about being a consultant is being able to “go home” both in the course of the project and at the end of the project. It was fascinating to be able to see, and sometimes even be part of, radically different organizations, as a consultant, knowing that in the end I was associated with my own, comparatively comfortable and particularly well-loved company. It could be bittersweet at the end of long, successful projects, but I’ve made great contacts and friends from those projects, and it was always fantastic to be able to finish up a project where the personalities hadn’t meshed as well and sink back into my own “family” of colleagues.

The thing that I’m most looking forward to, as an “innie,” is the issue of ownership and follow-through. As a consultant, I frequently left a project after the design phase and before implementation. That impacted the sense of pride and ownership of the final design, as well as the opportunity to influence the implementation process (in essence “eating our own dog food” when design elements that seemed strong on paper or in concept prove weak in action).

B&A What are some of the unique challenges financial sites offer?

SB: There are several. Security and issues of trust exist on virtually all sites, especially e-commerce sites, but with an online banking environment issues of security are paramount, and security needs that impinge upon the technological back-end supercede other drivers.

Another challenge I’m facing is the extremely complex nature of this site due to the fact that Wachovia is the nation’s 4th largest bank. We have both “retail” (the personal finance related banking you and I do) and “wholesale” (complex corporate and institutional banking) elements. In addition, Wachovia Securities is our brokerage arm, so from both wholesale and retail perspectives there are brokerage-related issues beyond traditional banking services. For example, our site is supporting both the features you’d find in an online bank and the features you’d find at a site like Schwab or Vanguard. This size and complexity issue leads to a number of impacts. The two most pressing are 1) it is quite hard to accurately define our users and narrow them into discrete personas and 2) it is very challenging to navigate the internal features of the bank (e.g. wanting to default to the bank’s organizational structure as the site’s organizational structure before gaining clarity as to what the bank’s organizational structure is and how it functions). B&A What’s the relationship between knowledge management and IA? (if any?)

SB: It depends. One thing it depends on is how you define knowledge management. I define knowledge management pretty loosely, first as the pursuit of maximizing your organization’s functionality by enhancing communication “Modern” methods of taxonomic classification are attributed to Linnaeus, who introduced his methodology in the 1700’s. Linneaus was a botanist, and taxonomy is generally associated with biology and systematics.”about and sharing of both tacit and implicit knowledge and second as the process of codifying this into a system/repository. The communication and capture piece may be the most critical aspect of KM, and I don’t know how much of a role IA can play in this aspect of KM. When it comes to codifying knowledge into a system, of course, IA will play a critical role in creating an information system that functions as well as it can.

B&A Can you tell me the difference between metadata and keywords?

SB: Metadata, at its broadest, is descriptive information about information. In the traditional library world, metadata is most commonly thought of as the big 3 from the traditional card (now online) catalog: Author, Title, Subject. But there are other fields as well-year published, publisher, shelf list number (administrative info for the library). In the online world, we use metadata for administrative purposes (to know when a document is “stale” and needs to be updated or deleted or to know the nature of a file so we know if we have the correct software to open it) and for retrieval purposes (the subject or keyword).
There are roughly 3 kinds of ways to think about, or classify, metadata:

  1. Intrinsic – information that can be extracted directly from an object (e.g., file name, size)
  2. Administrative/Management – information used to manage the document (e.g., author, date created, date to be reviewed)
  3. Descriptive – information that describes the object (e.g. title, subject, audience)

So, metadata can be quite varied-it may support retrieval (author, title, subject), it may support administration (call number, stale date), or both. As you can see, these categories are not mutually exclusive-administrative data could be used for retrieval purposes (if the system supported that usage) and we could debate as to whether “author” was administrative, descriptive or possibly even intrinsic, as with a piece of artwork.

That leaves us with keywords-what are they? Well, they’re a kind of descriptive metadata, generally describing the nature of the information. Keywords may be extracted directly from the text or they may be extrapolated-selected because they describe the text (subject, topic). The context in which keywords are selected and used is important for this reason. Keywords are by their nature fairly granular-a specific word applied to a specific item, often a narrow subset of a document (like a page or a paragraph), but even this granularity can vary in specificity (e.g., does the keyword describe the element in question specifically or generally?). Keywords are typically used for retrieval, as opposed to for administration.

When keywords are applied to html pages-which is generally done for descriptive and retrieval purposes-they are typically applied via a metatag. This may be what has led to some confusion around the difference between metadata and keywords. The metatag fields in HTML were meant to capture all sorts of metadata; and some are used to capture quite a wide array of information. Keyword seems to be the most commonly used/known of the meta field tags.

B&A How about the difference between taxonomies and hierarchies?

SB: Ah, taxonomies vs. hierarchies. Near and dear to my heart – I’ve just written an article on the uses (and misuses) of the term “taxonomy.” You probably know this, but just in case I’ll give a brief history lesson. Taxonomies have been around for a long time – they are hierarchical schemes for classifying things. Aristotle developed a system of classification in 300 BC. “Modern” methods of taxonomic classification are attributed to Linnaeus, who introduced his methodology in the 1700’s. Linneaus was a botanist, and taxonomy is generally associated with biology and systematics. Other disciplines have borrowed the term taxonomy from the hard sciences to describe their classification systems, so it wasn’t a completely novel act when folks working on the Internet stumbled upon it as a good term for describing what they were doing online. I first encountered the term in 1999 while doing some work with Ernst & Young.Management consulting seems to have been enamored of the term in this context early on- and was completely baffled, as I had only been familiar with the term from my biology courses and had never encountered it in my library science/information science work or reading. Doing more exploration, I concluded that when people were talking about taxonomy on the web they were often talking about the traditional LIS definitions for classification schemes, controlled vocabularies, or thesauri. (I went on a brief mission to convince the Argonauts that we should educate our clients about the LIS terms, but it was more or less a failure, so around 2000 I caved and began using the term taxonomy myself. Now, the terms has become so used, I think it has genuine validity of its own on the web.)

On the web, we tend to play fast and loose with terminology, and that’s true here as well. A strict interpretation of the definition of taxonomy would demand that the scheme be a pure hierarchy with one to one relationships. (Items can be in one place and one place only in the scheme-think of the animal kingdom or a family tree – but I’ve met people who are very comfortable with the concept of polyhierarchical taxonomy. Polyhierarchy being the concept that something can “live” in more than one place in a hierarchy. The most common example of this is “piano” in a scheme of musical instruments; it is both a stringed instrument and a percussion instrument.

Here are a couple definitions:

Traditional definition:

“Taxonomy, a sub-field of biology concerned with the classification of organisms according to their differences and similarities, still uses many of Linnaeus’ original categories. Today the major categories are kingdom, phylum, class, order, family, genus, and species.”
(http://www.ensc.sfu.ca/people/grad/brassard/personal/THESIS/node19.html)

Taxonomy on the web:

“A correlation of the different functional languages used by the enterprise to support a mechanism for navigating and gaining access to the intellectual capital of the enterprise.” (One of the more carefully justified definitions of taxonomy comes from research done by Alan Gilchrist and Peter Kibbey of TFPL, a leading taxonomy consulting firm. The definition can be found in the executive summary of the report “Taxonomies for Business: Access and Connectedness in a Wired World.”
(http://www.tfpl.com/consultancy/taxonomies/_report_/taxonomy_report.html)

B&A What about categories, where do they fit in?

SB: Categories are groupings of like elements (often by subject, but also by other criteria, like form). The groupings that make up taxonomies and classification schemes are categories.

B&A So where does the thesaurus come in? “Right now it’s a very thrilling time – we have a new medium and a new discipline, and a lot of work ahead of us teasing apart what it all means.”

SB: You won’t be surprised to find that I have a classic IA’s answer to this question: it depends. 🙂 A thesaurus is an information retrieval tool that excels at making connections between concepts. Information retrieval thesauri are almost the opposite of the way we think of the thesauruses we were introduced to in elementary school. Those thesauri took a word and exploded in outward, so that when we got absolutely sick of writing “brown” we learned that we could substitute the more exotic “sienna.” An information retrieval thesaurus at its most basic relationship brings concepts together, grouping and clumping like terms. Subsequently the document that mentions the brown crayon and the separate document that discusses the sienna Crayola are both pulled together in the information system that has a thesaurus applied to it.

There are 3 primary relationships that thesauri clarify: equivalent relationships (synonyms, variations; as with brown/sienna above), hierarchical relationships (broader and narrower-or more general and more specific), and associative relationships (related terms). In the classical sense, you only had a thesaurus if all 3 relationships were explicated, but on the Web people have been open to using the word thesaurus when they’re talking about just one or two of the relationships.

B&A Can you get all these things to work together in some way?

SB: Yes! There are a variety of different ways (some of this may be semantic, of course, depending on how strictly you want to interpret the terminology). Here’s an example: you might have a site that employed a high level taxonomy or classification scheme (think Yahoo!). If the taxonomy is polyhierarchical, thesaural relationships could be employed as part of the taxonomy (e.g. Movies: see Film). The thesaurus might also be used to show associated relationships for individual records (e.g., Final Fantasy, see also: Japanese anime). A thesaurus could also be used behind the scenes to enhance the search technology-for example, the taxonomy might only display movies and film but the search engine might use the thesaurus to tell the user who searches for “movie” that the results returned were based on documents indexed by the preferred term “film.” Conversely, the search engine might also use the thesaurus to create search zones-returning results for searches of “8mm” from the documents indexed as relating to film before the other documents.

B&A Does every site need all this stuff?

SB: No, definitely not all this stuff. These are concepts that can be leveraged as tools to support classification and retrieval. It’s basically the same as with search-not all sites need a search engine, for example. Barring the religious war between Jared & Jakob there is the reality that some sites seem to work quite well without search engines (e.g., Gap.com) while other sites are greatly enhanced by them (e.g., Amazon).

But every site needs some of this stuff, perhaps. It’s very difficult to have a functional site that doesn’t have some kind of approach to organization-usually in the form of a classification scheme-regardless of whether it’s a hierarchical taxonomy (a place for everything and everything in one place only), a polyhierarchical taxonomy (a Yahoo!-like scheme where items can be placed in more than one category), or a flat classification scheme (as with the simplest brochure sites), etc.

B&A What about software-can you think of software that could benefit from architecting their information?

SB: A topic worthy of a book, undoubtedly. When I’m looking at information architecture for content I tend to focus on classification, navigation, labeling and search, and there are certainly aspects of most all of these in software programs. Labeling is a huge issue in the functionality of software products, especially because we tend to be dealing with extremely narrow and deep structures with software. Good labels (even in the form of rollovers for icons) can make a significant difference in the users’ ability to understand and use the tools. (An interesting side note here is that generally novice or infrequent users have more success with broad and shallow schemes, something that doesn’t tend to work especially well with software interfaces.)

B&A What is your dream process for creating an architecture?

SB: Dream process, hmmm. Well first it begins with assembling a great team. I’d need to have a sense of the parameters to know what size team to go with, but at Argus we had great success with fairly small teams even for rather significantly sized projects. The best teams are a mix of skills, experience and personality. I tend to be drawn to the bottom-up elements of IA (e.g., content analysis, vocabulary control, indexing, etc.) so I tend to look for people with top-down skills (strategy, heuristics) to balance my approach.

After assembling the team, my dream project would have a dream context -clearly defined scope and goals with clients who value information architecture and are prepared to be advocates in their organization (this would be true whether I was an innie or an outtie; there’s generally some kind of client and stakeholder who can pave the way). But don’t go thinking the dream project would run perfectly smoothly-it would still have enough challenges to keep things interesting. I like projects that are daunting but not impossible.

So, let’s see: team, clients. Then I’d have the team sit down and hammer out a process that had a mixture of things we were comfortable with/had done before and had a high degree of confidence with and a few things we wanted to try out/experiment with. And once we had a rough road map we’d dive in and do the work.

B&A There is a lot of talk about semantic webs and self-organizing systems-automated IA, in other words. Meanwhile our community is talking about getting into Experience Design or getting MBA’s… can you see a future where there are no information architects, just machines and people who know what they do?

SB: I recently had a conversation with Matt Jones, IA for the BBC (his weblog is http://www.blackbeltjones.com/) about this very topic, in a more here and now way. Matt was arguing that he didn’t want information architects at the BBC, he wanted multidisciplinary staff members who were skilled in the discipline of information architecture. I took the position that in a world of ever increasing specialization, coupled with corporate environments that ask people to take on ever more responsibilities, with restricted schedules and budgets, we desperately need an individual in the IA role, both to look out for the IA particular issues and to evangelize. A sort of Lorax role-I am the Information Architect, I speak for the…labeling scheme and the organization structure and the search/browse system and so on and so forth. But that’s today, and you’re really asking about tomorrow.

In the library world there have long been whispers that automation will replace the need for librarians-it was even part of Autonomy’s ad campaign a few years ago. I think that there is a human tendency to both intrigue and scare ourselves with the idea that our creations will make us obsolete. And it is true that automation results in dramatic change. However, instead of making librarian’s obsolete, my experience has been that technology and automation often tends to replace the routine tasks, leaving the more subtle, often more interesting, challenges to be performed by people. So, in the big picture, I have no doubt that automation and technical developments will change the nature of our work as information architects over time. But people have been bending their minds to the nature and need for organizing information for a long, long time, whether as librarians or records managers or database administrators. Right now it’s a very thrilling time-we have a new medium and a new discipline, and a lot of work ahead of us teasing apart what it all means. So, yes, I think our work will evolve and change dramatically, but I don’t think the role is going to go away anytime soon.

B&A So what is the future of Information Architecture?

SB: The gazillion-dollar question that leaves me tongue-tied and tempted to blurt out “heck if I know!” But I think your question about semantic web and self-organizing systems hints at the answer-the immediate future requires stabilizing our role in the academic and business communities and identifying the key challenges and problems that we want to solve in the next 10 years. I think we’ll continue to see a weaving of old, new and newer-advancing technology with respected, well understood concepts and evolving thinking. Whatever the future of Information Architecture turns out to be, I’m excited about being part of the work as it unfolds.

Christina Wodtke is the founder of Boxes and Arrows. Her day job is Partner at Carbon IQ, a small user-experience agency in San Francisco, where she designs information architectures and conducts user research in the quest to create more usable, effective and profitable products.

7 comments

  1. I’m a designer with a keen interest in developing a deeper understanding of the what, how and why of IA. Admittedly, most discussions on the subject sail right over my head. I try to focus, but all I seem to catch is “blah blah information blah, structures blah blah…”

    This discussion was different. I think I get it now.
    My thanks to you, Christina, and to Samantha Bailey for presenting such a thorough and interesting exploration of the past, present and future of IA.

    I feel like organizing something.

  2. > I think good IAs (like many good librarians) are often generalists at heart-people who have a love of learning and a tendency to be interested in practically anything that comes their way.

    Amen, sister. That’s why I like what I do. The world benefits from a generalist. Sad thing, though, is that some large firms don’t see that there is such great value in someone who can wear many hats, e.g. do IA and also write code. In such cases, you’re either with the IA’s or your an IT person. Sigh.

    > I’ve just written an article on the uses (and misuses) of the term “taxonomy.”

    Was interesting to hear your thoughts about the mis-use of the term taxonomy and your efforts to get people in the industry to understand the LIS terms. I work in a corporate library, and we too have adopted the term taxonomy because our customers bought into the term “business taxonomy” early on. I think the problem with the use of the term is that it implies something specific that people don’t necessarily want — a hierarchical structure for classifying aboutness. Using a hierarchy to organize topics is rigid and if you support poly-hiearchy (multiple parents or broader terms) I don’t think you are technically using a taxonomy any more, you’re using a thesaurus. Is my understanding right? Am I being pedantic? It gets troublesome to me as an IA with an LIS bias to deal with terms like taxonomy and faceted classification because my LIS experience tells me that they’re one thing, and I read the blogs and people are defining things much more loosely.

    Was great to hear your innie vs. outtie perspective, by the way. A great discussion.

  3. From what I have observed is if you start to create relationships between the words then you are building a thesaurus. I guess my understanding of taxonomy is from the hierarchcal camp.

  4. Right, ML, connecting related terms (RT) is something you do in thesauri. But my understanding is also that multiple broader terms (BT) aren’t usually implied with taxonomies. That is to say, each node in a hierarchy only lives under one branch with one parent, whereas, with a thesaurus, a term can live in several places in a topic tree.

    I like to point to Amy J.Warner’s example of the term “viral pneumonia”, which you can search in the MeSH browser. That term lives several places in the subject tree. Because MeSH supports multiple BT’s as well as RT’s is not technically a taxonomy, it’s a thesaurus, right? I dunno. Maybe I should let it go. Please correct me if you have seen/heard of a different application/understanding of taxonomies though, because I yearn to understand how businesses are using the term.

  5. MeSH is actually a Subject Heading classification scheme (another subject heading scheme you may have encountered is LCSH, library of congress subject headings). Subject headings are their own substrata of controlled vocabularies and they’re generally created for a specific purpose (e.g., to index items in the library of congress collection by subject). When subject headings are very broad and widely developed, they are often useful beyond their originally intended use–hence MeSh is used by medical libraries and LCSH is used by many college and university libraries. Note the level of granularity in a subject heading scheme–you’re typicallly looking to describe an entire item (usually a book) with 2-4 descriptors; so this isn’t going to be as granular as a back of the book index.

    Different subject heading schemes function in different ways; in the case of MeSH and LCSH I’m inclined to say they’re more like taxonomies than thesauri, but that’s something of a gut feel and is debateable–so it’s safest to say they’re a slightly different beast altogether.

    But they are a controlled vocabulary–as are both taxonomies and thesauri.

    You may want to check out:
    http://www.carl.org/tlc/crs/shed0014.htm

  6. Terrific interview. Samantha explains complex concepts very clearly. I especially like the answer to this question: “Can you tell me the difference between metadata and keywords?”

    I totally agree with this statement:
    “I’m changing my attitude about business development; from something that consultants or folks in small companies do to something that everyone has to do, in some way or another, all the time.”

  7. I just attended Bella Haas Weinberg’s annual Thesaurus Design Seminar in New York yesterday. Dr. Weinberg was the chair of z39.19 (Guidelines for the Construction, Format, and Management of Monolingual Thesauri) and has consulted on a many large thesaurus projects such as the AAT.

    Weinberg said that taxonomy is simply used today to mean “classification”. And while MeSH is a subject heading list, functionally it is the same as a thesaurus. She used MeSH alongside ERIC to illustrate how thesauri interpret the guidelines, both successfully and poorly. (It is included as a source in the Unified Medical Language System’s meta thesaurus engine.) She indicated that one of the main differences between thesauri and subject heading lists is that thesauri are used for post-coordination and SH lists for pre-coordination, and that pre-coordination is essential for print environments.

    In any case, it appears that the term taxonomy is used and understood as loosely in the LIS profession to effectively mean thesaurus. So I’m going with the attitude that “If you can’t beat them, join them”. It is important, however, to understand the nuances of these terms/methodologies as an information professional, which is why I (and three colleagues from my office) go to seminars like these.

Comments are closed.