All About Facets & Controlled Vocabularies

“Our aim is to make this complex and important subject accessible to practicing information architects.”Information architects are fascinated with faceted classification and its application to information architecture problems. However, facets remain difficult to understand and there are few options for learning about them.

This is the first in a series of articles that aims to correct this situation. We intend to explain both facets and the more general concept of controlled vocabularies. We want to make the subject accessible to those who don’t have advanced degrees in library and information science. Furthermore, we want to show how these concepts can be applied to solve information architecture problems for the Web and other digital information environments.

The concept of faceted classification is decades old, and controlled vocabularies go back even further. Consequently a great deal has already been written about the subject. But these writings are not always helpful to the practicing IA. Some are too simple, others too academic. Most are hard to find, and many were written decades before this Web thing happened.

Throughout this series we will strive to be:

Practical. We will give you a practical guide to controlled vocabularies and faceted classification. We will not only explain the concepts, we will show you how to apply them in solving real information architecture problems.
Readable. Too much of the existing literature is hard to understand. It may be comprehensible to someone with a master’s in library and information science, but this excludes a large number of practicing IAs (and we know some librarians who don’t understand this stuff). We will use plain talk to explain this stuff, without dumbing it down.
Relevant. We will make this relevant to the Web and other digital information environments. A great deal was written about this topic in the 1950s and 60s. It’s excellent material, but back then transistors were still a pretty neat idea. We believe that faceted classification has even more applications today than it did back then.
Accessible. Everything will be published here on Boxes & Arrows: on the web, easy to access, and free. While a lot has been written on this topic, it’s often hard to obtain. For example, B.C. Vickery’s excellent book, “Faceted Classification: A Guide to the Construction and use of Special Schemes” was written in 1960 and is rather difficult to obtain today (at least one of the authors has resorted to finding a copy in a library and, in desperation, photocopying the whole thing).

The plan
Our main goal is to explain faceted classification. However, a faceted classification scheme is actually a special case of what are called controlled vocabularies. To properly explain facets we will begin with this more general topic and work our way up to facets.

Our travels through this strange land will include the following:

Controlled Vocabularies. In the first full article in the series we’ll describe controlled vocabularies in general. We’ll talk about what they are and how they work.
Synonym Rings & Authority Files. Before moving on to facets, we’ll describe these simpler types forms of controlled vocabularies. There are many situations where they are more useful solutions because they’re easier to create, implement, and maintain. Sometimes they’re not enough and it’s time to step up to facets.
Facets & Facet Analysis. With the fundamentals in place we will move on to the heart of our subject. This will take a while, but it’ll be worth it. We’ll also take time to describe facet analysis, the process used to develop facets.
Interface Issues. A long-standing weak point of controlled vocabularies is how to use them effectively in an interface. This is particularly true of facets. We’ll explore these issues and give you the best advice we can.
Decision Factors. Not every project calls for a full blown faceted solution. Sometimes a synonym ring is better. How do you know? We’ll cover some guidelines for making those decisions.
Future Directions. There are some interesting new applications related to facets and controlled vocabularies such as XFML (http://www.xfml.org/) and Topic Maps (http://www.topicmaps.org/). We hope to cover these as well.

Some final thoughts
That’s a lot. And yes, we’re ambitious. But no, we aren’t writing the definitive treatise on the subject. Our aim is to make this complex and important subject accessible to practicing information architects.

We view this as a collaborative effort. We anticipate many questions. We’ll answer these through the discussion features of Boxes & Arrows. We also plan to address the bigger questions you have in subsequent columns. Let us know what you want to know, and we’ll do our best to provide you with answers.

Karl Fast is a PhD student in library and information science at the University of Western Ontario. He also has a master’s in LIS. His graduate work has included courses on organization of information, subject analysis, thesaurus construction, and facet analysis.

Fred Leise, president of ContextualAnalysis, LLC, is an information architecture consultant providing services in the areas of content analysis and organization, user experience design, taxonomy and thesaurus creation, and website and back-of-book indexing.

Mike Steckel is an Information Architect/Technical Librarian for International SEMATECH in Austin, TX.

17 comments

Anonymous says:

March 10, 2011 at 3:35 pm

Has the Facet Classification article ever been posted? I was unable to find it, but would really like an intro to it through your great way of explaining things.
ML says:

December 10, 2002 at 10:49 am

I think the most difficult part of applying a cv or faceted classification is the actual integration of one with an information system(s) and implementing a workflow to support it. You’re lucky if you’re only dealing with just 3 or 4 people but if you have an intranet with hundreds of users, then you have a little problem.

Other challenges that come up is cost, maintenance, technical infrastructure(data models), and balance of expertise during construction vs. content authors experience/skills in applying it.
Nick Finck says:

December 10, 2002 at 1:24 pm

One of the issues that has me concerned when it comes to faceted classification is the ability to properly produce such a classification system using existing architecture. When I say architecture I mean database systems and backend technologies. Rational has a database that may be the only fit asside from using something like Oracle with complicated customizations. Any other ideas at what kind of databases this could be directly applied to?
Phil Murray says:

December 10, 2002 at 2:11 pm

I look forward to the series of articles. It’s interesting how people start discussions of various forms of classification from different perspectives. Some start with “taxonomies” as the most general case. You start with “controlled vocabularies.” I’m most comfortable with Dagobert Soergel’s “knowledge organization systems.” (See “Taxonomy of Knowledge Organization Sources/Systems” at the Networked Knowledge Organization Systems/Services
NKOS Web site (http://nkos.slis.kent.edu). Hope that link is still good.
Suzanne Sheppard says:

December 11, 2002 at 7:26 am

I am surprised to see that the use of the phrase “thesaurus” is yet to come – please do put it into context for us in your series of articles as it is another term that confuses people in this space.
I am looking forward to the rest of the articles – are you able to provide the titles in advance? What is your timescale for writing them – as you can tell I will find it very useful to see other IA’s views on this.
In response to Nick Finck on tools you may be interested to see the toolset provided at http://www.schemalogic.com.
Good luck.
Karl Fast says:

December 11, 2002 at 10:35 am

Here are some answers to questions about the upcoming facet articles:

1. It will take a while for us to cover all the material. Six months, maybe more. Depends on a few factors.

2. The first article will be posted next week.

3. The second article is in the works. You should see it in January.

4. We do hope to address terminology. It’s a big problem. The irony is that controlled vocabularies is lacking a controlled vocabulary. There is no consistent and clear use of terminology. We can’t fix this, but we will try to explain things.

5. We also hope to address the integration issues. This is a significant issue, as noted in the comments above. Our first priority is fundamentals independent of imlementation. If we have the time and energy we’ll deal with implementation.

6. This is evolutionary. We haven’t create a detailed outline. We want to be responsive to what people have questions about.

Hope this helps.

–karl
Mike Steckel says:

December 11, 2002 at 2:50 pm

I wanted to take a moment and build on what Karl said. I like his numbering thing, so I will use it also.

1. Terminology — I see confusion about this everywhere in the CV world. Often it is spoken of as though someone wants to know the “correct answer.” There is an implicit “top-down” assumption that is at the core of this question, as though there is an authority out there who will say, “This is what a taxonomy is” or “This is what a thesaurus is.” I think there is an emerging consensus for the IA field taken from the polar bear book, the Blueprints book, Amy Warner, etc. The first article will reinforce the terms used in these places. Outside of the IA world, where we do most of our communication, people may use very different terms to describe the same thing, here is where terminology becomes more “bottom up.” Once we have an understanding of the various ways to slice the questions, translation of terms from other fields becomes easier. Furthermore, it may help us to improve the understanding people from other fields take away after interacting with IAs. The question will not be resolved anytime soon, if such a thing is possible.

2. Implementation — I expected a lot of questions about implementation. I frankly would like the series to help people to think about controlled vocabularies more deeply and wait a little to think about implementation. Not that we won’t address it at all, but a more complete understanding of the nature of controlled vocabularies will help an IA communicate with vendors, IT staff, etc. and make us less dependant on being fed information by others. Sometimes, I suspect, we implement too quickly. Oftentimes, the subtleties of CVs cause us to miss important ideas, or act on faulty assumptions, that we don’t discover until our urge to implement makes fixing them more difficult. The other problem with discussing implementation is that each conversation is extremely dependant on a particular situation as far as technology, strategy, and users is concerned. It becomes difficult to compare oranges with oranges.

I wouldn’t be surprised if the best part of the series takes place here in the discussion area!

— Mike Steckel
M says:

December 12, 2002 at 10:00 am

Mike, on your 2nd point. I have to somewhat disagree. I think understanding why you need the CV in the first place and what it means to create it for your system(tech, workflow,etc) is very important. I agree the needs are not like comparing oranges to oranges. With business justification, you have to show the project has real costs…otherwise how will the business fully support you? Questions such as ROI and value propositions will come up as well. I can’t imagine building a CV as just an intellectual exercise, I have to see the options for direct application for me to be on board.
Mike Steckel says:

December 12, 2002 at 12:44 pm

I should clarify a little. Why you need to CV in the first place is hugely important. I am in no way disagreeing with that. This particular series of articles and discussion spaces are a unique place to slow down and think about the larger questions of CVs. I am not saying you should do this for a real world CV. I am saying that there are a huge number of variables for every CV when it comes to implementing it, so many that writing and talking about CVs can get overwhelming. My theory is that this is why there are so few “How to create a CV” articles out there (we have one on the way in the series though). Each implementation will answer a unique situation that can make communication between project languages difficult. What is often lacking is understanding of the larger philosophical questions that will make your CV better. I am saying that sometimes we move too quickly to “what kind of software should we use” when we have not considered “Which type of CV should we use” thoroughly enough. When you consider the CV question from a variety of ways while reading the B&A articles, once you get to your particular project, developing your business justifications should become easier.
ML says:

December 12, 2002 at 1:20 pm

Mike, I agree. The literature out there doesn’t really look at the bigger picture questions you mention. So with that in mind, the other questions that always gets asked…how big and how deep will this thing be(scope type questions).

Going through this thread, I’m getting totally excited about what these articles will provoke…
Mike Steckel says:

December 12, 2002 at 2:44 pm

Well now you are making me nervous!

As far as scope goes, I would review Karl’s comments above. We have an idea, but are going to try to stay open to new directions. It will be an improvisational piece in some ways too.
Jon says:

December 15, 2002 at 7:50 pm

“Information architects are fascinated with faceted classification and its application to information architecture problems.”

Quite honestly, I’m not fascinated in the least bit because I don’t have a clue what said classification is. Your brief didn’t whet my appetite, because it didn’t give even the slightest clue of what it is you are talking about. Care to drop a hint? What is a faceted classification? I’m sure I’ve just stuck a big foot in my pseudo-intellectual mouth, but I really don’t care if I come across as an ignorant sap. I haven’t the first idea of what this thread is about ..

—

a small followup: try posting a comment in IE6 and leave out your email address. Then, on the following screen that appears, try adding your email address … with javascript errors turned on, verbose. such pain.
Karl Fast says:

December 16, 2002 at 6:01 am

Oh, I’m sorry. I wrote that IAs are fascinated with faceted classfication because it’s a popular topic on SIGIA-L and in other IA venues. Of course you’re right–not everybody knows what faceted classification is.

The basic idea of classification is to organize things so that you can put similar things together. Think of classification in biology. In libraries, classification is used to arrange books so that related books sit together on the shelf. Find a book on backpacking and next to it will be other books on backpacking, canoeing, and other outdoor adventure activities. This is what the Dewey Decimal Classification system does.

The problem with traditional classification is that it assumes you’re arranging physical objects. That book on backpacking can only be in one place. There might be five places where it could go, but you have to pick the “best” one. That’s limiting.

Faceted classification is more suited to the flexibility of the computer environment. Instead of classifying the objects, you come up with concepts that describe these objects. Then you organize and arrange these concepts into what are called facets, and build relationships between the concepts. Don’t worry about the details of this; we’ll cover that in the articles.

The key point is that in faceted classification you’re not organizing things, you’re organizing the concepts which describe those things. This turns out to be harder than it sounds. But it’s
also immensely useful for developing information architectures. We’re going to tell you all about it over the next few months.

As mentioned in the introductory piece, we’re going to start with the more fundamental concept of controlled vocabularies and then work our way up to facets. The first article will be posted in a few days.
Vivian Bliss says:

December 16, 2002 at 9:32 am

Excellent! I look forward to the forthcoming articles and participating in the disucssions. I completely agree with Mike Steckel on the importance of understanding CV’s before asking about software. Keeping people in a software company from jumping to the software question is a bit difficult. Sometimes it was successful. Sometimes it was not.
Paul Lawson says:

December 27, 2002 at 1:30 pm

Marvellous! Especially the evolutionary way, as a group, you wish(and by design) hope you and your respondents will go about this.

David Landes (1999:201) spoke of three, critical reasons for the Industrial Revolution to be successful where it was when. The reasons appear to continue to apply, somewhat later, in this non-geographic space and asynchronous time.

1. Autonomy(emphasis by Landes) of intellectual enquiry.
2. Creation (my emphasis,his words) of a language of proof…understood across national and cultural boundaries.
3. The invention of invention…the routinization (his emphasis) of research and(my emphasis,hiswords) its diffusion.

You have commenced something valuable and my instinct is that ‘we’ will co-learn something necessary. Who the ‘we’ turn out to be must fascinate you. Which ‘ants’, which ‘sugar’, which pattern? I’ll be staying tuned.
Peggy Lillis says:

June 2, 2003 at 11:22 am

I’m looking forward to the series of articles. When will more be posted?
John Howe says:

October 29, 2003 at 6:15 am

Excellent, readable overview of CVs! Maybe you (or B&A editors) could make the series more usable, though, by adding navigation at top and bottom of each article/page–*numbered* page titles, with proposed future (in-progress) articles included and flagged as such. Browsing, skimming through the articles, I got lost and flailed around trying to figure out if I had missed the culminating article on facets.

Thanks again for your *generous* contribution.

Comments are closed.

Share this: