18 thoughts on “What Is A Controlled Vocabulary?”

  1. The use of of the word “ontology” to describe the creation of knowledge domains actually pre-dated the W3C. It was actively used in the AI field in the 80s (and probably earlier) for people active in designing expert systems., which involves more than controlled vocabularies. In creating models, one is also creating theories about how the models will be designed (for example, describing how spatial and temporal concepts shall be represented). Many people active in ontology work for the past three decades have backgrounds in philosophy and mathematical logic. Some philosophers disdain “applied ontology” work, but it seems like a reasonable extension. The epistomological aspects are part of the knowledge elicitation processes within ontological engineering, so it’s all related. 🙂

  2. This is a great article! I’m going to ask my students to read it. I’m an instruction librarian and many of my students have a tough time with the concept of “controlled vocabulary.”

  3. It might be useful to point out that the term relationships you describe are specified in a formal standard: “Construction of Monolingual Thesauri” — ANSI Standard Z39.19.

    There are relatively inexpensive software applications to support construction and management of thesauri. Most, if not all, support the ANSI standard. See http://www.asindexing.org/site/thessoft.shtml.

    For those interested in “faceted classification,” Peter Van Dijck and I have just started a Yahoo mailing list at http://groups.yahoo.com/group/facetedclassification.


  4. Just so people know, we’ll be getting to standards later in the series (I think). And we’ll definitely be covering facets (which was the impetus for these articles).

    For now though, we’re covering first principles.


  5. While we are on the subject of vocabulary, I’m going to share my experience of reading this article…

    In Britain, and maybe other English speaking countries, the ‘self-explanatory’ example of Gap’s labelling is actually bewildering. Pants are underwear in the UK. The US sub-set ‘trousers’ is the UK super-set for all the jeans, dungarees, slacks, etc mentioned here. That’s for about 57 million people, probably more.

    I was gently amused by the images the article conjured, but then realised my point was not as trivial as I first thought. Once the science of pattern-making is explained, the next step is the art of choosing the appropriate content. ‘Pants’ is not a culturally sensitive choice of example… but it does raise the question of how one deals best with clusters of users holding not only variant, but contradictory, semantic hierarchies.

    I look forward to acknowledgement of cultural aspects in the promised future column on building CVs.

  6. Excellent point, Ann! I even lived in England for a year and had an embarrasing incident involving a misunderstanding of the meaning of “pants” (which I will not replay here) and it still didn’t register with me when doing the article. It is very difficult to overcome our cultural understandings and your point is far from trivial. Sites should probably assume their audience is a global one. It also raises a good point about user testing. Often, we test people from within our own culture and don’t realize we have used an example with a different meaning elsewhere until someone from another part of the world points it out to us. Thanks.

  7. One advantage of a controlled vocabulary is that you could easly switch terms such as trousers and pants, depending on whether the user perfers British or American English. I suppose one would require metadata to implement that kind of conversion, but you certainly can’t do it without a CV.

    Even without hierarchy, having a set of standard terms also makes information more accessible to people who don’t speak the language well, or are simply unfamiliar with the jargon. When I don’t know what’s going on, seeing consistent labels makes me more confident in my understanding and I trust the presentation much more.

  8. Controlled vocabularies are being used on web sites and in search engines to aid site search, and are also closely related to so-called “knowledge ontologies”, which are integral to the W3C’s Semantic Web initiative. See:



    Work on knowledge ontologies is also at an advanced stage in medical informatics. See (eg):

  9. The example may confuse those not accustomed to thinking carefully about hierarchies. Several kinds of pants, e.g., “Casual pants” are shown as Narrower Terms under Men’s pants, Women’s pants, and Children’s pants. This treatment is both illogical and not helpful to users. For example, if your user has navigated to “Men’s pants,” you don’t want to tell him/her that “Casual pants” are a kind of men’s pants, when casuals will include women’s and children’s too. Perhaps the authors are assuming a vocabulary would include three different terms that looked identical but required the next broader term (e.g., “Men’s pants”) to specify their meaning. But if you’re operating this way you’ve gone beyond controlled vocabulary into classifications and notations.

  10. Before being co-opted by people of the W3C, et. al, the word ontology used to mean and still means to me the metaphysical study of the nature of being and existence. Ontology is related to metaphysics, which is the study of being and knowing, as well as epistemology, which is the philosophical theory of knowledge.

    What the W3C and others mean when they use (rightfully described as incorrectly use) the word is the phrase ‘knowledge domain’. A knowledge domain is a controlled vocabulary and associated phrasings, i.e., representation language, used to express the content of a particular domain or field of knowledge. When the W3C and others discuss ontology, they not discussing ontology at all. Actually, they’re moving closer to the concept of epistemology.

    This confusion should be nipped in the bud. It’s akin to the mistaken use of the word terror, as in ‘terror attacks’ when people, whether in government or in media, mean ‘terrorist attacks’.

  11. This business about taxonomies today being different from Linaean taxonomies is something I’ve read several times today in different places. It must be in somebodies book that way, but it isn’t correct.

    The terminology I ran across on an EDS ecommerce site is taxon and infon. Taxons are decisions. They result in branches. And, infons, the leaves of the decision tree, are the things being sorted out into the categories established by the taxons.

    In Linean taxonomy, you have to ask the question “Hair?” to get to mamals. In your example, I have to ask “Man, woman, or child?” to get to Mens Pants. So there is no difference in terms of what is going on.

    Hierarchy is a matter of a parent and child like car and engine. Hierarchies only get wide when there are decisions embedded in them. Even where I have a taxonomy like Truck, Ford Truck, Chevy Truck, I am asking Ford or Chevy. The only time you don’t ask a question is if there is only one child.

    In electronics wires can lead from one point to many. That is represented by the same kind of lines you drew between your entities in your taxonomy. Where they intersect constitutes a “wired OR.” This means that there is a decision constituted by some logic.

    We ask customers to naviage our taxonomies. They do this by asking questions. In e-commerce, shopping is naviagation. Shopping is asking questions like where is the men’s department?, where are the pants? Does anything fit? Those same questions need to be in the taxonomy if we our customers are going to have an experience congruent with their prior experience.

  12. In response to David and his comments about taxonomies, the word taxonomies is a problematic one in this field. Taxonomy has become sexy and somewhat generic. This has created much confusion (even I get confused). Personally, I’d like to fit the word with some concrete galoshes and toss it over a bridge.

    Taxonomy has been adopted by businesses to mean, roughly, a classification scheme. Now the purpose of a taxonomy is to classify and organize, and so this somewhat synonymous use of the word is understandable.

    However, classification in library and information science is a vast and rich subject that extends well beyond how a taxonomy, even a Linaean taxonomy, classifies things. The famous Dewey Decimal system is basically a hierarchical classification system, but it’s not a taxonomy.

    In our articles we are talking about “controlled vocabularies.” A taxonomy is a type of CV, consisting of preferred terms, all of which are connected in a hierarchy or a polyhierarchy. Terms in a taxonomy may also exhibit associative relationships, but it doesn’t have to. If it does have associative relationships we usually call it a thesaurus.

    To put it a bit differently, controlled vocabularies is the generic type of “classification” system (I use that word loosely here). Synonym rings, authority files, taxonomies, and thesauri are all types of controlled vocabularies. What separates them is the types of term relationships they support. The simplest is a synonym ring. The most sophisticated is a thesaurus. A taxonomy is at the high end.

    We are working on a glossary of terms for this field, organized as a hypertext controlled vocabulary. It should be published soon and should help clear a lot of this up. I hope!


  13. I have just discovered your series of articles. Excellent writing! By the way: GAP has now implemented a search on its website. Unfortunately they are not using a controlled vocabulary… Search for “dungarees” and you will get the “We’re sorry, there were no results found for “dungarees” in all departments.”-message.
    Kind regards, Felix

Comments are closed.