<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Boxes and Arrows &#187; Fred Leise</title>
	<atom:link href="http://boxesandarrows.com/author/fredleise/feed/" rel="self" type="application/rss+xml" />
	<link>http://boxesandarrows.com</link>
	<description>Boxes and Arrows is devoted to the practice, innovation, and discussion of design; including graphic design, interaction design, information architecture and the design of business.</description>
	<lastBuildDate>Tue, 18 Jun 2013 16:03:09 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>Content Analysis Heuristics</title>
		<link>http://boxesandarrows.com/content-analysis-heuristics/</link>
		<comments>http://boxesandarrows.com/content-analysis-heuristics/#comments</comments>
		<pubDate>Mon, 12 Mar 2007 07:40:36 +0000</pubDate>
		<dc:creator>Fred Leise</dc:creator>
				<category><![CDATA[Deliverables]]></category>
		<category><![CDATA[Deliverables and Documentation]]></category>
		<category><![CDATA[Methods]]></category>
		<category><![CDATA[Process and Methods]]></category>

		<guid isPermaLink="false">http://boxesandarrows.com/content-analysis-heuristics/</guid>
		<description><![CDATA[Many Web professionals consider content inventories critical parts of most projects. Are there certain specific things to look for during a content inventory? Fred Leise definitely thinks so. He proposes a set of content analysis heuristics and discusses how to utilize each one.]]></description>
				<content:encoded><![CDATA[<p>Most website designers are aware that an important part of understanding the background of any website redesign project is performing a content inventory as well as a content analysis.</p>
<p>After all, authorities Lou Rosenfeld and Peter Morville include this famous Venn diagram in their classic <i>Information Architecture for the World Wide Web</i>:</p>
<p><img width="277" height="209" align="right" src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/content-analysis/leise.iavenn.jpg" alt="leise.iavenn.jpg" /></p>
<p>Clearly, we are supposed to understand the current website content before we begin the process of redefining and reorganizing the website.</p>
<p>So we all dutifully go through the website and prepare a content inventory spreadsheet capturing page titles, details of page content, and so on. <img width="574" height="542" align="right" src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/content-analysis/leise.contentinventory.sample.jpg" alt="leise.contentinventory.sample.jpg" title="leise.contentinventory.sample.jpg" /></p>
<p>Each content inventory contains a different set of columns and fields; each has a purpose specific to the needs of the particular site being analyzed. Sarah Rice has developed <a href="http://iainstitute.org/tools/download/RiceContentInventory.xls">another example(xls)</a>  that&rsquo;s available as part of <a href="http://iainstitute.org/tools/">the IAInstitute&rsquo;s tools project</a>.</p>
<p>Sarah&rsquo;s version captures additional information from the site, uses an indented format for capturing the page titles at different hierarchical levels and uses color coding to indicate content types, external links and open questions.</p>
<p>So doing a content inventory is all well and good, but what exactly is it about the content that we are supposed to understand? What are we supposed to tell our client, other than that the website has 4,321 pages, of which 358 are dead-ends, 427 have no page titles, 27 have content that has expired, there are 432 different document templates in use, and there are 17 distinct document types?</p>
<p>In her 2002 article on rearchitecting the PeopleSoft website<sup><a href="#fn1">1</a></sup>, Chiara Fox noted that document inventories and analyses form part of bottom-up IA. &ldquo;It deals with the individual documents and files that make up the site, or in the case of a portal, the individual sub-sites. Bottom-up methods look for the relationships between the different pieces of content, and uses metadata to describe the attributes found. They allow multiple paths to the content to be built.&rdquo;</p>
<p>Certainly content relationships are important, as is the development of appropriate metadata to describe content, but are there specific things we can look for during a content inventory? In the remainder of this article, I hope to show that the answer is a resounding &ldquo;Yes.&rdquo;</p>
<h3>Content Analysis Heuristics</h3>
<p>
In the fall of 2006, I was working on a navigation taxonomy project for a major media industry client that was redesigning its public-facing website. It was while preparing the content analysis report for that client that I developed the following set of 11 heuristics for analyzing website content.</p>
<ul>
<li>Collocation</li>
<li>Differentiation</li>
<li>Completeness</li>
<li>Information scent</li>
<li>Bounded horizons</li>
<li>Accessibility</li>
<li>Multiple access paths</li>
<li>Appropriate structure</li>
<li>Consistency</li>
<li>Audience-relevance</li>
<li>Currency</li>
</ul>
<p>These heuristics provide an important way to organize my report and help me identify significant problems that I might not otherwise notice. They provide qualitative results and indicate general trends, but are not statistically valid in the strict sense.</p>
<p>While you can use heuristics for any kind of website or intranet, regardless of size or content, certain heuristics may be less applicable for some sites. For example, a game site that is designed to encourage users&rsquo; exploration may not present bounded horizons. In fact, it would be doing gamers a disservice to let them know the entire game path from the start. So some evaluation is necessary as to whether or not (or how strongly) a specific heuristic should apply to the site you are designing.</p>
<p>Each of these heuristics will be discussed in detail in turn.</p>
<h3>Collocation</h3>
<p><i>Bring together items with similar content or items about the same topic in one area.</i></p>
<p>Users should be able to find all relevant content easily. Accordingly, collect related content in one area, or at the least, make it accessible through one area.  While the exact way content is related may differ (e.g., by document type, by subject, by author, by date), the information that users will want to find in one place should be in one place.</p>
<p>Obviously, if the quantity of content is large enough, users may have to visit different subsections to view all of the related content. In that case, the content organization itself should make it easy for users to understand how different areas are related and how. When those areas are viewed together, they will provide a unified picture of the product or subject of interest.</p>
<p>The important point here is to not have &ldquo;dangling&rdquo; content that lives in one area perhaps because of historical growth of the website, while most of the related content is accessible in another area.</p>
<h3>Differentiation</h3>
<p><i>Place dissimilar items or items about different subject areas in different content areas. Use navigation labels for different areas that clearly indicate those differences.</i></p>
<p>One of the typical ways that websites break this guideline is in the use of Frequently Asked Questions. FAQs often bring together a wide variety of topics on issues that are important for users. Perhaps website creators think they are making it easier for users to find information when they put everything &ldquo;important&rdquo; in one place.</p>
<p>The problem for the user is that their search for specific information becomes like looking for the proverbial needle in a haystack. Unless FAQs use a well-thought-out topical arrangement, users may have to read through every question in a long list to find the particular information they are looking for. How much better it would be to separate this content into meaningful sections!</p>
<p>The World Bank&rsquo;s website is one example of an organized set of FAQs. They use four main topics and clearly identify secondary subject areas for each. Yet even this example is not totally successful in using a good topical arrangement, as the &ldquo;Ask the Expert&rdquo; section contains the usual miscellany of important information without topic differentiation.</p>
<p style="text-align: center;"><img width="399" height="395" src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/content-analysis/leise.worldbankfaq.jpg" alt="leise.worldbankfaq.jpg" /><br />
<a href="http://web.worldbank.org/WBSITE/EXTERNAL/EXTSITETOOLS/0,,contentMDK:20062181~pagePK:98400~piPK:98424~theSitePK:95474,00.html">&ldquo;World Bank Website: <span class="caps"><span class="caps"><span class="caps"><span class="caps"><span class="caps">FAQ</span></span></span></span></span> Section, February 2007&rdquo;</a></p>
<h3>Completeness</h3>
<p><i>All content mentioned or linked to should exist.</i></p>
<p>In this day and age, there is no excuse for the 404 Error, Page Not Found. Nor is there any excuse for the &ldquo;Under Construction&rdquo; sign on a page. If the content doesn&rsquo;t exist, don&rsquo;t lead the user to where it might be sometime in the future.</p>
<p>If you mention a related topical area, be sure that content is actually on the website. Directing users to non-existent information simply breaks their trust in the website.</p>
<h3>Information Scent</h3>
<p><i>Content labels should be appropriately descriptive of content so that users know they are on the proper path to finding the information they are looking for. Content labels should therefore also reflect information collocation and differentiation.</i></p>
<p>The idea of information scent was first developed by Peter Pirolli, Stuart K. Card, and Mija M. Van Der Wege of the famous Xerox Palo Alto Research Center (PARC). In their paper [2], they note that, &ldquo;Information scent is provided by the proximal cues perceived by the user that indicate the value, cost of access, and location of distal information content. In the context of foraging for information on the World Wide Web, for example, information scent is often provided by the snippets of text and graphics that surround links to other pages. The proximal cues provided by these snippets give indications of the value, cost, and location of the distal content on the linked page.&rdquo;</p>
<p>Simply put, a good website will provide users with strong clues as to the content that can be found by clicking on a specific link.</p>
<p>In his Alertbox column of June 30, 2003, Jakob Nielsen says, &ldquo;ensure that links and category descriptions explicitly describe what users will find at the destination. Faced with several navigation options, it&rsquo;s best if users can clearly identify the trail to the prey and see that other trails are devoid of anything edible.</p>
<p>&ldquo;Don&rsquo;t use made-up words or your own slogans as navigation options, since they don&rsquo;t have the scent of the sought-after item. Plain language also works best for search engine visibility: searching provides a literal match between the words in the user&rsquo;s mind and the words on your site.&rdquo;</p>
<h3><span class="caps"><span class="caps"><span class="caps"><span class="caps"><span class="caps">BOUNDED HORIZONS</span></span></span></span></span></h3>
<p><i>A site&rsquo;s users should be able to easily understand the breadth of content they are looking at.</i></p>
<p>While a labyrinthine website that leads users along a single, linear path through groves of rambling information might be appropriate for a conceptual artist&rsquo;s site, such a principle for organizing content is useless in most cases.</p>
<p>Users should be able to identify in relatively short order the depth and breadth of relevant content. <br />
Providing good navigation cues and a strong hierarchical structure when appropriate means that users quickly learn how long their search for information may take. They can thus make an informed decision whether to continue content exploration on your site or to abandon ship and continue elsewhere.</p>
<h3>Accessibility</h3>
<p><i>Users should be able to access the content they want through the browsing hierarchy or by using search.</i></p>
<p>It may seem obvious, but I have seen sites where search is so poor and navigation hierarchy so limited that it is hit-or-miss whether users can find what they seek. Often, information is hidden by contextual links to content areas not exposed in the main navigation. You are no doubt devoting considerable time and effort to creating content. Let users find it.</p>
<h3>Multiple Access Paths</h3>
<p><i>Because users think about content in different ways, they should be able to take multiple paths to get to specific content.</i></p>
<p>Facets provide one of the important ways to provide multiple paths to content. I&rsquo;m looking for a blue coat to go with my gray suit. Or I want a wool sweater, because my cotton one won&rsquo;t cut it in Boulder, Colorado. My wife says it <i>has</i> to be Prada. Size, color, material, designer: each can be the most important way for someone to find an item or some content.</p>
<p>While faceted navigation schemes are often useful for e-commerce sites, they can also be especially useful for information-rich sites. You might provide search filters by document type, date, or author in addition to subject. For scientists, methodology or researcher often become more important than subject in finding relevant research papers.</p>
<p>Multiple access paths provide greater findability for more users.</p>
<h3>Appropriate Structure</h3>
<p><i>Organization of content should (1) match users&rsquo; mental models of the information space and (2) support the differences in users&rsquo; information-seeking behaviors: known-item finding; exploratory browsing; unknown information finding; and refinding.</i></p>
<p>Whether you have multiple access paths or a single hierarchy, the organization and structure of your site should be appropriate to both the nature of the content and to your users.<br />
As with many of these heuristics, there is no single &ldquo;best&rdquo; approach. Rather, based on your knowledge of business context, users, and content, determine whether content access structures are valid for the specific context.</p>
<h3>Consistency</h3>
<p><i>Whenever possible, content structures in similar content areas should be consistent.</i></p>
<p>If all of your products have accessories, they should be accessible through similar links or tabs or icons. Consistency enables users to more quickly build a mental model of your site and to understand how to find information.</p>
<p>Think of the rather complex page structure on Amazon.com for a book:</p>
<ul>
<li>Cover illustration</li>
<li>Title</li>
<li>Author</li>
<li>List price</li>
<li>Savings</li>
<li>Availability</li>
<li>Delivery information</li>
<li>New/used copies</li>
<li>Customers also bought</li>
<li>Editorial reviews</li>
<li>Product details</li>
<li>What customers ultimately buy</li>
<li>Help others find this book</li>
<li>Customer tags</li>
<li>Customer reviews</li>
<li>Customer discussion</li>
<li>Listmania</li>
<li>Recently viewed items</li>
<li>Similar items by category</li>
<li>Similar items by subject</li>
</ul>
<p>Who in their right mind would create such a structure? Obviously people who did lots of research on their users. Why does this structure work? Because once we have seen it, we know that we will see it again and again and again. Power users of Amazon.com probably know exactly how many turns of their mouse&rsquo;s scroll wheel it takes to the to the information they want.</p>
<p>This book product page may be long and complex, but it is consistent in structure and format. We know what to expect. A good website provides users with a consistent experience.</p>
<h3>Audience-Relevance</h3>
<p><i>Content organization allows different audience segments to easily find relevant content.</i></p>
<p>This heuristic is especially important if your site&rsquo;s audience comprises multiple distinct segments, holiday travelers and business travelers, or students and faculty. In some cases, it might be appropriate to use audience segment as the primary way to organize information.</p>
<p>Additionally, audience relevance may be legally mandated. Drug websites, for instance, are governed by <span class="caps"><span class="caps"><span class="caps"><span class="caps"><span class="caps">FDA</span></span></span></span></span> regulations dictating that prescribing information should be available only to health-care professionals, not the general public.</p>
<p>However, even with a relatively unitary audience, you want to be sure that the site&rsquo;s labeling system is appropriate for its users. It is also important that the site mirror how users think about the site&rsquo;s content.</p>
<h3>Currency</h3>
<p><i>Content should be kept up to date.</i></p>
<p>Nothing frustrates a user more than finding that the information you provide is out of date: you don&rsquo;t make that product any more, that color is out of stock, or that drug is no longer indicated for that condition.</p>
<p>Put an expiration date on all content through your <span class="caps"><span class="caps"><span class="caps"><span class="caps"><span class="caps">CMS</span></span></span></span></span>, thus ensuring that it is reviewed for currency on a regular basis. That is a good way to ensure that you website provides users with information that is still valid.</p>
<p>Another way to ensure currency is to have a good website maintenance plan in place. Such a plan should cover, among other things: who is responsible for content reviews, extraordinary internal and external events that should automatically trigger a content review, and how users or content authors can suggest a review.</p>
<h3>Conclusion</h3>
<p>Although the above eleven heuristics provide good qualitative information, you may find it helpful to add a five-point scale (derived from the Lickert Scale), indicating how well the site under analysis conforms to the heuristic:<br />
1. Strongly deviates from the heuristic<br />
2. Deviates from the heuristic<br />
3. Neither deviates nor conforms to the heuristic<br />
4. Conforms to the heuristic<br />
5. Strongly conforms to the heuristic</p>
<p>Providing such a scale may help the client understand the results of your content analysis better than a purely descriptive report.</p>
<p>Whether you use a numerical scale in discussing the results or not, provide your client with a written content analysis heuristics report. You can offer the analysis as part of your content inventory or content analysis report, or you can create a separate document entirely. The report should include sections describing your evaluation of the website using each of the heuristics (if applicable). In discussing each heuristic, indicate how well the site meets the heuristic in general and then note instances for improvement, or places where the site does not conform to the heuristic at all.</p>
<p>The following are several sections from an actual content analysis report that used these heuristics (modified to mask the company&rsquo;s identity).</p>
<p style="margin-left: 40px;">Although [company].com is relatively good at gathering like content into one area, there a number of exceptions. For example, information on money and vacations is available as a content sub-area under both the Money and the Vacations topical areas. However, the content is different in each place. In essence, there are two separate areas dealing with the same subject of money and vacations.</p>
<p style="margin-left: 40px;">The most significant problem area with regards to collocation is the Specials section, which offers much content that would be best distributed among and combined with other areas of the site.</p>
<p style="margin-left: 40px;">&hellip;</p>
<p style="margin-left: 40px;">The [company].com Specials section is the primary place where the principle of differentiation is not observed. It combines subject areas such as health, relationships and travel, along with a number of the company&rsquo;s special projects. Because this content area contains such disparate information, users may not always spend enough time looking through it to find relevant information.</p>
<p style="margin-left: 40px;">&hellip;</p>
<p style="margin-left: 40px;">Because labels on the website often reflect a supportive and encouraging emotive vocabulary, those labels sometimes obscure important information. For example, it is doubtful that a user looking at &ldquo;Tips for Living&rdquo; would realize that there is information on home decoration and time management in that section.</p>
<p style="margin-left: 40px;">&hellip;</p>
<p style="margin-left: 40px;">[Company].com generally supports exploratory browsing and unknown information finding. However, shortcomings in the search results (a limit of only 21 results) sometimes make it difficult for users to find specific information.</p>
<p style="margin-left: 40px;">&hellip;</p>
<p style="margin-left: 40px;">[Company].com is not always good at providing access to audience-relevant information. For example teachers may be totally unaware of the fact that there are classroom videos and teaching aids available in the Library section.</p>
<p>By arming the client with such information, you give them more well-structured ideas about how to improve their website. And that, after all, is the goal of our work.</p>
<p><b>Endnotes</b><br />
[1] Fox, Chiara, &ldquo;Re-architecting PeopleSoft.com from the bottom-up&rdquo;:http://www.boxesandarrows.com/view/re_architecting_peoplesoft_com_from_the_bottom_up in Boxesandarrows.com, June 16, 2002.<br />
[2] Pirolli, Peter, Stuart K. Card and Mija M. Van Der Wege, &ldquo;The Effect of Information Scent on Searching Information Visualizations of Large Tree Structures&rdquo;:http://www.inxight.com/pdfs/info_scent.pdf.<br />
[3] Nielsen, Jakob, &ldquo;Information Foraging: Why Google Makes People Leave Your Site Faster&rdquo;:http://www.useit.com/alertbox/20030630.html.</p>
]]></content:encoded>
			<wfw:commentRss>http://boxesandarrows.com/content-analysis-heuristics/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Controlled Vocabularies: A Glosso-Thesaurus</title>
		<link>http://boxesandarrows.com/controlled-vocabularies-a-glosso-thesaurus/</link>
		<comments>http://boxesandarrows.com/controlled-vocabularies-a-glosso-thesaurus/#comments</comments>
		<pubDate>Mon, 27 Oct 2003 19:38:40 +0000</pubDate>
		<dc:creator>Fred Leise</dc:creator>
				<category><![CDATA[Methods]]></category>
		<category><![CDATA[Process and Methods]]></category>
		<category><![CDATA[Special topic: Search and Metadata]]></category>

		<guid isPermaLink="false">http://boxesandarrows.com/controlled-vocabularies-a-glosso-thesaurus/</guid>
		<description><![CDATA[In part 4 of the continuing series on controlled vocabularies and faceted classification, the authors present a glossary of terms to help cut through through the verbiage often found in this field. And this glossary is more than just a list of terms. The glossary is itself a controlled vocabulary.]]></description>
				<content:encoded><![CDATA[<pullquote>&#8220;There is a singular lack of vocabulary control in the field of controlled vocabularies.&#8221;<br /><i>&#8212; Bella Hass Weinberg</i></pullquote><i>This is part 4 in our continuing series on controlled vocabularies and faceted classification. Previous parts in the series include:</p>
<p><a href="http://www.boxesandarrows.com/view/all_about_facets_controlled_vocabularies">All About Facets and Controlled Vocabularies</a> (series introduction)<br />1. <a href="http://www.boxesandarrows.com/view/what_is_a_controlled_vocabulary_">What is a Controlled Vocabulary?</a><br />2. <a href="http://www.boxesandarrows.com/archives/creating_a_controlled_vocabulary.php">Creating a Controlled Vocabulary</a><br />3. <a href="http://www.boxesandarrows.com/archives/synonym_rings_and_authority_files.php">Synonym Rings and Authority Files</a></i></p>
<p><span class="subhead">Introduction</span></p>
<p>&#8220;There is a singular lack of vocabulary control in the field of controlled vocabularies,&#8221; Bella Hass Weinberg, professor of library science at St. John&#8217;s University in New York, is fond of saying.</p>
<p>To help you cut through the maze of verbiage often found in this field, we have created a glossary of terms.</p>
<p>The glossary reflects our usage of terms in the articles of this series. But this glossary is more than just a list of terms. We wanted it to serve as an illustration of what a controlled vocabulary looks like (we are fond of killing multiple birds with multiple stones).</p>
<p>Accordingly, the glossary is itself a controlled vocabulary, more specifically a thesaurus. So you will find all of the standard features of any thesaurus: broader, narrower, and variant term indicators, as well as scope notes. In this case, however, the scope notes provide the definition of the particular glossary term being presented.</p>
<p><span class="subhead">Glosso-Thesaurus</span></p>
<p>The following standard abbreviations are used in the glosso-thesaurus.</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>BT</b> = Broader Term <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>NT</b> = Narrower Term <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>RT</b> = Related Term (&#8220;See also&#8221;) <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>SN</b> = Scope Note <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>UF</b> = Used For <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b><span class="caps"><span class="caps"><span class="caps"><span class="caps">USE</span></span></span></span></b> = &#8220;See&#8221; (Refers reader from variant term to vocabulary term.)</p>
<hr />
<p><a name="alternateterm"></a><b>Alternate Term</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">USE</span></span></span></span> <a href="#variantterm">Variant Term</a></ul>
<hr />
<p><a name="associativerelationship"></a><b>Associative Relationship</b></p>
<ul>SN The connection between related <a href="#vocabularyterm">vocabulary terms</a>. That is, related terms are connected through an associative relationship.</p>
<p>BT <a href="#termrelationship">Term Relationship</a><br />RT <a href="#equivalencerelationship">Equivalence Relationship</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#hierarchicalrelationship">Hierarchical Relationship</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#relatedterm">Related Term</a></ul>
</p>
<hr />
<p><a name="authorityfile"></a><b>Authority File</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">SN A</span></span></span></span> flat (non-hierarchical) list containing <a href="#preferredterm">preferred terms</a>. May include <a href="#variantterm">variant terms</a>. Essentially, an authority file is a <a href="#synonymring">synonym ring</a> with the preferred term identified for each concept.</p>
<p>BT <a href="#controlledvocabulary">Controlled Vocabulary</a><br />RT <a href="#synonymequivalencelist">Synonym Equivalence List</a></ul>
</p>
<hr />
<p><a name="broaderterm"></a><b>Broader Term</b></p>
<ul>SN The superordinate word in an inclusion or <a href="#hierarchicalrelationship">hierarchical relationship</a>. A class or category term. Abbreviated in displays as &#8220;BT.&#8221; The inversion of broader term is <a href="#narrowerterm">narrower term</a>. For example, &#8220;shoe&#8221; is a broader term than &#8220;running shoe.&#8221; Broader terms are sometimes referred to as &#8220;parent&#8221; terms. </p>
<p>UF Parent Term <br />RT <a href="#hierarchicalrelationship">Hierarchical Relationship</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#narrowerterm">Narrower Term</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#relatedterm">Related Term</a></ul>
</p>
<hr />
<p><a name="cardsorting"></a><b>Card Sorting</b></p>
<ul>SN An exercise that can be used to help create a <a href="#controlledvocabulary">controlled vocabulary</a>. In a card sort, users are asked to group cards into like categories or to name categories of like items. Card sorting can be used to compile lists of variant terms or to verify the relationships in a hierarchy. For additional information, see <a href="http://www.boxesandarrows.com/archives/cardbased_classification_evaluation.php">Card-Based Classification Evaluation</a> by Donna Maurer or the <a href="http://www.iawiki.net/CardSorting">IAWiki page on card sorting</a>. </p>
<p>RT <a href="#freelisting">Free Listing</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#hierarchy">Hierarchy</a></ul>
</p>
<hr />
<p><a name="childterm"></a><b>Child Term</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">USE</span></span></span></span> <a href="#narrowerterm">Narrower Term</a></ul>
<hr />
<p><a name="controlledvocabulary"></a><b>Controlled Vocabulary</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">SN A</span></span></span></span> subset of <a href="#naturallanguage">natural language</a> that is used to tag documents and then to find content through navigation or search. Use of a controlled vocabulary increases consistency in tagging and can help match users&#8217; natural language with <a href="#preferredterm">preferred terms</a>. Abbreviated as &#8220;CV.&#8221; </p>
<p>Controlled vocabularies exhibit the following relationships:</p>
<p><a href="#synonymring">Synonym ring</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+ <a href="#preferredterm">Preferred terms</a> =<br /><a href="#authorityfile">Authority file</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+ <a href="#broaderterm">Broader</a> and <a href="#narrowerterm">narrower terms</a> =<br /><a href ="#taxonomy">Taxonomy</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+ <a href="#relatedterm">Related terms</a> =<br /><a href="#thesaurus">Thesaurus</a></p>
<p>NT <a href="#authorityfile">Authority File</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#facetedclassification">Faceted Classification</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#synonymequivalencelist">Synonym Equivalence List</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#synonymring">Synonym Ring</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#taxonomy">Taxonomy</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#thesaurus">Thesaurus</a></p>
<p>RT <a href="#naturallanguage">Natural Language</a></ul>
</p>
<hr />
<p><a name="entryterm"></a><b>Entry Term</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">USE</span></span></span></span> <a href="#variantterm">Variant Term</a></ul>
<hr />
<p><a name="equivalencerelationship"></a><b>Equivalence Relationship</b></p>
<ul>SN The connection between terms in a <a href="#synonymring">synonym ring</a>, or between <a href="#preferredterm">preferred terms</a> and <a href="#variantterm">variant terms</a>. Terms that exhibit an equivalence relationship refer to the same concept. For example, &#8220;cat&#8221; and &#8220;feline&#8221; are often considered as being equivalent. </p>
<p>BT <a href="#termrelationship">Term Relationship</a> <br />RT <a href="#associativerelationship">Associative Relationship</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#hierarchicalrelationship">Hierarchical Relationship</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#variantterm">Variant Term</a></ul>
</p>
<hr />
<p><a name="exhaustivity"></a><b>Exhaustivity</b></p>
<ul>SN The range of concept coverage of <a href="#vocabularyterm">vocabulary terms</a> in a <a href="#controlledvocabulary">controlled vocabulary</a>. If the vocabulary terms cover all of the concepts included in the content under consideration, then the controlled vocabulary is exhaustive. </p>
<p>RT <a href="#specificity">Specificity</a></ul>
</p>
<hr />
<p><a name="facet"></a><b>Facet</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">SN A</span></span></span></span> fundamental category by which an object or concept may be described. For example, a child&#8217;s ball may be described using the facets of size, weight, shape, color, texture, material and price. </p>
<p>RT <a href="#facetanalysis">Facet Analysis</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#facetedclassification">Faceted Classification</a></ul>
</p>
<hr />
<p><a name="facetanalysis"></a><b>Facet Analysis</b></p>
<ul>SN The process of analyzing content to determine appropriate <a href="#facet">facets</a> and <a href="#vocabularyterm">vocabulary term</a> <a href="#termrelationship">relationships</a>, using &#8220;one characteristic of division at a time, to produce homogeneous, mutually-exclusive groups.&#8221; * </p>
<p>RT <a href="#facet">Facet</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#facetedclassification">Faceted Classification</a></p>
<p>* Aitchison, Jean, Alan Gilchrist, and David Bawden (2002). Thesaurus Construction and Use: A Practical Manual. 4th ed. Chicago: Fitzroy-Dearborn, pg. 70.</ul>
</p>
<hr />
<p><a name="facetedclassification"></a><b>Faceted Classification</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">SN A</span></span></span></span> <a href="#controlledvocabulary">controlled vocabulary</a> that divides <a href="#vocabularyterm">vocabulary terms</a> into <a href="#facet">facets</a>. </p>
<p>BT <a href="#controlledvocabulary">Controlled Vocabulary</a></ul>
</p>
<hr />
<p><a name="freelisting"></a><b>Free Listing</b></p>
<ul>A method of vocabulary development in which users are asked to &#8220;name all the [x] you know.&#8221; Free listing can identify core terms in a <a href="#controlledvocabulary">controlled vocabulary</a>, as well as <a href="#variantterm">variant terms</a>. For additional information, see <a href="http://www.boxesandarrows.com/archives/beyond_cardsorting_freelisting_methods_to_explore_user_categorizations.php">Beyond cardsorting: Free-listing methods to explore user categorizations</a> by Rashmi Sinha.</p>
<p>RT <a href="#cardsorting">Card Sorting</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#userwarrant">User Warrant</a></ul>
</p>
<hr />
<p><a name="granularity"></a><b>Granularity</b></p>
<ul>SN The level of <a href="#specificity">specificity</a> with which content is described. The more granular, the more specific. </p>
<p>RT <a href="#specificity">Specificity</a></ul>
</p>
<hr />
<p><a name="hierarchy"></a><b>Hierarchy</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">SN A</span></span></span></span> collection of <a href="#vocabularyterm">vocabulary terms</a> that show levels of superordination and subordination. Hierarchies comprise <a href="#broaderterm">broader terms</a> and <a href="#narrowerterm">narrower terms</a>. Hierarchies may be testing using <a href="#cardsorting">card sorting</a>. </p>
<p>RT <a href="#cardsorting">Card Sorting</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#polyhierarchy">Polyhierarchy</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#taxonomy">Taxonomy</a></ul>
</p>
<hr />
<p><a name="hierarchicalrelationship"></a><b>Hierarchical Relationship</b></p>
<ul>SN The connection between <a href="#broaderterm">broader</a> and <a href="#narrowerterm">narrower terms</a> in a <a href="#taxonomy">taxonomy</a> or <a href="#thesaurus">thesaurus</a>. </p>
<p>BT <a href="#termrelationship">Term Relationship</a> <br />RT <a href="#associativerelationship">Associative Relationship</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#broaderterm">Broader Term</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#equivalencerelationship">Equivalence Relationship</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#narrowerterm">Narrower Term</a></ul>
</p>
<p><HR></p>
<p><a name="literarywarrant"></a><b>Literary Warrant</b></p>
<ul>SN The inclusion of a <a href="#vocabularyterm">vocabulary term</a> in a <a href="#controlledvocabulary">controlled vocabulary</a> based on its appearance in one or more content items. For example, a medical text may use the term &#8220;oncology.&#8221; Based on literary warrant, that term would be included in the controlled vocabulary even though the general public uses the term &#8220;cancer.&#8221; </p>
<p>RT <a href="#userwarrant">User Warrant</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#vocabularyterm">Vocabulary Term</a></ul>
</p>
<hr />
<p><a name="narrowerterm"></a><b>Narrower Term</b></p>
<ul>SN The subordinate word in an inclusion or <a href="#hierarchicalrelationship">hierarchical relationship</a>. A member or part. Abbreviated in displays at &#8220;NT.&#8221; For example, &#8220;running shoe&#8221; is a narrower term than &#8220;shoe.&#8221; Narrower terms are sometimes referred to as &#8220;child&#8221; terms.The inversion of narrower term is <a href="#broaderterm">broader term</a>. </p>
<p>UF Child Term <br />RT <a href="#broaderterm">Broader Term</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#hierarchicalrelationship">Hierarchical Relationship</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#relatedterm">Related Term</a></ul>
</p>
<hr />
<p><a name="naturallanguage"></a><b>Natural Language</b></p>
<ul>SN Language as it is spoken; language in everyday use. </p>
<p>RT <a href="#controlledvocabulary">Controlled Vocabulary</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#userwarrant">User Warrant</a></p>
<hr />
<p><a name="nonpreferred"></a><b>Non-preferred Term</b></p>
<p><span class="caps"><span class="caps"><span class="caps"><span class="caps">USE</span></span></span></span> <a href="#variantterm">Variant Term</a></ul>
</p>
<hr />
<p><a name="parentterm"></a><b>Parent Term</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">USE</span></span></span></span> <a href="#broaderterm">Broader Term</a></ul>
<hr />
<p><a name="polyhierarchy"></a><b>Polyhierarchy</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">SN A</span></span></span></span> <a href="#hierarchy">hierarchy</a> in which some <a href="#vocabularyterm">vocabulary terms</a> have more than one <a href="#broaderterm">broader term</a>. For example, &#8220;Rome&#8221; might be a narrower term under both &#8220;European capitals&#8221; and &#8220;Italian cities&#8221; in a geographic vocabulary. </p>
<p>RT <a href="#hierarchy">Hierarchy</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#taxonomy">Taxonomy</a></ul>
</p>
<hr />
<p><a name="precision"></a><b>Precision</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">SN A</span></span></span></span> ratio that measures the success of a search. Precision is defined mathematically as the number of relevant items returned by a search divided by the total number of items returned by the search. Thus, a search that returned only relevant items would have a precision of 1.0. </p>
<p>Precision usually has an inverse relationship to <a href="#recall">recall</a>. That is, increasing the precision of a search usually decreases the recall. Precision can be increased by increasing the <a href="#specificity">specificity</a> of <a href="#vocabularyterm">vocabulary terms</a>. For more information, see:</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="http://www.iawiki.net/action=browse&#38;id=RecallVsPrecision&#38;oldid=PrecisionVsRecall">IAWiki: &#8220;Recall vs. Precision&#8221;</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="http://www.tbray.org/ongoing/When/200x/2003/06/22/PandR">Ongoing: &#8220;On Search: Precision and Recall&#8221;</a></p>
<p>RT <a href="#recall">Recall</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#specificity">Specificity</a></ul>
</p>
<hr />
<p><a name="preferredterm"></a><b>Preferred Term</b></p>
<ul>SN The <a href="#vocabularyterm">vocabulary term</a> in a <a href="#controlledvocabulary">controlled vocabulary</a> used to tag content. </p>
<p>RT <a href="#broaderterm">Broader Term</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#narrowerterm">Narrower Term</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#variantterm">Variant Term</a></ul>
</p>
<hr />
<p><a name="recall"></a><b>Recall</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">SN A</span></span></span></span> ratio that measures the success of a search. Recall is defined mathematically as the number of relevant items returned by a search divided by the total number of relevant items in the collection. Thus, a search that returned all the relevant items in a collection would have a recall of 1.0. </p>
<p>Recall can be increased by the use of <a href="#synonymring">synonym rings</a> and <a href="#variantterm">variant terms</a>. Recall usually has an inverse relationship to <a href="#precision">precision</a>. That is, increasing the recall of a search usually decreases the precision. For more information, see:</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="http://www.iawiki.net/action=browse&#38;id=RecallVsPrecision&#38;oldid=PrecisionVsRecall">IAWiki: &#8220;Recall vs. Precision&#8221;</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="http://www.tbray.org/ongoing/When/200x/2003/06/22/PandR">Ongoing: &#8220;On Search: Precision and Recall&#8221;</a></p>
<p>RT <a href="#precision">Precision</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#variantterm">Variant Terms</a></ul>
</p>
<hr />
<p><a name="relatedterm"></a><b>Related Term</b></p>
<ul>SN <a href="#vocabularyterm">Vocabulary terms</a> in a <a href="#controlledvocabulary">controlled vocabulary</a> that are closely related. That is, they refer to closely related concepts. Abbreviated in displays as &#8220;RT.&#8221; Related terms may, for example, exhibit the following relationships: </p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;field of study/objects studied <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;operation/agent <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;action/product of action <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;concepts/properties <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;agent/counter-agent <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;concept/opposite</p>
<p>For additional information, see the section in associative relationships in <a href="http://www.boxesandarrows.com/archives/what_is_a_controlled_vocabulary.php">What is A Controlled Vocabulary?</a> by Karl Fast, Mike Steckel and Fred Leise.</p>
<p>UF &#8220;See Also&#8221; Term <br />RT <a href="#associativerelationship">Associative Relationship</a></ul>
</p>
<hr />
<p><a name="scopenote"></a><b>Scope Note</b></p>
<ul>SN (1) A definition of a <a href="#preferredterm">preferred term</a> in a <a href="#controlledvocabulary">controlled vocabulary</a>. (2) An indication of restrictions in meaning or other clarification needed for the proper use of the preferred term. Abbreviated in displays as &#8220;SN.&#8221; Examples of scope notes are provided throughout this glossary. </p>
<p>RT <a href="#preferredterm">Preferred Term</a></ul>
</p>
<hr />
<p><a name="seealso"><b>&#8220;See Also&#8221; Term</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">USE</span></span></span></span> <a href="#relatedterm">Related Term</a></ul>
<hr />
<p><a name="specificity"></a><b>Specificity</b></p>
<ul>SN The exactness with which a <a href="#vocabularyterm">vocabulary term</a> covers a concept. Thus, in considering the concept &#8220;dog,&#8221; the term &#8220;canine&#8221; is more specific than &#8220;animal.&#8221; Increasing specificity of vocabulary terms increases <a href="#precision">precision</a> and <a href="#granularity">granularity</a>, but may decrease <a href="#recall">recall</a>. </p>
<p>RT <a href="#exhaustivity">Exhaustivity</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#granularity">Granularity</a></ul>
</p>
<hr />
<p><a name="synonymequivalencelist"></a><b>Synonym Equivalence List</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">SN A</span></span></span></span> <a href="#synonymring">synonym ring</a> or an <a href="#authorityfile">authority file</a>.</p>
<p>BT <a href="#controlledvocabulary">Controlled Vocabulary</a> <br />RT <a href="#authorityfile">Authority File</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#synonymring">Synonym Ring</a></ul>
</p>
<hr />
<p><a name="synonymring"></a><b>Synonym Ring</b></p>
<ul>SN One of the simplest of <a href="#controlledvocabulary"> controlled vocabularies</a>. Includes only a list of <a href="#equivilancerelationship">equivalent</a> terms. When one of the terms is searched, the synonym ring returns results as if the complete set of terms was searched. </p>
<p>BT <a href="#controlledvocabulary">Controlled Vocabulary</a><br />RT <a href="#equivalencerelationship">Equivalence Relationship</a><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#synonymequivalencelist">Synonym Equivalence List</a></ul>
</p>
<hr />
<p><a name="taxonomy"></a><b>Taxonomy</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">SN A</span></span></span></span> <a href="#controlledvocabulary">controlled vocabulary</a>, the <a href="#preferredterm">preferred terms</a> of which are all connected in a <a href="#hierarchy">hierarchy</a> or <a href="#polyhierarchy">polyhierarchy</a>. Terms in a taxonomy may exhibit <a href="#equivalencerelationship">equivalence</a> or <a href="#hierarchicalrelationship">hierarchical relationships</a>. </p>
<p>BT <a href="#controlledvocabulary">Controlled Vocabulary</a> <br />RT <a href="#hierarchicalrelationship">Hierarchical Relationship</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#hierarchy">Hierarchy</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#polyhierarchy">Polyhierarchy</a></ul>
</p>
<hr />
<p><a name="term"></a><b>Term</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">USE</span></span></span></span> <a href="#vocabularyterm">Vocabulary Term</a></ul>
<hr />
<p><a name="termrelationship"></a><b>Term Relationship</b></p>
<ul>SN The type of association between <a href="#vocabularyterm">vocabulary terms</a>. Terms may be broader, narrower, related or variant, exhibiting <a href="#hierarchicalrelationship">hierarchical</a>, <a href="#associativerelationship">associative</a> or <a href="#equivalencerelationship">equivalence relationships</a>. </p>
<p>NT <a href="#associativerelationship">Associative Relationship</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#equivalencerelationship">Equivalence Relationship</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#hierarchicalrelationship">Hierarchical Relationship</a> <br />RT <a href="#broaderterm">Broader Term</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#narrowerterm">Narrower Term</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#relatedterm">Related Term</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#variantterm">Variant Term</a></ul>
</p>
<hr />
<p><a name="thesaurus"></a><b>Thesaurus</b>; pl. Thesauri</p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">SN A</span></span></span></span> <a href="#controlledvocabulary">controlled vocabulary</a> that indicates <a href="#preferredterm">preferred terms</a> and <a href="#variantterm">variant terms</a>. In addition to the <a href="#equivalencerelationship">equivalence relationship</a>, <a href="#vocabularyterm">vocabulary terms</a> in a thesaurus exhibit both <a href="#hierarchicalrelationship">hierarchical</a> and <a href="#associativerelationship">associative relationships</a>. These three relationships are called &#8220;standard thesaural relationships.&#8221; Thesauri are usually considered the most complex of controlled vocabularies. </p>
<p>BT <a href="#controlledvocabulary">Controlled Vocabulary</a> <br />RT <a href="#associativerelationship">Associative Relationship</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#equivalencerelationship">Equivalence Relationship</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#hierarchicalrelationship">Hierarchical Relationship</a></ul>
</p>
<hr />
<p><a name="userwarrant"></a><b>User Warrant</b></p>
<ul>SN The inclusion of a <a href="#vocabularyterm">vocabulary term</a> in a <a href="#controlledvocabulary">controlled vocabulary</a> based on use by users. Such terms can be identified through search log analysis or <a href="#freelisting">free listing</a>. </p>
<p>RT <a href="#freelisting">Free Listing</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#literarywarrant">Literary Warrant</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#vocabularyterm">Vocabulary Term</a></ul>
</p>
<hr />
<p><a name="variantterm"></a><b>Variant Term</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">SN A</span></span></span></span> <a href="#vocabularyterm">vocabulary term</a> that means nearly the same thing as a <a href="#preferredterm">preferred term</a>. Variant terms are used in the <a href="#controlledvocabulary">controlled vocabulary</a> to provide entry terms that lead to preferred terms. Variant terms may include synonyms, lexical variants, quasi-synonyms and abbreviations. Variant terms are sometimes referred to as &#8220;entry terms.&#8221; The collection of all variant terms may be referred to as the &#8220;entry vocabulary.&#8221; </p>
<p>UF Alternate Term <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Entry Term <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Non-preferred Term <br />RT <a href="#equivalencerelationship">Equivalence Relationship</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#preferredterm">Preferred Term</a></ul>
</p>
<hr />
<p><a name="vocabularyterm"></a><b>Vocabulary Term</b></p>
<ul><span class="caps"><span class="caps"><span class="caps"><span class="caps">SN A</span></span></span></span> word or phrase in a <a href="#controlledvocabulary">controlled vocabulary</a>. It may be a <a href="#preferredterm">preferred term</a> or <a href="#variantterm">variant term</a>. Vocabulary terms may exhibit several types of <a href="#termrelationship">term relationships</a>. </p>
<p>UF Term <br />NT <a href="#preferredterm">Preferred Term</a> <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#variantterm">Variant Term</a></ul>
<p><end></end>
<p><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/art_end.gif" alt="" title="" width="8" height="8" /></p>
<p><morebox><br /><b>References</b>
<ul>
<li><a href="http://www.boxesandarrows.com/archives/all_about_facets_controlled_vocabularies.php">All About Facets and Controlled Vocabularies</a> (series introduction)</li>
<li><a href="http://www.boxesandarrows.com/archives/what_is_a_controlled_vocabulary.php">What is a Controlled Vocabulary?</a></li>
<li><a href="http://www.boxesandarrows.com/archives/creating_a_controlled_vocabulary.php">Creating a Controlled Vocabulary</a></li>
<li><a href="http://www.boxesandarrows.com/archives/synonym_rings_and_authority_files.php">Synonym Rings and Authority Files</a></li>
</ul>
<p><b>Bibliography</b>
<ul>
<li>Pidcock, Woody (2003). <a href="http://www.metamodel.com/article.php?story=20030115211223271">&#8220;What are the differences between a vocabulary, a taxonomy, a thesaurus, an ontology, and a meta-model?&#8221;</a>.</li>
<li>Hagedorn, Kat (2000). <a href="http://argus-acia.com/white_papers/ia_glossary.pdf">The Information Architecture Glossary. Ann Arbor, MI: Argus Center for Information Architecture</a>.</li>
<li>IA Wiki (2003). <a href="http://www.iawiki.net/IAGlossary">IA Glossary</a>.</li>
<p></morebox></p>
<p><biobox><a href="http://www.boxesandarrows.com/people/archives/karl_fast.php">Karl Fast</a> is a PhD student in library and information science at the University of Western Ontario. He also has a master&#8217;s in <span class="caps"><span class="caps"><span class="caps"><span class="caps">LIS</span></span></span></span>. His graduate work has included courses on organization of information, subject analysis, thesaurus construction, and facet analysis.</p>
<p><a href="http://www.boxesandarrows.com/people/archives/fred_leise.php">Fred Leise,</a> president of <a href="http://www.contextualanalysis.com">ContextualAnalysis, <span class="caps"><span class="caps"><span class="caps"><span class="caps">LLC</span></span></span></span>,</a> is an information architecture consultant providing services in the areas of content analysis and organization, user experience design, taxonomy and thesaurus creation, and website and back-of-book indexing.</p>
<p><a href="http://www.boxesandarrows.com/people/archives/mike_steckel.php">Mike Steckel</a> is an Information Architect/Technical Librarian for International <span class="caps"><span class="caps"><span class="caps"><span class="caps">SEMATECH</span></span></span></span> in Austin, TX. </biobox></p>
]]></content:encoded>
			<wfw:commentRss>http://boxesandarrows.com/controlled-vocabularies-a-glosso-thesaurus/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Synonym Rings and Authority Files</title>
		<link>http://boxesandarrows.com/synonym-rings-and-authority-files/</link>
		<comments>http://boxesandarrows.com/synonym-rings-and-authority-files/#comments</comments>
		<pubDate>Tue, 26 Aug 2003 21:08:53 +0000</pubDate>
		<dc:creator>Fred Leise</dc:creator>
				<category><![CDATA[Findability]]></category>
		<category><![CDATA[Methods]]></category>
		<category><![CDATA[Process and Methods]]></category>
		<category><![CDATA[Special topic: Search and Metadata]]></category>

		<guid isPermaLink="false">http://boxesandarrows.com/synonym-rings-and-authority-files/</guid>
		<description><![CDATA[In part 3 of the continuing series on controlled vocabularies and faceted classification, the authors explain synonym rings and authority files and how their use can bridge the gap between natural language and complex controlled vocabularies (taxonomies and thesauri).]]></description>
				<content:encoded><![CDATA[<pullquote>
<p>&#8220;Synonym rings and authority files are simple tools that can bridge the gap between natural language and complex controlled vocabularies (taxonomies and thesauri) quite nicely.&#8221;</p>
</pullquote>
<p><em>This is Part 3 in our continuing series on controlled vocabularies and faceted classification. Previous parts in the series include:</em></p>
<ul class="nobullets">
<li><a href="http://www.boxesandarrows.com/view/all_about_facets_amp_controlled_vocabularies">All About Facets and Controlled Vocabularies</a> (series introduction)</li>
<li>1. <a href="http://www.boxesandarrows.com/view/what_is_a_controlled_vocabulary_">What is a Controlled Vocabulary?</a></li>
<li>2. <a href="http://www.boxesandarrows.com/view/creating_a_controlled_vocabulary">Creating a Controlled Vocabulary</a></li>
</ul>
<p>As any connoisseur of duct tape knows, when you need to get a job done, the simplest tool is often your best friend. This is as true for controlled vocabularies (CVs) as it is for home repair. Remember that <a href="http://www.boxesandarrows.com/archives/what_is_a_controlled_vocabulary.php">our goal for CVs</a> is to &#8220;impose some order to facilitate agreement between the concepts within the site and the vocabulary of the person [natural language] using it.&#8221;</p>
<p>But that doesn&#8217;t mean the CV has to be complicated. Resources do not always allow for a full-fledged thesaurus, and often such a large undertaking is not necessary. Synonym rings and authority files are simple tools that can bridge the gap between natural language and complex controlled vocabularies (taxonomies and thesauri) quite nicely. We can explain how synonym rings work by way of an example. </p>
<p>International SEMATECH, the semiconductor research consortium, had a searching problem. Documents were uploaded to a private research website in a highly decentralized manner. Member company employees from all over the world had the ability to upload their own research documents and meeting presentations to the website. </p>
<p>A look at the search logs, however, revealed that people entered search terms that were yielding only a percentage of the documents they were trying to find. The problem was consistency of terminology. A review of the metadata found that those uploading information were equally as likely to call silicon &#8220;Si&#8221; as they were to spell out the whole name, &#8220;silicon.&#8221; There were many similar examples. Besides chemical symbols, users were both searching and uploading documents with acronyms (&#8220;PSM&#8221; vs. &#8220;Phase Shift Mask&#8221;) and simple variants in spelling (&#8220;low K dielectrics&#8221; vs. &#8220;low-K dielectrics&#8221; vs. &#8220;lowk dielectrics&#8221;). </p>
<p>The way the system previously worked, a user who searched for &#8220;Si,&#8221; &#8220;PSM,&#8221; or &#8220;low K dielectrics&#8221; would get only exact matches. In other words, they would miss documents that had &#8220;Silicon,&#8221; &#8220;Phase Shift Mask,&#8221; or &#8220;low-K dielectrics&#8221; in their metadata. Furthermore, they would get enough hits so they might not have realized that some relevant documents were missing (if they had gotten zero hits, they might have suspected something was wrong and tried another term). </p>
<p>It was our assumption that when users searched one term, they intended to find the entire set of documents related to that concept. But trying to get such an organization to adopt a style guide for metadata was not viable. The solution was to install a synonym ring into our search engine, <a href="http://technet.oracle.com/products/text/content.html">Oracle Text.</a></p>
<h2>What the synonym ring does</h2>
<p>A synonym ring connects a series of terms together and treats them all as equivalent for search purposes. When a user enters &#8220;PSM,&#8221; for instance, the search term will be sent through the synonym ring to see if there are any equivalent terms. For &#8220;PSM&#8221; we would find &#8220;Phase Shift Mask&#8221; as a synonym. The search engine would then retrieve all documents with either &#8220;PSM&#8221; or &#8220;Phase Shift Mask&#8221; in their metadata. The searcher would get the complete set of relevant documents as though they had searched both terms (something few people would think to do). </p>
<p>If there is no match in the synonym list, the search is simply sent through the index as usual and any documents with &#8220;PSM&#8221; are returned. The synonym ring goes into effect only when there is a matching synonym for the term entered into the search box by the user.</p>
<p>Although getting a synonym ring up and running sounds pretty simple, the difficulties often come from trying to answer a simple question: &#8220;What is a synonym?&#8221; The example above was clear case of synonyms: An acronym and the full name of the object. It is not always this simple. A synonym can generally be two words with the exact or very similar meanings. Sounds simple, but how similar is similar enough?  True synonyms are a rare thing. </p>
<h2>What is a synonym?</h2>
<p>Some synonyms may appear to be pretty straightforward. These include:</p>
<ul>
<li>Acronyms:  BBC, British Broadcasting Company; MPG, miles per gallon</li>
<li>Variant spellings: cancelled, canceled; honor, honour</li>
<li>Scientific terms versus popular use terms: acetylsalicylic acid, aspirin; lilioceris, lily beetle</li>
</ul>
<p>But synonyms, in general, quickly become more difficult. Are &#8220;medicine&#8221; and &#8220;drugs&#8221; synonyms? Are &#8220;fired&#8221; and &#8220;laid off&#8221;? What about &#8220;forest&#8221; and &#8220;woods&#8221; or &#8220;arid&#8221; and &#8220;dry&#8221;? With these examples, it is more difficult to say for sure. To answer the question about whether two terms are synonyms, you often have to consider the overall content of your site, as well as the site&#8217;s context and its users.</p>
<p>In our <a href="http://www.boxesandarrows.com/archives/what_is_a_controlled_vocabulary.php">first article</a>,  we gave the following example of a synonym (which demonstrates the equivalence relationship): </p>
<p><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/cv_1.jpg" width="343" height="43" alt="Example of a preferred term" /></p>
<p>But one could easily argue that these are not true synonyms. You may be looking for information about Elizabeth Taylor only during the time she was married to Larry Fortensky. In this case &#8220;Elizabeth Fortensky&#8221; might be the only part of the ring you would be interested in. Expanding the results by including results for both &#8220;Elizabeth Warner&#8221; and &#8220;Elizabeth Fortensky&#8221; would reduce the precision of the search results.</p>
<p>When creating a synonym ring, or any controlled vocabulary, you will spend a lot of time evaluating near synonyms. What guidelines should you use for making these decisions?</p>
<h2>Recall and precision</h2>
<p>Information architecture in the real world is all about the tradeoffs, right? Librarians have long been aware of the tradeoffs one makes between a search system that is broad and one that is specific. A search system that is broad is one with high recall, while one that is very narrow is one with high precision. Let&#8217;s look at these two terms a little more closely.</p>
<p>Recall is often represented as a ratio: </p>
<p class="indent">number of retrieved relevant documents / all relevant documents in a collection</p>
<p>Recall measures how many of the relevant documents are returned to the user. When you are searching a system with high recall, you are able to get a comprehensive set of documents returned, but you increase the possibility that less relevant documents will also get returned. This is great when you want to look through a large number of documents to make sure you have seen everything on a certain topic. Techniques for increasing recall include a synonym ring, stemming (some search engines will automatically return &#8220;jumping&#8221; and &#8220;jumps&#8221; when someone searches &#8220;jump&#8221;), and wildcards.</p>
<p>Precision, like recall, is often represented as a ratio:</p>
<p class="indent">number of retrieved relevant documents / total number of documents retrieved</p>
<p>You want to return all relevant documents to each user. So why not return all documents in your system for every search? That way you can be sure that every single relevant document is returned to the user, right? Well, true, but you&#8217;re also returning many irrelevant documents at the same time, making it harder for users to find what they want. </p>
<p>Precision ensures that only the relevant documents are returned to the user. When you are searching a system with high precision, your results are specific to your search. This is closer to a known-item search. You want only relevant search results and are less tolerant of getting some irrelevant results mixed in.</p>
<p>You can increase search precision by using specific indexing terms (&#8220;Ferrari&#8221; and not &#8220;sports car&#8221;), little or no stemming, word proximity operators (how closely words appear next to each other), and search zones.</p>
<p>Measuring the recall and precision of a particular search engine can be <a href="http://www.tbray.org/ongoing/When/200x/2003/06/22/PandR">difficult</a>. Measuring recall and precision using hard numbers is questionable. Relevance is difficult to quantify since it is inconstant (even during the course of a single search, relevance may change) and subjective. </p>
<p>A better way to get a handle on precision and recall is to collect responses from your users. What do people complain about? Do they say, &#8220;I get too many results?&#8221; This really means, &#8220;I get too many irrelevant results&#8221; and is a sign your recall might be too high. Do people say &#8220;I know it is in there, but I can&#8217;t find it?&#8221; or &#8220;I get no hits for too many searches?&#8221; If so, you might have precision too high. Just remember, recall and precision are inversely related: as one goes up, the other goes down. You will need to strike a balance.</p>
<h2>Authority files</h2>
<p>So now that we know what a synonym ring is, we can define an authority file. An authority file is similar to the synonym ring, with the addition of one type of term relationship. Instead of all of the terms being equal, one term is identified as the preferred term and the others are considered variant terms.</p>
<p>Authority files help with tagging content consistently. Catalogers for large library collections have long used authority files to find approved terms for describing an item. When they get a book about the Italian city of Firenze and another one about the Italian city of Florence, they use one of the names (based on prescribed rules) and describe all books in the collection about the city using a single, consistent term. </p>
<p>Similarly, in most major academic libraries, all books about &#8220;Native Americans&#8221; and  &#8220;American Indians&#8221; are described with the term &#8220;Indians of North America.&#8221; When someone performs a subject search on &#8220;Native Americans&#8221; they get a note that says something like &#8220;This term is indexed as INDIANS OF NORTH AMERICA.&#8221; The authority file is the place you go find which term is the heading (the main term) and which term is the cross reference (the variant term). </p>
<p>A more typical example on a website might work like this: Let&#8217;s say you have a website devoted to comic books. It would be great if when someone typed &#8220;Caped Crusader&#8221; or &#8220;Dark Knight&#8221; into the search box, they got results for &#8220;Batman.&#8221; In this example, &#8220;Batman&#8221; and &#8220;Caped Crusader&#8221; would not be considered equivalent terms; the authority file would explain their relationship. You would not want to identify each Batman comic book with all three terms, just the main term. But when a user entered &#8220;Caped Crusader,&#8221; you would want the system to convert their term to &#8220;Batman&#8221; and return the appropriate results.</p>
<p>The relationships among the terms could be expressed like this:</p>
<p><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/synonym_rings_and_authority_files/steckel_082503_2.gif" width="292" height="213" alt="Example of an authority file relationship" /></p>
<p>Or in the language of a controlled vocabulary, like this:</p>
<div class="indent">
<p>Batman<br />USE FOR: Dark Knight, Caped Crusader</p>
<p>Caped Crusader<br />USE Batman</p>
<p>Dark Knight<br />USE Batman</p>
</div>
<p>Another way that people use authority files is to reinforce a correct term and to discourage an incorrect term. The Polar Bear Book uses the example of how drugstore.com corrects the spelling of Tylenol using an authority file. If you enter &#8220;tilenol&#8221; into the site&#8217;s search box, you get the results for &#8220;Tylenol.&#8221; Users will see the correct spelling prominently displayed, which will remind them how the word is really spelled. Maybe they will remember the correct spelling in the future.</p>
<h2>Guidelines for implementation</h2>
<p>When putting a synonym ring or authority file in place, consider the following guidelines:</p>
<ul>
<li>
<p>Show users how their search term was changed or added to by the system and exactly what was searched. At International SEMATECH, a search for Silicon would look like this:</p>
<p><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/synonym_rings_and_authority_files/steckel_082503_3.gif" width="594" height="132" alt="Example showing the user exactly how the search was submitted." /></p>
<p>The line under the search box tells users exactly how the search was submitted. When users understand how their term is expanded to include synonyms, they have a better understanding of how the site works. When done well, explanations can also increase confidence that users have in the system, since it shows them that the system understands what they are looking for.</p>
</li>
<li>Keep the display simple. Include a search box at the top of the page so users can edit their terms if they see they have made a mistake. Try to follow the prescient words of the old poem:<br />
<blockquote><p>Give me a look, give me a[n inter] face,<br />That makes simplicity a grace;</p>
<p>&#8212; Ben Johnson (slightly modified) [<a href="http://www.bartleby.com/100/146.9.html">http://www.bartleby.com/100/146.9.html</a>]</p>
</blockquote>
</li>
<li>Try to characterize your content and the way your users understand it. At International SEMATECH, the majority of the synonym ring we use is made up of acronyms, since the scientific community seems to love creating and communicating with them. The content is also very narrow and scientific. There is not a great deal of the mushy language that comes from the general culture; most of it is very well defined. A general rule: The broader the content your site covers, the more you will find yourself dealing with near synonyms. Try to make similar evaluations of the content you are searching.</li>
<li>Review search logs every day to look for new terms and synonyms. Is someone looking for an acronym that is not on your list? Try to find out what it means and make sure the next person looking for it gets the correct results.</li>
</ul>
<h2>Conclusions</h2>
<p>Synonym rings and authority files are simple, common-sense ways to help users connect the various semantic concepts that are inherently intertwined with the term they choose. They are particularly good for large decentralized sites that are search dominant and have little centralized control over content. </p>
<p>Most of us know by now that users tend to use a small number of words for each search. They should not be forced to consider all the synonyms their search terms might have. <a href="http://www.tbray.org/ongoing/When/200x/2003/06/24/IntelligentSearch">Tim Bray</a> said it well: &#8220;If you need to know about cow farming, you&#8217;re probably also searching for cattle ranching, beef (or dairy) production, and Kuhbauernhof, whether you know it or not.&#8221;
<p><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/art_end.gif" alt="" title="" width="8" height="8" /></p>
<p><morebox></p>
<ul>
<li><a href="http://www.boxesandarrows.com/archives/all_about_facets_controlled_vocabularies.php">All About Facets and Controlled Vocabularies</a> (series introduction)</li>
<li><a href="http://www.boxesandarrows.com/archives/what_is_a_controlled_vocabulary.php">What is a Controlled Vocabulary?</a></li>
<li><a href="http://www.boxesandarrows.com/archives/creating_a_controlled_vocabulary.php">Creating a Controlled Vocabulary</a></li>
<li><a href="/files/banda/Bibliography.htm">An Annotated Bibliography</a></li>
</ul>
<p></morebox></p>
<p><biobox><a href="http://www.boxesandarrows.com/people/archives/karl_fast.php">Karl Fast</a> is a PhD student in library and information science at the University of Western Ontario. He also has a master&#8217;s in LIS. His graduate work has included courses on organization of information, subject analysis, thesaurus construction, and facet analysis.</p>
<p><a href="http://www.boxesandarrows.com/people/archives/fred_leise.php">Fred Leise,</a> president of <a href="http://www.contextualanalysis.com">ContextualAnalysis, LLC,</a> is an information architecture consultant providing services in the areas of content analysis and organization, user experience design, taxonomy and thesaurus creation, and website and back-of-book indexing.</p>
<p><a href="http://www.boxesandarrows.com/people/archives/mike_steckel.php">Mike Steckel</a> is an Information Architect/Technical Librarian for International SEMATECH in Austin, TX.</biobox></p>
]]></content:encoded>
			<wfw:commentRss>http://boxesandarrows.com/synonym-rings-and-authority-files/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Creating a Controlled Vocabulary</title>
		<link>http://boxesandarrows.com/creating-a-controlled-vocabulary/</link>
		<comments>http://boxesandarrows.com/creating-a-controlled-vocabulary/#comments</comments>
		<pubDate>Mon, 07 Apr 2003 22:43:18 +0000</pubDate>
		<dc:creator>Fred Leise</dc:creator>
				<category><![CDATA[Methods]]></category>
		<category><![CDATA[Process and Methods]]></category>
		<category><![CDATA[Special topic: Search and Metadata]]></category>

		<guid isPermaLink="false">http://boxesandarrows.com/creating-a-controlled-vocabulary/</guid>
		<description><![CDATA[You have probably heard IAs discussing the benefits of their latest taxonomy project and how you should be implementing one. But <i>how</i>, you might wonder, can you get started?  In the next installment about Controlled Vocabularies, our authors go into detail about one methodology.]]></description>
				<content:encoded><![CDATA[<pullquote>&#8220;Creating a clear plan early on can save you a lot of trouble down the road and minimize unwelcome surprises. The broad strokes of CV design are like any other type of design: planning and preparation are essential, fundamental steps in producing a good design.&#8221;</pullquote>You have probably heard IAs discussing the benefits of their latest taxonomy project and how you should be implementing one. But <i>how</i>, you might wonder, can you get started? </p>
<p>This article describes a process for building your own controlled vocabulary (CV). A <a href="http://www.boxesandarrows.com/archives/what_is_a_controlled_vocabulary.php">previous article</a> discussed the concept of a CV&#8212;the &#8220;what.&#8221; This article focuses on the &#8220;how.&#8221;</p>
<p>In this article we are looking at a process for creating any kind of controlled vocabulary. While our ultimate goal in this series is to <a href="http://www.boxesandarrows.com/archives/all_about_facets_controlled_vocabularies.php">explain facets</a>, the details of facet analysis will be described in a future article. At this point, we are still exploring fundamental concepts and techniques.</p>
<p>There are many ways to create a controlled vocabulary. What follows is just one methodology. Also, keep in mind that many of the steps described here are not discrete units. When you actually create a CV, some steps may overlap.</p>
<p>Now, let&#8217;s get started. Imagine we are a company that sells camping gear, and we want to create a controlled vocabulary for our ecommerce site. </p>
<p><span class="subhead">1. Develop a strategy. What do you want your controlled vocabulary to do?</span><br />
The natural inclination when developing a CV is to start by gathering potential terms. But first, you need to consider a wide range of questions. Creating a clear plan early on can save you a lot of trouble down the road and minimize unwelcome surprises. The broad strokes of CV design are like any other type of design: planning and preparation are essential, fundamental steps in producing a good design.</p>
<p>First, what kind of CV do you need? The answer depends on a variety of issues. Start by thinking about some general questions such as these:</p>
<ul>
<li>What do you want your CV to accomplish?</li>
<li>Do you want the CV to integrate with your navigation system?</li>
<li>Are you planning on using the CV to improve searching? To improve browsing? Both?
<li>Are you planning to show term relationships in your search results?</li>
<li>How much vocabulary control do you want to provide? Synonym ring? Facets? What level of vocabulary control is appropriate?</li>
</ul>
<p>Second, think about your dependencies:</p>
<ul>
<li><b>Content</b> &#8211; Consider this in two parts: specificity and stability. </p>
<p>Specificity: If you are selling camping gear, are you selling 7-10 styles of tent or 100 styles? If you are selling 100 styles, you will need terms that are more specific and more exhaustive. This is because you will need to further differentiate among tents that are similar. The more items that are similar, the more specific you need to be.</p>
<p>Stability: Do the concepts and names for them change often? Do people generally call the same concept (or item or product) by the same name? In our example, we would ask if there are a lot of variant terms for the kinds of items we&#8217;re selling. What will be your method for keeping up with changing terminology?</li>
<li><b>Technology</b> &#8211; There are two pieces to this one: tools and integration. Each will help you think about implementation early on.
<p>Tools: Think about where the CV will ultimately sit. Do you have a CMS that will be involved? Will you be uploading your CV into a search engine? What software will you use to hold your terms: a thesaurus maintenance program like <a href="http://www.multites.com/">Multites</a>, <a href="http://www.termtree.com.au/">Term Tree</a>, or <a href="http://www.lexico.com/">Lexico</a>? Or will you be creating it in Excel? Also consider tools you might use while gathering your terms. Many people collect their terms in a large Excel spreadsheet, others on Post-it Notes, sometimes even a wiki might work nicely.</p>
<p>Integration: How will your CV be integrated with the other pieces of your system? If the CV is going to be used in multiple applications, you need to consider the requirements of each. Be sure you talk to someone in IT and outline what your goals are.</li>
<li><b>Users</b>  &#8211; CV design is a user-centered process. You must <i>understand</i> the target audience before setting down your terms. Who is the target audience for the site? The general public? Experienced campers? Are they web-savvy? How do they shop? Do they tend to buy one item at a time or several items at once?  Do they need to do a lot of research before they buy? In other words, good, standard user-centered design methods, such as interviews and observation, are appropriate. </li>
<li><b>Maintenance</b> &#8211; Who from the organization will maintain your controlled vocabulary? What amount of time can they spend on this task? What is their training? If you decide to create a highly complex controlled vocabulary that your high school intern is going to maintain, you will have to provide additional training for that person.  This is also a user-centered design issue, but along a different axis. Above we talked about a process that is extroverted: it looks towards the external users of the system. Here our axis is introverted: it looks towards the internal users of the system, the creators and maintainers of the vocabulary.</li>
</ul>
<p>At this point, any normal person will say to himself, &#8220;Geez! Enough with the questions! Let me get on with creating my controlled vocabulary!&#8221; Resist this urge and stick with the discovery process; developing a strategy is important. You will probably change some of your answers as the project develops, but considering these questions up front will prevent you from wasting time later on. </p>
<p><span class="subhead">2. Start gathering terms. What are the terms used to describe your content?</span><br />
 Now you are ready to start gathering your terms. Your goal here, considering the constraints and strategies that came out of Step 1, is to identify the terms that will bring the most success to your user population, enabling them to find exactly the information they need.</p>
<p>This is where the process becomes a little bit like &#8220;The Newlywed Game.&#8221; In this TV game show, the contestants are newly married couples. While one half of the couple is in a soundproof room, the host asks the remaining partner some intimate questions (often about &#8220;making whoopee&#8221;). Later, they reunite the couple and ask the other partner the same questions to see how well their answers match up. The couple with the most matched answers wins the big prize. The underlying questions for this game include &#8220;How well do the two sides of this relationship know each other?&#8221; and &#8220;How well can one half of the couple guess the answer the other half will give?&#8221;</p>
<p>To win the big prize of increased content <a href="http://www.boxesandarrows.com//archives/the_age_of_findability.php">findability</a>, your site must describe your content in the terms that best match those terms the users are <i>likely</i> to use. When your partner (the user) comes out of that soundproof booth, you want to feel confident that you have provided the terms he will use on your site. </p>
<p>There are lots of great ways to get started with this process.</p>
<p><b>A. Look inward.</b> What are the terms you already use to describe items on your site? If you are selling something, what are you selling? Look at each item and start generating terms to describe the object. What are the concepts the terms cover? List them. If we were doing a thesaurus for camping gear, we might start with something like: backpacks, tents, bug spray, etc. Then consider alternative terms you might use for each item.</p>
<p>Consider the level of granularity you want to use to reach your target audience or need to use based on the number of similar items you sell. If the target audience for your camping gear CV is beginning campers, you might distinguish thick sleeping bags from your thinner options by making a distinction by season (as in &#8220;winter&#8221; and &#8220;summer&#8221; bags). However, if you are targeting expert campers, you may need to describe your bags as &#8220;2-season&#8221; or &#8220;3-season&#8221; bags, in terms of insulating material (goose down, Polarguard, PrimaLoft), or by the temperature ratings. You don&#8217;t need to describe the entire field of camping gear; you need only describe your content in terms that will resonate with your target audience.</p>
<p>There is a danger here, however. Don&#8217;t look inward and exclude the additional options for gathering terms described below. It is important to get outside of your own understanding of terms and their concepts. Be sure to follow the next steps as well.</p>
<p><b>B. Look outward.</b> Where are people using terms related to your content?  You might review competitors&#8217; sites, journals or magazines on your subject matter, or discussions by subject experts on the web. For example, if you are looking for terms about camping gear, you might look here:</p>
<p><a href="http://directory.google.com/Top/Shopping/Recreation/Outdoors/?il=1">http://directory.google.com/Top/Shopping/Recreation/Outdoors/?il=1</a></p>
<p>Look at the sites on the list and note how they describe items that you also sell. Are there relevant variant terms you didn&#8217;t include from the looking inward step?</p>
<p>Consider the differences between <a href="http://www.rei.com">REI</a> and <a href="http://www.mec.ca">MEC</a> (Mountain Equipment Co-op, a Canadian outdoor equipment store). Note the differences and similarities between the terms they use. In this example, we have shown only the terms for their top-level categories; you should dig deeper and find out what terms they use for sub-categories and individual items.</p>
<table cellpadding="5" align="left">
<tr>
<td valign=top><img alt="mec-categories.jpg" src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/creating_a_controlled_vocabulary/mec-categories.jpg" width="164" height="351" border="0" hspace="5"/></td>
<td valign=top><img alt="rei-categories.jpg" src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/creating_a_controlled_vocabulary/rei-categories.jpg" width="145" height="462" border="0" hspace="5" /></td>
</tr>
</table>
<p>Sometimes, someone may already have developed a similar controlled vocabulary that you can use or modify. When this happens, we recommend that you perform an exuberant dance of joy. This won&#8217;t work for our camping gear example, but if you are building a large controlled vocabulary on another topic, you might want to see if you could borrow from one of the controlled vocabularies here:</p>
<p>— <a href="http://www.asindexing.org/site/refbooks.shtml">American Society of Indexers</a><br />
— <a href="http://www.willpower.demon.co.uk/thesbibl.htm">Publications on thesaurus construction and use</a></p>
<p>More than likely, you will need to simplify anything you use from one of these lists, but they might be worth reviewing. Often, just the exercise of reviewing other CVs can be helpful in discovering ways to improve your own.</p>
<p>But be careful. Borrowing terms from other sites can muddy your own particular site&#8217;s strategy. Don&#8217;t borrow so much that your message gets confused or loses distinction.</p>
<p><b>C. Log files.</b> If you already offer search, an easy option is to review your log files. Log files are goldmines of valuable customer information. They will give you an idea of what people think they might find on your site, as well as the words they use to describe what they are looking for. If you can get the file to display search results (as in 8 hits, 0 hits, etc.), you can see how successful people are. Or, reproduce the searches yourself to determine if people are getting relevant hits. See how <a href="http://www.fastcompany.com/online/43/kozuh.html">Nordstrom&#8217;s benefited from this technique</a>.</p>
<p><b>D. Ask people.</b> Is there a way to ask users what they look for on your site? How would they describe your site&#8217;s contents? </p>
<p>Throughout Step 2, you are building into your CV what librarians call &#8220;user warrant.&#8221; This means that a term &#8220;is justified for inclusion in an index (or CV) only if it is of interest to the users of the information service.&#8221; (Lancaster, 26). Your CV will have high user warrant if the terms you include are real terms that people use to describe your content. If you include a lot of terms you suspect people might use, but that did not actually show up during your research, you will lower the user warrant. You are taking a risk: You may be unnecessarily muddying your CV.</p>
<p>At the end of this process you should have a large number of terms describing your site&#8217;s content. </p>
<p><span class="subhead">3. Establish preferred terms, variants and hierarchies. How do the pieces fit together?</span><br />
 After Step 2, we are left with what is essentially a big bucket of unrelated terms. Now we start to put like terms together and identify each one&#8217;s relationships. For each term, ask what is the broader (more general) term? What are the narrower (more specific) terms? If you are using terms to establish a navigation system, is this a preferred term or a variant? Your controlled vocabulary will start to come together as context is added to each term.</p>
<p>Using our camping gear example, a traditional CV notation for the terms we have collected about sleeping bags might look like this:</p>
<p>Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;BT Camping Equipment<br />
&nbsp; &nbsp; &nbsp; &nbsp;NT Down Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;NT Synthetic Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;NT Family Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;NT Cold Weather Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;NT 2-Season Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;NT 3-Season Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;NT Ultralight Sleeping Bags</p>
<p>(BT = broader term; NT = narrower term)</p>
<p>Some in your group might say, &#8220;Hey, sleeping bags should go under Backpacking Equipment, not Camping Equipment.&#8221; A perfectly good assertion. Somehow, you will need to decide this issue. Can &#8220;Sleeping Bags&#8221; be in both places? Should the term live in one place in the CV with a cross-reference from the other location? Maybe there is a distinction among different kinds of sleeping bag that you had not previously considered.</p>
<p>It might be a good time to do some research. For instance, ask yourself, &#8220;How do REI and MEC describe their sleeping bags?&#8221;</p>
<table cellpadding=5>
<tr>
<td valign=top class="articlebody">MEC does it like this:<br />
<a href="http://www.boxesandarrows.com/archives/images/040703_CV/mec-sleeping-bags.php" onclick="window.open('http://www.boxesandarrows.com/archives/images/040703_CV/mec-sleeping-bags.php', 'popup', 'width=758,height=490,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/creating_a_controlled_vocabulary/mec-sleeping-bags-thumb.jpg" width="200" height="129" border="0" /></a><br /><span class="caption">Click to enlarge</span></td>
<td valign=top class="articlebody">REI takes a completely different approach:<br />
<a href="http://www.boxesandarrows.com/archives/images/040703_CV/rei-sleeping-bags.php" onclick="window.open('http://www.boxesandarrows.com/archives/images/040703_CV/rei-sleeping-bags.php', 'popup', 'width=592,height=424,scrollbars=no,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0'); return false"><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/creating_a_controlled_vocabulary/rei-sleeping-bags-thumb.gif" width="200" height="143" border="0" /></a><br /><span class="caption">Click to enlarge</span></td>
</tr>
</table>
<p>The differences are striking. The main ones include the following:</p>
<ul>
<li><b>Depth:</b> The most obvious distinction is how REI goes for increased depth, whereas MEC uses a shallower category set.</li>
<li><b>Term Choice:</b> REI uses the general term &#8220;Sleeping Gear,&#8221; whereas MEC uses &#8220;Sleeping Bags.&#8221; What&#8217;s interesting is that both sites classify terms for related materials&#8212;pillows, stuff sacks, and so on&#8212;as narrower terms, yet only REI uses the more generic term &#8220;Sleeping Gear&#8221; to describe this breadth.</li>
<li><b>Broader Terms:</b> REI has &#8220;Sleeping Gear&#8221; as a narrower term under the top-level term &#8220;Camp/Hike.&#8221; MEC also has a similar top-level term&#8212;&#8220;Hiking/Camping Gear&#8221;&#8212;but instead of making &#8220;Sleeping Bags&#8221; a narrower term they put it at the same level.</li>
<li><b>Bags and Pads:</b> MEC puts sleeping pads as a narrow term under sleeping bags. REI doesn&#8217;t put them below sleeping bags, but at the same level in the hierarchy.</li>
</ul>
<p>Which is better? That&#8217;s difficult to say. REI is more sophisticated in their categorization, probably because of their larger product line. While REI&#8217;s scheme is more sophisticated, it&#8217;s also more complicated, so perhaps the simplicity of the MEC approach is better. Most likely, these differences are the result of differing strategies.</p>
<p>Our intention here is not to suggest which is better, only to show how even a simple situation can give many alternative answers. Certainly one can find much to like about these schemes. But in each case, improvements can be made. They are muddled. Concepts are mixed and matched haphazardly. There are questions about scalability and future directions. Material, temperature, gender, and age are combined in surprising and inconsistent ways. For example, why does MEC put sleeping pads as a narrower term of sleeping bag when they are obviously related, yet distinct items? And we&#8217;re still confused about the distinction between these two terms in the REI scheme: &#8220;Kids&#8217; Camping Bags&#8221; and &#8220;Kids Backpacking Bags.&#8221;</p>
<p>We will return to this example in a future article showing how facets can clarify this situation. But let&#8217;s not get too far ahead of ourselves.</p>
<p>For now, the question is: How do you clarify these issues? How do you make these difficult decisions? Making these decisions can quickly get messy in a group environment. Perhaps you need to ask a smaller team to consider the question and report back to the larger group. Doing some analysis, as we did with MEC and REI, and looking at your own strategy should help clarify what it is you want to do. However you decide your questions, be sure to note why you made the decision you did (for more on this, see Step 5).</p>
<p>We have been arguing that a good CV design process is essentially a user-centered process. Getting feedback from users will give you a great deal of insight into the problems we have raised.</p>
<p>A simple and commonly used method of getting feedback is called card sorting. Find some people whom you consider to be your target users. Give them cards with examples of items for sale on your site and ask them to arrange them into groups of like objects, or objects that they believe should be together. Then ask them to label their groups of cards. Look for patterns among their responses, compare the results to your original content labels, and make any necessary adjustments. For some good additional materials on card sorting, <a href="http://www.iawiki.net/CardSorting">see the IA Wiki</a>. Yes, it really is that simple and effective.</p>
<p><span class="subhead">4. Identify the &#8220;see also&#8221; terms. What else might be interesting to your target audience?</span><br />
 In most cases, related terms need to be identified only for large projects. If you are working on an ecommerce site, here is a way to connect related products that people might buy at the same time. In other words, you need to identify places where interest in one item might lead to interest in another. If your site users are buying camping boots, do they need socks? If they are buying backpacks, would they be interested in water bottles? Often, these are what the <a href="http://www.amazon.com/exec/obidos/tg/detail/-/0596000359/ref%3Dnosim/boxesandarrows-20">Polar Bear book</a> calls &#8220;contextual navigation&#8221; (116-118).</p>
<p>To get you started here, think about these possible relationships when considering related terms: </p>
<ul>
<li>process/agent (camp fires/matches); </li>
<li>action/product of action (baking/cakes); </li>
<li>agent/counteragent (allergies/antihistamine); </li>
<li>raw material/product (wool/sweater).</li>
</ul>
<p>Putting this idea of cross-selling into traditional CV notation might look something like this:</p>
<p>Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;BT Camping<br />
&nbsp; &nbsp; &nbsp; &nbsp;NT Down Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;NT Synthetic Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;NT Family Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;NT Cold Weather Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;NT 2-Season Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;NT 3-Season Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;NT Back Packing Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;NT Expedition Class Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;NT Ultralight Sleeping Bags<br />
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;RT Backpacks<br />
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp; &nbsp; &nbsp; &nbsp;RT Ultralight Backpacking<br />
&nbsp; &nbsp; &nbsp; &nbsp;RT Sleeping Bag Liners<br />
&nbsp; &nbsp; &nbsp; &nbsp;RT Sleeping Pads<br />
&nbsp; &nbsp; &nbsp; &nbsp;RT Stuff Sacks<br />
&nbsp; &nbsp; &nbsp; &nbsp;RT Pillows</p>
<p>(RT = related term)</p>
<p>What constitutes a related term? That is something for you to decide. Try to strike the right balance between suggesting options and overwhelming a user with choices. You might want to run the card sorting exercise again, this time giving people a list of items on cards and ask, for each item, if there are any objects from your inventory that they might look for when purchasing it. Adjust your CV accordingly.</p>
<p><span class="subhead">5. Establish a record of the rules you are using if you are creating a large thesaurus. </span><br />
I suspect most CV creators do not take the time to do this, and that is unfortunate. Remember all those decisions about what term goes where? Review the decisions you made and record what the decision was and why you made it. This will enable you to maintain consistency as your CV changes and expands. This makes your system easier to learn, and consequently, training your staff is easier. This is especially important for keeping categories pure if multiple people will be adding terms to content. It also makes for better decision-making in the future. </p>
<p>I am reminded of an <a href="http://slate.msn.com/?id=102916">interview with cellist Yo-Yo Ma</a> who told some students, &#8220;If you make specific choices in the music, we hear them.&#8221; He added later in the class, &#8220;If you don&#8217;t make specific choices, we don&#8217;t hear them.&#8221; This is as true for the actions of a controlled vocabulary as it is for a piece of music. Be aware of the assumptions you are making and make them conscious choices; users will &#8220;hear&#8221; them.</p>
<p>Some possible questions to consider here are: When do you include a new term? What constitutes a relationship or RT? When do you delete terms? What is the basis for choosing a preferred term?  When are terms singular or plural? Nouns or verbs? How will you deal with punctuation? </p>
<p>A place to look for generating issues you might want to consider is the <a href="http://www.niso.org/standards/standard_detail.cfm?std_id=518">ANSI/NISO standard for thesaurus construction</a>. Reviewing these guidelines and deciding what is relevant to your particular situation will help ensure the best possible outcome for your CV creation process. Now is also a great time to review the assumptions you made in Step 1.</p>
<p><span class="subhead">6. Implement.</span><br />
This step is difficult to write about because implementation is extremely dependant on your specific context. The other steps are not easy, but in the real world implementation is often the most difficult. It is also something the literature on CVs rarely tackles in a meaningful way. For now, we will take the metaphorical 50,000-foot view.</p>
<p>If you are using your controlled vocabulary for developing a menu for navigation or categories for browsing, continue your user testing. At this stage you can present a more complete version for users to evaluate. If you have completed some testing earlier, this should involve only minor changes to your CV.</p>
<p>If you are using your controlled vocabulary for searching, get ready for more work: Tweaking the algorithms for a search engine is a difficult job involving lots of tradeoffs. It will also require a good relationship with your IT staff (good thing you started this already in Step 1!). A lot of difficult decisions will need to be made. Examples include how you use punctuation, Boolean operators (when to use AND and when to use OR connectors), and recall versus precision. Multiple word terms can sometimes be difficult (if your CV term is &#8220;walking staff&#8221; and the user enters &#8220;Walking Staff Wood,&#8221; does he get any variant terms for &#8220;walking staff?&#8221;). Your solution will depend on the search engine you are using, the audience, the content, and the tradeoffs you need to make to get your project up and running. </p>
<p><span class="subhead">7. Test and evaluate.</span><br />
You have done some testing during the CV creation process, now it is time to make sure the assumptions you have made throughout the process are correct when you consider the implementation as a whole. </p>
<p>Start with yourself. Use the site to find various types of information based on assumptions you made earlier. Can you identify which content goes in which slot pretty easily? Can you search and get the results you expect? If using your CV to improve searching, enter a term and carefully look at the first page of returns. Are these the results you want your users to get for this search term?</p>
<p>After you feel like the CV is working as you believe it should, contact some outsiders and ask them to use your site. Do your terms reflect the concepts these people are searching for? Are they getting the results they expect? Are your terms too broad or too narrow? Remember, you are not always going to be successful. This is another time to keep the 80/20 rule in mind.</p>
<p><span class="subhead">8. Go back and refine.  What can be improved?</span><br />
A controlled vocabulary is never finished. The goal of the initial creation of your CV is simply to create a system for controlling vocabulary that is agile, easy to update, consistent in both scope (what is covered) and granularity (how deeply it is covered), and helps users find what they are looking for. </p>
<p>However, maintenance is required to keep your CV viable and usable. Constant monitoring, evaluation, and tweaking are critical. This may require daily reviews of search logs, regular testing with users, regular conversations with subject specialists, or other analysis. One of the arguments against using a controlled vocabulary is that it requires so much time to maintain, that it doesn&#8217;t keep up with the changing terminology of the given field. Therefore, constant analysis is key to success. The list of improvements you can imagine needing to make will always be long, but don&#8217;t lose sight of the smaller, daily &#8220;housecleaning&#8221; tasks.</p>
<p>There is a lot of talk about how controlled vocabularies improve a site&#8217;s information architecture. If you decide to create one, however, it is important to realize that an effective controlled vocabulary involves regular maintenance. Doing it right will keep you aware of both the dynamic developments of your content and keep you close to the language of your users and their information needs.</p>
<p><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/art_end.gif" alt="" title="" width="8" height="8" /></p>
<p><end></end><br />
<morebox>
<ul>
<li>Cooper, Alan (1999). <a href="http://www.amazon.com/exec/obidos/tg/detail/-/0672316498/ref=nosim/boxesandarrows-20">The Inmates are Running the Asylum: Why High-Tech Products Drive Us Crazy and How to Restore the Sanity</a>. SAMS publishing: Indianapolis, IN.</li>
<li>Lancaster, F.W. (1986). <a href="http://www.amazon.com/exec/obidos/tg/detail/-/0878150064/ref=nosim/boxesandarrows-20">Vocabulary Control for Information Retrieval</a> (2nd Edition). Information Resources Press: Arlington, VA.</li>
<li>Rosenfeld, Louis, &#038; Morville, Peter. (2002). <a href="http://www.amazon.com/exec/obidos/tg/detail/-/0596000359/ref%3Dnosim/boxesandarrows-20">Information Architecture for the World Wide Web: Designing large scale web sites.</a> (2nd Edition). O&#8217;Reilly &#038; Associates: Sebastopol, CA.</li>
<li><a href="/files/banda/Bibliography.htm">An Annotated Bibliography</a></morebox><biobox><a href="http://www.boxesandarrows.com/people/archives/karl_fast.php">Karl Fast</a> is a PhD student in library and information science at the University of Western Ontario. He also has a master&#8217;s in LIS. His graduate work has included courses on organization of information, subject analysis, thesaurus construction, and facet analysis.
<p><a href="http://www.boxesandarrows.com/people/archives/fred_leise.php">Fred Leise,</a> president of <a href="http://www.contextualanalysis.com">ContextualAnalysis, LLC,</a> is an information architecture consultant providing services in the areas of content analysis and organization, user experience design, taxonomy and thesaurus creation, and website and back-of-book indexing.</p>
<p><a href="http://www.boxesandarrows.com/people/archives/mike_steckel.php">Mike Steckel</a> is an Information Architect/Technical Librarian for International SEMATECH in Austin, TX. </biobox></p>
]]></content:encoded>
			<wfw:commentRss>http://boxesandarrows.com/creating-a-controlled-vocabulary/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>What Is A Controlled Vocabulary?</title>
		<link>http://boxesandarrows.com/what-is-a-controlled-vocabulary/</link>
		<comments>http://boxesandarrows.com/what-is-a-controlled-vocabulary/#comments</comments>
		<pubDate>Mon, 16 Dec 2002 23:29:17 +0000</pubDate>
		<dc:creator>Fred Leise</dc:creator>
				<category><![CDATA[Methods]]></category>
		<category><![CDATA[Process and Methods]]></category>
		<category><![CDATA[Special topic: Search and Metadata]]></category>

		<guid isPermaLink="false">http://boxesandarrows.com/what-is-a-controlled-vocabulary/</guid>
		<description><![CDATA[Finding the right words to communicate the message of your website can be one of the most difficult parts of developing it. Our authors guide you through the concepts behind a well-designed controlled vocabulary and discuss the pros and cons of its development.]]></description>
				<content:encoded><![CDATA[<pullquote>&#8220;A controlled vocabulary is a way to insert an interpretive layer of semantics between the term entered by the user and the underlying database to better represent the original intention of the terms of the user.&#8221;</pullquote>The most effective communication occurs when all parties involved agree on the meaning of the terms being used. Consequently, finding the right words to communicate the message of your website can be one of the most difficult parts of developing it. </p>
<p>When we converse, we speak in &#8220;natural language.&#8221; This is language in all its raw, rich, gooey glory. When we organize our information and label it however, there is so much richness, variance, and confusion in terminology that we often need to impose some order to facilitate agreement between the concepts within the site and the vocabulary of the person using it. </p>
<p>This order can come through a controlled vocabulary. Amy Warner <a href="http://www.lexonomy.com/publications/aTaxonomyPrimer.html">defines</a> a controlled vocabulary (CV) as &#8220;organized lists of words and phrases, or notation systems, that are used to initially tag content, and then to find it through navigation or search.&#8221;  This means that a CV is a type of metadata that functions as a &#8220;subset of natural language&#8221;(Wellisch); it is not how we normally speak. Using a CV is also a way to overtly display relationships among the various concepts that your site covers in order to increase findability. The most basic, and often overlooked, form of controlled vocabulary is a consistent labeling system. If you are careful to call the same thing, or the same concept, by the same name everywhere on your site, you are using a very simple controlled vocabulary. And you&#8217;re also ensuring that your users start developing a mental model of the information they can find. </p>
<p>A controlled vocabulary is a way to insert an interpretive layer of semantics between the term entered by the user and the underlying database  to better represent the original intention of the terms of the user. Consider what happens when you do not use a controlled vocabulary. An uncontrolled vocabulary simply uses the natural language of the documents and matches that with the natural language of the user. This is extremely specific, and it gives the user exactly what they ask for. Sounds great right? Consider, however, a site about chemistry, where many of the documents use the chemical name of the element (&#8220;iron&#8221;), and many use the chemical symbol of the element (&#8220;Fe&#8221;). Using an uncontrolled vocabulary, the results will only include the terms entered by the user. If the user entered &#8220;Fe&#8221; in the search box, he will not get any of the results for documents that use the term &#8220;iron.&#8221;  There is a good chance the user is missing some documents he would like to have.  Very few users will enter both terms, and many will be reviewing their results thinking they are seeing the results from all relevant documents.</p>
<p><span class="subhead">The equivalence relationship</span><br />
You probably are aware of certain categories or items on your site that might go by multiple names. You realize that if you said &#8220;automobiles&#8221; on your homepage and &#8220;cars&#8221; on the next page, users might get confused. Users will start to wonder if there is a difference between the two terms. Instead you choose &#8220;automobiles&#8221; and don&#8217;t use &#8220;cars&#8221; at all. In this case &#8220;automobiles&#8221; is the term you prefer to use throughout your site. We call this the &#8220;preferred term.&#8221; &#8220;Cars&#8221; is a variant term, a different word representing the same concept.  Or, consider this example:</p>
<p> <img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/cv_1.jpg" width=343 height=43 alt="Example of a preferred term">    </p>
<p>Here, each term refers to the same concept, Elizabeth Taylor (your preferred term). We could tell our system, when people ask for &#8220;Elizabeth Burton&#8221; use &#8220;Elizabeth Taylor.&#8221; This is more traditionally expressed using standard CV notation as:</p>
<p>Elizabeth Fortensky USE Elizabeth Taylor<br />
Elizabeth Taylor UF Elizabeth Fortensky  (UF = Use For)</p>
<p>Or even this:</p>
<p>Liz Taylor USE Elizabeth Taylor<br />
Elizabeth Taylor UF Liz Taylor</p>
<p>Think about Gap&#8217;s web page (<a href="http://www.gap.com">http://www.gap.com</a>). We already know what they sell (they have excellent branding), and most of their content is generally referred to by the same terms as used in our general culture. In other words, people consistently say &#8220;jeans,&#8221; &#8220;pants,&#8221; and &#8220;shirts.&#8221; Even though you might get the occasional person using the word &#8220;dungarees&#8221; or &#8220;slacks,&#8221; nearly everyone would see &#8220;jeans&#8221; and know what the category referred to (the visuals help support this too). Furthermore, Gap does not carry hundreds of pairs of jeans that must somehow be distinguished from one another. If you examine the natural language people use when talking about Gap&#8217;s products, there&#8217;s an unusually small amount of term variance. Content like this works great in the very simple organization system used on the Gap site. It works so well that they do not even need to offer search; this is very unusual for an ecommerce site. What they have is a system in which all of the concepts are consistently labeled using language familiar to their users. They&#8217;re lucky. Few sites have the option to work in this way.</p>
<p>Let&#8217;s say, however, that gap.com decided to offer search. Then they would somehow need to translate the natural language of search into the controlled language of the website. People search in the same language they speak, natural language, so a more advanced controlled vocabulary needs to take the concepts of your users (natural language) and match them to the concepts expressed in the language of your website (controlled vocabulary). That means if the developers of the site began to see that people were searching for &#8220;dungarees&#8221; and getting zero hits, they would need to create a way to tell the system, &#8220;when someone searches for &#8216;dungarees,&#8217; give them the results for &#8216;jeans.&#8217;&#8221; In the language of a controlled vocabulary, &#8220;jeans&#8221; becomes the preferred term and &#8220;dungarees&#8221; is a variant term, and they have an equivalence relationship. This can be a powerful tool for increasing findability. </p>
<p>There are many examples of the situations that alternate terms cover. Here are a few:
<ul>
<li>synonyms (two words with the same meaning, like &#8220;jeans&#8221; and &#8220;dungarees&#8221;)</li>
<li>homonyms (words that sound the same, but have different meanings, like &#8220;bank&#8221; the financial institution and &#8220;bank&#8221; the side of a stream or river) </li>
<li>common misspellings </li>
<li>changes in content (e.g., countries that change their name or have multiple spellings)</li>
<li>identifying &#8220;Best Bets&#8221; or the most popular pages associated with a certain term (<a href="http://www.BBC.com">http://www.BBC.com</a> is great at this)</li>
<li>connecting a woman&#8217;s married name to her maiden name</li>
<li>connecting abbreviations to the full word (e.g., NY and New York, the chemical symbol Si with the element Silicon)</li>
</ul>
<p>There are two types of synonym equivalence lists: synonym rings and authority files. Synonym rings are generally used for searching behind the scenes as a way to connect the various terms for a concept. It can be used to say, &#8220;when someone searches for &#8220;Si,&#8221; give them all documents with both &#8220;Si&#8221; and &#8216;Silicon.&#8217;&#8221; However, what happens when you want to display one of these terms in your navigation? Then you will need to pick one to be your preferred term. Now, you have an authority file. In each of the above examples, different terms may be used, but each one represents the same concept. They are tied together and given meaning by making their equivalent relationship explicit.</p>
<p><span class="subhead">Hierarchical relationships: broader and narrower terms</span><br />
If your content is more complex, for instance if you sold only pants and you had hundreds of types, you might require more from your controlled vocabulary.  <fig image="http://www.boxesandarrows.com/archives/images/121602_CV/cv_2.jpg" width=161 height=271 alt="Jumble of terms" align="left" hspace="5" caption="Figure 2: Terms related to &#8220;pants.&#8221;" />The natural language we use to describe the concept of &#8220;pants&#8221; quickly enlarges as &#8220;pants&#8221; becomes more specific. In other words, &#8220;slacks,&#8221; &#8220;khakis,&#8221; &#8220;jeans,&#8221; &#8220;trousers,&#8221; &#8220;corduroys,&#8221; and other kinds of pants will all need to be differentiated so users don&#8217;t have to rummage through pages and pages of search results for the word &#8220;pants,&#8221; when pants are your whole inventory.
<p><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/art_end.gif" alt="" title="" width="8" height="8" /></p>
<p>What will help is a systematic way to map out the different terms so people quickly find the specific kinds of pants they are interested in. What you need is a hierarchy showing the broader terms (BTs), the narrower terms (NTs), and the variant terms (most often displayed as &#8220;USE&#8221; and &#8220;UF&#8221; for Used for). These will show which terms are subsets of larger, broader concepts. You are starting off with a jumble of words that are all related to &#8220;pants&#8221; in some way. It might look something like Figure 2.</p>
<p>We have a bucket we can call &#8220;Pants&#8221; and inside are a lot of terms with a relationship to the concept of pants. In this example, &#8220;pants&#8221; is the broader term, and the kinds of pants refer to subsets of the whole universe of pants. In a controlled vocabulary, we might reconfigure the chart above to look like this:</p>
<p> <img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/what_is_a_controlled_vocabulary_/cv_3.gif" width=592 height=360 alt="Taxonomy of the concept of Pants">        </p>
<p>This is what people are increasingly calling a Taxonomy. This term makes traditional librarians a little uncomfortable, but we are learning to live with it. Originally it was a term for biological classifications (Genus, species, etc.), but has quickly become a standard word for describing hierarchies. </p>
<p>The standard CV notation used to express hierarchical relationships are NT (narrower term) and BT (broader term). Using this notation, the term &#8220;Women&#8217;s Pants&#8221; would be expressed like this:</p>
<p>Women&#8217;s Pants<br />
&nbsp;&nbsp;BT Pants<br />
&nbsp;&nbsp;NT Casual Pants<br />
&nbsp;&nbsp;NT Dress Pants<br />
&nbsp;&nbsp;NT Sports Pants</p>
<p>There is a lot you can do with this hierarchical arrangement. It can help you formulate your homepage navigation. It could improve your searching and browsing. It can help users broaden and narrow their search results quickly by showing them where each set of results fits into the site&#8217;s hierarchy (see Keith Instone&#8217;s &#8220;<a href="http://keith.instone.org/breadcrumbs/">attribute breadcrumbs</a>&#8221; for more examples). Generally, few sites need to go beyond the level of a taxonomy, but it might be useful to see the next level of complexity in controlled vocabularies.</p>
<p><span class="subhead">Associative relationships: related terms</span><br />
How far can I extend the pants example? Oh, quite far. Let&#8217;s say that you are a research institute that studies pants (ridiculous I know, but stay with me). You not only study pants themselves, but the materials they are made from, their history, how they are manufactured, and more. Your institute might do well to take the time to develop what Peter Morville has called the &#8220;<a href="http://www.asis.org/Conferences/Summit2001/preconference.html">Rolls Royce of controlled vocabularies</a>&#8221;&#8212;a thesaurus. A thesaurus shows all of the relationships described so far (BT, NT[LD3], and UF), but will also include related terms (RT). This is an associative relationship. It shows how one term is associated with another.</p>
<p>If a user looked to your institute for research on jeans, you would be able to give them that term embedded in a rich series of relationships. An example of the range of relationships would be expressed like this using the standard format for thesauri:  </p>
<p>Jeans<br />
&nbsp;&nbsp;BT Pants<br />
&nbsp;&nbsp;NT Levis<br />
&nbsp;&nbsp;NT Wranglers<br />
&nbsp;&nbsp;UF Dungarees<br />
&nbsp;&nbsp;UF Waist Overalls<br />
&nbsp;&nbsp;RT Denim<br />
&nbsp;&nbsp;RT Overalls</p>
<p>Denim is related to Jeans, but not hierarchically. It is not a type of jeans, nor is one a subset of the other. Yet someone interested in one term might be interested in the other because they are related concepts. In the interface, you might identify &#8220;Denim&#8221; as a &#8220;see also&#8221; option for &#8220;Jeans.&#8221; If users looked for the term &#8220;Denim&#8221; in the thesaurus they might see something like this:</p>
<p>Denim<br />
&nbsp;&nbsp;BT Fabrics<br />
&nbsp;&nbsp;NT Ring Spun<br />
&nbsp;&nbsp;NT Dark Indigo<br />
&nbsp;&nbsp;NT Stonewash<br />
&nbsp;&nbsp;RT Jeans</p>
<p>The Denim example alone could be filled with many additional terms, and it is easy to see how well this would accommodate user browsing (and &#8220;<a href="http://www.gseis.ucla.edu/faculty/bates/berrypicking.html">berrypicking</a>&#8221;). This is also one of the dangers of creating associative relationships: knowing when to stop. This relationship is also the most difficult and subjective of all the relationships in a CV. You are identifying a relationship between two concepts that may not be obviously apparent. On an Amazon product page, when the page identifies an item that others have purchased along with the one being displayed, Amazon is identifying a potentially useful associative relationship. </p>
<p>To push the concept a little further, if a user is interested in a paper from your pants institute on the &#8220;Hemingway wore khakis&#8221; advertisement from Gap, they might also be interested in a paper you have on how it was really Rock Hudson&#8217;s subtle use of Khakis that made &#8220;A Farewell to Arms&#8221; such a great movie. The connection between the two documents, the intersection of the concepts of &#8220;Hemingway&#8221; and &#8220;khakis,&#8221; is less direct than the Denim example above. This is expanding the concept of &#8220;related terms&#8221; farther than many would be prepared to go, but it is an option. </p>
<p><span class="subhead">Internal uses of controlled vocabularies</span><br />
So far, we have focused on how controlled vocabularies help the user, but there are also benefits to the organization using the CV. Here are a few:
<ul>
<li>CVs can help with category analysis or keeping your categories distinct.</li>
<li>CVs can help establish a site&#8217;s navigation.</li>
<li>CVs can be the basis for personalization features.</li>
<li>CVs can help with preparation for CMS or knowledge management projects, since many of these require this sort of structure to your content to do their magic.</li>
<li>CVs get the organization using the same language as the users (which should result in better communication with them).</li>
<li>CVs can help the organization (and the user) understand what concepts your site covers. Your controlled vocabulary is in reality a &#8220;concept map&#8221; of what is on your site.</li>
</ul>
<p>While controlled vocabularies can be powerful, by themselves they are not the magic pill that will cure what ails your site. CVs are a lot of work, they are often difficult and time consuming to maintain, and they can be very political. Some skepticism toward all metadata is a healthy thing (everyone still reading this should see <a href="http://www.well.com/~doctorow/metacrap.htm">Metacrap</a>). As with anything important, there are a lot of people who are doing it loudly and badly.  </p>
<p><span class="subhead">Conclusion</span><br />
Human beings are natural makers of patterns. That is how we understand what our senses are taking in. When people visit your site, they will immediately begin trying to understand what they see. A well-designed and regularly updated controlled vocabulary can help connect the concepts your users have in their heads to the concepts you present on your site. That is when real communication will occur. </p>
<p>Next in the series: <a href="http://www.boxesandarrows.com/archives/creating_a_controlled_vocabulary.php">How to create a controlled vocabulary</a>.</p>
<p><end></end></p>
<p><morebox>
<ul>
<li>Wellisch, Hans. <a href="http://www.amazon.com/exec/obidos/tg/detail/-/082420882X/ref=nosim/boxesandarrows-20">Indexing from A to Z</a>. New York: H.W. Wilson, 1995. p.214</li>
<li>Amy J. Warner <a href="http://www.lexonomy.com/publications/aTaxonomyPrimer.html">Taxonomy Primer</a></li>
<li><a href="http://www.BBC.com">http://www.BBC.com</a></li>
<li>Keith Instone&#8217;s <a href="http://keith.instone.org/breadcrumbs/">attribute breadcrumbs</a></li>
<li><a href="http://www.asis.org/Conferences/Summit2001/preconference.html">ASIS Summit 2001</a></li>
<li><a href="http://www.gseis.ucla.edu/faculty/bates/berrypicking.html">Berrypicking</a></li>
<li><a href="http://www.well.com/~doctorow/metacrap.htm">Metacrap</a></li>
<li><a href="/files/banda/Bibliography.htm">An Annotated Bibliography</a></li>
</ul>
<p></morebox><biobox><a href="http://www.boxesandarrows.com/people/archives/karl_fast.php">Karl Fast</a> is a PhD student in library and information science at the University of Western Ontario. He also has a master&#8217;s in LIS. His graduate work has included courses on organization of information, subject analysis, thesaurus construction, and facet analysis.</p>
<p><a href="http://www.boxesandarrows.com/people/archives/fred_leise.php">Fred Leise,</a> president of <a href="http://www.contextualanalysis.com">ContextualAnalysis, LLC,</a> is an information architecture consultant providing services in the areas of content analysis and organization, user experience design, taxonomy and thesaurus creation, and website and back-of-book indexing.</p>
<p><a href="http://www.boxesandarrows.com/people/archives/mike_steckel.php">Mike Steckel</a> is an Information Architect/Technical Librarian for International SEMATECH in Austin, TX. </biobox></p>
]]></content:encoded>
			<wfw:commentRss>http://boxesandarrows.com/what-is-a-controlled-vocabulary/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>All About Facets &amp; Controlled Vocabularies</title>
		<link>http://boxesandarrows.com/all-about-facets-controlled-vocabularies/</link>
		<comments>http://boxesandarrows.com/all-about-facets-controlled-vocabularies/#comments</comments>
		<pubDate>Mon, 09 Dec 2002 22:44:53 +0000</pubDate>
		<dc:creator>Fred Leise</dc:creator>
				<category><![CDATA[Findability]]></category>
		<category><![CDATA[Process and Methods]]></category>
		<category><![CDATA[Special topic: Search and Metadata]]></category>

		<guid isPermaLink="false">http://boxesandarrows.com/all-about-facets-controlled-vocabularies/</guid>
		<description><![CDATA[Information architects are fascinated with faceted classification and its application to information architecture problems. Our three authors present a series of in-depth articles covering faceted classification and controlled vocabularies and their practical application.]]></description>
				<content:encoded><![CDATA[<pullquote>&#8220;Our aim is to make this complex and important subject accessible to practicing information architects.&#8221;</pullquote>Information architects are fascinated with faceted classification and its application to information architecture problems. However, facets remain difficult to understand and there are few options for learning about them. </p>
<p>This is the first in a series of articles that aims to correct this situation. We intend to explain both facets and the more general concept of controlled vocabularies. We want to make the subject accessible to those who don&#8217;t have advanced degrees in library and information science. Furthermore, we want to show how these concepts can be applied to solve information architecture problems for the Web and other digital information environments.</p>
<p>The concept of faceted classification is decades old, and controlled vocabularies go back even further. Consequently a great deal has already been written about the subject. But these writings are not always helpful to the practicing IA. Some are too simple, others too academic. Most are hard to find, and many were written decades before this Web thing happened.</p>
<p>Throughout this series we will strive to be:</p>
<ul>
<li><b>Practical.</b> We will give you a practical guide to controlled vocabularies and faceted classification. We will not only explain the concepts, we will show you how to apply them in solving real information architecture problems.</li>
<li><b>Readable.</b> Too much of the existing literature is hard to understand. It may be comprehensible to someone with a master&#8217;s in library and information science, but this excludes a large number of practicing IAs (and we know some librarians who don&#8217;t understand this stuff). We will use plain talk to explain this stuff, without dumbing it down.</li>
<li><b>Relevant.</b> We will make this relevant to the Web and other digital information environments. A great deal was written about this topic in the 1950s and 60s. It&#8217;s excellent material, but back then transistors were still a pretty neat idea. We believe that faceted classification has even more applications today than it did back then.</li>
<li><b>Accessible.</b> Everything will be published here on Boxes &amp; Arrows: on the web, easy to access, and free. While a lot has been written on this topic, it&#8217;s often hard to obtain. For example, B.C. Vickery&#8217;s excellent book, &#8220;Faceted Classification: A Guide to the Construction and use of Special Schemes&#8221; was written in 1960 and is rather difficult to obtain today (at least one of the authors has resorted to finding a copy in a library and, in desperation, photocopying the whole thing).</li>
</ul>
<p><span class="subhead">The plan </span><br />
Our main goal is to explain faceted classification. However, a faceted classification scheme is actually a special case of what are called controlled vocabularies. To properly explain facets we will begin with this more general topic and work our way up to facets.</p>
<p>Our travels through this strange land will include the following:</p>
<ul>
<li><b>Controlled Vocabularies.</b> In the first full article in the series we&#8217;ll describe controlled vocabularies in general. We&#8217;ll talk about what they are and how they work. </li>
<li><b>Synonym Rings &amp; Authority Files.</b> Before moving on to facets, we&#8217;ll describe these simpler types forms of controlled vocabularies. There are many situations where they are more useful solutions because they&#8217;re easier to create, implement, and maintain. Sometimes they&#8217;re not enough and it&#8217;s time to step up to facets.</li>
<li><b>Facets &amp; Facet Analysis.</b> With the fundamentals in place we will move on to the heart of our subject. This will take a while, but it&#8217;ll be worth it. We&#8217;ll also take time to describe facet analysis, the process used to develop facets.</li>
<li><b>Interface Issues.</b> A long-standing weak point of controlled vocabularies is how to use them effectively in an interface. This is particularly true of facets. We&#8217;ll explore these issues and give you the best advice we can.</li>
<li><b>Decision Factors.</b> Not every project calls for a full blown faceted solution. Sometimes a synonym ring is better. How do you know? We&#8217;ll cover some guidelines for making those decisions.</li>
<li><b>Future Directions.</b> There are some interesting new applications related to facets and controlled vocabularies such as <a href="http://www.xfml.org/">XFML</a> (http://www.xfml.org/) and <a href="http://www.topicmaps.org/">Topic Maps</a> (http://www.topicmaps.org/). We hope to cover these as well.</li>
</ul>
<p><span class="subhead">Some final thoughts</span><br />
That&#8217;s a lot. And yes, we&#8217;re ambitious. But no, we aren&#8217;t writing the definitive treatise on the subject. Our aim is to make this complex and important subject accessible to practicing information architects.</p>
<p>We view this as a collaborative effort. We anticipate many questions. We&#8217;ll answer these through the discussion features of Boxes &amp; Arrows. We also plan to address the bigger questions you have in subsequent columns. Let us know what you want to know, and we&#8217;ll do our best to provide you with answers.</p>
<p><end></end></p>
<p><biobox><a href="http://www.boxesandarrows.com/people/archives/karl_fast.php">Karl Fast</a> is a PhD student in library and information science at the University of Western Ontario. He also has a master&#8217;s in LIS. His graduate work has included courses on organization of information, subject analysis, thesaurus construction, and facet analysis.</p>
<p><a href="http://www.boxesandarrows.com/people/archives/fred_leise.php">Fred Leise,</a> president of <a href="http://www.contextualanalysis.com">ContextualAnalysis, LLC,</a> is an information architecture consultant providing services in the areas of content analysis and organization, user experience design, taxonomy and thesaurus creation, and website and back-of-book indexing.</p>
<p><a href="http://www.boxesandarrows.com/people/archives/mike_steckel.php">Mike Steckel</a> is an Information Architect/Technical Librarian for International SEMATECH in Austin, TX. </biobox></p>
<p><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/art_end.gif" alt="" title="" width="8" height="8" /></p>
]]></content:encoded>
			<wfw:commentRss>http://boxesandarrows.com/all-about-facets-controlled-vocabularies/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Improving Usability with a Website Index</title>
		<link>http://boxesandarrows.com/improving-usability-with-a-website-index/</link>
		<comments>http://boxesandarrows.com/improving-usability-with-a-website-index/#comments</comments>
		<pubDate>Mon, 15 Jul 2002 12:00:01 +0000</pubDate>
		<dc:creator>Fred Leise</dc:creator>
				<category><![CDATA[Interfaces]]></category>
		<category><![CDATA[Methods]]></category>

		<guid isPermaLink="false">http://boxesandarrows.com/improving-usability-with-a-website-index/</guid>
		<description><![CDATA[Indexes are important information-finding tools that can enhance usability. Site indexes provide direct, easily scannable links to meaningful, yet highly granular, chunks of content. But there&#8217;s more to them than people often assume.]]></description>
				<content:encoded><![CDATA[<p>Indexes are important information-finding tools that can enhance website usability. They offer easy scanning for finding known items, they provide entry points to content using the users&#8217; own vocabulary and they provide access to concepts discussed, but not named, in the text. Perhaps most importantly, site indexes provide direct access to granular chunks of information without the need for traversing multiple links in a hierarchy.</p>
<p><span class="subhead">What are indexes?</span><br />
Before I explore how website indexes can improve usability, let&#8217;s start with background knowledge that will help show how they fit into the broader picture, especially since indexes have more to them than people often assume.
<pullquote>Although great strides have been made with the technology, automatic classification tools come nowhere near the human brain in terms of accuracy in evaluating text.</pullquote>According to Nancy Mulvaney an index is &#8220;a structured sequence—resulting from a thorough and complete analysis of text—of synthesized access points to all the information contained in the text.&#8221;<a href="javascript://" onClick="window.open('siteindex071502_notes.html', 'popup', 'width=400,height=300,scrollbars=auto,resizable=no,toolbar=no,directories=no,location=no,menubar=no,status=no,left=0,top=0')"><span class="sup">1</span></a></p>
<p>What are the important points about this definition? First, that the index is a sequence, that is, it has a known order of items. While most indexes are arranged alphabetically, other orders are possible, such as numerical (for a parts list index) or chronological (for a timeline). But the index isn&#8217;t just a list of entries, it is structured. In other words, the index shows relationships between various subjects, thus leading users to more specific or related topics that might meet their information needs more closely.</p>
<p>Most importantly for the construction of an index, a human has looked at and analyzed the text. Although great strides have been made with the technology, automatic classification tools come nowhere near the human brain in terms of accuracy in evaluating text. There is simply too much contextual meaning that texts carry, too much social and cultural knowledge that while not stated in the text, needs to be accounted for when creating the index. Certainly no computer can yet understand the actual meaning of all texts. </p>
<p>Mulvaney&#8217;s final point is that the index comprises access points to all the information contained in the text. An index contains all significant mentions of people, places, things and ideas. Important here is the idea of significance. An index should lead users to relevant material, to significant content chunks that provide useful information, rather than to passing mentions of words.</p>
<p>Thus, indexes are not concordances&#8212;lists of every occurrence of every word in a text. This is primary reason why indexes are much more valuable in certain cases that searches. Search results are often overwhelming or even useless; the fact that a word or phrase is mentioned in the text does not mean that the subject is discussed in the text. And it is the discussion that provides information for the user.</p>
<p><span class="subhead">How do indexes increase usability?</span><br />
Indexes, as flat lists of terms, are easily scannable. Users need only use their browser&#8217;s scroll bar to navigate through the entire index. (Large indexes often provide alphabetical anchor links at the top of the index, which take users quickly to the portion of the index they need to use.) There are no multiple levels to navigate, nor must users decide which branch of a hierarchy to click on, which often results in their missing information they are looking for or taking longer to find it. In fact, the easy scannability of the index on a single page is an important argument against having separate pages for letter of the alphabet, whenever possible.</p>
<p>Through the use of multiple access points or &#8220;see&#8221; references, indexes help translate the vocabulary of the users to that of a text. In this example, for instance:<br />
<blockquote>cancer. See oncology</p></blockquote>
<p>The index is telling the user that this site does have information on cancer, but that it uses the term &#8220;oncology&#8221; to represent this concept. And, if users click on the link, the index will bring them directly to the relevant information about that term.</p>
<p>&#8220;See also&#8221; references can lead users to additional or more specific information that might more closely meet their information needs. Every reference librarian knows that many users come to them with ill-formed queries. &#8220;See also&#8221; references assist users by helping them think about the information they are looking for.<br />
<blockquote>training. See also online training; web-based training</p></blockquote>
<p>Indexes are especially useful in &#8220;know-item finding,&#8221; those cases where users know specifically what they are looking for (or what information they saw previously and want to get back to). They simply find the term in the index and click on the link to go directly to the information. No need to drill down through multiple site levels or try to remember what path they took before.</p>
<p>Indexes can also serve an important function by leading users to concepts discussed but not specifically mentioned in the text. For example, a good indexer analyzing a paragraph that talks about Alpo and Purina Dog Chow might add an index entry for &#8220;pet nutrition.&#8221; Such intellectual analysis and synthesis adds significant value for users. Automated indexing tools fail at providing this kind of added value.</p>
<p>A site index acts as an important complement to the site map or table of contents. Where the latter look at the high-level (or top-down) organization of information on the site, indexes look at the bottom-up view, that is, at specific, granular information chunks.</p>
<p><span class="subhead">When should site index be used?</span><br />
Clearly, small sites have little need for indexes. Usually the navigation labels and page titles themselves will be enough for users to find the information they need (assuming that labels have been well thought out and provide an appropriate information scent).</p>
<p>For extremely large sites, with millions of pages, including everything in the index would be so time consuming and labor intensive as to be uneconomical. In addition, the resulting index would be almost impossible to scan. However, such sites can be improved and their usability increased by providing an index that directs users to the set of information that is most used or that most users need to do their jobs efficiently. </p>
<p>Most mid-sized sites, with hundreds or thousands of pages can benefit from the additional navigation that site indexes offer and can be indexed in a reasonable amount of time at a reasonable cost.</p>
<p><span class="subhead">How are website indexes created?</span><br />
Indexing, no matter what the material under consideration, consists of two steps. First, the content is analyzed to establish indexable concepts and then terms (or labels) for those concepts are created or selected. In website indexing, the URL for the page on which the information resides is captured and used to turn the index term into a hypertext link. For best results, a human mind needs to do the content analysis process. </p>
<p>There is software available, such as <a href="http://www.html-indexer.com/">HTML Index,</a> that helps automate the index preparation process by spidering a site and creating a preliminary version of an index using page titles and named anchors. The indexer then needs to massage those results to create a truly useful index.</p>
<p>Indexers can also create a site index using regular indexing software. CINDEX, MACREX and Sky Professional are the programs most used by professional indexers to assist with important, but time-consuming housekeeping tasks such as alphabetizing entries, checking spelling or verifying cross references.  After the initial index entries have been created, they can then be copied or output (with embedded HTML coding) into a content management system&#8217;s index page template for later publishing to the website itself. </p>
<p>That process was the one I used to create the site index for <a href="http://www.peoplesoft.com/">PeopleSoft, Inc.&#8217;s website,</a> which won an Australian Society of Indexers Web Index Award 2002–2004. Here, for example is the simple link code used to create the fourth line in the PeopleSoft site index illustrated below:<br />
<blockquote>&#60;a href=&#8221;/corp/en/about/pspartner/apply/apply_partner.asp&#8221;&#62;Alliance partners, applying to become&#60;/a&#62;&#60;br&#62;</p></blockquote>
<p>Special codes (available in most indexing programs) were used to &#8220;hide&#8221; the HTML coding so that the program alphabetized only the actual index labels themselves.</p>
<p><fig href="/files/banda/improving_usability_with_a_website_index/siteindex_img1.html" pop_width="582" pop_height="531" pop_scroll="auto" image="http://www.boxesandarrows.com/archives/images/071502_siteindex/peoplesoft_site_index-thumb.gif" width="291" height="266" border="0" align="left" caption="PeopleSoft.com site index" /></a>Having the site index use the conventions of back-of-book indexes, for example, indented subheads, makes it instantly recognizable for users. If they have any familiarity at all with using indexes, they will feel right at home with your site index. And that helps make for good usability.</p>
<p><span class="subhead">Creating index labels</span><br />
Label terms for indexes may be created by one of two different methods, depending on whether indexing is being carried out in a &#8220;closed&#8221; system or an &#8220;open&#8221; system. </p>
<p>In the former, nothing other than the text itself needs to be considered. The indexer derives index labels using &#8220;literary warrant&#8221; from the terminology used in the website itself and adjusts the labels as necessary for whatever reason.</p>
<p>Alternately, in an open system, the indexer selects terms from a previously created list of terms that exists separately from the text itself. These term lists may be authority files, simple lists of approved terms, or thesauri, which show relationships between terms (related terms, broader terms or narrower terms) that help the indexer select the most appropriate term to describe the specific text being analyzed. Open system indexing is used in cases where it is necessary to ensure consistency among multiple, related sites or to control vocabulary in a single large, complex site with multiple authors.</p>
<p><span class="subhead">Who should create site indexes? </span><br />
Whenever possible, a professional indexer should be hired. Such individuals are thoroughly experienced in analyzing content, accounting for user terminology and in creating an appropriate index structure. </p>
<p>The American Society of Indexers has an indexer locater on its <a href="http://www.asindexing.org/locator/start.cfm">website</a> through which you can find indexers with experience in indexing web/HTML documents.</p>
<p>Corporate librarians often have training or experience in indexing and can also be important resources in identifying individuals with indexing skills.</p>
<p><span class="subhead">Index maintenance</span><br />
Once you have created a fabulous site index and have tested it to ensure that all its links work properly, you need to have an index maintenance policy in place. You will need to consider such things as: How often does the index get updated? Who decides when newly created information gets included. When does ROT (redundant, outdated or trivial information) get removed? Who is responsible for updating the index?</p>
<p>Keeping this important information access tool up to date will help ensure that your site&#8217;s users continue to find what they need when they need it.</p>
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td background="http://www.boxesandarrows.com/images/hr_3dotline.gif"><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/space.gif" width="1" height="1"></td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="10" bgcolor="#F2F2F2">
<tr>
<td><span class="moreinfohead">For more information:</span>
<div class="moreinfo">
<ul>
<li>The American Society of Indexers maintains a page on its website listing indexing courses and workshops: <a href="http://www.asindexing.org/site/courses.shtml">http://www.asindexing.org/site/courses.shtml</a></li>
<li>Anderson, James D. Guidelines for Indexes and Related Information Retrieval Devices (NISO Technical Report 2, NISO-TR02-1997. Bethesda, Maryland: NISO Press, 1997.</li>
<li>The Chicago Manual of Style. 14th ed. Chicago: The University of Chicago Press, 1993</li>
<li>Mulvaney, Nancy. Indexing Books. Chicago: The University of Chicago Press, 1994</li>
<li>Wellisch, Hans H. Indexing from A to Z. Bronx, New York: H. W. Wilson Company, 1991</li>
<li>American Society of Indexers: <a href="http://www.asindexing.org">http://www.asindexing.org</a>
<li>Australian Society of Indexers: <a href="http://www.aussi.org">http://www.aussi.org</a></li>
<li>CINDEX indexing software: <a href="http://www.indexres.com/">http://www.indexres.com</a></li>
<li>MACREX indexing software: <a href="http://www.macrex.com">http://www.macrex.com</a></li>
<li>Sky indexing software: <a href="http://www.sky-software.com">http://www.sky-software.com</a></li>
</ul>
</div>
</td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td background="http://www.boxesandarrows.com/images/hr_3dotline.gif"><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/space.gif" width="1" height="1"></td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td background="http://www.boxesandarrows.com/images/hr_3dotline.gif"><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/space.gif" width="1" height="1"></td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td background="http://www.boxesandarrows.com/images/hr_3dotline.gif"><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/space.gif" width="1" height="1"></td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="10" bgcolor="#F2F2F2">
<tr>
<td><span class="bio"><a href="http://www.boxesandarrows.com/people/archives/fred_leise.php">Fred Leise,</a> president of <a href="http://www.contextualanalysis.com">ContextualAnalysis, LLC,</a> is an information architecture consultant providing services in the areas of content analysis and organization, user experience design, taxonomy and thesaurus creation, and website and back-of-book indexing.</span></td>
</tr>
</table>
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr>
<td background="http://www.boxesandarrows.com/images/hr_3dotline.gif"><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/space.gif" width="1" height="1"></td>
</tr>
</table>
<p><img src="http://www-boxesandarrows-com.zippykid.netdna-cdn.com/files/banda/art_end.gif" alt="" title="" width="8" height="8" /></p>
]]></content:encoded>
			<wfw:commentRss>http://boxesandarrows.com/improving-usability-with-a-website-index/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
