Long Tails and Short Queries - Boxes and Arrows

“People don’t understand their own information behaviors, and they don’t really understand much about search or the web, so they will have to learn. It could take generations.

Amanda Spink is one of the smartest people working on user behavior while using web search, yet when I mentioned her name to a friend who’s spent the last year working on the search user experience, he had never heard of her. The design community is woefully undereducated about search, and is often prone to redesigning Google and postulating what Yahoo! is doing wrong, rather than working to understand why search engines have chosen to do what they do. I suppose this shouldn’t be surprising, though, considering Spink’s work is more often seen in scholarly journals, such as New Directions in Human Information Behavior and Journal of the American Society for Information Science and Technology (brought to you by the same folks who bring you the IA Summit, yet rarely cracked by the working folks).

In order to help correct this problem, I shyly contacted my hero by email, and overcame the time difference between sunny California and even sunnier Brisbane, Australia, with a series of email questions.

Christina Wodtke: When I joined Yahoo!, I had never worked on search before. Your article, “From E-Sex to E-Commerce: How Search Changes” was one of the most valuable in beginning to get my mind around the problem of search. Since reading that article, however, I haven’t seen much change from your findings. In your opinion, are users changing their search behavior, or are they still following the same patterns you found when you analyzed Excite’s data?

Amanda Spink: You make a good observation. Since 1997, our findings have come from analyzing large-scale web user data gathered from commercial web companies, including Excite, Ask Jeeves, Alltheweb.com, Alta Vista, Vivisimo, and Dogpile. Our research since 1997 shows some trends and changing patterns in general searching. However, looking at more recent data from Vivisimo and Dogpile, most web queries are still short—2 to 3 terms, and sessions include little query modification and are generally 2-3 queries in length.

Few people use advanced search features, and many queries include spelling and other mistakes that adversely affect the search results. People look at only a few result pages—not beyond the first or second results pages.

A small number of terms are used with high frequency, and many terms are used once. Web queries are very diverse in topic and some, such as people’s names, are unique.

CW: You’re referring to the “long tail” of a Zipf curve? How does this affect search engine’s strategies in providing relevant results?

AS: The long tail makes “relevant” retrieval very difficult, especially if users are only providing the search engine with two to three words on which to base relevance judgments. Despite the inherent “interactive” nature of search, much of search is not very interactive. The talk about personalization is an attempt to obtain more information from the user to help determine relevance.

CW: So are users changing their overall behavior at all?

AS: We are seeing a growth in more complex search behaviors. More people are searching for information using more than one search. This might mean repeat searches of the same query over time or modifying the queries in successive searches over time. Many people are multitasking or searching for information on more than one topic during a search session. People’s information needs are often quite complex in their home and work environments.

CW: How are they handling that?

AS: During multitasking search they may include two or more topics in one window, or open new windows for each topic and run searches concurrently.

CW: Could you talk a bit more about “complex search behaviors”?

AS: Peoples’ information seeking behavior can often be long and complex. Imagine a person is looking for information on cars. He conducts one search on one search engine, looks at the results, and tries another search engine, or goes back to the first search engine, and repeats the same search with the same search terms, or he may add or remove some terms (query reformulation). This is called successive searching.

In addition, research shows that people often search for more than one topic during their interaction with a search engine. They may batch their topics due to time constraints or new topics may evolve during their search session. This is called multitasking search.

Both phenomena are examples of more complex behaviors beyond the one-topic, one-search paradigm that most search engines assume.

CW: If queries are still so short, what are some of the more successful disambiguation tools used by the search engine? Vivisimo and Clusty offer algorithmically generated groupings and present them as narrowing tools. But last time I was testing these tools with users, the narrowing options were essentially invisible to them. And despite Jakob Nielsen’s assurance (admittedly in 1999) that longer search boxes produce longer queries, I’ve never seen it happen. Are there better and worse ways to encourage richer queries from users?

AS: So far no commercial tool seems to be effective at helping users on a large scale. Search engines have not used longer boxes, so no one really knows what would happen on a large scale if text boxes were changed. The best way to encourage richer queries is to train users and expect them to put more effort into their search behavior. Search engines need to put more demands on the users. People don’t understand their own information behaviors, and they don’t really understand much about search or the web, so they will have to learn. It could take generations.

CW: Really? Many folks who revamp search, either by adding Google, Yahoo!, or another vendor, seem to be leaning toward long entry boxes. I’m thinking about CNN, NY Times.com, CNET.

AS: The Google and CNN text boxes may be a little bigger or longer than average, but not substantially longer. How about a structured textbox, like an electronic library catalog interface? How about a textbox that is 3 inches by 3 inches with lots of space for people to express themselves? If you give people a small text box, you’re probably constraining their expression of their information problem.

People need to feel they should play around with search and experiment. All they can really do at present is squeeze in a few words, press search, and look at a list of websites-the list giving little indication of what the websites mean or how they are ranked. One major problem is that [designers of] search engines tend to think that one technique will do it! What they need to do is test combinations of many techniques, such as clustering, relevance feedback, etc. There is no silver bullet here.

CW: When you speak of training users, I’ve found that very challenging, much more with search than any other Internet paradigm. I’ve been in a lab with a fellow who has used Google for five years, and he never realized “cached” was there until I pointed it out. How can train people who have “banner blindness” for most of the page?

AS: I think this is a major problem for the web industry. How to train billions of people? Whoever comes up with the best solution for that question may capture huge market share. The paradigm needs to change. Search is challenging and interactive, and maybe a “game” paradigm would help.

CW: The short-query phenomena is fascinating. In a lab, I have asked people why they typed, say, “sailboats,” and they’ll say, “Oh, I’m interested in taking classes next summer when we’re up at the lake in Michigan,” yet none of the words made it into their query. Any insights?

AS: Our research shows that the most effective search terms are those submitted by the user, from a user’s interaction with another person about their topic, and terms they identify on the screen from the retrieved output. Stimulating users to talk with someone or thing (agent) about their information problem helps generate terms and look at the results for additional terms.

CW: Hey, those sound like classic reference librarian techniques!

AS: One area that some web developers are exploring is classic reference librarian techniques. It’s an obvious area to explore to understand information behavior and how librarians have helped people with their information problems.

CW: “E-sex and E-commerce,” was referring to a topical shift in searches. Are you continuing to see changes in what people are searching for?

AS: I think it’s important not to assume what “people are searching for” means just U.S.-based search engines. There are major differences emerging in search in different global regions. For the more U.S.-based search engines, the topics seem to have stabilized somewhat with business and e-commerce related searches being the largest category, followed by people, places and things, computers, and medical/health. Sex/porn and entertainment is now a smaller proportion of searches. From what I’ve seen about the Chinese search engines (e.g., Baidu), users are looking for entertainment and gaming. One could say that the Chinese search engine users are where the U.S. users were 5-10 years ago. As more Chinese business information is accessible via Baidu, the search topics may change. Also, currently less than 10% of the Chinese population search the web, so as that number increases, topic may change.

CW: There are endless articles these days about search privacy, and Google giving information to the feds. Is the ordinary person on the street worried about that?

AS: This is an important area for everyone. If search engines and the web are becoming the primary tool by which people are expected to access information, then privacy and the practices of the government in regulation or companies is crucial. Much like the way we see telephones and TV in the past, as involving privacy, commercial, and government interests. Also, because search is now ubiquitous, politicians will seek to gain political advantage or grounds for industry regulation. Ordinary people should be concerned about how political and commercial information policies will affect their access the web.

CW: And you are studying the evolution of human information behavior. How far back are you going? Medieval libraries? Cavemen looking for the right painting?

AS: Obviously humans evolved information behaviors before preliterate societies through cave art, etc. Information behaviors evolved to help humans complete and cooperate, as the technologies evolved from cave art to the web. In fact, people may not change their information behaviors, but may have evolved over time to utilize a greater capacity for more complex information behaviors.

The Spink and Currier paper talks about the information behaviors of Darwin, Napoleon, and Casanova-all very effective people at finding and using information. And what we write about Casanova many people have found fascinating!

CW: Now you are being cruel! I’m going to have to renew my library card. Can you predict trends in behavior from your research? What’s next on the horizon?

AS: What’s next on the horizon is developing an understanding of how human information behaviors have evolved over human history, how they evolve over a person’s lifetime, how their search interactions develop over time, and how search in the aggregate is evolving over time. In other words, we need more longitudinal studies.

CW: Any bits of advice to practitioners about to attack the search tools on their sites? Lessons from web search?

AS: A key problem for practitioners is the lack of computer people trained in information and web retrieval, web design, and web usability. There is a lack of good trained people and not many industry consultants who really understand search. Search is much harder than most people think, and the design of effective search tools is even harder. Practitioners need to really test any search engines they consider buying. Many companies claim that their search engines are effective and the best, but provide little real evidence for their claims.

Be careful of the search engine that promises effectiveness and superiority based on a “single” feature, e.g., linking or clustering. There is no silver bullet feature. We don’t yet have Search engines that have adopted a more holistic attitude based on a real understanding of search, people’s information behavior and what is really effective. Whoever takes that path effectively will gain competitive advantage in the marketplace.

For More Information

Spink, A., & Currier, J. (2006). Emerging evolutionary framework for human information behavior. In: A. Spink & C. B. Cole (Eds.), New Directions in Human Information Behavior. Berlin: Springer (pp. 13-31).

Spink, A., & Cole, C. B. (2006). Human information behavior: Integrating diverse approaches and information use. Journal of the American Society for Information Science and Technology, 57(1) 25-35.

Spink, A., & Currier, J. (2006). Toward an evolutionary perspective on human information behavior: An exploratory study. Journal of Documentation, 62(2), 171-193.

Nice.

Having researched search in some depth over the past year and a half I have to add that search, and the experience of performing a search, is so very dependant on how the content owner approaches the challenges of:

a) Organising their content (their collection, or collections)

b) Indexing their collection

c) Presenting search results

d) Adding value to the entire experience in regards to pre- and post-search functionality and usability

Fundamentally these elements contribute to how effective a site search is, as opposed to the wider public search engines effectiveness.

A simple example of c) and d) is “pre-canned” results. Set by a site admin team these can add incredible weight to how users find the right result first time. This in itself points to just how all the primary skills of IA lend themselves to a better site search tool.

These pre-set results are triggered by query patterns and can cover misspellings, adapt to varying approaches to locating similar answers and also provide a gentle nudge to user so that they can build a better picture of what a site structure really does contain.

Additionally, if there was any one overriding lesson I’ve learnt it’s that search functionality needs constant (and consistent) review in order to remain a great tool for any site. Constantly reviewing search logs and relating them to media or national events, marketing initiatives or site changes must be folded in to regular cycles of content reviews and usage tracking.

Skipping this essential work means that not only do you miss identifying badly returned result sets, ill judged ranking weights and (frankly) badly indexed content – but you also miss a chance to understand and adapt to the user experience.

Cheers

Brian

5 comments

Anonymous says:

October 18, 2006 at 2:12 am

Nice.

Having researched search in some depth over the past year and a half I have to add that search, and the experience of performing a search, is so very dependant on how the content owner approaches the challenges of:

a) Organising their content (their collection, or collections)

b) Indexing their collection

c) Presenting search results

d) Adding value to the entire experience in regards to pre- and post-search functionality and usability

Fundamentally these elements contribute to how effective a site search is, as opposed to the wider public search engines effectiveness.

A simple example of c) and d) is “pre-canned” results. Set by a site admin team these can add incredible weight to how users find the right result first time. This in itself points to just how all the primary skills of IA lend themselves to a better site search tool.

These pre-set results are triggered by query patterns and can cover misspellings, adapt to varying approaches to locating similar answers and also provide a gentle nudge to user so that they can build a better picture of what a site structure really does contain.

Additionally, if there was any one overriding lesson I’ve learnt it’s that search functionality needs constant (and consistent) review in order to remain a great tool for any site. Constantly reviewing search logs and relating them to media or national events, marketing initiatives or site changes must be folded in to regular cycles of content reviews and usage tracking.

Skipping this essential work means that not only do you miss identifying badly returned result sets, ill judged ranking weights and (frankly) badly indexed content – but you also miss a chance to understand and adapt to the user experience.

Cheers

Brian
Anonymous says:

October 19, 2006 at 11:57 am

I’m no expert on searching, but I am somewhat of a developer (web site). I don’t think the problem is entirely on the search engine and the users. It certainly doesn’t help when web developers throw a bunch of random, non-related keywords into the site as to pop up on searches more frequently. To a search engine there’s really no way to differentiate between these sites because they’re based mainly on the descriptors and keywords given to them by the code on the site. So I think a lot, if not more of the responsibility falls on the developers not just the individual users and the search engines.

I do feel that refining searching is very important. We live in a world based on time and the more we can get done in the least amount of time the better we are. So, of course, when it comes to searching I’d love to see new techniques to filter out the obvious unrelated sites and become better at showing me “best matches”.

Good read, thank you.
Christina Wodtke says:

October 24, 2006 at 11:50 pm

You are in luck: Peter Norvig recently gave a talk on Search and AI. It’s extremely technical, but facinating, and Parc Forum has it archived. http://www.parc.xerox.com/events/forum/archive.php
Anonymous says:

October 31, 2006 at 11:12 am

In my experience ‘training users’ never works in a self service medium like the Web.

I suspect that the real problem is that most search interfaces don’t encourage multiple queries. We know that search is an iterative process, and in the real world, people do have long conversations with each other when searching. But they don’t do this online.

Part of the problem is search results: they are not presented as a dialogue. Users’ attention is focused on the result list and the cues to encourage users to modify the search are overwhelmed. So the ‘dialogue’ is limited to ‘Do you have x?’, ‘No.’, ‘Goodbye’.

I see very few interfaces that progressively reveal advanced features (instead users are given a choice of one text box or every concievable control). Progressive disclosure encourages dialogue.

As designers, I think it’s our job to understand how to present features and interactions so that users see their value. In a sense, this is ‘training’ – not through instruction, but through environment.
Anonymous says:

November 29, 2006 at 2:59 am

Perhaps one of my biggest quibbles with some of this is when IAs or designers focus on search as a stand alone solution or a closed-loop feature set for “finding information.” Search is but one key piece of a larger, findability strategy that users employ to meet any number of disparate needs. By diving into traditional IA/UX initiatives like content classification, search interfaces, feedback messaging, refinement/sorting tools, etc., I find that a large portion of what search really is can be completely missed.

I’d suggest that the following should be considered:

1) Determining where search fits in the set of archetypal users’ offline and online finding behaviors
2) Understanding the emotional, physical and cognitive contexts within which a user comes to a website to find information and how these factors may affect the perception of what search is and the expectation of what search will deliver
3) How search functionality is perceived as integrating with other on-site finding features like browsing or exploring functionality
4) How the website’s UX articulates or infers what sort of information can be found by using search

Insights from these threads of research should then inform a search model that can be tested and refined. Only then should the work of designing the UX and UI start.

Finally, one should also consider that information finding or search does not simply stop when a target is recognized among a list of distractors. Users must have an opportunity to “acquire” or “encode” the information for search to really be useful. As such, search results should empower users to be able to act on the information that they just spent some effort finding.

Comments are closed.

Share this: