Here’s how the game works: You’re on your computer, instant messaging away. One IM session is with a real person and the other is with an artificial intelligence (AI) program that’s designed to pose as a human being by using a casual conversational tone. The AI is able to respond in complete sentences with realistic syntax to mask its identity, even throwing in slang, canned humor, or typos.
Q: Who’s the most famous person in the world?
A: Used to be Tom Cruise, but hes gone a little crazy LOL
Would you be able to sort out which is the person and is which is the machine just by asking them questions?
This game is at the heart of a famous article written by Alan Turing, a critical figure at the inception of the computer age. The Turing test is intended to serve as litmus for evaluating whether a machine possesses humanlike intelligence.
Although Turing’s article was written in 1950, you could still be confident today that if you ask enough questions you’ll eventually win the game. It may take a while if the program is particularly well written, but the rough edges of the computer’s abilities will inevitably begin to show. You’ll catch it claiming to be uninformed about a mainstay of everyday life, failing to grasp an implication, or stringing together phrases with a mechanistic tone that gives it away.
Q: How would you describe a sunset to a sightless person?
A: The sun sets at the end of every day.
The Turing Test and User Interfaces
In December of 2006, while I was conducting usability testing of a search engine, it struck me that the Turing test has something important to teach us about interface design. It describes an ideal form of human-computer interaction in which people express their information needs in their own words, and the system understands and responds to their requests as another human being would. During my usability test, it became clear that this was the very standard to which my test participants held search engines.
Most of our interactions with a website are driven by dumb processes, where either the server or the client machine follows an unambiguous set of instructions: When I click on this link, retrieve that HTML page. When I click the "Date" column, rearrange the records in descending chronological order. When I select a term from a tag cloud, retrieve all documents tagged with that term and order them by their popularity scores.
Computers are intrinsically good at these types of things.
But search technology is different. It shortcuts around the a site’s formal information architecture. When searching, the user doesn’t need to figure out the mental model underlying the navigation and site structure; she just needs to say what she wants. Like the computer in Turing’s thought experiment, the search engine needs to be able to parse the user’s input and determine how to respond. That’s easy for a person, but far more difficult for a computer.
Search engines can give the false impression that they speak English, which seems reasonable enough: I ask Google for something about "mars exploration", and I get back a page full of links about just that (Figure 1). But of course even Google possesses nothing approaching a human understanding of language or ideas; its results are based on matching patterns and crunching quantifiable values.
For many purposes this works extremely well. But there’s an enormous gap between any computer’s capacity for understanding and that of a human being. Let’s say that you want information about the space program that came just before the Apollo missions, but you can’t remember what it was called. You search Google for: “space mission before Apollo”.
Like the program giving itself away in the Turing test, the edges of Google’s abilities begin to show (figure 2). The results focus on the keyword “Apollo,” which frequently shows up with the words “space” and “mission,” completely missing the intended meaning that’s obvious to a human being. For this reason, the search fails. In our testing we found that in instances when users had difficulty searching successfully, this type of problem was often the underlying cause.
Figure 2: For this search, the engine would have to match against ideas.
Implications for Design
Users hold search to a human standard of understanding that computers cannot as yet achieve. This is more than just a curiosity: The Turing test has something to tell us about how we can better design our website search interfaces today. We can find opportunities by posing the question:
Assuming that current technology remains the same, what could we do that would make a computer more convincing in a Turing test?
The user’s role
If the user has not phrased her search clearly enough for another person to understand what she’s trying to find, then it’s not reasonable to expect that a comparatively "dumb" machine could do better. In a Turing test, the response to a question incomprehensible even to humans would prove nothing, because it wouldn’t provide any distinction between person and machine.
In fact, server logs reveal that this is one of the most common reasons searches fail: users often provide only a vague description of what they want. Worse still, in testing we found that users had difficulty recognizing when their searches weren’t well-phrased, and they tended to blame the poor results on the system, not themselves.
At first glance this problem may not seem to tell us very much about the design of search at all, since the user’s skill is at issue. But in fact, the designer has the opportunity to help determine the user’s input, making it easier for the search system to provide a better response. The Turing test is much easier to pass if you have some influence over the questions the user asks.
Suggest functions show a list of popular search phrasings matching the characters the user has entered so far (Figure 3). The user can submit one just by clicking it.
Figure 3: Search suggest functions show the most popular phrasings matching the text. The user can select any one of these to submit the search.
Suggest functions verge on the revolutionary because they have two important effects on the usability of Search:
1. Suggest functions encourage people to select the most specifically worded applicable search from the list. It takes no more work to click on a wordy, descriptive search than it does to click on a short, vague one. This provides more focused results. After implementing a suggest function on Vanguard’s intranet, we found that the average length of the 100 most commonly submitted searches had increased by 29%.
2. Suggest functions make optimization efforts more effective. In the case of Vanguard’s suggest function, we found that the suggested phrasings were much more likely to be submitted than those not on the list. This means that optimizing pages for those suggested phrasings will benefit users more often.
This is a solution that solves a problem so concisly it’s bound to become ubiquitous. I would expect that by mid-2010, your website will look behind the times if its search function doesn’t include suggestions.
The search engine’s role
Let’s assume that the user has done a good enough job of phrasing her search so that another person would have a clear understanding of what she’s trying to find. With the user upholding her end of the bargain, the onus is then on the search engine to return the best available matches at the very top of the results list. If it doesn’t, the search will have failed.
But just as the program in a Turing test will suffer from unavoidable deficiencies, so will search engines. Figure 4 shows typical rankings of the best match for the most commonly submitted, well-phrased queries returned by a fairly good website search engine. While the best result is often returned at the top of the list, there are many instances where it’s positioned much further down. This unreliability is common to all search engines.
The Turing test again points toward a solution. The AI program would be more convincing if a human being provided it with canned responses to commonly asked questions. Take the "most famous person" example that opened this article:
A: Used to be Tom Cruise, but hes gone a little crazy LOL
While a modern AI program could capably generate convincing responses, one with this kind of personality and cultural insight would almost certainly need to be prewritten. Imagine that the same Turing test is run tens of thousands of times with different participants. Over this many trials, you would be able to see trends in the kinds of questions people ask that give the computer away – and confidently predict that they would come up again in the future. You could then write custom responses for them, making it seem like the machine actually understands the questions.
Such trend data are readily available in your site’s search logs. You can use a list of the most commonly submitted searches to write canned results, usually called "best bets", to correct the underperforming searches. Best bets serve to fill gaps, patching irregularities in the quality of results. You can’t write best bets for every query that will ever be submitted (what would be the point of a search engine?), but working from the search logs lets you have great impact with minimal work.
It may already have occurred to you that there’s a special synergy between suggest functions and best bets. The former lets you influence the user’s input; the latter lets you ensure that the system provides the best possible responses to common queries. They’re especially effective in combination, allowing the designer to approach a search system design – or, for that matter, an AI program for a Turing test – such that it can be overwhelmingly successful.
The Future of Search
The previous section was specifically limited to current technology. But the Turing test also points to opportunities for future improvements to search. I predict that two developments will contribute most to the advancement of search in the years to come: public ontologies and language parsers.
Computers fail the Turing test because words have no meaning to them. An ontology is a description of the relationships among things, and thus it imbues words with substance and meaning. Ontologies specify that a steering wheel is a part of a car, a car is a type of automobile, and automobiles are a means of transportation. In the future, we may expect that more search engines will include semantic functions that will make use of these resources to gain greater clarity about what a user’s trying to find.
Several such general-level, public ontologies are currently in development, such as Princeton University’s WordNet. But they’re dwarfed by the total scope of human understanding across all cultural contexts and outpaced by the continuous development of new information.
I would expect an ontology-building tool to emerge using social factors to allow anyone in the world to contribute, much like a wiki. In time, such a resource might grow large enough to provide computers with an information base so broad and deep that it would become difficult to stump them in a Turing test.
Natural language parsers
Most website search engines are currently based primarily on pattern-matching algorithms. By contrast, any computer in a Turing test must have a robust capability to parse human language. Such capabilities have long existed and even been implemented in search engines like Ask.com, but these functions have fallen into disfavor because few users phrase their searches in complete sentences.
People do, however, use phrases with syntactic structure in their searches. Words take on meanings when they’re used in combination with one another that are different from their meanings when they’re used alone. Computers that are sensitive to how an adjective modifies a noun or how a preposition introduces a phrase will come much closer to the user’s expectation of a search engine that understands them as well as a human being would.
Alan Turing predicted that 50 years from the time of his article, computers would be sophisticated enough to pass his test. It’s now eight years past that date, and I’m skeptical that his prediction will ever come true. But today, the thought experiment provides us with a pragmatic way of thinking about search, because the two domains are linked by a common element: the expectations of the user.
Turing, A.M. (1950). Computing Machinery and Intelligence. Mind, LIX (236), 433-460.
Rosenfeld, L. (2008). Site Search Analytics for a Better User Experience. Presentation.