Here’s how the game works: You’re on your computer, instant messaging away. One IM session is with a real person and the other is with an artificial intelligence (AI) program that’s designed to pose as a human being by using a casual conversational tone. The AI is able to respond in complete sentences with realistic syntax to mask its identity, even throwing in slang, canned humor, or typos.
Q: Who’s the most famous person in the world?
A: Used to be Tom Cruise, but hes gone a little crazy LOL 😉
Would you be able to sort out which is the person and is which is the machine just by asking them questions?
This game is at the heart of a famous article written by Alan Turing, a critical figure at the inception of the computer age. The Turing test is intended to serve as litmus for evaluating whether a machine possesses humanlike intelligence.
Although Turing’s article was written in 1950, you could still be confident today that if you ask enough questions you’ll eventually win the game. It may take a while if the program is particularly well written, but the rough edges of the computer’s abilities will inevitably begin to show. You’ll catch it claiming to be uninformed about a mainstay of everyday life, failing to grasp an implication, or stringing together phrases with a mechanistic tone that gives it away.
Q: How would you describe a sunset to a sightless person?
A: The sun sets at the end of every day.
Gotcha.
The Turing Test and User Interfaces
In December of 2006, while I was conducting usability testing of a search engine, it struck me that the Turing test has something important to teach us about interface design. It describes an ideal form of human-computer interaction in which people express their information needs in their own words, and the system understands and responds to their requests as another human being would. During my usability test, it became clear that this was the very standard to which my test participants held search engines.
Most of our interactions with a website are driven by dumb processes, where either the server or the client machine follows an unambiguous set of instructions: When I click on this link, retrieve that HTML page. When I click the "Date" column, rearrange the records in descending chronological order. When I select a term from a tag cloud, retrieve all documents tagged with that term and order them by their popularity scores.
Computers are intrinsically good at these types of things.
But search technology is different. It shortcuts around the a site’s formal information architecture. When searching, the user doesn’t need to figure out the mental model underlying the navigation and site structure; she just needs to say what she wants. Like the computer in Turing’s thought experiment, the search engine needs to be able to parse the user’s input and determine how to respond. That’s easy for a person, but far more difficult for a computer.
Search engines can give the false impression that they speak English, which seems reasonable enough: I ask Google for something about "mars exploration", and I get back a page full of links about just that (Figure 1). But of course even Google possesses nothing approaching a human understanding of language or ideas; its results are based on matching patterns and crunching quantifiable values.
For many purposes this works extremely well. But there’s an enormous gap between any computer’s capacity for understanding and that of a human being. Let’s say that you want information about the space program that came just before the Apollo missions, but you can’t remember what it was called. You search Google for: “space mission before Apollo”.
Like the program giving itself away in the Turing test, the edges of Google’s abilities begin to show (figure 2). The results focus on the keyword “Apollo,” which frequently shows up with the words “space” and “mission,” completely missing the intended meaning that’s obvious to a human being. For this reason, the search fails. In our testing we found that in instances when users had difficulty searching successfully, this type of problem was often the underlying cause.
Figure 2: For this search, the engine would have to match against ideas.
Implications for Design
Users hold search to a human standard of understanding that computers cannot as yet achieve. This is more than just a curiosity: The Turing test has something to tell us about how we can better design our website search interfaces today. We can find opportunities by posing the question:
Assuming that current technology remains the same, what could we do that would make a computer more convincing in a Turing test?
The user’s role
If the user has not phrased her search clearly enough for another person to understand what she’s trying to find, then it’s not reasonable to expect that a comparatively "dumb" machine could do better. In a Turing test, the response to a question incomprehensible even to humans would prove nothing, because it wouldn’t provide any distinction between person and machine.
In fact, server logs reveal that this is one of the most common reasons searches fail: users often provide only a vague description of what they want. Worse still, in testing we found that users had difficulty recognizing when their searches weren’t well-phrased, and they tended to blame the poor results on the system, not themselves.
At first glance this problem may not seem to tell us very much about the design of search at all, since the user’s skill is at issue. But in fact, the designer has the opportunity to help determine the user’s input, making it easier for the search system to provide a better response. The Turing test is much easier to pass if you have some influence over the questions the user asks.
Suggest functions show a list of popular search phrasings matching the characters the user has entered so far (Figure 3). The user can submit one just by clicking it.
Figure 3: Search suggest functions show the most popular phrasings matching the text. The user can select any one of these to submit the search.
Suggest functions verge on the revolutionary because they have two important effects on the usability of Search:
1. Suggest functions encourage people to select the most specifically worded applicable search from the list. It takes no more work to click on a wordy, descriptive search than it does to click on a short, vague one. This provides more focused results. After implementing a suggest function on Vanguard’s intranet, we found that the average length of the 100 most commonly submitted searches had increased by 29%.
2. Suggest functions make optimization efforts more effective. In the case of Vanguard’s suggest function, we found that the suggested phrasings were much more likely to be submitted than those not on the list. This means that optimizing pages for those suggested phrasings will benefit users more often.
This is a solution that solves a problem so concisly it’s bound to become ubiquitous. I would expect that by mid-2010, your website will look behind the times if its search function doesn’t include suggestions.
The search engine’s role
Let’s assume that the user has done a good enough job of phrasing her search so that another person would have a clear understanding of what she’s trying to find. With the user upholding her end of the bargain, the onus is then on the search engine to return the best available matches at the very top of the results list. If it doesn’t, the search will have failed.
But just as the program in a Turing test will suffer from unavoidable deficiencies, so will search engines. Figure 4 shows typical rankings of the best match for the most commonly submitted, well-phrased queries returned by a fairly good website search engine. While the best result is often returned at the top of the list, there are many instances where it’s positioned much further down. This unreliability is common to all search engines.
Figure 4
The Turing test again points toward a solution. The AI program would be more convincing if a human being provided it with canned responses to commonly asked questions. Take the "most famous person" example that opened this article:
A: Used to be Tom Cruise, but hes gone a little crazy LOL 😉
While a modern AI program could capably generate convincing responses, one with this kind of personality and cultural insight would almost certainly need to be prewritten. Imagine that the same Turing test is run tens of thousands of times with different participants. Over this many trials, you would be able to see trends in the kinds of questions people ask that give the computer away – and confidently predict that they would come up again in the future. You could then write custom responses for them, making it seem like the machine actually understands the questions.
Such trend data are readily available in your site’s search logs. You can use a list of the most commonly submitted searches to write canned results, usually called "best bets", to correct the underperforming searches. Best bets serve to fill gaps, patching irregularities in the quality of results. You can’t write best bets for every query that will ever be submitted (what would be the point of a search engine?), but working from the search logs lets you have great impact with minimal work.
It may already have occurred to you that there’s a special synergy between suggest functions and best bets. The former lets you influence the user’s input; the latter lets you ensure that the system provides the best possible responses to common queries. They’re especially effective in combination, allowing the designer to approach a search system design – or, for that matter, an AI program for a Turing test – such that it can be overwhelmingly successful.
The Future of Search
The previous section was specifically limited to current technology. But the Turing test also points to opportunities for future improvements to search. I predict that two developments will contribute most to the advancement of search in the years to come: public ontologies and language parsers.
Public ontologies
Computers fail the Turing test because words have no meaning to them. An ontology is a description of the relationships among things, and thus it imbues words with substance and meaning. Ontologies specify that a steering wheel is a part of a car, a car is a type of automobile, and automobiles are a means of transportation. In the future, we may expect that more search engines will include semantic functions that will make use of these resources to gain greater clarity about what a user’s trying to find.
Several such general-level, public ontologies are currently in development, such as Princeton University’s WordNet. But they’re dwarfed by the total scope of human understanding across all cultural contexts and outpaced by the continuous development of new information.
I would expect an ontology-building tool to emerge using social factors to allow anyone in the world to contribute, much like a wiki. In time, such a resource might grow large enough to provide computers with an information base so broad and deep that it would become difficult to stump them in a Turing test.
Natural language parsers
Most website search engines are currently based primarily on pattern-matching algorithms. By contrast, any computer in a Turing test must have a robust capability to parse human language. Such capabilities have long existed and even been implemented in search engines like Ask.com, but these functions have fallen into disfavor because few users phrase their searches in complete sentences.
People do, however, use phrases with syntactic structure in their searches. Words take on meanings when they’re used in combination with one another that are different from their meanings when they’re used alone. Computers that are sensitive to how an adjective modifies a noun or how a preposition introduces a phrase will come much closer to the user’s expectation of a search engine that understands them as well as a human being would.
Conclusion
Alan Turing predicted that 50 years from the time of his article, computers would be sophisticated enough to pass his test. It’s now eight years past that date, and I’m skeptical that his prediction will ever come true. But today, the thought experiment provides us with a pragmatic way of thinking about search, because the two domains are linked by a common element: the expectations of the user.
References
Turing, A.M. (1950). Computing Machinery and Intelligence. Mind, LIX (236), 433-460.
Rosenfeld, L. (2008). Site Search Analytics for a Better User Experience. Presentation.
I love the unique perspective of applying the Turing Test to UI design. The end results are dead-on — easier, more intuitive, more “human” — critical to an ever evolving and improving user experience. I am recommending this article to all of my readers this weekend in my Weekend Reading…
http://tpgblog.com/2008/08/29/the-product-guys-weekend-reading-august-29-2008/
Keep up the good work.
Jeremy Horn
The Product Guy
http://tpgblog.com
I would expect an ontology-building tool to emerge using social factors to allow anyone in the world to contribute, much like a wiki. In time, such a resource might grow large enough to provide computers with an information base so broad and deep that it would become difficult to stump them in a Turing test.
Which makes me really tempted to create a public ontology where we use Turing tests to identify gaps in the knowledge. I wonder if there’s already an open-source Turing-test discussion engine in php floating out on the net…
ugh, too many ideas, not enough time.
Anne,
I’ve been thinking along similar lines. I think there could be various word games that serve as a frontend for a public ontology database. Luis Von Ahn’s “Verbosity” certainly fits the bill (play it at http://www.gwap.org), as could a Family Feud-style game (“Name something you keep in a medicine cabinet”).
Thanks for the comments,
John
Hi John,
You’ve hit it on the head with Ontologies, almost!
Have you read ‘On Intelligence’ by Jeff Hawkins of Palm fame? (http://en.wikipedia.org/wiki/On_Intelligence)
He outlines a Memory Prediction Framework as a way to develop intelligent machines. In the context of Hawkin’s framework, a computer should only need to learn a minimum set of relationships before it can begin predicting things (like search requests) on its own.
A publically generated list of Ontologies is simply a massive ‘cheat sheet’ and there is noting intelligent about that. If the computer were truly intelligent, we wouldn’t have to put in any (public) effort at all.
James
James,
Thanks so much for the comment, I’ve put Hawkins’ book on order through my library. From the Wikipedia article the theory sounds fascinating, though I have to admit that I’m skeptical the memory-prediction framework could provide a complete answer. Hawkins seems to be describing an intelligence that isn’t “artificial” in the conventional sense, but that emulates a theoretical model of how animal brains get the job done. That’s admirably ambitious, but do we have a sense of how far it can get us? By contrast, it’s reasonably clear that a semantic/ontological approach could get us very far indeed, and provide great raw material for any number of AI applications.
A really interesting discussion, I appreciate the post.
John
If there are search solutions for websites that understand human language why is it so hard to make a search engine act the same way?
Casey,
It is a confusing situation, but it all comes down to what we mean when we say “understand”. Yes, some search solutions are programmed to parse human language. They can recognize that a particular word is a transitive verb, and that the noun preceding it is its subject while the noun after it is its object. They can recognize tense, conjugation, number, and gender in words. But that’s very different from saying that they understand what you’re talking about. They don’t, and getting them to would be a much more difficult challenge.
Let’s say that I gave you a sentence in another language: “Blek felk floop”. I could then tell you that “felk” is a verb, “blek” is its subject, and “floop” is its object. From that, you could conclude that blek has felked floop, or is felking floop, or plans to felk floop at some point in the future. You could count how many times the word “floop” appears in a document. You have a basis for processing the sentence’s structure, yet you still have no idea what I’m actually talking about; you don’t understand it. It’s the same for a search engine.
But if I translated it into “Tigers eat apples”, all of a sudden you understand me. The word “apple” alone brings with it an image, a shape, a feeling in your hand, a taste and smell, colors, varieties, ciders, pies, strudels. You already know that it’s a fruit, it’s a living thing, it’s something you eat, that it contains seeds and is covered in skin. You can recognize that the sentence is untrue (tigers are carnivores, which don’t eat apples). You understand all of that, while a computer understands none of it.
So there’s a vast gulf between the ways human beings experience the world and talk about it, and the way computers do. But people like you and me nonetheless expect search engines to understand us as well as another human being would, and that makes the human standard the correct benchmark for assessing search quality.
Thanks so much for the great question, I hope I was able to clear it up.
John
Thank you for the explanation, your article is a very interesting take on search engines. The question is-is it possible to enable search engines to understand humans?
Casey,
I would say that true understanding is an impossibility, though philosophers and computer scientists may differ on that point. But I think that there’s a great deal of work that can be done to make search engines behave as though they do understand human language. And the closer we can get to that, the more satisfying the search experience will be.
John
I totally agree with you…I really enjoyed reading your article thank you!
John,
I also enjoyed your article. I am not an expert in this field but what you say seems to make absolute sense. Is the way forward then to get humans to understand search engines more?
A lot of people still haven’t grasped how metadata works. I work in the media and I recently overheard a conversation where someone made this point. The speaker bemoaned the fact that he couldn’t find a programme on a famous poet only to discover that the programme’s metadata didn’t include the poet’s name!
John,
Thank you for the article. Very thought provoking. I couldn’t help but think that a lot of design today is the actually flipping the Turing test on it’s head. Rather than trying to build a computer to sound like a human, we’re trying to encourage humans to sound like computers. Boolean search is a prime example of that. Suggest functions are a more sophisticated form of that.
Another thought as well: with any human interaction, we typically allow for give and take. This seems to go against expectations when dealing with computers. I might ask a human “so what can you tell me about X?” And my interlocutor would respond “I’m not sure what you mean?” To this I would try alternate forms of getting across my idea (using other words, explaining the context, etc.) With computers there seems to be a higher expectation of “give me an answer right away.”
Patrick & Anthony,
I want to be careful to say that people hold search engines to a human standard, but that’s ultimately unattainable. A function like search suggest eases the burden on people to formulate more effective queries, and best bets directly introduce a human hand into the results. Neither one is making the machine any smarter, just patching its deficiencies.
In some cases, I do think that it’s a good idea to also get people working in more machine-friendly ways. Jared Spool had a great presentation years ago comparing the handwriting recognition systems of the Apple Newton and the Palm Pilot. The Newton allowed people to write in their own longhand, and the machine did its best to try to make sense of it. It often got it wrong, but that’s entirely to be expected since it can often be difficult for one human being to understand another’s handwriting. How could we possibly expect a stupid machine to do better?
The Palm pilot, though, asked people to modify their writing just a little bit and follow a more machine-readable standard. The result was a much more reliable handwriting recognition. People can be asked to adapt a bit if it’s really going to result in major improvements in the quality of the results.
I would emphasize that both human beings and machines have their burdens in a search transaction. The person must submit a query that’s at least good enough that a reasonable person could understand what they’re trying to find. That could mean defining the scope of the query, or correctly saying that they want this AND that, this OR that, this but NOT that. It’s okay to require that the person express their needs precisely and logically. But if the user has fulfilled that minimum requirement, then the burden shifts to the system to provide a response that’s as good as what a reasonable person would provide. It’s then okay to hold the search engine to the human standard.
Hope that makes sense. I really appreciate your thoughtful comments.
John
Reading all these indeed gives us a great amount of enthusiasm to know the practical walkthro or the Turing’s ideas. quantifying the search and inturn making it fall under the relationship is really a great way to go about. Specifically when we are talking in the lines of ontology relationship, there is also a wide range of analysis of how the brain associates its thinking when we do the physical search in human body. I would like to suggest the reading of the book “Phantoms in the Brain” by Ramachandran, V. S. & S. Blakeslee (1998). who inturn discuss the behavioural pattern of human search. As the way goes, even all the AI based algorithms also follow similar patterns to bring for a focussed and greater pointed search.
i strongly believe in your article when you spoke about future of searches that includes the ontology and the natural language parsers. This is ofcourse been researched worldwide including me 🙂 to quantify the algorithmic search with the next level of human thinking which could one day match with user intelligence.
Thanks for such a great article…
Hi John,
I really liked your article. Another angle that I wanted to bring attention to (from UI perspective) is that of the “The problem of Interaction Modality”. And how it changes the expectations and makes the problem of user expectations worse. I used to work with Voice User Interfaces a lot and I have seen this very same problem arise there. If the system sounded very human like and had some functions of search, lookup or shortcuts, people just assumed that the search commands or any other interaction commands would be very well executed with high competency. The modality affected their expectations very much. The fact that a system spoke back to them with replies (to queries) made people assign intelligence to it and when the system did not perform as expected they felt let down which is similar to a system failing the Turing test. So i think the modality of interaction will have a huge impact on the user expectations as well which is another thing the designer should account for when designing an interactive system.