Cues, The Golden Retriever

Posted by

In every waking moment, our brains are processing the stimuli in our environment and responding, consciously and unconsciously, to what is going on around us. This may mean something simple like stopping automatically at a crosswalk based on the color of the traffic signal. Or it may mean something more deliberate, like deciding to turn left after orienting yourself by reading a street sign.

Both consciously and unconsciously, we also make decisions while interacting in an onscreen environment. We move automatically during routine tasks and through familiar interfaces. But what do we do when the interaction onscreen requires a very deliberate and thoughtful interaction—how do we determine the correct response to the stimulus? We need cues to help us draw from our experience and carry out an acceptable response. Cues are like little cognitive helper elves who prompt us toward a suitable interaction, reminding us of what goes where, when, and how. Cues can be singular reminders, like a string tied around your finger, or they can be contextual reminders, like remembering that you also need carrots when you are shopping for potatoes and onions in a supermarket.

When we’re arranging content and designing interactions for the onscreen environment, providing cues for users helps them interact more effectively and productively. Increased customer satisfaction, job performance, e-commerce, safety, and cognitive efficacy rely on deliberate interaction with the technology and thus easily benefit from the smart use of cues.

I’d like to frame a discussion of cues by touching on a mixture of topics including memory, a few theories from cognitive psychology, and multimedia research. It may get a little dry, but stick with me. The integration of these three areas not only affects how information is encoded and retrieved, it influences how and when cues might best be used.

Remembering Memory

Let’s refresh your memory on the topic of memory—stuff you probably already know. This is the foundation of how and why cuing is effective.

First, there’s the idea of encoding and retrieval (or recall). Encoding is converting information into a form usable in memory. And we tend to encode only as much information as we need to know. This is a safety valve for over-stimulation of the senses as well as a way of filtering out what we don’t need for later retrieval. Retrieval is bringing to mind for specific use a piece of information already encoded and stored in memory.

Memory is generally labeled long-term memory and short-term memory (or working memory, in cognitive psychology parlance). Our working memory holds a small amount of information for about 20 seconds for the purpose of manipulation—deciding what to do with sensory input from one’s environment or with an item of information recently retrieved from long-term memory. The familiar rule is that humans have the capacity to hold seven items (plus or minus two) in working memory. In contrast, long-term memory is considered limitless and information is stored there indefinitely. Information from working memory has the potential to become stored in long-term memory.

The Integration of Multimedia and Memory

Ingredient 1
By its nature, interaction in an onscreen environment can be considered multimedia. At the very least, visual elements (images, application windows, the cursor, etc.) are combined with verbal elements (semiotics, language, aural narration, etc). These are called modalities and they are processed differently in the human mind using different neurological channels: this process is called dual coding and it’s when images and words create separate representations for themselves in the brain[3]. This is important because cues unique to a given modality can be used to better retrieve information originally processed with that modality. For example, color coding the shapes of the states on a map as red or blue helps us store for later recall the political leanings of a given state—the shape of the state triggers our remembering the color.

In a “real world” environment, stimuli from the visual and verbal modalities (among others) guide the way we interact with that environment—influencing our working memory and long-term memory. These stimuli can get to be a lot of work for the little grey cells and it helps when the two modalities share the load—the cognitive load—of processing information. The same is true for the onscreen environment as well.

Ingredient 2
Cognitive load[1] describes the tasks imposed on working memory by information or stimuli from the environment, in our case the onscreen environment. How much information can be retained in working memory—how much can we encode before our working memory is full and new information has no place to go? And if it escapes working memory, chances are slim that the information will make it into long-term memory.

So what happens when a modality is limited by cognitive load? In short, the working memory gets full fast. Encoding, cuing, and retrieval are affected. The interaction onscreen impacts the encoding necessary for later recall, particularly when different modalities are vying for attention. A limited working memory makes it difficult to absorb multiple modes of information simultaneously[2].

But if the modalities compliment one another, more information can be processed when they work in tandem than would be possible using a single modality. A large body of research exploring the use of multimedia and computers yields a couple of useful general guidelines:

  1. When presenting information onscreen, text and visuals are not as effective as seeing visuals and hearing narration.
  2. If text is the chosen way to convey verbal information, it should be in close proximity to the visual element it is related to (like labels on a map).

A big no-no is narration which is redundant to the text visible onscreen. This is a bad practice because the brain works too hard mediating continuity between the two cognitive channels; the reader is distracted from the content because of the mechanics of constant comparison of text and voice. It actually detracts from successful encoding. Naturally, if the encoding is faulty any use of cues used for later recall of that information is compromised.


Okay, now let’s look at cuing a bit more closely. The idea of cues and cuing is a theory more formally known as encoding specificity by its pioneer Endel Tulving. Memories are retrieved from long-term memory by means of retrieval cues; a large number of memories are stored in the brain and are not currently active, but a key word or visual element might instantly call up a specific memory from storage. In addition, the most effective retrieval cues are those stimuli stored concurrently with the memory of the experience itself[5]. (This implies that most cues are external to the individual and we’ll accept this characteristic for the sake of this discussion.) Citing a popular example, the words “amusement park” might not serve to retrieve your memory of a trip to Disneyland because during your visit you didn’t specifically think of it as an amusement park. You simply thought of it as “Disneyland.” So the word “Disneyland” is the cue that retrieves the appropriate gleeful memory from all the other memories warehoused in your brain.

It’s important to note two chief categories of cues—discrete or contextual. In other words, it may be that a user is being asked to respond directly to an onscreen prompt, or she may be interacting with the technology in a certain way because of the elements present in her onscreen setting. Most of us are probably familiar with the Visio interface and can recognize it instantly. When we’re working in it, we automatically use its features without thinking about the act of using its features. When concentrating on a project, we grab an item from a stencil, move it onto the workspace, size it, label it, etc. We don’t use Visio to try to re-sample a photograph’s resolution or check a hospital patient’s vital signs—we “remember” that Visio is capable of certain functionality because of the cues surrounding us in the Visio environment. This is an example of contextual cuing.

Reminiscing about Disneyland is one thing, but some tasks and interactions require more cognitive load to complete and the cues should be employed appropriately. For example, onscreen controls for a large piece of machinery, one which is dangerous when used incorrectly, require an operator’s focused attention. Cues provided in such an onscreen environment need to be deliberate and explicit. For example, a large red stop sign icon appears onscreen to warn the operator that he has forgotten a safety procedure.

External cues such as work environment, physical position, or teaming around a table may also affect interaction onscreen. If we anticipate the physical environment in our designs, we can control the cues onscreen to accommodate the users in that environment. In our large machinery example, perhaps onscreen cues are related to observing its movement or the sounds it makes. Or if crucial interaction needs to take place in a busy or noisy environment, like punching your numbers into an ATM, discrete and/or contextual cues which accommodate that external environment appear onscreen.

Cues also need to be salient and germane—they need to have meaning and relevance appropriate for the situation, task, or environment. They need to fit into the schema[4] of the interaction. Schema can also be regarded as a semantic network[6], where information is held in networks of interlinking concepts; it’s linked to related things that have meaning to us. Tim Berners-Lee says “a piece of information is really defined only by what it’s related to, and how it’s related.” So naturally the cue that recalls such a piece of information will need to be related to it, too.

The use of meaningful cues is tied to how memory functions. Memory is bolstered when its meaning is more firmly established by linking it to related things. This is because it’s less work for the short-term memory to plug new information into an existing schema: if the new information is encoded relative to its context, the cue that retrieves the information should also be related to that context. A rather glib example might be memorizing several new varieties of wine using colored grape icons to represent different flavors. When recalling those wines, cues in the form of smiling farm animals would do no good in helping you select a wine that goes well with spaghetti.

Humans are fallible, though, and sometimes even the best thought-out cues may not be effective. For example, if the context or subject matter is unfamiliar, cues which rely on it will not be helpful. In fact, sometimes the context is so unfamiliar that cues are not recognized for what they are; if information is not recognized as relevant or meaningful, it will be disregarded. People are better at recalling information that fits into their own existing schemas. There’s a semantic network unique to each of us. Fortunately, Tulving (1974) assures us, “even when retrieval of a target event in the presence of the most ‘obvious’ cue fails, there may be other cues that will provide access to the stored information” (p. 75). One preventative measure against designing ineffective cues is a thorough usability study. Or we may provide cues that address more than one modality. Each situation is as unique as its context, so it’s not possible to make recommendations here; the issue of ineffective cues can arise and it is important for us to acknowledge the risk (and any potential fallout!).

One general prescription for the symptom of ineffective cues is to provide the cue immediately before the desired recall, either immediately preceding interaction or positioned near the recall artifact (e.g., password field or bank account number field). In other words, cues need to prime the information they are designed to help retrieve. Another strategic method of cuing is pattern completion—the ability to recall complete memory from partial cues. The simple act of grouping items may be a sufficient retrieval cue. It may even help establish a context or schema for the user, thus increasing the subsequent effectiveness of your cuing system.

Related form and function in the onscreen environment can also act as cues. Context dependent menus are a perfect example of this, like the grouping of drawing tools in Word. The four-sided icon represents the function for drawing boxes. The same icon indicates very different functions in other Word tool palettes (or in other applications)—the user doesn’t have to remember exactly what each of the four-sided icons does: their context is the cue for reminding the user of their function. An easy text-based example might be placing an arts festival event with an ambiguous title in the same column onscreen that lists similar events.

Jason Withrow’s B&A article Cognitive Psychology & IA: From Theory to Practice explores this idea in greater detail.

Another cuing strategy is one mentioned above in passing, the use of mixed-modality cues. This strategy draws on the advantages of splitting the cognitive load between two encoding systems.2 , 3 Cues for one modality can be presented in another modality if the original encoding matches that set-up (i.e., an image-text mix is the cue for recall of the same image-text mix). A perfect example is discussed in Ross Howard’s article on what he terms ambient signifiers. Audio is piped over the PA of a large transportation network. Each train station in a large city has a unique audio melody associated with it. As Howard points out, not only is the destination station’s audio a cue to get off the train, the commuters memorize the melody for the station prior to their destination, priming them for their actual destination. This is an interesting example because it also takes into account the environment in which the stimulus-response cue is introduced. With preoccupied or bored daily commuters crowding onto a train stopping at homogenous-looking stations, what cues might help them successfully get home? The computer game Myst used a similar technique by using sound cues to help the intrepid player solve puzzles.

But what happens when elements of the onscreen environment are really similar (or ubiquitous)? Our brains err toward efficiency: events and elements that are similar are generally encoded in terms of their common features rather than their distinctive characteristics. This is great for helping us fold new information into existing schemas and contexts. But it interferes when the IAs and designers need the user to distinguish between the similar events or elements. This situation is described in the interference theory, which states that the greater the similarity between two events, the greater the risk of interference. So it becomes a balancing act: maintain continuity across the interactive environment while at the same time establish a distinction between elements you want the user to retain. Something as simple as color-coding might be a means of distinguishing information onscreen. Position may be another. Think of a process being taught or conveyed on a training website, a process whose stages have big bold numbers respectively highlighted across the top or side of an interface. Not only does this help with chunking (breaking the information into digestible bits to avoid an unreasonable cognitive load), but when enacting the process later, like on a factory floor, it’s easier to visualize the numbers and remember the correct procedure.

Two notable phenomenon are related to using position onscreen as a cuing strategy. Primacy effect is the increased tendency to recall information committed to memory first and recency effect allows that items memorized last are also easier to recall. This may influence how the information is organized on a web page and how the cues might be used. (By the way, recency items fade sooner than do primacy items). One example might be a corporate intranet website with crucial information buried in a feature article. If you place that information in a single sentence synopsis at the top of the home page, you may plant the important points more permanently than forcing the readers to sift through the longer article. Any cues related to that information will likely be more effective.


Philosophy from 10,000 Feet Up

There’s a Chinese proverb that says “the palest ink is better than the sharpest memory.” I include this proverb because the palest ink serves as metaphor for how even the most understated of cues employed in an onscreen environment can be an effective recall or feedback strategy. And this strategy nurtures the perception that the computing technology is in concord with what is natural for the human user.

It’s been encouraging to watch the evolution of computing technology move away from forcing the human user to adapt to its form, function, architecture, and singularity. The continued momentum toward a more human-centered, ubiquitous interaction environment is encouraging. Humans are very dependent on the dynamics of stimulus-response cues in their natural environment; it’s important to establish a similar dynamic as we take part in designing interaction within their technological environment. The conscientious use of cues is not a panacea, of course. Because the use of cues onscreen mirrors the common stimulus-response paradigm which humans are used to in the natural world, however, it’s one of the more effective tools we can use when we design interactions.



fn1. Sweller, J., & Chandler, P. (1994). “Why some material is difficult to learn.” Cognition and Instruction 12(3): 185-233.

fn2. Mayer, R. E., & Moreno, R. (2003). “Nine Ways to Reduce Cognitive Load in Multimedia Learning.” Educational Psychologist 38(1): 43-52.

fn3. Paivio, A. (1986). “Dual coding theory.” Mental representations; a dual coding approach. New York, Oxford University Press: 53-83.

fn4. Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals and understanding; An inquiry into human knowledge structures. Hillsdale, NJ, Lawrence Erlbaum Associates.

fn5. Tulving, E. (1974). “Cue-dependent forgetting.” American Scientist 62(1): 74-82.

fn6. Collins, A. M., & Quillian, M. R. (2004). “The structure of semantic memory.” In Douglass Mook (ed.) Classic experiments in psychology. Westport, Conn.: Greenwood Press: 209-216.


  1. Your article represents the type that should more often appear here in B&A; one that addresses the foundational perceptual and mental constructs of information architecture. But you have provided so much information that, for me, it was hard to take it all in while reading. I believe the reason for this was my effort to transfer the information to long-term memory. It’s just the sort of stuff we should all be applying daily.

    One question I have: You stated, “When presenting information onscreen, text and visuals are not as effective as seeing visuals and hearing narration.” This is quite a blanket assertion- do you really think it applies comprehensively? For example, what about complicated graphs, where portions of the graph need to be identified? I would think that labels work better than verbal narration in those (and other) cases. In any case, some examples of the research you cited would be helpful.

  2. Thanks for the terrific article. It amazes me how applying basic cognitive psychology can improve our understanding of most anything we humans interact with. A number of your points reminded me of theories we discussed in my Instructional Design and Technology program.

    Much of what your article touches on is how we prepare a user for interaction and then once engaged, how do we deliver content that is meaningful and effective. The reminds me a great deal of Gagne’s 9 Events of Instruction

    1. Gain attention (Stimuli activates receptors)

    2. Inform learners of objectives (Creates level of expectation for learning)

    3. Stimulate recall of prior learning (Retrieval and activation of short-term memory)

    4. Present the content (Selective perception of content)

    5. Provide “learning guidance” (Semantic encoding for storage long-term memory)

    6. Elicit performance (practice)

    7. Provide feedback (Reinforcement and assessment of correct performance)

    8. Assess performance (Retrieval and reinforcement of content as final evaluation)

    9. Enhance retention and transfer to the job (Retrieval and generalization of learned skill to new situation)

    While hopefully your average UI doesn’t require the learning curve that these steps assumes, I find it interesting that some of the more fundamental steps are present in most all human interactions.

    1. Gain attention (Stimuli activates receptors)
    3. Stimulate recall of prior learning (Retrieval and activation of short-term memory)
    6. Elicit performance (practice)
    7. Provide feedback (Reinforcement and assessment of correct performance)

  3. Thank you both for the complimentary responses.

    Indeed it is a blanket assertion. But it’s not universal dogma, simply a general guideline. Or let’s call it a “starting point.” Each online situation merits careful consideration as to the best use of multimedia. Starting with these rules of thumb, we then determine if the rule stands or if another treatment is called for. For your example of a complicated chart, visuals-plus-narration may not be the best way to convey the information to the chart’s respective audience. Narration that covers all the material in the chart might be too lengthy to be practical. It depends on the purpose of the chart for your given users. Your solution of regional labels is a superior solution if a user wants to go in-depth. If there are a dozen different target audiences, there are at least a dozen different ways for a user to explore (and interpret) the information in the chart.

    You asked about examples of the research: Richard E. Mayer’s research on short onscreen lessons on how lighting forms or how bicycle pumps function. As well as in his academic research, you can also find these studies in his 2001 book “Multimedia Learning.” Also, there are rubber-hits-the-road applied examples in the 2005 “E-learning and the science of instruction,” co-authored with Ruth Clark. These are digestible (as in: non-professorial language), and move along quickly.

    There are a LOT of parallels between instructional design and interaction design! And your observation reminds ME that many of the ID theorists can contribute to how “we deliver content that is meaningful and effective.” For example, the influence of a person’s immediate context on his or her interactions might benefit from a look at Huitt’s work; an interaction sequence that is progressively more complex can be informed by Reigeluth’s work. Merrill and Knowles address the just-in-time responsibilities that rest with the designer for effective onscreen interactions. Even the more systematic structure of Dick and Carey’s model lends itself to a healthy iterative design methodology. Stuff like that.

    Instructional design also has striking parallels with graphic design, architecture, industrial design, and most other forms of design. Form follows function, and if the function merits design, the core methodology is arguably the same. Like interaction design, Gagne starts with getting attention, then orienting the participant in some way, initiating/prompting an interaction, then returning a result. Airport terminals, promotional print collateral, coffee makers, onscreen ecosystems, classroom pedagogy, etc. All are designed for their respective audiences and users.

    It’s my opinion that the fundamentals of design are generally the same, only the lexicon between disciplines is different.

  4. I’ll have to look up Huitt. Not familiar with that work. Thanks for the lead.

    As for Merrill, one of my favorite quotes of all time is “information is not instruction.” Perfect.

  5. There’s a short piece entitled “Total Recall” in the 4/13/08 New York Time Magazine that serves as an interesting addendum to this article. It’s by Gary Marcus, a psychology professor at NYU. The article explores “cue-driven memory” enhanced by a short-term memory chip inserted into the human brain.
    Such a chip might someday make memories accessible but not necessarily more reliable, Marcus says. Reliability comes with search-engine-like system that, similar to the mechanics of a read-write head, catalogs the location of information on your CPU.
    Though perhaps not at efficient as a chip, encoding (relative to each of us in terms of modality, schema and context) already does this to some extent. Location, location, location. And search engines are not discriminatory: they will return all hits on a given search, whether they are relevant or not. With constant response like this, in a matter of minutes the chip-carrying subject would go insane from overstimulation!
    Maybe we could control or filter the intensity with a small knob or remote control. Memories of a bad movie could be turned down to a 3, whereas your med school anatomy exam would benefit from a chip that goes to 11.

  6. “And search engines are not discriminatory: they will return all hits on a given search, whether they are relevant or not. ” This is why some onlines businesses are emerging that apply the critical thinking of actual humans instead of using algorithms to power thier content searches and taxonimies. The April issue of Wired has a short article on the subject:

  7. Thanks for the article. I’ll have to re-read it several times to transfer an appreciable amount of it to long-term memory, but that’s what makes it good in my opinion.
    I’m interested in the effects of primacy and recency that you mention near the end. Are you aware of any studies on these effects in the online environment? Your article reminds me of studying something similar to these effects in relation to childhood memory retention (an untold number of years ago in college). Now I’m wondering how this might be applied to online learning.
    As David pointed out, this is exactly the sort of thing we should be applying.

    Again – Thanks!

  8. Thanks for the compliment, Margot. Yeah, I probably should have divided the article into two installments for easier absorption. Ironic, huh?

    Anyway, I think you’re asking two questions. One deals with primacy and recency specifically in an onscreen environment. The other is related to that, but further addresses primacy and recency in online learning.

    Below is the URL to an exciting study (with lots of references for later scouting expeditions!) that at least reviews other related studies and paints them into an onscreen landscape. Perhaps the ideas you’re looking for are inside. It touches on position effects in memory, on attitudes, and on choice and preference. All of these are arguably elements of learning and pedagogy.

    In terms of designing the user experience and crafting the IA, it has a good number of ideas that can instantly be implemented. There’s some stuff on marketing and if you’re into eye tracking, there’s plenty of fodder for your next experiment.

  9. Personally I’m interested in the learning, but professionally the marketing info is always needed. I’d never really considered how the two are so strongly related until just now. Looks like I won’t be getting any more work done today… Thanks!

  10. Great article! Large amount of information to digest. Since I’m primarily a visual learner, I could have done with a little more “chunking” and possibly some illustrated examples. Thanks!

  11. Synchronicity: in Matthew C. Clarke’s “Wanted/Needed: UX Design for Collaboration 2.0” I included this in my comment: “when I’m confronted with a decision point (existentially) salience makes itself felt. I mean at a gut level. This has everything to do with cognitive schema (“pop-out” and “priming”, yes?). Data that isn’t salient is, if not noise, distraction.” I’ve always been convinced that folk who are not results-based (I mean in the moment) are easily distracted by fluff (cognitive cotton candy); absent existential concerns entertainment value sweeps everything else aside. But given a real concern (“democracy” isn’t about choice of dressing on one’s popcorn) a whole set of cognitive factors and processes are primed and enabled.

Comments are closed.