Wizards and Guides

“…the seeming simplicity of these structures belies their versatility and broad applicability to a wide range of situations…”

In part one of this article the discussion was one of views, forms, and the manner in which they could be combined into a particular type of task structure known as a hub. The purpose of this installment is to expand on those themes by exploring two other types of task structures commonly employed in web applications. Known as wizards and guides, these additional structures are useful for presenting complex transactions and multipart processes in smaller and more manageable sequences of individual steps.

As shown in Figure 1, all three of these structures—hubs, wizards, and guides—are not only conceptually simple; their basic forms are also easily diagramed and understood. Like many primitive patterns, however, the seeming simplicity of these structures belies their versatility and broad applicability to a wide range of situations, from the simplest consumer applications to the most robust enterprise management tools.

As unlikely as it may seem, taken together, hubs, wizards, and guides comprise the full universe of task flows applicable to web applications and form the basis of all web-based processes and transactions. Although this closely parallels the world of information architecture and its similarly small universe of organizational structures—indexes, webs, and hierarchies—whether such a parallel results from an artificial coincidence, debatable definitions, or some sort of universal truth, is a subject I’ll leave to the comment boards. For now, however, let’s shift the focus to the subject of wizards: what they are and what they can do.

Problem: How to structure rigid procedures
Solution: Wizards

With a predilection for precision and a fondness for exacting input, computers are unavoidably creatures of habit. Try as we might to curb their natural tendencies, disguise their demands, or inject flexibility into their routines, there remain situations that require users to follow specific and prescribed paths in order to complete particular operations or procedures. Such situations call for the task flow known as a wizard.

A staple of desktop applications, wizards are also found in a variety of web-based applications. Essentially predetermined sequences of forms, wizards provide a mechanism for guiding users through complex operations that can be completed in one and only one sequence. One of the most common uses of a wizard, as shown in Figure 2, is found in applications used to reserve limited resources—a seat on an airplane, a ticket to a sporting event, or the increasingly-scarce conference room, for example.

Reservation wizards are a particularly useful case in point because they illustrate, in a concise manner, the defining characteristics of a wizard: a multi-step procedure where the interdependence between steps dictates a specific sequence. That last bit is the crucial part.

Wizards are not simply chains of dialog boxes strung together to make life easier for the user. Rather, they are a particular interface pattern for expressing a precise and rigid procedure that has to unfold in a known and specific sequence. Although wizards can contain any number of required or optional steps, that does not mean users can randomly navigate between those steps.

I know what you’re thinking: “But aren’t there any reservation systems where the user can browse availability without having to frame such a specific request?” Well, such systems do exist, and one example is at the site for British Airways. Shown in Figure 3, the BA design is an interesting and innovative approach because it exposes a significant portion of the fare structure in a clear and concise calendar format rather than leaving the user to fumble around, testing dates for pricing and availability.

There are two important things to note about this example. First, although this design doesn’t necessarily escape the tyranny of the wizard, it does enhance the interaction by recognizing that users will make more satisfying and efficient choices if they are given sufficient information. Second, because this wizard ultimately results in a transaction, it terminates with a page that summarizes the user’s choices and provides the relevant interface elements for either completing the transaction or discarding it and starting over.

Although not the case in this particular situation, wizards can also be used in the context of an operation where a terminating summary view isn’t necessarily relevant. Software installation and image uploading are two such examples of wizard-based operations that don’t terminate in an editable end state. In both cases, although the user specifies various options before initiating the operation, once the operation is underway it cannot be edited or abandoned in the same manner as a transaction. In these cases, rather than a transaction summary, the end state is either an alert confirming the success of the operation, or the return to an appropriate location in the application. A software installer typically terminates with an “All Done” message, for example, while an image upload operation naturally ends by displaying a page containing the uploaded images.

Although wizards are a common feature of the interface landscape, their rigidity clearly runs counter to one of the basic tenets of user-centered design: providing the user with appropriate control over the interaction. Therefore, like the pointy-hat mystics for whom they’re named, wizards should generally be treated with suspicion and skepticism, and ideally avoided whenever possible. Fortunately guides, a third type of task flow, can often be used in place of a wizard.

Problem: How to structure complex transactions and flexible procedures
Solution: Guides

Combining the structured sequence of a wizard with the navigational flexibility of a hub, guides are another type of task flow commonly used in web applications. Unlike wizards, guides do not assume any sort of strict interdependence between steps. As a result, guides can incorporate the navigational flexibility needed for users to access, process, and edit the forms in any order. And unlike hubs, guides also provide the requisite structure and sequencing needed to ensure users can successfully create and manage lengthy or complex transactions.

In the simplest, though admittedly abstract, terms, guides are effectively wizards that terminate in the view page of a hub. This view page allows the user to return to any part of the guide, modify data, and directly return to the terminating page without having to go through any of the intervening steps. Typically, the terminating page of the guide also includes a submit button, enabling the user to indicate that the transaction is complete and ready for processing. One of the most frequent uses of a guide is the checkout process common to ecommerce sites. A case in point can be found at AllLearn.org, an online learning alliance. Like similar processes elsewhere, AllLearn collects billing, shipping, and payment information by logically breaking the task into multiple forms. Unlike the reservation wizard from British Airways, however, there is not a strict requirement affecting either the grouping of the information or the order in which the information is collected. For example, although AllLearn collects the user’s shipping address before their credit card information, that sequencing is by convention rather than necessity. As a result, because the steps are not dependent on one other, once the user has made their initial pass through the steps, they can randomly access and edit any of the previous steps without having to pass back through the entire sequence.

In fact, as shown in Figure 4, the final page of AllLearn’s guide effectively functions as the center of a hub task flow but with the addition of a master submit button that enables the user to finalize the transaction. (For more on the hub task flow, please see part 1 of this article.)

Because so many web applications are designed to facilitate lengthy or complex transactions, the guide’s ability to combine the navigational flexibility of a hub with the simplicity of a wizard renders it an extremely useful and versatile taskflow.

Guidelines for Usage

For designers, proper usage of wizards and guides not only requires an understanding of their relative advantages, disadvantages, and utility, but also an understanding of various conventions and best practices. These include the following:

  • Communicate purpose- It’s an obvious point that bears repeating: it is important to not only be clear about how to use a guide or a wizard, but also to be clear on why you’re using it, and to communicate that purpose to the user.
  • Minimize the number of steps- One of the more challenging aspects of designing these taskflows is to strike the appropriate balance between too many steps and too many options within any given step. While there’s no golden rule about this balance, suffice it to say that the optimal solution requires a conscious consideration of the tradeoffs.
  • Provide an exit path- Guides and wizards are not inescapable hallways; they are rooms. And like all rooms, they should include an accessible door so users can easily get out of them. Of course, if there are consequences to an untimely exit, those consequences should be communicated.
  • Limit navigation options- While a wizard or a guide is analogous to a room, we should limit the analogy to a museum gallery, rather than a concert hall. While it’s important to have an exit, if you’re trying to corral users down a particular path, it’s also wise to not have too many exits. Therefore pages that are part of guides or wizards should have limited navigation options. (For a more detailed explanation, please see part 1 of this article.)
  • Inform users of their progress- Because these taskflows embody discrete procedures and operations, users understandably enter them with a sort of “are we there yet?” mentality. Therefore, it’s important to keep users informed of their progress by clearly displaying both the number of steps in the sequence and users’ progress through them.
  • Do your homework- This isn’t really a guideline so much as a reminder that wizards and guides are merely one more interface convention: another arrow for your quiver, another tool for your toolbox. As such, they are simply a reflection of the care with which you analyze, understand, and address the unique needs of both the task and the users expected to complete it.

In Conclusion

Although this discussion has covered a lot of ground, it would somehow fail to be complete without the obligatory collection of summarizing sound bites. To wit:

  • Embrace the medium- The natural behavior of a web application features two modes: viewing and editing. Design accordingly by removing extraneous navigation options from forms and inappropriate interface controls from views.
  • Hubs- You go, you come back. Hubs are ideal for situations that use multiple, discreet, single-page forms.
  • Wizards- Step one, two, three. Wizards are appropriate for multi-page procedures or operations that must be completed in a prescribed order.
  • Guides- This way please. Guides are useful for complex, multi-part sequences that seek to combine the navigational guidance of a wizard with the navigational flexibility of a hub.

So there you go: hubs, wizards, and guides in all their glorious detail. I look forward to reading your comments soon.

Next up, my model for dissecting and describing a user interface.

Bob Baxley is a practicing designer who specializes in interaction design for Web applications and desktop products. An independent consultant in Silicon Valley, he is also the author of “Making the Web Work: Designing Effective Web Applications.” Bob writes about design at www.baxleydesign.com and a host of other things at www.drowninginthecurrent.com.

2nd Annual O’Reilly Emerging Technology Conference

“I came not for the view but to fill my brain with the collective wisdom, curiosity, and enthusiasm.”In the final week of April, I had the good fortune of spending four days at The Westin Hotel in Santa Clara, California. Now, for those of you who are familiar with Silicon Valley, this might not immediately sound like the sort of thing one would actually enjoy, given that The Westin sits in the middle of an industrial office park that is about as interesting as, well, an industrial office park. Living a scant ten miles north of the hotel, however, I came not for the view but to fill my brain with the collective wisdom, curiosity, and enthusiasm known as The O’Reilly Emerging Technology Conference.

The conference was organized by perhaps the most prolific publisher of technical books. So many books, in fact, they appear to be running out of animals for their covers. The company’s namesake, Tim O’Reilly, describes the purpose of the conference thusly, “The Emerging Technology Conference is a way for us to frame what the hackers and alpha geeks are showing us about new technologies into a coherent picture, think about the implications, and share it with interested—and interesting—parties.”

That last little phrase can’t be overstated. This was a conference in which the technorati turned out en masse. Attendees roaming the buffet line included the CEO of a well-known online bookstore, a smattering of recognizable venture capitalists, a fair number of authors, a host of engineering icons, and bloggers galore.

The four days, including one day of tutorials, were divided between conference-wide keynotes in the morning and smaller sessions in the afternoon. As if eight hours of cognitive input wasn’t quite enough, there were also a variety of birds-of-a-feather meetings and social gatherings to fill the evening hours. The full conference schedule is still available online if you’re interested in the gory details.

The conference focused on four themes: rich internet applications, social software, “untethered,” and nanotechnology and hardware. I was somewhat relieved by the surprisingly non-technical nature of each keynote and even most of the sessions, particularly since this was a conference largely attended by people who would unflinchingly refer to themselves as “geeks.” That’s not to say that there wasn’t some level of tech-talk, it’s hard to talk about “Identity, Security, and XML Web Services” without eventually getting down to brass tacks. However, the bulk of the sessions and all of the keynotes were decidedly non-technical. For example, Howard Rheingold talked about “Smart Mobs”, Alan Kay spoke on the future of computing, and Clay Shirky presented on group behavior and how it effects social software.

With over fifty sessions in a period of 96 hours, I obviously had to make some choices about where to spend my time. As such, my notes and observations are consistent with an interest in social software and rich internet applications, leaving the subjects of untethered and nanotechnology to the reader’s imagination.

Keynote — “Smart Mobs,” Howard Rheingold
The author of some ten books, Rheingold’s current interest, as described in his recent effort “Smart Mobs,” is in the idea of collective action. Rheingold envisions a future where cell phones have evolved into powerful, mobile internet terminals, and the concept of connectedness is pervasive, if not ubiquitous. A world such as this challenges our ideas of trust and reputation, and leaves Rheingold wondering why we’re so quick to trust people we meet online when we have no real way of judging their intentions or trustworthiness.

One of the more interesting threads he spun surrounded the question of innovation and his ongoing concerns about the efforts of government and industry to limit innovation through political and regulatory enclosures. He talked about the unique nature of the Internet as media outlet, pointing out that it was the only medium ever devised where the consumers of the medium also functioned as contributors. Posing the question “Are we going to be consumers or users?,” Rheingold challenged us all to recognize and exercise our power as owners and stakeholders in the vast collective actions known as the Internet, the Web, and Open Source.

In a less political moment, he also uttered an observation worth noting by the readers of Boxes and Arrows: “Only geeks mess with defaults.” As designers, we often think that providing preferences and options is a way of empowering users. Rheingold, however, shuts down that avenue of escape and reminds us that our decisions have serious consequences for all users, even when there is a way for them to override our choices.

Panel — Digital Rights Management (DRM) in Practice: Rights, Restrictions, and Reality
Moderated by Dan Gillmor of the San Jose Mercury News, the most notable comments from this panel were made by Cory Doctorow of the Electronic Frontier Foundation (among other things) and Joe Kraus, founder of DigitalConsumer.org. Other panelists certainly contributed their fair share, but the eloquence of Cory and Joe was most memorable.

As someone who has spent way too much time in the halls of Silicon Valley, I was particularly intrigued by Joe Kraus’ comments. Joe has obviously paid his dues in our nation’s capital, and one of his most insightful observations was about the differences between Silicon Valley and Washington. As an engineering-based culture, Silicon Valley assumes fact-based decision-making. In other words, if you don’t agree with me it’s because you don’t have all the facts, so let’s go over this one more time. By comparison, Washington assumes politically-based decision making. In other words, you scratch my back and I’ll scratch Fred’s, who owes me a favor and will get three people to scratch Ted’s, who will in turn make sure that Matt gets back to you.

The message of all this being we can’t assume that decisions and legislation related to digital rights will be based on facts. We have to realize that they will be based on politics, and therefore our only way to influence the outcome is to participate in their process.

Cory Doctorow, perhaps best known for his work at BoingBoing.net, also waxed poetic about DRM and consumer/user complacency in the face of industry executives who are overly committed to the concept of intellectual property. Comparing Napster to the Library at Alexandria—a questionable comparison, but apropos for someone of Cory’s enthusiasm—he noted that Napster had more registered users (~57M) than George W. Bush received for President (~50M). Continuing the metaphor, Cory also noted that all 57,000,000,000 users quietly sat on the sidelines watching as the “library was burned to the ground.”

Cory is a passionate guy.

As the session closed I found myself thinking, “Dang! I didn’t even know DRM (Digital Rights Management) or DMCA (Digital Millennium Copyright Act) were acronyms, but now I’m mad as hell.”

Keynote – “A Group Is Its Own Worst Enemy: Social Structure in Social Software,” Clay Shirky
Right off the bat I have to admit that before this conference I had never heard the phrase “social software.” As I sat through my first session, I thought the din of keyboards was actually rain rather than a cadre of attendees “blogging in real time.” It turns out, however, that social software is all the rage amongst the emerging technologists, and few carry the flag higher than Clay Shirky.

During the hour in which he held court, Clay spoke less about technology than he did about psychology and group dynamics. Revealing a political bias tending towards Libertarianism, Clay described for an almost surprised audience why groups require structure, codes, and moderators to survive. Venturing into the “soft sciences” of psychology and sociology, he noted that social software is much closer to economics and politics than it is to traditional programming.

Echoing some of Rheingold’s comments, Clay noted that anyone creating social software has to accept three things: First, you cannot separate technology from social issues. Software may determine what people can do easily, but it doesn’t determine what they can do period. Second, members are different than users. Any group includes both active and passive participants (you lurkers know who you are). Third, the group itself has rights that trump individual rights in some situations. For example, an Open Source project has the right to survive even if it means alienating a few people here and there.

Clay was a great speaker and very intelligent guy. If you’re interested in such things you should check out his site.

Takeaways and conclusions
As you undoubtedly noted from all those comments left between the lines, my biggest takeaway from the second annual O’Reilly Emerging Technology Conference was that emerging technology is not nearly as interesting as the emerging uses of technology. What I heard had a lot more to do with politics, power, individual choice, and collective action than it did with PHP, CSS, XML, or XHTML.

Most interesting for us as a community of designers and user advocates was the obvious subtext that technology has reached a point where users—actual real people—are the defining element of the human/computer relationship. It is an encouraging sign that a conference for “hackers and alpha geeks” was at least as concerned with the question of what as it was with the issues of how.

Bob Baxley is a practicing designer who specializes in interaction design for web applications and desktop products. His first book, “Making the Web Work: Designing Effective Web Applications” (ISBN: 0735711968) was published in 2002. His thoughts on design and technology are at Baxley Design while his observations on a host of other things can be found at Drowning in the Current.

Views and Forms: Principles of Task Flow for Web Applications Part 1

“Creating web applications that support the full and valid completion of specific tasks, operations, and database transactions, require some understanding of how to manipulate the medium to that purpose. ”One of the first challenges facing the designer of any application is answering the ubiquitous question, “How do I __________?” In the case of a web application examples might include, “How do I register for a class?”, “How do I pay my bills?”, or perhaps “How do I make sure that gets to my mom on the Friday before Mother’s Day?”

The hypertext environment of the Web presumes a style of unfettered browsing and exploration that is not particularly conducive to the full and valid completion of specific tasks, operations, or database transactions. Creating web applications that support the full and valid completion of specific tasks, operations, and database transactions, therefore requires some understanding of how to manipulate the medium to that purpose. To wit, the following few thousand words serve to describe both the fundamental building blocks of HTML-based web applications as well as the three ways in which those blocks can be arranged to provide various types of task flows.

As previously discussed, one of the defining elements of web applications is their support for the editing and manipulation of stored data. Unlike the typical conversation that goes on between a user and a content-centric Web site however, this additional capability requires a more robust dialog between user and application. For example, a visitor to CNN might see something of interest and click on a link, sending a request to the server to send over the indicated story. In such cases, the vocabulary of the user is reduced to something less than that of a nine-month-old baby. The experience provides the user with specificity in their selection of nouns—get me this versus get me that—but their choice of verbs is conspicuously limited to “fetch.”

By comparison the designer of a bill pay application has to support a broader array of both nouns and verbs so that users can more readily converse with the application, saying things such as, “Create a new vendor with this address and account number” or “Pay these four bills the day before they’re due.” Where the user of a content-centric site is mostly confined to the mode of listening, the user of a web application splits their time between listening and talking. And because the client-server architecture of a web application dictates a serial conversation, these types of operations not only require a broader vocabulary; they also require a separate mode.

Although rich-media applications technologies strive to reduce the division between these modes, when it comes to HTML-based applications, the best strategy is to embrace the nature of the medium. This is accomplished by clearly isolating operations into two types of pages: “views” for viewing and navigating, and “forms” for editing and manipulating stored data.

An example of a relatively sophisticated view is shown in Figure 1. This page not only provides for the simple presentation of content and data, it also includes controls for navigating the data or the application—changing the sort order, specifying the type of information to be displayed, or paging through long lists of items. What distinguishes these operations, however, is that they do not result in permanent changes to stored information.

Conversely, the purpose of a form should be the completion of a specific task such as updating an address, recording a stock transaction, or submitting an application. In desktop applications, specific tasks such as these are typically handled through modal dialog boxes. Only by requiring the user to supply complete and valid information, can the application complete the requested operation. Inherently HTML does not provide any ready mechanism for enforcing modality. Although this might initially seem like more of a blessing than a curse, the lack of modality in a web application makes it very difficult to structure tasks in a way that ensures the system is able to do what the user asked.

To help mitigate this problem, it is important to respect the division between views and forms by eliminating virtually all of the navigation options from forms. Global and local navigation links not only serve to distract the user from what they’re doing, they also function as cancel buttons since such navigation links do not typically submit or validate the user’s edits.

It’s worth noting at this juncture that just because something is a ‹FORM› doesn’t necessarily mean it’s a form in this context. As described here, forms result in permanent changes to stored data, necessitating a variety of validation criteria as well as the attendant collection of status, confirmation, and progress alerts. By comparison the forms associated with content-centric sites or web directories, the new Yahoo! Search for example, do not result in changes to stored data and therefore don’t have to validate input or report input errors.

The division of an application into explicit views and forms is what I call the “view/form” construct—the benefits of which include:

  1. A tidy separation of interactions into the two modes of viewing and editing
  2. Providing explicit control over when data is submitted and saved
  3. Reducing the chances of a user inadvertently abandoning edits by minimizing the navigation options out of forms
  4. Reducing the visual complexity of views by moving input controls to dedicated form pages
  5. Providing an explicit and predictable moment to validate user input

Of course the best way to understand the value and implementation of these concepts is to look at some real-life examples.

Problem: How Can I Ensure Users Explicitly Save or Cancel Their Changes?
MovableType’s Weblog Configuration Forms

One such example is found in the blogging application, MovableType. Used to specify a variety of parameters about an individual blog, MovableType’s Weblog Configuration area is grouped into four different forms, Core Setup, Preferences, Archiving, and IP Banning.

In Figure 2, it’s obvious that this is a well-crafted, intelligent design. There is however, one confusing and ambiguous aspect of the interaction design that highlights the importance of clearly distinguishing between the modes of viewing and editing.

The issue is this: the only interface element on this page that actually saves the user’s data is the Save button located at the bottom of the form. If the user clicks any of the other links, buttons, or menus, their changes will be abandoned. This is a particular problem with regards to the links to the other configuration forms since a user could reasonably infer that the Save button would save their edits to all four forms at once.

In other words, if the user made changes to the Archives form and then immediately navigated to the Preferences form, any changes they made to the Archives form would be abandoned. In effect, the links to the other configuration options, as well as the navigation controls along the left and top of the page, all function as Cancel buttons and serve to distract the user from completing their task.

Problem: What Happens After a Successful Submit?
Example: TiVo’s Account Management forms

Figure 3 illustrates a second point of confusion that can arise when views and forms are merged into a single page. In this example from the TiVo account management application, the user has successfully submitted changes to their account and those changes have been saved. However, because the application doesn’t follow the view/form construct there is not a logical place to confirm the result of the user’s action. As a result, the confirmation message is reported at the top of the form that has just been saved.

To illustrate the issue we’ll walk through the entire task flow. First the user navigates to this page. Next they change the values for one or both of the available fields and then click either of the Save Preferences buttons. Their browser talks to the server and after some sort of delay, the page redraws with exactly the same page and form elements but with the somewhat subtle addition of the green text reading, “Success!” Unfortunately, it’s quite likely that most users won’t see this message and will be left wondering if anything happened or not.

In addition, the TiVo design itself acknowledges a problem with the approach by including instructional text next to each of the Save buttons. By contrast, if the design followed the view/form construct, the only way out of the form would be an explicit Save or Cancel button. This would eliminate the need for such text and reduce the number of messages and information vying for the user’s attention.

Fortunately, there is a simple solution all of the problems described in these two examples: redesigning the task flow using the view/form construct and a hub.

Solution: The Hub Structure
Example: Mailblocks Preferences Forms

The structure of a hub is one of the most common task flows used in HTML-based Web applications. As its name implies, a hub is constructed from a single view page and a collection of forms. The view serves as a launching pad to each of the forms with a successful submit from any of the forms returning back to the view (Figure 4).

This example, from the Options area of the web-based mail service Mailblocks, features a view page as a launching pad to the various forms containing user-defined options for mail accounts and other preferences. Unlike the designs shown in the two preceding examples, Figure 5 solves the same problem using a hub.

The view page includes links to the various forms as well as information explaining the contents and purpose of those forms. In addition, the page contains a collection of vital information about the user’s different accounts as well as different commands for acting on those accounts.

One spoke of this hub, the Add/Edit External Account form, is shown in Figure 6. The important thing to notice about this form is the elimination of all extraneous navigation. As a result, the only way the user can readily exit the form is by explicitly clicking the Submit or Cancel buttons located at the bottom of the form. Although the utility navigation links still appear in the upper-right corner, only the Sign Out button would abandon the user’s changes since the Help links opens a secondary browser window.

Compared to the earlier examples, this approach has the following advantages:

  1. Provides an intuitive task flow by providing clear mechanisms for navigating to the appropriate form, explicitly submitting changes, and returning to the initial starting point.
  2. Minimizes the chances of a user inadvertently abandoning changed data by removing extraneous navigation elements.
  3. Minimizes the cognitive load and visual clutter of all pages by dividing the task into specific, focused operations.
  4. Removes the need for a confirmation alert.

Hubs are one of the most useful structures available in HTML-based applications. They can be found in a variety of applications including online calendars, Web-based email, bill pay applications, server configuration applications, and others.

Up Next
In addition to hubs, there are two other task flows commonly used in HTML-based web applications. Next time around we’ll focus on the other two: wizards and guides. In the meantime, I hope you’ll take the time to join the conversation on our newly created discussion list dedicated to interaction design.

(Editor’s note): I’m happy to announce the beginning of Boxes and Arrow’s first discussion list. Dedicated to the topic of interaction design, <“Interaction”> awaits your participation. Please come join the conversation.Bob Baxley is a practicing designer who specializes in interaction design for Web applications and desktop products. He currently runs the Design and Usability teams at The Harris-myCFO and recently published his first book, “Making the Web Work: Designing Effective Web Applications.” He looks forward to hearing from you at .

What is a Web Application?

“The fundamental purpose of all web applications is to facilitate the completion of one or more tasks.”To reiterate the themes of dance and conversation introduced in my last column, truly superior interaction design strikes a delicate balance between the needs and expectations of users and the capabilities and limitations of technology. The ability to consistently find solutions that achieve this balance in a manner appropriate to the medium is a hallmark of an experienced interaction designer.

The purpose of this article is to improve your ability to find that balance by adding to your understanding of web applications as an interactive medium. What distinguishes a web application from a traditional, content-based website and what are some of the unique design challenges associated with web applications? A reasonable launching point is the more fundamental question, “What is an application?”

What is an application?
The first step toward differentiating web applications from traditional content-centric websites is to focus on the “application” part of the equation. According to the American Heritage Dictionary, an application is (among other things), “…a computer program designed for a specific task or use.” That last phrase, “specific task,” is perhaps the most important.

The fundamental purpose of all web applications is to facilitate the completion of one or more tasks. Unlike visitors to traditional, content-centric websites, users of web applications invariably arrive with specific goals, tasks, and expectations in mind. Of course, that’s not to say that visitors to content-based websites don’t also arrive with certain goals and expectations, but rather that the motivations for using a web application are almost always explicit and precise.

One of the most important implications of this task-based orientation is the degree to which the application should call attention to itself. Compared to content-centric websites, video games, and various forms of entertainment media, application design succeeds not when it draws attention to itself but when it recedes into the background. This requires the designer to find solutions fundamentally natural to both the user and the medium, allowing the application itself to become transparent. The paradox of application design is that the perfect solution is invariably the one that goes unnoticed.

While this does not mean that an application’s design shouldn’t be enjoyable and aesthetically pleasing, it does mean that the design should play a subservient role to the user’s work.

A second implication of their task-based orientation is that web applications have to provide users with various milestones informing them when tasks are complete. In other words, web applications have to support an end-state in a way that content-based sites typically don’t.

In addition to the challenges resulting from their focus on task completion, the manner in which web applications function and connect with users highlights other issues affecting web application design.

When is a website a web application?
Without being overly concerned about semantics or classification (if that’s actually possible on a site like Boxes and Arrows ), it is important to establish an objective means of differentiating between a web application and a traditional website. To wit, in contrast to content-based websites, a web application possesses both of the following observable properties:

  • One-to-one relationship – Web applications establish a unique session and relationship with each and every visitor. Although this behavior is fundamental to Web applications it is not present in either content-based websites or desktop applications. A web application such as Hotmail knows who you are in a way that Cnet or even Photoshop doesn’t.
  • Ability to permanently change data – Web applications allow users to create, manipulate, and permanently store data. Such data can take the form of completed sales transactions, human resources records, or email messages to name but a few. This contrasts with web services like Google that allow users to submit information but do not allow them to permanently store or alter information.

Although these two characteristics alone result in a fairly broad definition of web applications, websites that possess both of them necessarily contain a degree of application behavior, logic, and state lacking in traditional content-based sites. In addition, they require a significantly more sophisticated level of user interactivity and interaction design than what is associated with content sites.

This distinction between websites and web applications is most obvious in situations where a given site is almost exclusively composed of either content OR functionality. Newsweek.com (a website) and Ofoto (a web application) are two such cases. However, even popular web destinations such as Amazon, and myYahoo!, sites that combine both content AND functionality, should be considered web applications because they meet these two criteria and therefore exhibit the interactive complexities and behaviors associated with applications.

In the case of Amazon, this takes the obvious form of personalized content and complex transactions, as well as a variety of other functions including the creation and storage of , the uploading and ordering of digital photographs, the editing and tracking of orders, and many others. That’s not to say that all online stores qualify as web applications; in fact most don’t. But Amazon and other stores of similar sophistication have the same characteristics and design considerations as more traditional applications such as email and contact management.

Granted, consumer sites like Amazon and myYahoo! typically lack the level of complexity found in licensed enterprise applications such as Siebel, PeopleSoft, or Documentum, but as a tool for classification, complexity is both inadequate and subjective.

Whether any particular application has sufficient complexity to require a highly skilled interaction designer is a question that can only be answered on a case-by-case basis. The point remains, however, that if a web property establishes a one-to-one relationship with its users and allows those users to edit, manipulate, and permanently store data, then it possess certain capabilities and complexities that distinguish it from traditional content-centric websites.

So what? “The point remains, however, that if a web property establishes a one-to-one relationship with its users and allows those users to edit, manipulate, and permanently store data, then it possess certain capabilities and complexities that distinguish it from traditional content-centric websites.”If the point of all this definition stuff was simply to provide a consistent method for classifying web properties, the whole exercise could be dismissed as little more than academic rhetoric. What’s useful about this definition, however, is not so much its utility as a classification scheme, but rather its ability to highlight some of the unique design challenges and functional benefits associated with web applications.

One of the most significant challenges and benefits results from the one-to-one relationship web applications form with their users.

Because a web application requires each user to uniquely identify themselves to the system, typically through a username and password pair, the application can be dynamically altered from one user to the next. This can take both the obvious form of personalized content and the more subtle and complex form form of personalized functionality based on roles and privileges. This type of dynamic behavior allows a complex corporate accounting application, for example, to provide different functionality to account managers, regional directors, corporate executives, etc.

Although this type of capability has been a mainstay of enterprise applications for some time, many less sophisticated or expensive applications now employ this behavior. For example, consumer-based online services can add and remove features or advertising based on whether or not a particular user has paid a subscription fee.

More so than any other interactive medium, a web application has the ability to adapt itself to each user, providing them with a personalized and unique experience. Accommodating the full range of permutations afforded by this capability is a unique and significant design challenge. Because various functions, interface controls, and data can dynamically come and go from the interface, designers are forced to think in terms of modular components that are simultaneously harmonious and autonomous.

In the same way that it is practically impossible for a visual designer to fully anticipate how a given web page will look in every situation, the designers of large-scale applications also struggle to fully document and consider every possible permutation of functionality and data.

Another unique design challenge associated with web applications results from their ability to allow users to make permanent changes to stored data. Because web applications are fundamentally database applications–that is, they store and present information contained in a defined database structure—the user’s information almost always has to fit within a predetermined format. Certain fields are present; some fields are required, others are not; some require a specific type of value; and still others require a value within a precise range. The result of all this is a level of data validation and error recovery unseen in either content-based websites or most desktop applications.

Accommodating this behavior requires the designer to carefully consider the task flow from one operation to the next, the full scope of potential errors, the most expedient way to recover from errors, and of course the holy grail of it all, the ideal solution for avoiding errors altogether.

All this points to one critical conclusion: web applications are a new form of interactive media, distinct from both content-based websites and traditional desktop applications. Therefore, the creation of truly useful and usable web applications requires the designer to understand, appreciate, and exploit the unique capabilities, limitations, and conventions of this new medium rather than simply approaching the design problem from the perspective of more established interactive mediums.

Next Up
Next time around we’ll continue to explore web applications as an interactive medium by comparing their advantages, disadvantages, and uses to traditional desktop applications.

Bob Baxley is a practicing designer who specializes in interaction design for Web applications and desktop products. He currently runs the Design and Usability teams at The Harris-myCFO and recently published his first book, “Making the Web Work: Designing Effective Web Applications.” He looks forward to hearing from you at .

Introducing Interaction Design

Part 1 of a 12-part series about interaction design for web applications

“[Design] has never cohered into a unified profession, such as law, medicine, or architecture… Instead, design has splintered into ever-greater subdivisions of practice without any overarching concept of organization…”
—John Heskett in Toothpicks and Logos: Design in Everyday Life
When the founders of this site set out to create what we now know as Boxes and Arrows, they had four things on their mind: information architecture, interaction design, information design, and interface design. Now, only nine months later, Boxes and Arrows stands as one of the preeminent design communities on the Web and is well on its way to fulfilling its mission to be the “definitive source for the complex task of bringing architecture and design to the digital landscape.” With its first birthday a few short months away, it seems an appropriate time to expand the conversation and the community by examining issues outside the traditional scope of information architecture. To wit, I offer the first of a twelve-article, twelve-month series devoted to the field of interaction design.

The focus of this series is on the challenges inherent in the task of translating established product requirements into a browser-based interface. Along the way, we’ll discuss the activity of interaction design as it relates to the Web and the relative advantages and disadvantages of the Web as an interactive medium. In addition, we’ll examine a variety of solutions to common interaction design problems. Although the next eleven articles are already loosely mapped out, if there are particular topics you would like to have covered, please let me know and I’ll do my best to work them in.

And so we’re off. I hope the journey is a fun, useful, and educational one for us all.

We’re in this together
Ours is a world, an economy, and a profession that has embraced the idea of specialization of occupation on a scale heretofore unknown. Where six years ago someone who designed software could readily lay claim to the title “interface designer,” the explosion of the Web and other interactive mediums has split our profession into a variety of increasingly granular specialties.

Although this has sometimes been both necessary and useful, it has also resulted in a cacophony of competing titles, responsibilities, and consulting rates. As a result, even though we all approach our work from the same user-centric orientation, specialists working on one aspect of a design may be ignorant of the issues and compromises being made by other specialists working on other aspects of the design.

In particular, information architecture and interaction design have often been sequestered from one another. Where information architecture has tended to focus on content-centric sites, interaction design has tended to focus on functionality-centric sites. Now however, the two often find themselves in meetings together thanks to the proliferation of websites featuring both large volumes of content and sophisticated functionality.

Therefore, seeing as how we’re going to be in meetings together, it seems only polite to introduce ourselves to one another.

Structure vs. behavior: teasing apart IA and ID
A good place to begin is the definition of information architecture offered up by two of Michigan’s better minds. In their recently published second edition to “Information Architecture for the World Wide Web,” Sirs Rosenfeld and Morville offer a lengthy definition of the field which focuses on four key themes:

  1. Information
  2. Structuring, organizing, and labeling
  3. Finding and managing
  4. Art and science

Other than the fact that “art and science” is endemic to all forms of design, their definition describes a fairly specific universe of design issues and challenges. In contrast, the field of interaction design concentrates on the following:

  1. Human/machine communication – At its most fundamental, interaction design serves to translate the conversation that goes on between the technology and the user. In the role of translator, interaction designers are required to understand the subtleties and colloquialisms of both parties, ensuring that they can readily and efficiently communicate with one another.
  2. Action/Reaction – Not surprisingly, the action/reaction dynamic of interactive media sits at the heart of interaction design. This requires the designer to understand and anticipate how interactions unfold over time, designing for the wide range of permutations that can occur.
  3. State – As part of its role as translator, interaction design is also concerned with ensuring the user understands the current state of the application. In the same way that humans use body language and social situation to govern and predict behavior, interactive systems communicate state so that users will understand what type of operations are possible or appropriate at any given time.
  4. Workflow – In addition to facilitating the completion of discrete tasks such as selecting a payment method, interaction design is also concerned with the completion of multi-task goals such as browsing, selecting, and purchasing an item. Like a film director connecting individual shots into scenes, and scenes into movies, the interaction designer uses individual screen elements to create pages, pages to create complex operations, and operations to create a complete application.
  5. Malfunction – As with all forms of communication, misunderstandings and mistakes occur. Therefore it is also part of the designer’s role to anticipate and mitigate those problems, ensuring that both the user and the system can easily recover.

Well-designed interactive products balance each of these concerns with the respective limitations and capabilities of both people and technology. Such products allow people and technology to carry on a complex and elegant dance relying on multiple, simultaneous forms of communication. The role of the interaction design, therefore, is to choreograph and facilitate the dance in a manner that makes everyone feel like Fred Astaire.

Such choreography, of course, requires an understanding of both the stage and the dancers. As a result, the best interaction designers draw from a variety of disciplines ranging from perceptual psychology to computer science.

Looking ahead
With that brief introduction to the field of interaction design behind us, we can start to examine more specific and thorny topics. Next month’s installment will include a comparison of web applications to traditional content-based sites as well as a consideration of the relative advantages and disadvantages of the Web as an application platform.

Bob Baxley is a practicing designer who specializes in interaction design for Web applications and desktop products. He currently runs the Design and Usability teams at The Harris-myCFO and recently published his first book, “Making the Web Work: Designing Effective Web Applications.” He looks forward to hearing from you at .