Lost in Translation: IA Challenges in Distributing Digital Audio

Posted by
“The main challenge facing network audio devices is how to provide remote access to the music library… this looks like a job for an information architect!”

With each new advancement in digital media come new ways to consume and distribute it, and new and different challenges for information architecture. For example, several new devices on the market are designed to distribute digital audio from a computer to audio systems in other rooms of the house. These devices connect to your home network through a standard Ethernet cable or wi-fi, routing music from your computer to your stereo using standard audio connections.

The main challenge facing these devices is how to provide remote access to the music library. While sitting at a computer, you have the benefit of using a keyboard, mouse and screen to interact with software like iTunes or WinAmp. Since network audio devices need to sit on the shelf with your stereo, they do not have a full display, and the only means of interaction is a remote control. In other words, this looks like a job for an information architect!

This new paradigm for accessing music libraries presents at least two information architecture challenges:

  1. How do users find a song in their music library?
  2. How do users know what’s playing and what’s coming up?

The challenges are made even more difficult by several factors:

  1. Limited display size
  2. Limited availability of metadata
  3. User’s expectations—people are used to browsing through a CD library

This article looks at how three devices on the market today address these IA challenges. Two of these devices, RokuLabs’ Soundbridge and Slim Devices Squeezebox have a screen on the shelf unit. The display on each of these devices is limited to two lines of text, and the remote controls are configured for navigation. On the other hand, Sonos’ device uses a different approach, putting the display in the remote control. Because of this, Sonos’ remote looks like a large iPod with a color display while the device that networks the music has no display at all.

Design Philosophies

Sean Adams created the first generation Squeezebox in 2001 by hacking together some hardware and software. From that first foray into distributed digital music grew a large community grounded in the open source culture. Slim Devices made their server software open source, and there are now more than 50 developers working on it worldwide. This approach has led to constant gradual improvement.

Slim Devices' Squeezebox
Dean Blackketter, Slim Devices’ CTO, says that although the community is the key to adding new features, he monitors all changes to the software before they are officially released. This allows Slim Devices to ensure that any changes to the interface stick with their style guide. Blackketter appreciates the open source approach because it allows people to work on the interface quirks that bother them the most; he told a story about someone who found the timing of the scroll a little off, and wrote a new scrolling algorithm. Blackketter frequently uses the “friends and family” approach to test the usability of these upgrades.

Slim Devices uses no formal user-centered design methodology and maintains no tools beyond a style guide. Blackketter says that the company has internalized the personas of their customers. The management team came to an implicit agreement over the life of the device that their target audience consists of highly technical people—users who like playing with the device—and their spouses—people who just want to listen to music.

Like Slim Devices, RokuLabs’ design philosophy does not depend on formal user testing. Many of the team members at RokuLabs came from ReplayTV, the main competitor to TiVo, and the designers at RokuLabs depend on their previous experience in networked media devices to provide insight into usage. Mike Cobb, RokuLab’s senior engineer, says their experience with ReplayTV provided many lessons for the user experience of the Soundbridge.

The user experience of iTunes also drove the design philosophy for Soundbridge, since the unit was meant to be an extension of that software; RokuLabs sought to make the interactions similar to those of iTunes or the iPod. One key difference is the interaction model of the remote control: while Squeezebox uses the “right arrow” button to make a selection, Soundbridge users must push a “select” button. RokuLabs’ design rejects the use of navigation as selection. In this way, it resembles the iPod, which uses a one-dimensional navigation device (the wheel) and forces users to physically make a selection (by pushing the center button).

RokuLabs also had the benefit of not being the first to market. They played with early versions of the Squeezebox and decided what they liked and didn’t like. One thing they noticed was that the experience seemed geared to tech-savvy users, and RokuLabs wanted a more mass market device.

The newest entrant is Sonos, whose unit shipped in January 2005. I spoke to Mieko Kusano, the director of product management who says that although the idea for Sonos came from its founder, they spent a lot of time defining their target market, which led to creating personas. Sonos also employed a simple ground rule: their designers were not allowed to talk about what “I” want. Instead, all design decisions had to be made within the context of the personas. Kusano says the personas were useful for making the process more concrete, and they gave the company a common platform. She advocates doing as many user studies as you can. “Every time we had something new to show,” said Kusano, “we brought users in.”

Initial user research drove a couple of key design decisions, including putting the display on the remote and focusing on distributing music to many rooms in the house. Having decided to make the screen on the remote in early user studies, they developed a method for prototyping new remote controls by using a PDA. They could program the PDA to display different screens and then test them with their users.

The second decision—focusing on multiroom audio distribution—motivated the design of the remote control itself. Sonos’ remote boasts the fewest buttons. Many functions use “soft keys”—buttons that change their function depending on state—but escalates key functions to physical buttons. Besides volume and playback and navigation, there are only two other buttons: Music and Zones. The music button brings users to the menu where they can select music and the zones button brings users to the menu to select what room to program. All other controls (for example, shuffle, repeat, music queuing, etc.) are presented in the screen.

As Sonos neared their launch date, they did frequent in-home testing, taking beta units to customers’ houses and observing them. They watched users as they went through the out-of-box-experience, the set-up, and use of the unit. Sonos’ approach represents a departure from the other two philosophies, and I was eager to see how the structure of information would differ among them.

Browsing Music

Before digging into the navigation scheme, I want to set out the underlying conceptual structure for each system, which is the same across all three and resembles that of the iPod. (Squeezebox was around before iPod, and was the first unit to employ this structure.) Songs live in a music library. They are “moved” to a queue of songs to play. Users may move songs one at a time or implicitly by selecting a “natural grouping” of songs—an album or an artist, for example. Conceptually, a music player’s key interaction is moving songs from library to queue. At any given time, users need to know what song is currently playing and what songs will be coming up. They also need to navigate the library to facilitate moving songs to and from their queue.

I don’t know if this is the best structure, but it appears to be employed across the board. Even though the underlying structure is consistent, it’s possible for each system to present a different mechanism for navigating the library and moving songs from library to queue. Possible, but unfortunately not true: despite having differing design philosophies, all three devices use nearly identical information architectures, all of which resemble the iPod’s structure. The root menu of each system varies slightly, but one option takes users to a familiar menu:

  • Browse Albums
  • Browse Artists
  • Browse Composers
  • Browse Genres
  • Browse Songs

In Sonos’ system, this menu is called “Music Library”; SoundBridge calls it “Browse.” Selecting any of the options from this menu will take users to an alphabetical list of albums, artists, etc. Each entry represents a group of songs. Users can move the entire group to the play-queue, or can “open” the group to look at individual songs.

Looking at all the songs in a group, users can select a track and play it, add it to the queue or get more information about it. Specifics vary depending on the system. Soundbridge takes you to a list of options, the first of which is “play songs starting with this one,” allowing users to select the group of songs by selecting one song inside the group.

When compared directly, the core information architectures of each are virtually indistinguishable. Each album, genre, artist, and composer is a separate category and each track fits into one of each. There are relationships between the categories:

  • Genre → Artist → Album
  • Bluegrass → Del McCoury Band → It’s Just the Night

The problem is that music is much more complicated than this architecture, even if it does account for some of the nuances of music libraries. For example, an artist or album can belong to multiple genres:

  • Folk → Eva Cassidy → Songbird
  • Popular → Eva Cassidy → Songbird

Another problem with the architecture is that artists’ names may be rendered differently, depending on what they’re working on:

  • Bela Fleck & the Flecktones → UFO TOFU
  • Bela Fleck and Edgar Meyer → Music for Two
  • Edgar Meyer/Bela Fleck/Mike Marshall → Uncommon Ritual

Each of these instances of Bela Fleck is rendered differently in the architecture, because the architecture is conceived as a straight hierarchy.

“All the problems with navigation can be traced back to a single central issue: lack of data. Creating more complex structures depends on having more comprehensive information about the music.”

All the problems with navigation can be traced back to a single central issue: lack of data. Creating more complex structures depends on having more comprehensive information about the music. Because the artist is rendered as a simple text field, the systems can not match up “Bela Fleck & the Flecktones” with “Edgar Meyer/Bela Fleck/Mike Marshall.” Using the systems’ browse features alone I would not be able to find every track in my library on which Bela Fleck performs. The systems’ search features afford some improvement, but they still depend on having good metadata.

Searching Music

The appalling state of music metadata is no secret. Other authors have already explored the limitations of the available metadata with respect to jazz, a genre that “goes beyond the ‘Great Man’ theory and recognizes the influence of side players…” Whether other genres of music have as rich a metadata landscape as jazz is immaterial. Liner notes from any album in any genre hold more information than currently captured in most digital audio systems. All three manufacturers highlighted in this article believe the lack of good metadata is a crisis facing the entire industry. However, they all feel that once the industry cracks the nut, their devices will be prepared to address it.

Search on the Squeezebox and Soundbridge operate as you would expect them to. Select a search field from a menu, enter keywords video game hall-of-fame style with the arrow keys, and get a list of results. The extra step of selecting a field (eg: Search Artists) seems pointless, but Soundbridge engineer Mike Kobb explains:

[I]f I want to find tracks by “Barenaked Ladies”, it’s only a few key presses to choose “Search Artists,” then enter “ba.” The same 2-letter search would find too many items if it were done as a keyword search. I believe making the initial selection and then entering a smaller term is generally quicker than entering enough letters in a keyword search to get a small result set.

This makes sense from a technical point of view: allow people to limit the scope of their search so they don’t need to enter as many letters with the arrow keys. This approach solves one issue with navigation. So long as “bela” appears in the artist field, I can do a search to find all Bela Fleck’s music in my library. On the other hand, entering “be” to see all Bela Fleck tracks seems like an enormous conceptual leap from browsing a library of CDs. In other words, if the task is “get a list of all Bela Fleck’s tracks,” my inclination is to browse by artist—kind of like what I would do in real life.

The third device, Sonos, does not offer a search mechanism. They intend to offer it in the future, but provide no rationale as to why it wasn’t included in the initial release.

Knowing Where You Are

Digital music players give us two virtual spaces: the library and the queue. Knowing your “location” in the library is relatively easy because a mental image of the virtual space is readily available. When navigating the library, users are focusing on the task at hand. The use case for the queue is a different story; users put the queue together and leave it to do its thing. Only occasionally does the queue become the focus of attention after the initial set-up. All three units have a default view called “Now Playing,” in which the display shows information about the track that’s currently coming out of the stereo. Usually, that’s the name of the track and the amount of time left on the song.

On shelf-bound displays, Soundbridge and Squeezebox both give you “one-click” access to the next song. On Soundbridge, simply push the down arrow on the remote and you’ll see what’s next in the queue. Keep pushing the down arrow, and you’ll scroll through the queue. Sonos offers a bit more information, but not much. The “Now Playing” display shows the title of the next song, and getting to the entire queue is just a click away.

When looking at the queue on Sonos the large up-close display offers a broader view, providing more context. Think about using a CD: a complete track listing in the liner notes; you can see the whole thing and get information like song length. Displays of the shelf-bound devices offer only a limited window into the queue. Sonos’ display offers more information because you can see more of the queue. Still, the experience is not quite the same as looking at a set of liner notes because it lacks all the other information.

Is it fair to compare the user experiences of digital and analog worlds? Until music players carve out a new set of user behaviors, their designers don’t have much choice. People are used to interacting with their personal music collections in a certain way, and deviating too far may slow the adoption of new technologies.

Supporting User Behaviors

With only a few nit-picky exceptions the three devices generally do a good job supporting three basic scenarios:

  • I want to play an album and I know which one.
  • I want to play an album by an artist whose name I know.
  • I want to play a specific song and I know its album/artist/genre.

As an end-user, these tasks are pretty easy, once you get the hang of the IA and the interaction model with the remote control. If you want to create a mix on the fly things can get a little clunky as you run through the last task several times over.

Moving back and forth between your music library and the current queue requires gestures that may be difficult for users to get used to. Also, the idea of a queue is unique to this interaction model. If you’re doing the DJ thing and playing random songs for your friends, you may stack up a bunch of CDs to go through, but the queue is in your head and easily modified.

Each of these scenarios depends on user knowledge. If you know the artist or album, you can easily narrow down the library. Things get difficult when you don’t know the name of the song, or when you know the name of the artist, but not which variation of their name is the correct one.

Browsing is another user behavior that’s been neglected. There’s an aspect to browsing a collection of CDs that’s lost when translated to an iTunes-like environment. People don’t keep their entire music library in their head, and the ability to browse is crucial. Because the browse features on these systems are pre-divided into Track, Artist, Album, and Genre, “browsing” is limited to only text-based information.

Browsing a long list of album names is not the same thing as browsing jewel case spines. Color, typography and organization of the jewel cases give more information than just the album name; I may know that the Yonder Mountain String Band song I want is on their latest album which has a brown spine with orange lettering. The black spine with white lettering is their earlier album. I may not know the names of these albums, just the look of their spines. This free-browsing of a physical CD library is a nut not yet cracked by the industry. To be fair, this is a serious challenge: how do you support existing behaviors when users are used to browsing by more than just the names of albums or artists?

On the other hand, a virtual environment enables behaviors unimaginable in the physical world. Wouldn’t it be great if I could play tracks:

  • Based on how much I listen (or don’t listen) to them
  • Based on how often I play them sequentially
  • That my wife has marked as a favorite
  • That my kids did NOT mark as a favorite
  • Featuring certain kinds of instruments or vocalists
  • That have a special place in music history (like the “definitive” newgrass song)
  • That have been tagged by other listeners with particular keywords
  • I usually play on this day of the week or year
  • That feature a specified combination of musicians
“Virtual spaces with robust metadata models enable the kind of serendipitous browsing you’d find on IMDB, or the “social networking” you’d find on Del.icio.us.”

As online services emerge that compile this and other information, network audio players will need to tap into that metadata to enrich the music playing experience. Virtual spaces with robust metadata models enable the kind of serendipitous browsing you’d find on IMDB, or the “social networking” you’d find on Del.icio.us. Music libraries are ripe for this kind of experience, and the proliferation of these players could be the catalyst to bring about the change.

Conclusion

There is something very cool about storing all your music on a single server and being able to play it in any room in the house. Homeowners have an option for whole-house audio that, while still bearing a hefty price tag, doesn’t come close to the cost of “old school” systems. (The cheapest network audio systems are only a few hundred dollars, but you need a unit AND a stereo for each room.) The wireless network is much more appealing than running miles of cable through your walls.

When these manufacturers sought to create a whole-house audio system, they each started with slightly different ideas for the user interface problem. For Slim Devices, the pioneer, it was whether it could be done at all. The others each chose a different aspect: the remote, multiple zones, the display. The purpose of this article is not to recommend one device over another (there are many more than these three). The point is, none of these three devices demonstrate any innovation in the underlying information architecture.

Network audio technology is faced with a chicken-and-egg situation. Innovative IA in audio devices like these will be limited by the available metadata. At the same time, industry fears of piracy will limit the amount of metadata supplied with the music. Until the adoption of audio devices reaches critical mass, the industry won’t face pressure from consumers to expand the quality of data, but audio device adoption may stall without more innovative navigation methods.


Dan Brown is not the Dan Brown who wrote The Da Vinci Code, but wishes he were.

4 comments

  1. I would agree with nearly all that Dan mentions here.

    However… I would add to the IA challenges:
    A1. How do users find…
    A2. How do users know…
    >> A3. How do users control…

    And to the other challenges:
    B1. Limited display size…
    B2. Limited metadata…
    B3. User’s expectations/experience…
    >> B4. Industry non-cooperation and competition

    ==========
    Several articles could be written by combining the A and B lists.

    Why do so many people listen in shuffle mode? Are they happy there?

    Why are so many iPod users settling for a text-based interface? What other great interfaces have been pushed aside by Apple’s marketing might?

    Why are good metadata libraries that are licensed to online stores and services (e.g. Muze, SavageBeast/Pandora, AMG) not available to individuals?

    Can user-defined and user-activity tagging be combined with a fixed metadata structure in a way that creates powerful and still easy-to-use navigation?

    This is a deep topic. More should be written about it in collaboration with Music Cognitive Psychology folk.

    – Mike

  2. A nit to pick, but I must:

    iPod/iTunes expressly does NOT use a queue for playback (setting aside playlists/on the go).

    In regular playback situations, the view is the queue – once you start playing a song, playback continues down whatever list of songs was in view when playback was started. That’s why iTunes has the little arrow in the information window to return you to the view that is feeding the current playback.

    The iTunes paradigm needs some support on the iPod, thus the addition of the “Now Playing” menu item. However, the Now Playing screen does not replace the arrow widget described above and does not allow the user to see the list of songs that are playing. To see that list they must follow the browse path that kicked off playback in the first place.

    This approach differs from many other applications and devices that use a queue as a place where songs must go to be played. After songs are placed in the queue they can be saved as a playlist etc.

    I’m still not sure which is the best approach – the time I’ve spent in the lab looking at how users approach these models yields little sense that one is better than the other. Newbies tend to do well with either while more habituated users prefer what they were exposed to first.

    In general, it seems to me that the queue system is more powerful but requires some discovery while the WYSIWYH (what you see is what you hear) model is more obvious, but dead ends the user in certain tasks.

    Mr. Constantiono is entirely accurate that there is much lurking below the surface that should be more deeply explored. I do have a couple thoughts on the popularity of shuffle mode though:

    1) It’s simple and requires very little thought. It also yields unexpected juxtapositions of songs that can be quite delightful.

    2) It’s a great way to explore all the songs you “borrowed” from your friend but have no idea what they are or whether you’ll like them.

    – Jon

  3. Another consideration on this topic is that different genres may imply different browsing strategies. The main organizing concept in classical music is the ‘work’ – i.e., the opus, the piece, e.g., a piano trio, a symphony, a concerto., etc. And the work is subdivided into individual movments, which correspond to tracks. So in classical music there is an extra layer in the hierarchy that doesn’t typically exist in other music.

    When I’m browsing my classical music I may go COMPOSER (e.g., Beethoven) > WORK (e.g., “Symphony No. 9 in D minor, Op. 125”) > MOVEMENT (“2nd movement – Molto vivace”)

    When I’m browsing popular music I go ARTIST (e.g., “Moby”) > SONG (e.g., “South Side”)

    But in jazz I might focus on certain classic albums. e.g. ARTIST (“Miles Davis”) > ALBUM (e.g. “Sketches of Spain”) and play the whole thing.

    Any good IA for music has to account for different genres having different organizational and searching paradigms.

Comments are closed.