Building a Data-Backed Persona

Posted by

Incorporating the voice of the user into user experience design by using personas in the design process is no longer the latest and greatest new practice. Everyone is doing it these days, and with good reason. Using personas in the design process helps focus the design team’s attention and efforts on the needs and challenges of realistic users, which in turn helps the team develop a more usable finished design. While completely imaginary personas will do, it seems only logical that personas based upon real user data will do better. Web analytics can provide a helpful starting point to generate data-backed personas; this article presents an informal 5-step process for building a “persona of the people.”

In practice, outcomes indicate that designing with any persona is better than with no personas, even if the personas used are entirely fictitious. Better yet, however, are personas that are based on real user data. Reports and case studies that support this approach typically offer examples incorporating data into personas from customer service call centers, user surveys and interviews. It’s nice work if you can get it, but not all design projects have all (or even any!) of these rich and varied user data sources available.

However, more and more sites are now collecting web analytic data using vendor solutions or free options such as Google Analytics. Web analytics provides a rich source of user data, unique among the forms of user data that are used to evaluate websites, in that it represents the users in their native habitat of use. Despite some drawbacks to using web analytics that are inherent to the technology and data collection methods, the information it provides can be very useful for informing design.

Google Analytics is readily accessible and offers great service for the price, so for the sake of example, the methods described here will refer to specific reports in Google Analytics. Any web analytics solution will provide basic reporting similar to Google Analytics, give or take a few reports, so using a different tool will just require you to determine which reports will provide data equivalent to the reports mentioned here.

To illustrate the process, an example persona design scenario is included in the description for each of the five steps:

Kate is an independent web design contractor who is redesigning the website of a nonprofit professional theater company. She has hardly any budget, plenty of content, and many audiences to consider. The theater’s website fills numerous functions: it advertises the current and upcoming plays for patrons; provides patrons information about ticketing and the live theater experience; announces auditions; specifies playwright manuscript and design portfolio requirements for theater professionals; recruits theater intern staff; serves as the central repository of collected theater history in the form of past play archives and press releases; advertises classes and outreach activities; and attempts to develop a donor base as well. As she gathers requirements, Kate decides to use the theater’s new Google Analytics account as a data source for building personas.

Step One: Collect Data

After Google Analytics has been installed on a site, you must wait for data to accumulate. Sometimes you will have the good fortune to start a project that has already been collecting data for months or years, but when this isn’t the case, try to get as much data as you can before extracting the reports you will use to build personas. Ideally, you want to have enough data for reporting to have statistical power, but not all sites generate this level of traffic. As a rule of thumb, less than two weeks of data is not sufficient for any meaningful analysis. One to three months of the most recent data is much more appropriate.

If it is reasonable, try to set up two profiles to filter on new and returning visitors. While some Google Analytics reports do allow segmentation, profile filtering on new versus returning visitor status gives you the best access to the full array of reports for each visitor segment. If this setup can be arranged early in data collection, then you can later draw on a profile that contains only new visitors to determine the characteristics of your personas who are new visitors, and likewise for returning visitors.

Kate has been given administrator privileges in the theater’s Google Analytics account for the duration of her contract. The theater has just one profile that includes all site traffic, so she starts off by making two new profiles with filters to include new visitors in one profile and returning visitors in the other. Kate knows that she needs a decent sample of site data, so she monitors the profiles weekly to make sure that the data is accumulating. She starts designing her personas using the existing Google Analytics profile (all visitors), and checks back later on the custom profiles to see if the segmented data can provide any new insights to add to her personas.

Step Two: Determine How Many Personas to Use

Next, determine how many personas to use–generally no less than three and rarely more than seven or eight. This gives you the number of blank slates across which to proportionately distribute the user characteristics that you extract from Google Analytics reports. If there are four personas, each will be assigned the characteristics of 25% of the site audience in each report; if five personas, each represents 20% of the site audience. Despite the fact that you’re working with statistics, you don’t have to be exacting in proportionately representing user segments; sometimes it is very important, for business reasons, to strongly represent a small user segment.

After thinking carefully about the many functions that the site has to fill, Kate looks at the Top Content report in Google Analytics to see what pages get the most traffic. She notices that most of the top pages are related to current shows, tickets and directions, and decides that she will have at least one persona represent a first-time patron who plans to travel from out of town. The other pages that are popular include the “About Us,” “People,” and “Classes” pages; “Auditions” is a little further down the page, but well above “Support Us.” Kate determines that she will create another persona to represent people interested in joining the theater company. Kate knows that fund development is important to the theater, but it doesn’t appear to be all that important to the website audience, so she decides to create another patron persona who has attended several plays and is interested in becoming a donor. She feels that these three roles can represent the audience the theater is most interested in reaching, and starts creating a persona document for each of them. She names her personas: Regina is the first-time out-of-town patron, Monica is the would-be theater participant, and Rex is the returning patron.

Step Three: Gather Your Reports

After allowing some data to accumulate, the next step is to acquire the Google Analytics reports, whether you’re interacting directly with the application yourself or someone else is providing you with reports. If you are not the person extracting data, make sure that you receive the PDF exports of reports, as these contain summary data that is not present in some of the other export formats. Whether or not you have profiles that are filtered on new versus returning visitor segments, you will be interested in the same handful of reports:

  • Visitors Overview Report. In one convenient dashboard-style screen, you can get the percentage of “new visits,” or visits by new visitors, and a snapshot of other visitor characteristics.
  • Browsers and OS Report. While you can look at browsers and operating systems separately in other individual reports, it usually makes more sense to look at them in combination in the Browsers and OS Report. Typically only a handful of browser and operating system combinations are required to represent well over 90% of the site’s visitors.
  • Map Overlay Report. To use this report, which provides a great deal of detail on the geographic origins of site visits, you will need to do just a little bit of math. Divide the number of visits from the top country or region (whichever is of greater use to you) by the total number of visits to get the percentage of visits from that geographical area. This allows you to determine the proportions of domestic and international visits. For the visits from your country, you will want to drill down to the city level and select a few cities from the top ranks of the list, keeping in mind that big cities will statistically generate more traffic than small ones. For your international visitors, choose from the top cities in the countries that bring the most visits.
  • Keywords Report. This report shows the queries that bring users to your site. When you look at the search engine query terms, ask yourself, “What are our users looking for? What type of language do they use when searches bring them to our site?” This gives you a starting point to think about user motivations and goals.
  • Referring Sites Report. Like the Keywords Report, the Referring Sites Report gives you an opportunity to look for answers to questions like, “Where do our users come from? Are they reaching our site from search engines, other sites, or just appearing directly with no referrer, as returning visitors are more likely to do?”

If you have the segmented profiles set up, extract the same reports from both of these profiles, and get the Visitors Overview report from an unfiltered profile.

Kate starts looking for report data to build her personas. She has already generated user goals for her 3 personas, but the goals are pretty general, so she hopes to find more specific characteristics that are based on the real user population. Kate consults the Visitors Overview report and find that about 75% of the site’s visits in the last month were from new visitors; she decides that the Regina and Monica personas will be new visitors to the site and quickly brainstorms a few questions that she thinks they might have, based on their goals, that motivate their site visits. The last persona, Rex, will be a returning visitor.

Kate knows that the overwhelming majority of patrons are local because it is a regional theater company. She checks the Map Overlay report and sees that at the state level, about half of the visitors come from Michigan, where the theater is located. She decides that Monica comes from another state, and picks New York because it’s in second place behind her state, and because of the level of activity of the theater community in New York City. Kate drills down to view the traffic from Michigan, and chooses the top city for Rex’s home–the city is near the theater, so this makes intuitive sense. For Regina, who is planning to travel a little further, she selects the #4 city, which is about an hour away, and is a much bigger city. The visitors from that city have longer visits and a lower bounce rate, so she feels these characteristics would match well with Regina’s goal of planning an out-of-town visit to the theater. Coming from that city, she will also want to have dinner and stay the night at a local bed-and-breakfast, so Kate jots down these additional goals for Regina.

Since two of her personas are new visitors, Kate looks up the Traffic Sources Overlay and then the Referring Sites and Keywords reports. There’s a lot of search engine referral traffic, and some strong referrers among regional event listings sites. She decides that Regina got to the site from an event listings site that refers a lot of traffic, and that Monica arrived from a Google search on the phrase, “auditions in Michigan.” Kate thinks that a logical reason Monica would be searching for auditions in Michigan is because she’s planning to move there from New York, so Kate adds this detail to Monica’s persona.

Step Four: Fill in the Blanks

The next step is to “fill in the blanks” from the report data. Make a template for each persona, and first fill in whether they are a new or returning visitor. If you have segmented profiles on new versus returning visitor status, draw the remaining characteristics of your new visitors evenly from the new visitors profile, and likewise for the returning visitors. When you have distributed the other statistics (browser, operating system, and geographical location) among your persona templates, review them against the unfiltered “all visitors” profile for a reality check to make sure you have not unintentionally over-represented a user characteristic, which is one hazard of using segmented data. If you have no preconceptions about user goals, you can distribute the report characteristics randomly at this point, as there is not necessarily much meaningful interplay between the statistics for new/returning status, geographic location, and browser/OS. Alternately, using a goal-oriented approach as in the example, you can select persona characteristics from the user data that make sense with the goals you have established.

Kate took a goal-oriented approach to building her personas, so she has already assigned the report data to the personas. She builds her normal persona description template with the notes she made while looking at reports and adds OS and browsers based on the Google Analytics report to each of them. Kate then starts drilling down into the Google Analytics reports’ segmentation to add more detail. She clicks on Rex’s city in the Map Overlay to check the average visit length, bounce rate, and number of pageviews in the visit, which she uses to help her think about which pages Rex would be looking at, given his goals and those averages. Visits from Regina’s city are a little longer, so Kate considers what pages might show up, and checks the event listings site that referred Regina’s visit to find out what Regina might already know before visiting the theater’s site. Kate also checks on the referrers and keywords for visits from NYC and verifies that they contain some phrases similar to the one she chose for Monica.

Step Five: Bring the Personas to Life

The fifth and final step is to breathe life into these rough skeletons of personas. This is the familiar practice of generating the rest of the fictitious biography of the user, the detailed picture of who that person is and what motivates her or him, and so on. Let your creativity take over and build off the initial characteristics from the web analytics data to create a coherent persona. For example, the assigned browsers and operating systems should guide the determination of the computer makes and models that your personas use. Use the new or returning visitor status to assign the personas a level of comfort with using your site and their motivations for the site visits. The geographic location determined from the user data can help generate appropriate user goals and challenges, as well as occupations and hobbies, which may differ for domestic and international users. The reports on Keywords and Referring Sites offer insight on visitors’ interests and motivations, albeit slightly abstracted, and are a good starter for writing usage scenarios.

Kate spends some more time fleshing out her personas, and eventually decides that she needs more information about Rex, the returning patron and would-be donor. She asks the theater for some information from their patron database about how often regular patrons from Rex’s city visit the theater. Kate also interviews the company’s Development Director to gain more perspective on the characteristics of the theater’s existing donors from the local area. After learning more about the types of donors that the theater attracts and the general giving patterns they have, Kate feels that Rex is a good representation of the kind of potential donor who would visit the theater’s website repeatedly, and adds in some additional details based on her interview with the Development Director.

If you have other sources of user data, this is a great time to work it in. Survey data can often provide useful demographics that web analytics cannot, like users’ age, sex, and education level, for example. Free answers from surveys, interviews and focus groups are great sources of inspiration for filling in the details that make personas come to life. The Google Analytics Keywords report can sometimes provide the very questions that bring users to your site–and where better to answer them than in the design process?

Even when there is relatively little user data available to aid in the process of persona development, leveraging the resources at hand creates a stronger design tool. The 5-step process presented here aims to provide a starting point for developing personas using web analytic user data, rather than relying solely on assumption or imagination. An evidence-based approach like this one can lend structure and credibility to using personas and scenarios in the design process. At the same time, user data and statistics must be creatively synthesized to produce a useful representation, and imagination is always required to transform a user profile into a persona.


  1. Interesting. This is the first time I’ve seen a description of personas being constructed from anonymous web analytics data. I’m a bit unsure how defensible this approach is though. For one thing, basing personas on this data implies there is nothing wrong with the design of the site as it is. For example, the data might show 60% of people going to the “about us” page, but that doesn’t tell you what their intention was, nor whether they got the information they wanted by going there. Would there be a chance that you would create a persona who goes to the “about us” page for a reason completely different from the reality?

    Nice touch in using a self-referential persona though (somebody using GA). It fried my brains a bit at first, but it’s actually quite clever.

  2. Interesting concept Andrea, whilst I think the sentiment is good, the application of this would only work in situations where there is no budget for user surveys to find out why people are visiting the “about us” page. Also nice use of integrating what little qualitative data there is at hand to manipulate the personas – I’d recommend this approach to someone who doesn’t have the research capabilitiies, time or budget to do the online surveys.

    Although it does bring me on nicely to an approach we adopted recently. Whilst our research team went away and conducted online surveys and interviews to work up the personas, we also went into overdrive and used some previous research to come up with some user behaviours. We then cross referenced the behaviours with the personas to show how these ‘people’ were interacting with the site and how they were getting there. This helped give a bit of context to the Business who couldn’t quite work out how we were going to link a persona to the nitty gritty of x million users having various different touch points with the site.

  3. Alec wrote: “I’d recommend this approach to someone who doesn’t have the research capabilities, time or budget to do the online surveys.”

    In other words, nearly everyone, including probably nearly all readers of this article. It’s interesting to read this along with the discussion going on at about personas the last couple days ( It’s so rare to work on a project where there’s time, budget, or willingness to do any qualitative user research.

    One of the key things about personas, I think, is that personas aren’t primarly a research technique, but a communication tool. So if you’re “only” able to synthesize assumptions, folklore, myths, and a little data about your users into a format that anyone can read and get something out of, you’re still making communication happen better.

  4. As a Web Analytics person, I’m all for using data as much as possible. I love the data.

    But not in personas. Generally, there are two groups of people who should not make guesses about user motivations and needs: Clients and Web professionals. Clients are too embedded in their business and all too easily absorb the culture and organizational opinions about what their users need. Generally, unless they’ve been doing a lot of research, these opinions are just wrong. Web professionals shouldn’t guess about what non-professionals think about a site. We are *far* too embedded in our own culture and milieu to be able to guess what someone who does not use the web regularly thinks or wants. And we also develop institutional knowledge and biases of our own.

    We always have to do research. And research does not have to be expensive. If you’re a little guy, find six people who you don’t know very well and ask them. Paper prototype, show them the site, go down to Starbucks and ask a stranger what they think. It won’t be statistically valid, but as anyone in the practice who has done research knows, you get good data from only a few people. Sampling more than ten is a waste of time.

    Our training as professionals makes us great at coming up with solutions, but it makes us less able to understand the customer. Personas are there to represnt the customer: not a demographic, not a perfect model of a perfect customer. Web Analytics data answers “what,” not “why.” Personas are about the why.

  5. Thanks for the comments, folks!

    Jonathan, the technique doesn’t make assumptions about the rightness or wrongness of the site’s current instantiation. In fact, I would generally assume it’s not optimal! This technique is less about the behavioral end than most of us would like, and that’s largely because of the tool’s robustness. I would not recommend using Google Analytics to dig into user goals; you can really do some wonders toward that end with the more fully-featured vendor solutions, but for this article I was specifically constraining the technique to a widely accessible tool that pretty much everyone can use. I wouldn’t speculate on why people were viewing the “About Us” page because there are lots of possible reasons for that one – but with something like “Classes” or “Auditions,” you can make more reasonable assumptions about user interests, if not goals. The page popularity statistics are among the least solid to interpret for this purpose, which is why I emphasize taking advantage of more directly interpretable information like geographic locations, browser/OS, and new/returning visitor status. This is hard data about real users and I think there’s a lot of value to be had from taking advantage of every data source we can get.

    Alec, I love the approach you describe! I have made use of web analytics to extract user behavior “in the wild” for similar uses, and I think it’s a very powerful approach. The main difference is the source of the user behavior data, and every data source has its advantages and disadvantages. Again, for digging into the behavioral aspects, I don’t think Google Analytics is a strong enough tool. To use the more sophisticated web analytics applications for this purpose, it’s important to work with a web analytics professional for validity reasons – they know the strengths and weaknesses of the data collection methods and analysis tools, which is important for interpreting the data.

    And you’re right – this was specifically written for all the folks who don’t have access to the preferred methods of collecting user data. Even when you have all of those user research data streams available, I think there’s value in bringing in the web analytic data. User interviews and surveys provide great information but there are notorious problems with self-reporting, so it may be useful to verify self-reported behavior against actual usage. The research world is quite fond of triangulating data, but I haven’t seen much emphasis on this type of validation for UX design.

    Alec, I think you’ve hit the nail on the head by saying that personas are communication tools. What the persona communicates will depend in part on what you put into it; my opinion is that a greater variety of sources for user data will help you generate a stronger persona. This technique is simply a way to use what you have, even when you haven’t got much to use, which is often the case – as you point out, it can be really hard to get support for qualitative data collection.

  6. Mark, your comments are very astute. You said, “Web Analytics data answers ‘what,’ not ‘why.’ Personas are about the why.” I agree with this – and am suggesting using the “what” to contribute to thinking about “why.” I think the part where I described a way to think about user goals as a starting point may be confusing the point here. Is that the main point of contention? For the sake of example, I didn’t want to assume that everyone can start with research-based user goals. In practice, I think it’s better to have the user goals determined before you start looking at web analytic data.

  7. I think this article is absolutely brilliant.

    I’ve been struggling with personas for a long, long time. Not in terms of creating them or leveraging them in my work as an information architect, but with integrating them as a communication tool amongst teams.

    Like all things with political overtones and buy-in aspects, personas have problems. First, from where they originate from, and second, how they’re written. Creative departments I’ve worked with in the past would discount personas altogether because of who originally crafted them (this includes persona creation by committee, where egos tend to dominate the direction of the initiative). There are usually conflicts on how any qualitative information can be meaningful to the business, much less start a common dialog about user goals, when departmental goals around results take precedent.

    However, analytics and hard numbers have ties to revenue and profits, especially with companies that rely on ad-serving plays to illustrate market share (like my current employer). Analytics touches marketing, sales, execs, and project teams, and it’s usually indicated as a success metric in requirements documents (i.e. KPI’s, % uplift from baseline, etc.). Thus, if personas can be derived from this data at its base via the suggestions in this article, I think you have a great blueprint to work from.

    I think with Step 5, I wanted a bit more detail on techniques or methods (even more examples) to tie in qualitative data (if available) into the persona mix. Specifically, trends between identified user types based on demographics that could tie into analytic data. If the parallels can be identified and matched, you have some good gateways to work on the ‘why’ as Andrew put it.

    Thank you very much for posting this article. I know I’m coming across as a fan boy, but I can’t stress how long it’s been since I’ve seen something compelling about persona creation enough to remark on it.

  8. Whoops. I posted my comment too soon.
    The second paragraph was missing the following sentence, after “… of the initiative).:
    “Likewise, the common language personas are meant to facilitate between departments as part of the project process is usually overlooked, especially with certain areas of the business in constant ‘firefighting’ mode with too little staff to cover the workload. (new paragraph)”

    Also, I meant ‘behavioral trends’ in that second to last paragraph.
    Hmm, the comments system here needs a 5-minute edit window after posting. 🙂

  9. What about personas from analytics reports based on time periods and considering other business data? I find that filtering, say, a keyword report by time periods and doing some cross referencing of other business trends data can be quite revealing about the “who” and beneficial for doing some analysis to hypothosize about the “why”—whether this data gets used for creating personas or not.

    If you’re lucky enough to be an a place that has the resources, the hardest part is breaking down that wall that exists between web analytics or UX design resources and market research resources.

  10. There have always been, three types of personas: actual, factual and fictional. Which you use is tactically determined based on a number of situational-specific factors.

  11. “In practice, I think it’s better to have the user goals determined before you start looking at web analytic data.” I couldn’t agree with you more. As an Analyst, I’m buggered (and not in the good way) without user and business goals.

    Isn’t the real goal of personas to connect us folk with them folk – the users? And inspire – and I think that is the key word – visual designers, interactive designers and team members in general to do our very best job to meet the needs of real people. To that end, Personas can’t be ‘numbery’ – they have to drive empathy, sympathy and understanding. They can’t be segments, or demographic profiles

    That’s the main reason I think you never read personas that start off “john q smith works at x-company and is a real bastard.” No one would ever care about meeting his needs.

    Note: Robert Williams is right that the analytics data can be a very good source of hypotheses. It’s one of the best sources of hypotheses. But then you’ve got to go out and test them, which circularly takes you back to qualitative material.

  12. Thanks for sharing your story. Inserting visual examples of reports, maps, and documentation that you use would make this an even better article.

    I have seen many efforts with personas generate fantastic results. However, those that struggle often do so for one of two reasons:
    1) They didn’t have enough of the right information
    2) The documentation used to communicate them did not bring things to life at the right level for the audience

    I believe in personas, but would like to see more visual best practices of the documentation and sharing mechanisms used. I feel this would be a tremendous benefit for both the design and research communities at large.

Comments are closed.