How to Determine When Customer Feedback Is Actionable

One of the riskiest assumptions for any new product or feature is that customers actually want it.

Although product leaders can propose numerous ‘lean’ methodologies to experiment inexpensively with new concepts before fully engineering them, anything short of launching a product or feature and monitoring its performance over time in the market is, by definition, not 100% accurate. That leaves us with a dangerously wide spectrum of user research strategies, and an even wider range of opinions for determining when customer feedback is actionable.

To the dismay of product teams desiring to ‘move fast and break things,’ their counterparts in data science and research advocate a slower, more traditional approach. These proponents of caution often emphasize an evaluation of statistical signals before considering customer insights valid enough to act upon.

This dynamic has meaningful ramifications. For those who care about making data-driven business decisions, the challenge that presents itself is: How do we adhere to rigorous scientific standards in a world that demands adaptability and agility to survive? Having frequently witnessed the back-and-forth between product teams and research groups, it is clear that there is no shortage of misconceptions and miscommunication between the two. Only a thorough analysis of some critical nuances in statistics and product management can help us bridge the gap.

Quantify risk tolerance

You’ve probably been on one end of an argument that cited a “statistically significant” finding to support a course of action. The problem is that statistical significance is often equated to having relevant and substantive results, but neither is necessarily the case.

Simply put, statistical significance exclusively refers to the level of confidence (measured from 0 to 1, or 0% to 100%) you have that the results you obtained from a given experiment are not due to chance. Statistical significance alone tells you nothing about the appropriateness of the confidence level selected nor the importance of the results.

To begin, confidence levels should be context-dependent, and determining the appropriate confidence threshold is an oft-overlooked proposition that can have profound consequences. In statistics, confidence levels are closely linked to two concepts: type I and type II errors.

A type I error, or false-positive, refers to believing that a variable has an effect that it actually doesn’t.

Some industries, like pharmaceuticals and aeronautics, must be exceedingly cautious against false-positives. Medical researchers for example cannot afford to mistakenly think a drug has an intended benefit when in reality it does not. Side effects can be lethal so the FDA’s threshold for proof that a drug’s health benefits outweigh their known risks is intentionally onerous.

A type II error, or false-negative, has to do with the flip side of the coin: concluding that a variable doesn’t have an effect when it actually does.

Historically though, statistical significance has been primarily focused on avoiding false-positives (even if it means missing out on some likely opportunities) with the default confidence level at 95% for any finding to be considered actionable. The reality that this value was arbitrarily determined by scientists speaks more to their comfort level of being wrong than it does to its appropriateness in any given context. Unfortunately, this particular confidence level is used today by the vast majority of research teams at large organizations and remains generally unchallenged in contexts far different than the ones for which it was formulated.

Matrix visualising Type I and Type II errors as described in text.


But confidence levels should be representative of the amount of risk that an organization is willing to take to realize a potential opportunity. There are many reasons for product teams in particular to be more concerned with avoiding false-negatives than false-positives. Mistakenly missing an opportunity due to caution can have a more negative impact than building something no one really wants. Digital product teams don’t share many of the concerns of an aerospace engineering team and therefore need to calculate and quantify their own tolerance for risk.

To illustrate the ramifications that confidence levels can have on business decisions, consider this thought exercise. Imagine two companies, one with outrageously profitable 90% margins, and one with painfully narrow 5% margins. Suppose each of these businesses are considering a new line of business.

In the case of the high margin business, the amount of capital they have to risk to pursue the opportunity is dwarfed by the potential reward. If executives get even the weakest indication that the business might work they should pursue the new business line aggressively. In fact, waiting for perfect information before acting might be the difference between capturing a market and allowing a competitor to get there first.

In the case of the narrow margin business, however, the buffer before going into the red is so small that going after the new business line wouldn’t make sense with anything except the most definitive signal.

Although these two examples are obviously allegorical, they demonstrate the principle at hand. To work together effectively, research analysts and their commercially-driven counterparts should have a conversation around their organization’s particular level of comfort and to make statistical decisions accordingly.

Focus on impact

Confidence levels only tell half the story. They don’t address the magnitude to which the results of an experiment are meaningful to your business. Product teams need to combine the detection of an effect (i.e., the likelihood that there is an effect) with the size of that effect (i.e., the potential impact to the business), but this is often forgotten on the quest for the proverbial holy grail of statistical significance.

Many teams mistakenly focus energy and resources acting on statistically significant but inconsequential findings. A meta-analysis of hundreds of consumer behavior experiments sought to qualify how seriously effect sizes are considered when evaluating research results. They found that an astonishing three-quarters of the findings didn’t even bother reporting effect sizes “because of their small values” or because of “a general lack of interest in discovering the extent to which an effect is significant…”

This is troubling, because without considering effect size, there’s virtually no way to determine what opportunities are worth pursuing and in what order. Limited development resources prevent product teams from realistically tackling every single opportunity. Consider for example how the answer to this question, posed by a MECLABS data scientist, changes based on your perspective:

In terms of size, what does a 0.2% difference mean? For, that lift might mean an extra 2,000 sales and be worth a $100,000 investment…For a mom-and-pop Yahoo! store, that increase might just equate to an extra two sales and not be worth a $100 investment.

Unless you’re operating at a Google-esque scale for which an incremental lift in a conversion rate could result in literally millions of dollars in additional revenue, product teams should rely on statistics and research teams to help them prioritize the largest opportunities in front of them.

Sample size constraints

One of the most critical constraints on product teams that want to generate user insights is the ability to source users for experiments. With enough traffic, it’s certainly possible to generate a sample size large enough to pass traditional statistical requirements for a production split test. But it can be difficult to drive enough traffic to new product concepts, and it can also put a brand unnecessarily at risk, especially in heavily regulated industries. For product teams that can’t easily access or run tests in production environments, simulated environments offer a compelling alternative.

That leaves product teams stuck between a rock and a hard place. Simulated environments require standing user panels that can get expensive quickly, especially if research teams seek  sample sizes in the hundreds or thousands. Unfortunately, strategies like these again overlook important nuances in statistics and place undue hardship on the user insight generation process.

A larger sample does not necessarily mean a better or more insightful sample. The objective of any sample is for it to be representative of the population of interest, so that conclusions about the sample can be extrapolated to the population. It’s assumed that the larger the sample, the more likely it is going to be representative of the population. But that’s not inherently true, especially if the sampling methodology is biased.

Years ago, a client fired an entire research team in the human resources department for making this assumption. The client sought to gather feedback about employee engagement and tasked this research team with distributing a survey to the entire company of more than 20,000 global employees. From a statistical significance standpoint, only 1,000 employees needed to take the survey for the research team to derive defensible insights.

Within hours after sending out the survey on a Tuesday morning, they had collected enough data and closed the survey. The problem was that only employees within a few timezones had completed the questionnaire with a solid third of the company being asleep, and therefore ignored, during collection.

Clearly, a large sample isn’t inherently representative of the population. To obtain a representative sample, product teams first need to clearly identify a target persona. This may seem obvious, but it’s often not explicitly done, creating quite a bit of miscommunication for researchers and other stakeholders. What one person may mean by a ‘frequent customer’ could mean something different entirely to another person.

After a persona is clearly identified, there are a few sampling techniques that one can follow, including probability sampling and nonprobability sampling techniques. A carefully-selected sample size of 100 may be considerably more representative of a target population than a thrown-together sample of 2,000.

Research teams may counter with the need to meet statistical assumptions that are necessary for conducting popular tests such as a t-test or Analysis of Variance (ANOVA). These types of tests assume a normal distribution, which generally occurs as a sample size increases. But statistics has a solution for when this assumption is violated and provides other options, such as non-parametric testing, which work well for small sample sizes.

In fact, the strongest argument left in favor of large sample sizes has already been discounted. Statisticians know that the larger the sample size, the easier it is to detect small effect sizes at a statistically significant level (digital product managers and marketers have become soberly aware that even a test comparing two identical versions can find a statistically significant difference between the two). But a focused product development process should be immune to this distraction because small effect sizes are of little concern. Not only that, but large effect sizes are almost as easily discovered in small samples as in large samples.

For example, suppose you want to test ideas to improve a form on your website that currently gets filled out by 10% of visitors. For simplicity’s sake, let’s use a confidence level of 95% to accept any changes. To identify just a 1% absolute increase to 11%, you’d need more than 12,000 users, according to Optimizely’s stats engine formula! If you were looking for a 5% absolute increase, you’d only need 223 users.

But depending on what you’re looking for, even that many users may not be needed, especially if conducting qualitative research. When identifying usability problems across your site, leading UX researchers have concluded that “elaborate usability tests are a waste of resources” because the overwhelming majority of usability issues are discovered with just five testers.

An emphasis on large sample sizes can be a red herring for product stakeholders. Organizations should not be misled away from the real objective of any sample, which is an accurate representation of the identified, target population. Research teams can help product teams identify necessary sample sizes and appropriate statistical tests to ensure that findings are indeed meaningful and cost-effectively attained.

Expand capacity for learning

It might sound like semantics, but data should not drive decision-making. Insights should. And there can be quite a gap between the two, especially when it comes to user insights.

In a recent talk on the topic of big data, Malcolm Gladwell argued that “data can tell us about the immediate environment of consumer attitudes, but it can’t tell us much about the context in which those attitudes were formed.” Essentially, statistics can be a powerful tool for obtaining and processing data, but it doesn’t have a monopoly on research.

Product teams can become obsessed with their Omniture and Optimizely dashboards, but there’s a lot of rich information that can’t be captured with these tools alone. There is simply no replacement for sitting down and talking with a user or customer. Open-ended feedback in particular can lead to insights that simply cannot be discovered by other means. The focus shouldn’t be on interviewing every single user though, but rather on finding a pattern or theme from the interviews you do conduct.

One of the core principles of the scientific method is the concept of replicability—that the results of any single experiment can be reproduced by another experiment. In product management, the importance of this principle cannot be overstated. You’ll presumably need any data from your research to hold true once you engineer the product or feature and release it to a user base, so reproducibility is an inherent requirement when it comes to collecting and acting on user insights.

We’ve far too often seen a product team wielding a single data point to defend a dubious intuition or pet project. But there are a number of factors that could and almost always do bias the results of a test without any intentional wrongdoing. Mistakenly asking a leading question or sourcing a user panel that doesn’t exactly represent your target customer can skew individual test results.

Similarly, and in digital product management especially, customer perceptions and trends evolve rapidly, further complicating data. Look no further than the handful of mobile operating systems which undergo yearly redesigns and updates, leading to constantly elevated user expectations. It’s perilously easy to imitate Homer Simpson’s lapse in thinking, “This year, I invested in pumpkins. They’ve been going up the whole month of October and I got a feeling they’re going to peak right around January. Then, bang! That’s when I’ll cash in.”

So how can product and research teams safely transition from data to insights? Fortunately, we believe statistics offers insight into the answer.

The central limit theorem is one of the foundational concepts taught in every introductory statistics class. It states that the distribution of averages tends to be Normal even when the distribution of the population from which the samples were taken is decidedly not Normal.

Put as simply as possible, the theorem acknowledges that individual samples will almost invariably be skewed, but offers statisticians a way to combine them to collectively generate valid data. Regardless of how confusing or complex the underlying data may be, by performing relatively simple individual experiments, the culminating result can cut through the noise.

This theorem provides a useful analogy for product management. To derive value from individual experiments and customer data points, product teams need to practice substantiation through iteration. Even if the results of any given experiment are skewed or outdated, they can be offset by a robust user research process that incorporates both quantitative and qualitative techniques across a variety of environments. The safeguard against pursuing insignificant findings, if you will, is to be mindful not to consider data to be an insight until a pattern has been rigorously established.

Divide no more

The moral of the story is that the nuances in statistics actually do matter. Dogmatically adopting textbook statistics can stifle an organization’s ability to innovate and operate competitively, but ignoring the value and perspective provided by statistics altogether can be similarly catastrophic. By understanding and appropriately applying the core tenets of statistics, product and research teams can begin with a framework for productive dialog about the risks they’re willing to take, the research methodologies they can efficiently but rigorously conduct, and the customer insights they’ll act upon.

Your Guide to Online Research and Testing Tools

The success of every business depends on how the business will meet their customers’ needs. To do that, it is important to optimize your offer, the website, and your selling methods so your customer is satisfied. The fields of online marketing, conversion rate optimization, and user experience design have a wide range of online tools that can guide you through this process smoothly. Many companies use only one or two tools that they are familiar with, but that might not be enough to gather important data necessary for improvement. To help you better understand when and which tool is valuable to use, I created a framework that can help in your assessment. Once you broaden your horizons, it will be easier to choose the set of tools aligned to your business’s needs. The tools can be roughly divided into three basic categories:

  • User testing: Evaluate a product by testing it with users who take the study simultaneously, in their natural context, and on their own device.
  • Customer feedback: Capture feedback of customer’s expectations, preferences, and aversions directly from a website.
  • Web analytics: Provide detailed statistics about a website’s traffic, traffic sources, and measurement of conversions and sales.

To better understand when to use which tool, it is helpful to use the following criteria:

  • What people say versus what people do… and what they say they do
  • Why versus how much
  • Existing classifications of online tools

The possible services are included at the latter part of the article to help you start.

What people say versus what people do… and what they say they do

What people say, what people do, and what they say they do are three entirely different things. People often lack awareness or necessary knowledge which would enable them to provide correct information. Anyone who has any experience with user research or conversion rate optimization and has spent time trying to understand users has seen firsthand that more often than not user statements do not match the acquired data. People are not always able to fully articulate why they did that thing they just did. That’s the reason it’s sometimes good to compare information about opinions to information on behavior, as this mix can provide better insights. You can learn what people do by studying your website from your users’ perspective and drawing conclusions based on observations of their behavior, such as click tracking or user session recording. However, that is based on the idea that you test certain theories about people’s behavior. There is a degree of uncertainty, and to validate the data you’ve gathered, you will sometimes have to go one step further and simply ask your users, which will allow you to see the whole picture. Therefore, you can learn what people say by reaching out to your target group directly and asking them questions about your business.

Why versus how much

Some tools are better suited for answering questions about why or how to fix a problem, whereas tools like web analytics do a much better job at answering how many and how much types of questions. Google Analytics tells you the percentage of people who clicked what thing to through to what page, but it doesn’t tell you why they did or did not do that. Having knowledge of these differences helps you prioritize certain sets of tools and use them while fixing issues having the biggest impact on your business.

The following chart illustrates how different dimensions affect the types of questions that can be asked:

chart illustrates how different dimensions affect the types of questions that can be asked.

Choosing the right tool—infographics

There are a lot of tools out there these days that do everything from testing information architecture and remote observation. With more coming out every day, it can be really hard to pick the one that will give you the best results for your specific purpose. To alleviate some of the confusion, many experts tried to classify them according to different criteria. I decided to include some of examples for your convenience below.

Which remote tool should I use? By Stuff On My Wall

A flow chart to evaluate remote tool choices.

Choosing a remote user experience research tool by Nate Bolt

Another chart showing evaluation criteria for remote research tools.

The five categories of remote UX tools by Nate Bolt

Five categories of user research tools.

Four quadrants of the usability testing tools matrix by Craig Tomlin

Usability testing tools arranged in a quadrant chart.

Tool examples

The examples of tools which I list below are best suited for web services. The world of mobile applications and services is too vast to be skimmed over and has enough material to be a different article completely. The selection is narrowed down in order to not overwhelm you with choice, so worry not.

User testing

User testing and research is vital to creating a successful website, products and services. Nowadays using one of the many existing tools and services for user testing is a lot easier than it used to be. The important thing is to find a tool or service that works for your website and then use it to gather real-world data on what works and what does not.

Survey: The most basic form of what people say. Online surveys are often used by companies to gain a better understanding of their customers’ motives and opinions. You can ask them to respond in any way they choose or ask them to select an answer from a limited number of predetermined responses. Getting feedback straight from your customers may be best used in determining their painpoints or figuring out their possible needs (or future trends). However, what you need to remember about is that people do not always communicate best what is exactly the issue they are facing. Be like Henry Ford: Do not give people faster horses when they want quicker transportation—invent a car.

Survey Gizmo

Card sorting: It focuses on asking your users to categorize and sort provided items in the most logical way for them or create their own possible categories for items. These two methods are called respectively closed and open card sorting. This will help you to rework the information architecture of your site thanks to the knowledge about the users’ mental models. If you aim to obtain information that balances between “what they do” and “what they say”, sorting is your best bet. Be sure to conduct this study in a larger group – some mental models might make sense, but aren’t the most intuitive for others. Focus on the responses that are aligned with each other, as it is possibly the most representative version of categories.


Click testing/automated static/design surveys: This lets you test screenshots of your design, so you can obtain detailed information about your users’ expectations and reactions to a website in various stages of development. This enters the territory of simply gathering data about the actions of your users, so you obtain information about what they do. The study is conducted usually by asking a direct question: “Click on the button which will lead to sign-ups”. However, remember, click testing alone is not sufficient enough, you need other tools that cover the part of “why” in order to fully understand.

Verify App

5-Second testing/first impression test: Because your testers have only five seconds to view a presented image, they are put under time pressure and must answer questions relying only on almost subconscious information they obtained. This enables you to improve your landing pages and calls to action, as users mostly focus only on the most eye-catching elements.

Optimal Workshop Chalkmark

Diary studies: An extensive database of all thoughts, feelings and actions of your user, who belongs to a studied target market. All events are being recorded by the participants at their moment of occurrence. This provides insights into firsthand needs of your customers, asking them directly about their experiences. Yet, it operates in a similar fashion to surveys, therefore remember that your participants do not always clearly convey what they mean.

FocusVision Revelation

Moderated user studies/remote usability testing: The participants of this test are located in their natural environment, so their experiences are more genuine. Thanks to the tools and software there is no necessity for participants and facilitators to be in the same physical location. Putting the study into context of a natural/neutral environment (of whatever group you are studying) gives you insight into unmodified behaviours. Also, the study is a lot cheaper than other versions.


Self-moderated testing: The participants of the test are expected to complete the tasks independently. After that you will obtain videos of their test sessions, along with a report containing information what problems your users were facing and what to do in order to fix them. The services offering this type of testing usually offer the responses quickly, so if you are in a dire need of feedback, this is one of possibilities.


Automated live testing/remote scenario testing: Very similar to remote testing, yet the amount of information provided is much more extensive and organized. You get effectiveness ratios (success, error, abandonment and timeout), efficiency ratios (time on task and number of clicks), and survey comments as the results.

UX Suite

Tree testing/card-based classification: It is a technique which completely removes every distracting element of the website (ads, themes etc.) and focuses only on the simplified text version. Through this you can evaluate the clarity of your scheme and pinpoint the chokepoints that present problems to users. It is a good method to test your prototypes or if you want to detect a problem with your website and suspect the basic framework is at fault.

Optimal Workshop Treejack

Remote eye tracking/online eye tracking/visual attention tracking: Shows you where people focus their attention on your landing pages, layouts, and branding materials. This can tell you whether the users are focused on the page, whether they are reading it or just scanning, how intense they are, and what is the pattern of their movement. However, it cannot tell you exactly whether your users actually do see something or do not, or why exactly do they look at a given part. This can be remedied for example with voiceovers, where the participants tell you right away what they feel.

a) Simulated: creates measurement reports that predict what a real person would most likely look at.


b) Behavioral: finds out whether people actually notice conversion-oriented elements of the page and how much attention they pay to them.

Attensee Eyetrack Shop

These are the singular features which are prominent elements of the listed services. However, nowadays there is a trend to combine various tools together, so they can be offered by a single website. If you happen to find more than one tool valuable for your business, you can use services such as UsabilityTools or UserZoom.

Customer feedback

Successful business owners know that it’s crucial to take some time to obtain customer feedback. Understanding what your customers think about your products and services will not only help you improve quality, but will also give you insights into what new products and services your customers want. Knowing what your customers think you’re doing right or wrong also lets you make smart decisions about where to focus your energies.

Live chats: an easy to understand way of communicating through the website interface in real time. Live chat enables you to provide all the answers your customers could want. By analyzing their questions and often inquired issues you can decide what needs improvement. Live chats usually focus on solving an immediate problem, so it is usually used for smaller issues. The plus is the fact that your client will feel acknowledged right away.

Live Person

Insight surveys: They help you understand your customers thanks to targeted website surveys. You can create targeted surveys and prompts by focusing them on the variables such as the time on page, the number of visits, the referring search term or your own internal data. You can even target custom variables such as the number of items in a shopping cart. However, they are very specific and operate on the same principle as general surveys, so you must remember about the risk that the survey participants won’t always be able to provide you with satisfactory answers.


Feedback forms: They are a simple website application to receive feedback from your website visitors. You can create a customized form, copy and paste code into your site’s HTML, and start getting feedback. This is a basic tool for getting feedback forms from your customer, and receiving and organizing results. If you want to know the general opinion about your website and the experiences of your visitors (and you want it to be completely voluntary), then forms are a great option.


Feedback forums: Users enter a forum where they can propose and vote on items which need change or need to be discussed. That information allows you to prioritize issues and decide what needs to be fixed as fast as possible. The forums can be also used for communicating with users, for example you can inform them that you introduced some improvements to the product. Remember, however, that even the most popular issues might be actually least important for imrpoving your serive and/or website, it is up to you to judge.

Get Satisfaction

Online customer communities: You refer to your customer directly, peer-to-peer, and offer problem solving and feedback. Those web-based gathering places for customers, experts, and partners enable you to discuss problems, post reviews, brainstorm new product ideas, and engage with one another.


There are also platforms that merge some of the functionalities such as UserEcho or Freshdesk which are an extremely popular solution to the growing demands of clients who prefer to focus on single service with many features.

Website analytics

Just because analytics provide you with some additional data about your site doesn’t mean it’s actually valuable to your business. You want to find the errors and holes within your website and fill them with additional functionality for your users and customers. Using the information gathered you can influence your future decisions in order to improve your service.

Web analytics: all movement of the users is recorded and stored. However, their privacy is safe, as the data gathered is used only for optimization, and the data is impossible to be personally identified. The data can be later used for evaluating and improving your service and website in order to achieve your goals such as increasing the amount of visitors or sales.

Google Analytics

In-page web analytics: They differ from traditional web analytics as they focus on the users’ movement within the page and not between them. These tools are generally used to understand behavior for the purposes of optimizing a website’s usability and conversion.

a) Click tracking: This technique used to determine and record what the users are clicking with their mouse while browsing the website–it draws you a map of their movements, which allows you to see step by step the journey of your user. If there is a problem with the website, this is one of the methods to check out where that problem could’ve occured.

Gemius Heatmap

b) Visitor recording/user session replays: Every action and event is recorded as a video.


c) Form testing: This allows you to evaluate the web form and identify areas that need improvement, for example which fields make your visitors leave the website before completing the form.

UsabilityTools Conversion Suite

In a similar fashion to the previous groups, there is also a considerable amount of analytic Swiss army knives offering various tools in one place. The examples of such are ClickTale, UsabilityTools, or MouseStats.


This is it—the finishing line of this guide to online research tools. It is an extremely valuable asset which can provide important and surprising data. The amount of tools available at hand is indeed overwhelming, that is why you need to consider the listed factors of what, why and such. This way you will reach a conclusion about what exactly you need to test in order to improve your service or obtain required information. Knowing what you want to do will help you narrow your choices and in result choose the right tool. Hopefully, what you’ve read will help you choose the best usability tools for testing, and you will end up an expert in your research sessions.

Online Surveys On a Shoestring: Tips and Tricks

Design research has always been about qualitative techniques. Increasingly, our clients ask us to add a “quant part” to projects, often without much or any additional budget. Luckily for us, there are plenty of tools available to conduct online surveys, from simple ones like Google Forms and SurveyMonkey to more elaborate ones like Qualtrics and Key Survey.

Whichever tool you choose, there are certain pitfalls in conducting quantitative research on a shoestring budget. Based on our own experience, we’ve compiled a set of tips and tricks to help avoid some common ones, as well as make your online survey more effective.

We’ve organized our thoughts around three survey phases: writing questions, finding respondents, and cleaning up data.

Writing questions

Writing a good questionnaire is both art and science, and we strongly encourage you to learn how to do it. Most of our tips here are relevant to all surveys, but particularly important for the low-budget ones. Having respondents who are compensated only a little, if at all, makes the need for good survey writing practices even more important.

Ask (dis)qualifying questions first

A sacred rule of surveys is to not waste people’s time. If there are terminating criteria, gather those up front and disqualify respondents as quickly as you can if they do not meet the profile. It is also more sensitive to terminate them with a message “Thank you for your time, but we already have enough respondents like you” rather than “Sorry, but you do not qualify for this survey.”

Keep it short

Little compensation means that respondents will drop out at higher rates. Only focus on what is truly important to your research questions. Ask yourself how exactly the information you collect will contribute to your research. If the answer is “not sure,” don’t ask.

For example, it’s common to ask about a level of education or income, but if comparing data across different levels of education or income is not essential to your analysis, don’t waste everyone’s time asking the questions. If your client insists on having “nice to know” answers, insist on allocating more budget to pay the respondents for extra work.

Keep it simple

Keep your target audience in mind and be a normal human being in framing your questions. Your client may insist on slipping in industry jargon and argue that “everyone knows what it is.” It is your job to make the survey speak the language of the respondents, not the client.

For example, in a survey about cameras, we changed the industry term “lifelogging” to a longer, but simpler phrase “capturing daily routines, such as commute, meals, household activities, and social interactions.”

Keep it engaging

People in real life don’t casually say, “I am somewhat satisfied” or “the idea is appealing to me.” To make your survey not only simple but also engaging, consider using more natural language for response choices.

For example, instead of using standard Likert-scale “strongly disagree” to “strongly agree” responses to the statement “This idea appeals to me” in a concept testing survey, we offered a scale “No, thanks” – “Meh” – “It’s okay” – “It’s pretty cool” – “It’s amazing.” We don’t know for sure if our respondents found this approach more engaging (we certainly hope so), but our client showed a deeper emotional response to the results.

Finding respondents

Online survey tools differ in how much help they provide with recruiting respondents, but most common tools will assist in finding the sample you need, if the profile is relatively generic or simple. For true “next to nothing” surveys, we’ve used Amazon Mechanical Turk (mTurk), SurveyMonkey Audience, and our own social networks for recruiting.

Be aware of quality

Cheap recruiting may easily result in low quality data. While low-budget surveys will always be vulnerable to quality concerns, there are mechanisms to ensure that you keep your quality bar high.

First of all, know what motivates your respondents. Amazon mTurk commonly pays $1 for the so-called “Human Intelligence Task” that may include taking an entire survey. In other words, someone is earning as little as $4 an hour if they complete four 15-minute surveys. As such, some mTurk Workers may try to cheat the system and complete multiple surveys for which they may not be qualified.

SurveyMonkey, on the other hand, claims that their Audience service delivers better quality, since the respondents are not motivated by money. Instead of compensating respondents, SurveyMonkey makes a small donation to the charity of their choice, thus lowering the risk of people being motivated to cheat for money.

Use social media

If you don’t need thousands of respondents and your sample is pretty generic, the best resource can be your social network. For surveys with fewer than 300 respondents, we’ve had great success with tapping into our collective social network of Artefact’s members, friends, and family. Write a request and ask your colleagues to post it on their networks. Of course, volunteers still need to match the profile. When we send an announcement, we include a very brief description of who we look for and send volunteers to a qualifying survey. This approach costs little but yields high-quality results.

We don’t pay our social connections for surveys, but many will be motivated to help a friend and will be very excited to hear about the outcomes. Share with them what you can as a “thank you” token.

For example, we used social network recruiting in early stages of Purple development. When we revealed the product months later, we posted a “thank you” link to the article to our social networks. Surprisingly even for us, many remembered the survey they took and were grateful to see the outcomes of their contribution.


If you are trying to hit a certain sample size for “good” data, you need to over-recruit to remove the “bad” data. No survey is perfect and all can benefit from over-recruiting, but it’s almost a must for low-budget surveys. There are no rules, but we suggest over-recruiting by at least 20% to hit the sample size you need at the end. Since the whole survey costs you little, over-recruiting will equally cost little.

Cleaning up data

Cleaning up your data is another essential step of any survey that is particularly important for the one on a tight budget. A few simple tricks can increase the quality of responses, particularly if you use public recruiting resources. When choosing a survey tool, check what mechanisms are available for you to clean up your data.

Throw out duplicates

As mentioned earlier, some people may be motivated to complete the same survey multiple times and even under multiple profiles. We’ve spotted this when working with mTurk respondents by checking their Worker IDs. We had multiple cases when the same IDs were used to complete a survey multiple times. We ended up throwing away all responses associated with the “faulty IDs” and gained more confidence in our data at the end.

Check response time

With SurveyMonkey, you can calculate the time spent on the survey using the StartTime and EndTime data. We benchmarked the average time of the survey by piloting the survey in the office. This can be used as a pretty robust fool-proof mechanism.

If the benchmark time is eight minutes and you have surveys completed in three, you may question how carefully respondents were reading the questions. We flag such outliers as suspect and don’t include them in our analysis.

Add a dummy question

Dummy questions help filter out the respondent quickly answering survey questions at random. Dummy questions require the respondent to read carefully and then respond. People who click and type at random might answer it correctly, but it is unlikely. If the answer is incorrect, this is another flag we use to mark a respondent’s data as suspect.

Low-budget surveys are challenging, but not necessarily bad, and with a few tricks you can make them much more robust. If they are used as an indicative, rather than definitive, mechanism to supplement other design research activities, they can bring “good enough” insights to a project.

Educate your clients about the pros and cons of low-budget surveys and help them make a decision whether or not they want to invest more to get greater confidence in the quantitative results. Setting these expectations up front is critical for the client, but you never know, it could also be a good tool for negotiating a higher survey budget to begin with!

Creativity Must Guide the Data-Driven Design Process

Collecting data about design is easy in the digital world. We no longer have to conduct in-person experiments to track pedestrians’ behavior in an airport terminal or the movement of eyeballs across a page. New digital technologies allow us to easily measure almost anything, and apps, social media platforms, websites, and email programs come with built-in tools to track data.

And, as of late, data-driven design has become increasingly popular. As a designer, you no longer need to convince your clients of your design’s “elegance,” “simplicity,” or “beauty.” Instead of those subjective measures, you can give them data: click-through and abandonment rates, statistics on the number of installs, retention and referral counts, user paths, cohort analyses, A/B comparisons, and countless other analytical riches.

After you’ve mesmerized your clients with numbers, you can draw a few graphs on a whiteboard and begin claiming causalities. Those bad numbers? They’re showing up because of what you told the client was wrong with the old design. And the good numbers? They’re showing up because of the new and improved design.

But what if it’s not because of the design? What if it’s just a coincidence?

There are two problems with the present trend toward data-driven design: using the wrong data, and using data at the wrong time.

The problem with untested hypotheses

Let’s say you go through a major digital redesign. Shortly after you launch the new look, the number of users hitting the “share” button increases significantly. That’s great news, and you’re ready to celebrate the fact that your new design was such a success.

But what if the new design had nothing to do with it? You’re seeing a clear correlation—two seemingly related events that happened around the same time—but that does not prove that one caused the other.

Steven D. Levitt and Stephen J. Dubner, the authors of “Freakonomics,” have built a media empire on exposing the difference between correlation and causation. My favorite example is their analysis of the “broken windows” campaign carried out by New York City Mayor Rudy Giuliani and Police Commissioner William Bratton. The campaign coincided with a drop in the city’s crime rate. The officials naturally took credit for making the city safer, but Levitt and Dubner make a very strong case that the crime rate declined for reasons other than their campaign.

Raw data doesn’t offer up easy conclusions. Instead, look at your data as a generator of promising hypotheses that must be tested. Is your newly implemented user flow the cause of a spike in conversion rates? It might be, but the only way you’ll know is by conducting an A/B test that isolates that single variable. Otherwise, you’re really just guessing, and all that data you have showing the spike doesn’t change that.

Data can’t direct innovation

Unfortunately, many designers are relying on data instead of creativity. The problem with using numbers to guide innovation is that users typically don’t know what they want, and no amount of data will tell you what they want. Instead of relying on data from the outset, you have to create something and give it to users before they can discover that they want it.

Steve Jobs was a big advocate of this method. He didn’t design devices and operating systems by polling users or hosting focus groups. He innovated and created, and once users saw what he and his team had produced, they fell in love with a product they hadn’t even known they wanted.

Data won’t tell you what to do during the design process. Innovation and creativity have to happen before data collection, not after. Data is best used for testing and validation.

Product development and design is a cyclical process. During the innovation phase, creativity is often based on user experience and artistry — characteristics that aren’t meant to be quantified on a spreadsheet. Once a product is released, it’s time to start collecting data.

Perhaps the data will reveal a broken step in the user flow. That’s good information because it directs your attention to the problem. But the data won’t tell you how to fix the problem. You have to innovate again, then test to see if you’ve finally fixed what was broken.

Ultimately, data and analysis should be part of the design process. We can’t afford to rely on our instincts alone. And with the wealth of data available in the digital domain, we don’t have to. The unquantifiable riches of the creative process still have to lead design, but applying the right data at the right time is just as important to the future of design.

A Beginner’s Guide to Web Site Optimization—Part 3

Web site optimization has become an essential capability in today’s conversion-driven web teams. In Part 1 of this series, we introduced the topic as well as discussed key goals and philosophies. In Part 2, I presented a detailed and customizable process. In this final article, we’ll cover communication planning and how to select the appropriate team and tools to do the job.


For many organizations, communicating the status of your optimization tests is an essential practice. Imagine if your team has just launched an A/B test on your company’s homepage, only to learn that another team had just released new code the previous day that had changed the homepage design entirely. Or imagine if a customer support agent was trying to help users through the website’s forgot password flow, unaware that the customer was seeing a different version due to an A/B test that your team was running.

To avoid these types of problems, I recommend a three-step communication program:

  1. Pre-test notification

This is an email announcing that your team has selected a certain page/section of the site to target for its next optimization test and that if anyone has any concerns, they had better voice them immediately, before your team starts working on it. Give folks a day or two to respond. The email should include:

  • Name/brief description of the test
  • Goals
  • Affected pages
  • Expected launch date
  • Link to the task or project plan where others can track the status of the test.

Here’s a sample pre-test notification.

  1. Pre-launch notification

This email is sent out a day or two before a new experiment launches. It includes all of the information from the Pre-Test Notification email, plus:

  • Expected test duration
  • Some optimization tools create a unique dashboard page in which interested parties can monitor the results of the test in real-time. If your tool does this, you can include the link here.
  • Any other details that you care to mention, such as variations, traffic allocation, etc…

Here’s a sample pre-launch email.

  1. Test results

After the test has run its course and you’ve compiled the results into the Optimization Test Results document, send out a final email to communicate this. If you have a new winner, be sure to brag about it a little in the email. Other details may include:

  • Brief discussion
  • A few specifics, such as conversion rates, traffic and confidence intervals
  • Next steps

Here’s a sample test results email.

Team size and selection

As is true with many things, good people are the most important aspect of a successful optimization program. Find competent people with curious minds who take pride in their work – this will be far more valuable than investment in any optimization tool or adherence to specific processes.

The following are recommendations for organizations of varying team sizes.

One person

It is difficult for one person to perform optimization well unless they are dedicated full-time to the job. If your organization can only cough-up one resource, I would select either a web analytics resource with an eye for design, or a data-centric UX designer. For the latter profile, I don’t mean the type of designer who studied fine art and is only comfortable using Photoshop, but rather the type who likes wireframes, has poked around an analytics tool on their own, and is good with numbers. This person will also have to be resourceful and persuasive, since they will almost certainly have to borrow time and collaborate with others to complete the necessary work.

Two to three people

With a team size of three people, you are starting to get into the comfort zone. To the UX designer and web/data analytics roles, I would add either a visual designer or a front-end developer. Ideally, some of the team members would have multiple or overlapping competencies. The team will probably still have to borrow time from other resources, such as back-end developers and QA.

Five people

A team that is lucky enough to have five dedicated optimization resources has the potential to be completely autonomous. If your organization places such a high value on optimization, they may have also invested accordingly in sophisticated products or strategies for the job, such as complex testing software, data warehouses, etc… If so, then you’ll need folks who are specifically adept at these tools, broadening your potential team to roles such as data engineers, back-end developers, content managers, project managers, or dedicated QA resources. A team of five would ideally have some overlap with some of the skill-sets.

Tool selection

The optimization market is hot and tool selection may seem complicated at first. The good news is that broader interest and increased competition is fueling an all-out arms race towards simpler, more user-friendly interfaces designed for non-technical folks. Data analysis and segmentation features also seem to be evolving rapidly.

My main advice if you’re new to optimization is to start small. Spend a year honing your optimization program and after you’ve proven your value, you can easily graduate to the more sophisticated (and expensive) tools. Possibly by the time you’re ready, your existing tool will have advanced to keep up with your needs. Also realize that many of the cheaper tools can do the job perfectly well for most organizations, and that some organizations with the high-powered tools are not using them to their fullest capabilities.

A somewhat dated Forrester Research report from February 2013 assesses some of the big hitters, but notably absent are Visual Website Optimizer (VWO) and, for very low end, Google’s free Content Experiments tool. Conversion Rate Experts keeps an up-to-date comparison table listing virtually all of today’s popular testing tools, but it only rates them along a few specific attributes.

I performed my own assessment earlier this year and here is a short list of my favorites:

Visual Website Optimizer (VWO)
Google Content Experiments
Adobe Test & Target

Here are a few factors to consider when deciding on products:

Basic features

Intuitive user interface

Luckily, most tools now have simple, WYSIWYG type of interfaces that allow you to directly manipulate your site content when creating test variations. You can edit text, change styles, move elements around, and save these changes into a new test variation. Some products have better implementations than others, so be sure to try out a few to find the best match for your team.


Targeting allows you to specify which site visitors are allowed to see your tests. Almost all tools allow you to target site visitors based on basic attributes that can be inferred from their browser, IP address, or session. These attributes may include operating system, browser type/version, geographical location, day of week, time of day, traffic source (direct vs. organic vs. referral), and first time vs. returning visitor. More advanced tools also allow you to target individuals based on attributes (variables) that you define and programmatically place in your users’ browser sessions, cookies, or URLs. This allows you to start targeting traffic based on your organization’s own customer data. The most advanced tools allow you to import custom data directly into the tool’s database, giving you direct access to these attributes through their user interface, not only for targeting, but also for segmented analysis.

Analysis and reporting

Tools vary widely in their analysis and reporting capabilities, with the more powerful tools generally increasing in segmentation functionality. The simplest tools only allow you to view test results compared against a single dimension, for example, you can see how your test performed on visitors with mobile vs. desktop systems. The majority of tools now allow you to perform more complicated analyses along multiple dimensions and customized user segments. For example, you might be interested in seeing how your test performed with visitors on mobile platforms, segmented by organic vs. paid vs. direct traffic.

Keep in mind that as your user segments become more specific, your optimization tool must rely on fewer and fewer data points to generate the results for each segment, thereby decreasing your confidence levels.

Server response time

Optimization tools work by adding a small snippet of code to your pages. When a user visits that page, the code snippet calls a server somewhere that returns instructions on which test variation to display to the user. Long server response times can delay page loading and the display of your variations, thereby affecting your conversions and reporting.

When shopping around, be sure to inquire about how the tool will affect your site’s performance. The more advanced tools are deployed on multiple, load-balanced CDNs and may include contractual service level agreements that guarantee specific server response times.

Customer support

Most optimization vendors provide a combination of online and telephone support, with some of the expensive solutions offering in-person set-up, onboarding and training. Be sure inquire about customer support when determining costs. A trick I’ve used in the past to test a vendor’s level of service is to call the customer support lines at different times of the day and see how fast they pick up the phone.

Price and cost structure

Your budget may largely determine your optimization tool options as prices vary tremendously, from free (for some entry tools with limited features) to six-figure annual contracts that are negotiated based on website traffic and customer support levels (Maxymiser, Monetate and Test & Target fall into this latter category).

Tools also vary in their pricing model, with some basing costs on the amount of website traffic and others charging more for increased features. My preference is towards the latter model, since the former is sometimes difficult to predict and provides a disincentive to perform more testing.

Advanced features

Integration with CMS/analytics/marketing platforms

If you are married to a single Content Management System, analytics tool, or marketing platform, be sure to inquire from your vendor about how their tool will integrate. Some vendors advertise multi-channel solutions (the ability to leverage your customer profile data to optimize across websites, email, and possibly other channels, such as social media or SMS). Enterprise-level tools seem to be trending towards all-in-one solutions that include components such as CMS, marketing, ecommerce, analytics, and optimization (ie. Adobe’s Marketing Cloud or Oracle’s Commerce Experience Manager). But for smaller organizations, integration may simply mean the ability to manage the optimization tool’s javascript tags (used for implementation) across your site’s different pages. In these situations, basic tools such as Google Tag Manager or WordPress plugins may suffice.

Automated segmentation and targeting

Some of the advanced tools offer automated functionality that tries to analyze your site’s conversions and notify you of high-performing segments. These segments may be defined by any combination of recognizable attributes and thus, far more complicated than your team may be able to define on their own. For example, the tool might define one segment as female users on Windows platform, living in California, and who visited your site within the past 30 days. It might define a dozen or more of these complex micro-segments and even more impressively, allow you to automatically redirect all future traffic to the winning variations specific to each of these segments. If implemented well, this intelligent segmentation has tremendous potential for your overall site conversions. The largest downside is that it usually requires a lot of traffic to make accurate predictions.

Automated segmentation is often an added cost to the base price of the optimization tool. If so, consider asking for a free trial period to evaluate the utility/practicality of this functionality before making the additional investment.

Synchronous vs. asynchronous page loading

Most tools recommend that you implement their services in an asynchronous fashion. In other words, that you allow the rest of your page’s HTML to load first before pinging their services and potentially loading one of the test variations that you created. The benefit of this approach is that your users won’t have to wait additional time before your control page starts to render in the browser. The drawback is that once the call to the optimization’s services is returned, then your users may see a page flicker as the control page is replaced by one of your test variations. This flickering effect, along with the additional time it takes to display the test variations, could potentially skew test results or cause surprise/confusion with your users.

In contrast, synchronous page loading, which is recommended by some of the more advanced tools, makes the call to the optimization tool before the rest of the page loads. This ensures that your control group and variations are all displayed in the same relative amount of time, which should allow for more accurate test results. It also eliminates the page flicker effect inherent in asynchronous deployments.


By far, the most difficult step in any web site optimization program is the first one – the simple act of starting. With this in mind, I’ve tried to present a complete and practical guide on how to get you from this first step through to a mature program. Please feel free to send me your comments as well as your own experiences. Happy optimizing.