Online Surveys On a Shoestring: Tips and Tricks

by:   |  Posted on

Design research has always been about qualitative techniques. Increasingly, our clients ask us to add a “quant part” to projects, often without much or any additional budget. Luckily for us, there are plenty of tools available to conduct online surveys, from simple ones like Google Forms and SurveyMonkey to more elaborate ones like Qualtrics and Key Survey.

Whichever tool you choose, there are certain pitfalls in conducting quantitative research on a shoestring budget. Based on our own experience, we’ve compiled a set of tips and tricks to help avoid some common ones, as well as make your online survey more effective.

We’ve organized our thoughts around three survey phases: writing questions, finding respondents, and cleaning up data.

Writing questions

Writing a good questionnaire is both art and science, and we strongly encourage you to learn how to do it. Most of our tips here are relevant to all surveys, but particularly important for the low-budget ones. Having respondents who are compensated only a little, if at all, makes the need for good survey writing practices even more important.

Ask (dis)qualifying questions first

A sacred rule of surveys is to not waste people’s time. If there are terminating criteria, gather those up front and disqualify respondents as quickly as you can if they do not meet the profile. It is also more sensitive to terminate them with a message “Thank you for your time, but we already have enough respondents like you” rather than “Sorry, but you do not qualify for this survey.”

Keep it short

Little compensation means that respondents will drop out at higher rates. Only focus on what is truly important to your research questions. Ask yourself how exactly the information you collect will contribute to your research. If the answer is “not sure,” don’t ask.

For example, it’s common to ask about a level of education or income, but if comparing data across different levels of education or income is not essential to your analysis, don’t waste everyone’s time asking the questions. If your client insists on having “nice to know” answers, insist on allocating more budget to pay the respondents for extra work.

Keep it simple

Keep your target audience in mind and be a normal human being in framing your questions. Your client may insist on slipping in industry jargon and argue that “everyone knows what it is.” It is your job to make the survey speak the language of the respondents, not the client.

For example, in a survey about cameras, we changed the industry term “lifelogging” to a longer, but simpler phrase “capturing daily routines, such as commute, meals, household activities, and social interactions.”

Keep it engaging

People in real life don’t casually say, “I am somewhat satisfied” or “the idea is appealing to me.” To make your survey not only simple but also engaging, consider using more natural language for response choices.

For example, instead of using standard Likert-scale “strongly disagree” to “strongly agree” responses to the statement “This idea appeals to me” in a concept testing survey, we offered a scale “No, thanks” – “Meh” – “It’s okay” – “It’s pretty cool” – “It’s amazing.” We don’t know for sure if our respondents found this approach more engaging (we certainly hope so), but our client showed a deeper emotional response to the results.

Finding respondents

Online survey tools differ in how much help they provide with recruiting respondents, but most common tools will assist in finding the sample you need, if the profile is relatively generic or simple. For true “next to nothing” surveys, we’ve used Amazon Mechanical Turk (mTurk), SurveyMonkey Audience, and our own social networks for recruiting.

Be aware of quality

Cheap recruiting may easily result in low quality data. While low-budget surveys will always be vulnerable to quality concerns, there are mechanisms to ensure that you keep your quality bar high.

First of all, know what motivates your respondents. Amazon mTurk commonly pays $1 for the so-called “Human Intelligence Task” that may include taking an entire survey. In other words, someone is earning as little as $4 an hour if they complete four 15-minute surveys. As such, some mTurk Workers may try to cheat the system and complete multiple surveys for which they may not be qualified.

SurveyMonkey, on the other hand, claims that their Audience service delivers better quality, since the respondents are not motivated by money. Instead of compensating respondents, SurveyMonkey makes a small donation to the charity of their choice, thus lowering the risk of people being motivated to cheat for money.

Use social media

If you don’t need thousands of respondents and your sample is pretty generic, the best resource can be your social network. For surveys with fewer than 300 respondents, we’ve had great success with tapping into our collective social network of Artefact’s members, friends, and family. Write a request and ask your colleagues to post it on their networks. Of course, volunteers still need to match the profile. When we send an announcement, we include a very brief description of who we look for and send volunteers to a qualifying survey. This approach costs little but yields high-quality results.

We don’t pay our social connections for surveys, but many will be motivated to help a friend and will be very excited to hear about the outcomes. Share with them what you can as a “thank you” token.

For example, we used social network recruiting in early stages of Purple development. When we revealed the product months later, we posted a “thank you” link to the article to our social networks. Surprisingly even for us, many remembered the survey they took and were grateful to see the outcomes of their contribution.


If you are trying to hit a certain sample size for “good” data, you need to over-recruit to remove the “bad” data. No survey is perfect and all can benefit from over-recruiting, but it’s almost a must for low-budget surveys. There are no rules, but we suggest over-recruiting by at least 20% to hit the sample size you need at the end. Since the whole survey costs you little, over-recruiting will equally cost little.

Cleaning up data

Cleaning up your data is another essential step of any survey that is particularly important for the one on a tight budget. A few simple tricks can increase the quality of responses, particularly if you use public recruiting resources. When choosing a survey tool, check what mechanisms are available for you to clean up your data.

Throw out duplicates

As mentioned earlier, some people may be motivated to complete the same survey multiple times and even under multiple profiles. We’ve spotted this when working with mTurk respondents by checking their Worker IDs. We had multiple cases when the same IDs were used to complete a survey multiple times. We ended up throwing away all responses associated with the “faulty IDs” and gained more confidence in our data at the end.

Check response time

With SurveyMonkey, you can calculate the time spent on the survey using the StartTime and EndTime data. We benchmarked the average time of the survey by piloting the survey in the office. This can be used as a pretty robust fool-proof mechanism.

If the benchmark time is eight minutes and you have surveys completed in three, you may question how carefully respondents were reading the questions. We flag such outliers as suspect and don’t include them in our analysis.

Add a dummy question

Dummy questions help filter out the respondent quickly answering survey questions at random. Dummy questions require the respondent to read carefully and then respond. People who click and type at random might answer it correctly, but it is unlikely. If the answer is incorrect, this is another flag we use to mark a respondent’s data as suspect.

Low-budget surveys are challenging, but not necessarily bad, and with a few tricks you can make them much more robust. If they are used as an indicative, rather than definitive, mechanism to supplement other design research activities, they can bring “good enough” insights to a project.

Educate your clients about the pros and cons of low-budget surveys and help them make a decision whether or not they want to invest more to get greater confidence in the quantitative results. Setting these expectations up front is critical for the client, but you never know, it could also be a good tool for negotiating a higher survey budget to begin with!

Creativity Must Guide the Data-Driven Design Process

by:   |  Posted on

Collecting data about design is easy in the digital world. We no longer have to conduct in-person experiments to track pedestrians’ behavior in an airport terminal or the movement of eyeballs across a page. New digital technologies allow us to easily measure almost anything, and apps, social media platforms, websites, and email programs come with built-in tools to track data.

And, as of late, data-driven design has become increasingly popular. As a designer, you no longer need to convince your clients of your design’s “elegance,” “simplicity,” or “beauty.” Instead of those subjective measures, you can give them data: click-through and abandonment rates, statistics on the number of installs, retention and referral counts, user paths, cohort analyses, A/B comparisons, and countless other analytical riches.

After you’ve mesmerized your clients with numbers, you can draw a few graphs on a whiteboard and begin claiming causalities. Those bad numbers? They’re showing up because of what you told the client was wrong with the old design. And the good numbers? They’re showing up because of the new and improved design.

But what if it’s not because of the design? What if it’s just a coincidence?

There are two problems with the present trend toward data-driven design: using the wrong data, and using data at the wrong time.

The problem with untested hypotheses

Let’s say you go through a major digital redesign. Shortly after you launch the new look, the number of users hitting the “share” button increases significantly. That’s great news, and you’re ready to celebrate the fact that your new design was such a success.

But what if the new design had nothing to do with it? You’re seeing a clear correlation—two seemingly related events that happened around the same time—but that does not prove that one caused the other.

Steven D. Levitt and Stephen J. Dubner, the authors of “Freakonomics,” have built a media empire on exposing the difference between correlation and causation. My favorite example is their analysis of the “broken windows” campaign carried out by New York City Mayor Rudy Giuliani and Police Commissioner William Bratton. The campaign coincided with a drop in the city’s crime rate. The officials naturally took credit for making the city safer, but Levitt and Dubner make a very strong case that the crime rate declined for reasons other than their campaign.

Raw data doesn’t offer up easy conclusions. Instead, look at your data as a generator of promising hypotheses that must be tested. Is your newly implemented user flow the cause of a spike in conversion rates? It might be, but the only way you’ll know is by conducting an A/B test that isolates that single variable. Otherwise, you’re really just guessing, and all that data you have showing the spike doesn’t change that.

Data can’t direct innovation

Unfortunately, many designers are relying on data instead of creativity. The problem with using numbers to guide innovation is that users typically don’t know what they want, and no amount of data will tell you what they want. Instead of relying on data from the outset, you have to create something and give it to users before they can discover that they want it.

Steve Jobs was a big advocate of this method. He didn’t design devices and operating systems by polling users or hosting focus groups. He innovated and created, and once users saw what he and his team had produced, they fell in love with a product they hadn’t even known they wanted.

Data won’t tell you what to do during the design process. Innovation and creativity have to happen before data collection, not after. Data is best used for testing and validation.

Product development and design is a cyclical process. During the innovation phase, creativity is often based on user experience and artistry — characteristics that aren’t meant to be quantified on a spreadsheet. Once a product is released, it’s time to start collecting data.

Perhaps the data will reveal a broken step in the user flow. That’s good information because it directs your attention to the problem. But the data won’t tell you how to fix the problem. You have to innovate again, then test to see if you’ve finally fixed what was broken.

Ultimately, data and analysis should be part of the design process. We can’t afford to rely on our instincts alone. And with the wealth of data available in the digital domain, we don’t have to. The unquantifiable riches of the creative process still have to lead design, but applying the right data at the right time is just as important to the future of design.

A Beginner’s Guide to Web Site Optimization—Part 3

by:   |  Posted on

Web site optimization has become an essential capability in today’s conversion-driven web teams. In Part 1 of this series, we introduced the topic as well as discussed key goals and philosophies. In Part 2, I presented a detailed and customizable process. In this final article, we’ll cover communication planning and how to select the appropriate team and tools to do the job.


For many organizations, communicating the status of your optimization tests is an essential practice. Imagine if your team has just launched an A/B test on your company’s homepage, only to learn that another team had just released new code the previous day that had changed the homepage design entirely. Or imagine if a customer support agent was trying to help users through the website’s forgot password flow, unaware that the customer was seeing a different version due to an A/B test that your team was running.

To avoid these types of problems, I recommend a three-step communication program:

  1. Pre-test notification

This is an email announcing that your team has selected a certain page/section of the site to target for its next optimization test and that if anyone has any concerns, they had better voice them immediately, before your team starts working on it. Give folks a day or two to respond. The email should include:

  • Name/brief description of the test
  • Goals
  • Affected pages
  • Expected launch date
  • Link to the task or project plan where others can track the status of the test.

Here’s a sample pre-test notification.

  1. Pre-launch notification

This email is sent out a day or two before a new experiment launches. It includes all of the information from the Pre-Test Notification email, plus:

  • Expected test duration
  • Some optimization tools create a unique dashboard page in which interested parties can monitor the results of the test in real-time. If your tool does this, you can include the link here.
  • Any other details that you care to mention, such as variations, traffic allocation, etc…

Here’s a sample pre-launch email.

  1. Test results

After the test has run its course and you’ve compiled the results into the Optimization Test Results document, send out a final email to communicate this. If you have a new winner, be sure to brag about it a little in the email. Other details may include:

  • Brief discussion
  • A few specifics, such as conversion rates, traffic and confidence intervals
  • Next steps

Here’s a sample test results email.

Team size and selection

As is true with many things, good people are the most important aspect of a successful optimization program. Find competent people with curious minds who take pride in their work – this will be far more valuable than investment in any optimization tool or adherence to specific processes.

The following are recommendations for organizations of varying team sizes.

One person

It is difficult for one person to perform optimization well unless they are dedicated full-time to the job. If your organization can only cough-up one resource, I would select either a web analytics resource with an eye for design, or a data-centric UX designer. For the latter profile, I don’t mean the type of designer who studied fine art and is only comfortable using Photoshop, but rather the type who likes wireframes, has poked around an analytics tool on their own, and is good with numbers. This person will also have to be resourceful and persuasive, since they will almost certainly have to borrow time and collaborate with others to complete the necessary work.

Two to three people

With a team size of three people, you are starting to get into the comfort zone. To the UX designer and web/data analytics roles, I would add either a visual designer or a front-end developer. Ideally, some of the team members would have multiple or overlapping competencies. The team will probably still have to borrow time from other resources, such as back-end developers and QA.

Five people

A team that is lucky enough to have five dedicated optimization resources has the potential to be completely autonomous. If your organization places such a high value on optimization, they may have also invested accordingly in sophisticated products or strategies for the job, such as complex testing software, data warehouses, etc… If so, then you’ll need folks who are specifically adept at these tools, broadening your potential team to roles such as data engineers, back-end developers, content managers, project managers, or dedicated QA resources. A team of five would ideally have some overlap with some of the skill-sets.

Tool selection

The optimization market is hot and tool selection may seem complicated at first. The good news is that broader interest and increased competition is fueling an all-out arms race towards simpler, more user-friendly interfaces designed for non-technical folks. Data analysis and segmentation features also seem to be evolving rapidly.

My main advice if you’re new to optimization is to start small. Spend a year honing your optimization program and after you’ve proven your value, you can easily graduate to the more sophisticated (and expensive) tools. Possibly by the time you’re ready, your existing tool will have advanced to keep up with your needs. Also realize that many of the cheaper tools can do the job perfectly well for most organizations, and that some organizations with the high-powered tools are not using them to their fullest capabilities.

A somewhat dated Forrester Research report from February 2013 assesses some of the big hitters, but notably absent are Visual Website Optimizer (VWO) and, for very low end, Google’s free Content Experiments tool. Conversion Rate Experts keeps an up-to-date comparison table listing virtually all of today’s popular testing tools, but it only rates them along a few specific attributes.

I performed my own assessment earlier this year and here is a short list of my favorites:

Visual Website Optimizer (VWO)
Google Content Experiments
Adobe Test & Target

Here are a few factors to consider when deciding on products:

Basic features

Intuitive user interface

Luckily, most tools now have simple, WYSIWYG type of interfaces that allow you to directly manipulate your site content when creating test variations. You can edit text, change styles, move elements around, and save these changes into a new test variation. Some products have better implementations than others, so be sure to try out a few to find the best match for your team.


Targeting allows you to specify which site visitors are allowed to see your tests. Almost all tools allow you to target site visitors based on basic attributes that can be inferred from their browser, IP address, or session. These attributes may include operating system, browser type/version, geographical location, day of week, time of day, traffic source (direct vs. organic vs. referral), and first time vs. returning visitor. More advanced tools also allow you to target individuals based on attributes (variables) that you define and programmatically place in your users’ browser sessions, cookies, or URLs. This allows you to start targeting traffic based on your organization’s own customer data. The most advanced tools allow you to import custom data directly into the tool’s database, giving you direct access to these attributes through their user interface, not only for targeting, but also for segmented analysis.

Analysis and reporting

Tools vary widely in their analysis and reporting capabilities, with the more powerful tools generally increasing in segmentation functionality. The simplest tools only allow you to view test results compared against a single dimension, for example, you can see how your test performed on visitors with mobile vs. desktop systems. The majority of tools now allow you to perform more complicated analyses along multiple dimensions and customized user segments. For example, you might be interested in seeing how your test performed with visitors on mobile platforms, segmented by organic vs. paid vs. direct traffic.

Keep in mind that as your user segments become more specific, your optimization tool must rely on fewer and fewer data points to generate the results for each segment, thereby decreasing your confidence levels.

Server response time

Optimization tools work by adding a small snippet of code to your pages. When a user visits that page, the code snippet calls a server somewhere that returns instructions on which test variation to display to the user. Long server response times can delay page loading and the display of your variations, thereby affecting your conversions and reporting.

When shopping around, be sure to inquire about how the tool will affect your site’s performance. The more advanced tools are deployed on multiple, load-balanced CDNs and may include contractual service level agreements that guarantee specific server response times.

Customer support

Most optimization vendors provide a combination of online and telephone support, with some of the expensive solutions offering in-person set-up, onboarding and training. Be sure inquire about customer support when determining costs. A trick I’ve used in the past to test a vendor’s level of service is to call the customer support lines at different times of the day and see how fast they pick up the phone.

Price and cost structure

Your budget may largely determine your optimization tool options as prices vary tremendously, from free (for some entry tools with limited features) to six-figure annual contracts that are negotiated based on website traffic and customer support levels (Maxymiser, Monetate and Test & Target fall into this latter category).

Tools also vary in their pricing model, with some basing costs on the amount of website traffic and others charging more for increased features. My preference is towards the latter model, since the former is sometimes difficult to predict and provides a disincentive to perform more testing.

Advanced features

Integration with CMS/analytics/marketing platforms

If you are married to a single Content Management System, analytics tool, or marketing platform, be sure to inquire from your vendor about how their tool will integrate. Some vendors advertise multi-channel solutions (the ability to leverage your customer profile data to optimize across websites, email, and possibly other channels, such as social media or SMS). Enterprise-level tools seem to be trending towards all-in-one solutions that include components such as CMS, marketing, ecommerce, analytics, and optimization (ie. Adobe’s Marketing Cloud or Oracle’s Commerce Experience Manager). But for smaller organizations, integration may simply mean the ability to manage the optimization tool’s javascript tags (used for implementation) across your site’s different pages. In these situations, basic tools such as Google Tag Manager or WordPress plugins may suffice.

Automated segmentation and targeting

Some of the advanced tools offer automated functionality that tries to analyze your site’s conversions and notify you of high-performing segments. These segments may be defined by any combination of recognizable attributes and thus, far more complicated than your team may be able to define on their own. For example, the tool might define one segment as female users on Windows platform, living in California, and who visited your site within the past 30 days. It might define a dozen or more of these complex micro-segments and even more impressively, allow you to automatically redirect all future traffic to the winning variations specific to each of these segments. If implemented well, this intelligent segmentation has tremendous potential for your overall site conversions. The largest downside is that it usually requires a lot of traffic to make accurate predictions.

Automated segmentation is often an added cost to the base price of the optimization tool. If so, consider asking for a free trial period to evaluate the utility/practicality of this functionality before making the additional investment.

Synchronous vs. asynchronous page loading

Most tools recommend that you implement their services in an asynchronous fashion. In other words, that you allow the rest of your page’s HTML to load first before pinging their services and potentially loading one of the test variations that you created. The benefit of this approach is that your users won’t have to wait additional time before your control page starts to render in the browser. The drawback is that once the call to the optimization’s services is returned, then your users may see a page flicker as the control page is replaced by one of your test variations. This flickering effect, along with the additional time it takes to display the test variations, could potentially skew test results or cause surprise/confusion with your users.

In contrast, synchronous page loading, which is recommended by some of the more advanced tools, makes the call to the optimization tool before the rest of the page loads. This ensures that your control group and variations are all displayed in the same relative amount of time, which should allow for more accurate test results. It also eliminates the page flicker effect inherent in asynchronous deployments.


By far, the most difficult step in any web site optimization program is the first one – the simple act of starting. With this in mind, I’ve tried to present a complete and practical guide on how to get you from this first step through to a mature program. Please feel free to send me your comments as well as your own experiences. Happy optimizing.

A Beginner’s Guide to Web Site Optimization—Part 2

by:   |  Posted on

In the previous article we talked about why site optimization is important and presented a few important goals and philosophies to impart on your team. I’d like to switch gears now and talk about more tactical stuff, namely, process.

Optimization process

Establishing a well-formed, formal optimization process is beneficial for the following reasons.

  1. It organizes the workflow and sets clear expectations for completion.
  2. Establishes quality control standards to reduce bugs/errors.
  3. Adds legitimacy to the whole operation so that if questioned by stakeholders, you can explain the logic behind the process.

At a high level, I suggest a weekly or bi-weekly optimization planning session to perform the following activities:

  1. Review ongoing tests to determine if they can be stopped or considered “complete” (see the boxed section below). For tests that have reached completion, the possibilities are:
    1. There is a decisive new winner. In this case, plan how to communicate and launch the change permanently to production.
    2. There is no decisive winner or the current version (control group) wins. In this case, determine if more study is required or if you should simply move on and drop the experiment.
  2. Review data sources and brainstorm new test ideas.
  3. Discuss and prioritize any externally submitted ideas.
How do I know when a test has reached completion?
Completion criteria are a somewhat tricky topic and seemingly guarded industry secrets. These define the minimum requirements that must be true in order for a test to be declared “completed.” My personal sense from reading/conferences is that there are no widely-accepted standards and that completion criteria really depend on how comfortable your team feels with the uncertainty that is inherent in experimentation. We created the following minimum completion criteria for my past team at DIRECTV Latin America. Keep in mind that these were bare-bones minimums, and that most of our tests actually ran much longer.

  1. Temporal: Tests must run for a minimum of two weeks to account for variation between days of the week.
  2. Statistical confidence: We used a 90-95% confidence interval for most tests.
  3. Stability over time: Variations must maintain their positions relative to each other for at least one week.
  4. Total conversions: Minimum of 200 total conversions.

For further discussion of the rationale behind these completion criteria, please see Best Practices When Designing and Running Experiments later in this article.

The creation of a new optimization test may follow a process that is similar to your overall product development lifecycle. I suggest the following basic structure:


The following diagram shows a detailed process that I’ve used in the past.

A detailed process that the author has used in the past.

Step 1: Data analysis and deciding what to test

Step one in the optimization process is figuring out where to first focus your efforts. We used the following list as a loose prioritization guideline:

  1. Recent product releases, or pages that have not yet undergone optimization.
  2. High “value” pages
    • 1. High revenue (ie. shopping cart checkout pages, detail pages of your most expensive products, etc…).
    • 2. High traffic (ie. homepage, login/logout).
    • 3. Highly “strategic” (this might include pages that are highly visible internally or that management considers important).
  3. Poorly performing pages

Step 2: Brainstorm ideas for improvement

Ideas for how to improve page performance is a topic that is as large as the field of user experience itself, and definitely greater than the scope of this article. One might consider improvements in copywriting, form design, media display, page rendering, visual design, accessibility, browser targeting… the list goes on.

My only suggestion for this process is to make it collaborative – harness the power of your team to come up with new ideas for improvement, not only including designers in the brainstorming sessions, but also developers, copywriters, business analysts, marketers, QA, etc… Good ideas can (and often do) come from anywhere.

Adaptive Path has a great technique of collaborative ideation that they call sketchboarding, which uses iterative rounds of group sketching.

Step 3: Write the testing plan

An Optimization Testing Plan acts as the backbone of every test. At a high level, it is used to plan, communicate, and document the history of the experiment, but more importantly, it fosters learning by forcing the team to clearly formulate goals and analyze results.

A good testing plan should include:

  1. Test name
  2. Description
  3. Goals
  4. Opportunities (what gains will come about if the test goes well)
  5. Methodology
    • 1. Expected dates that the test will be running in production.
    • 2. Resources (who will be working on the test).
    • 3. Key metrics to be tracked through the duration of the experiment.
    • 4. Completion criteria.
    • 5. Variations (screenshots of the different designs that you will be showing your site visitors).

Here’s a sample optimization testing plan to get you started.

Step 4: Design and develop the test

Design and development will generally follow an abbreviated version of your organization’s product development lifecycle. Since test variations are generally simpler than full-blown product development projects, I try to use a lighter, more agile process.

Be sure that if you do cut corners, only skimp on things like process artifacts or documentation, and not on design quality. For example, be sure to perform some basic usability testing and user research on your variations. This small investment will create better candidates that will be more likely to boost conversions.

Step 5: Quality assurance

When performing QA on your variations, be as thorough as you would with any other code release to production. I recommend at least functional, visual, and analytics QA. Even though many tools allow you to manipulate your website’s UI on the fly using interfaces that immediately display the results of your changes, the tools are not perfect and any changes that you make might not render perfectly across all browsers.

Keep in mind that optimization tools provide you one additional luxury that is not usually possible with general website releases – that of targeting. You can decide to show your variations to only the target browsers, platforms, audiences, etc… for which you have performed QA. For example, let’s imagine that your team has only been able to QA a certain A/B test on desktop (but not mobile) browsers. When you actually configure this test in your optimization tool, you can decide to only display the test to visitors with those specific desktop browsers. If one of your variations has a visual bug when viewed on mobile phones, for example, that problem should not affect the accuracy of your test results.

Step 6: Run the Test

After QA has completed and you’ve decided how to allocate traffic to the different designs, it’s time to actually run your test. The following are a few best practices to keep in mind before pressing the “Go” button.

1.  Variations must be run concurrently

This first principle is almost so obvious that it goes without saying, but I’ve often heard the following story from teams that do not perform optimization: “After we launched our new design, we saw our [sales, conversions, etc…] increase by X%. So the new design must be better.”

The problem with this logic is that you don’t know what other factors might have been at play before and after the new change launched. Perhaps traffic to that page increased in either quantity or quality after the new design released. Perhaps the conversion rate was on the increase anyway, due to better brand recognition, seasonal variation, or just random chance. Due to these and many other reasons, variations must be run concurrently and not sequentially. This is the only way to hold all other factors consistent and level the playing field between your different designs.

2.  Always track multiple conversion metrics

One A/B test that we ran on the movie detail pages of the DIRECTV Latin American sites was the following: we increased the size and prominence of the “Ver adelanto” (View trailer) call to action, guessing that if people watched the movie trailer, it might excite them to buy more pay-per-view movies from the web site.

We increased the size and prominence of the “Ver adelanto” (View trailer) call to action, guessing that if people watched the movie trailer, it might excite them to buy more pay-per-view movies from the web site.

Our initial hunch was right, and after a few weeks we saw that pay-per-views purchases were 4.8% higher with this variation over the control. This increase would have resulted in a revenue boost of about $18,000/year in pay-per-view purchases. Not bad for one simple test. Fortunately though, since we were also tracking other site goals, we noticed that this variation also decreased purchases of our premium channel packages (ie. HBO and Showtime packages) by a whopping 25%! This would have decreased total revenue by a much greater amount than the uptick in pay-per-views, and because of this, we did not launch this variation to production.

It’s important to keep in mind that changes may affect your site in ways that you never would have expected. Always track multiple conversion metrics with every test.

3.  Tests should reach a comfortable level of statistical significance

I recently saw a presentation in which a consultant suggested that preliminary tests on email segmentation had yielded some very promising results.

Chart showing conversion rates per 1000 emails sent.

In the chart above, the last segment of users (those who had logged in more than four times in the past year) had a conversion rate of .00139% (.139 upgrades per 1000 emails sent). Even though a conversion rate of .00139% is dismally low by any standards, according to the consultant it represented an increase of 142% compared to the base segment of users, and thus, a very promising result.

Aside from the obvious lack of actionable utility (does this study suggest that emails only be sent to users who have logged in more than four times?) the test contained another glaring problem. If you look at the “Upgrades” column at the top of the spreadsheet, you will see that the results were based on only five individuals purchasing an upgrade. Five total individuals out of almost eighty four thousand emails sent! So if, by pure chance, only one other person had purchased an upgrade in any of the segments, it could have completely changed the study’s implications.

While this example is not actually an optimization test but rather just an email segmentation study, it does convey an important lesson: don’t declare a winner for your tests until it has reached a “comfortable” level of significance.

So what does “comfortable” mean? The field of science requires strict definitions to use the terms “significant” (95% confidence level) and “highly significant” (99% confidence level) when publishing results. Even with these definitions, it still means that there is a 5% and 1% chance, respectively, of your conclusions being wrong. Also keep in mind that higher confidence intervals require more data (ie. more website traffic) which translates into longer test durations. Because of these factors, I would recommend less stringent standards for most optimization tests – somewhere around 90-95% confidence depending on the gravity of the situation (higher confidence intervals for tests with more serious consequences or implications).

Ultimately, your team must decide on confidence intervals that reflect a compromise between test duration and results certainty, but I would propose that if you perform a lot of testing, the larger number of true winners will make up for the fewer (but inevitable) false positives.

4.  The duration of your tests should account for any natural variations (such as between weekdays/weekends) and be stable over time

In a 2012 article on, Jan Petrovic brings to light an important pitfall of ending your tests too early. He discusses an A/B test that he ran for a high-traffic site in which, after only a day, the testing tool reported that a winning variation had increased the primary conversion rate by an impressive 87%, with a 100% confidence interval.

The duration of your tests should account for any natural variations (such as between weekdays/weekends) and be stable over time.

Jan writes, “If we stopped the test then and pat each other on the shoulder about how great we were, then we would probably make a very big mistake. The reason for that is simple: we didn’t test our variation on Friday or Monday traffic, or on weekend traffic. But, because we didn’t stop the test (because we knew it was too early), our actual result looked very different.”

Chart showing new design results over time.

After continuing the test for four weeks, Jan saw that the new design, although still better than the control, had leveled out to a more reasonable 10.49% improvement since it had now taken into account natural daily variation. He writes, “Let’s say you were running this test in checkout, and on the following day you say to your boss something like ‘hey boss, we just increased our site revenue by 87.25%’. If I was your boss, you would make me extremely happy and probably would increase your salary too. So we start celebrating…”

Jan’s fable continues with the boss checking the bank account at the end of the month, and upon seeing that sales had actually not increased by the 87% that you had initially reported, reconsiders your salary increase.

The moral of the story: Consider temporal variations in the behavior of your site visitors, including differences between weekday and weekend or even seasonal traffic.

Step 7: Analyze and Report on the Results

After your test has run its course and your team has decided to press the “stop” button, it’s time to compile the results into an Optimization Test Report. The Optimization Test Report can be a continuation of the Test Plan from Step 2, but with the following additional sections:

  1. Results
  2. Discussion
  3. Next steps

It is helpful to include graphs and details in the Results section so that readers can visually see trends and analyze data themselves. This will add credibility to your studies and hopefully get people invested in the optimization program.

The discussion section is useful for explaining details and postulating on the reasons for the observed results. This will force the team to think more deeply about user behavior and is an invaluable step towards designing future improvements.


In this article, I’ve presented a detailed and practical process that your team can customize to its own use. In the next and final article of this series, I’ll wrap things up with suggestions for communication planning, team composition, and tool selection.

A Beginner’s Guide to Web Site Optimization—Part 1

by:   |  Posted on

Web site optimization, commonly known as A/B testing, has become an expected competency among many web teams, yet there are few comprehensive and unbiased books, articles, or training opportunities aimed at individuals trying to create this capability within their organization.

In this series, I’ll present a detailed, practical guide on how to build, fine-tune, and evolve an optimization program. Part 1 will cover some basics: definitions, goals and philosophies. In Part 2, I’ll dive into a detailed process discussion covering topics such as deciding what to test, writing optimization plans, and best practices when running tests. Part 3 will finish up with communication planning, team composition, and tool selection. Let’s get started!

The basics: What is web site optimization?

Web site optimization is an experimental method for testing which designs work best for your site. The basic process is simple:

  1. Create a few different design options, or variations, of a page/section of your website.
  2. Split up your web site traffic so that each visitor to the page sees either your current version (the control group) or one of these new variations.
  3. Keep track of which version performs better based on specific performance metrics.

The performance metrics are chosen to directly reflect your site’s business goals and might include things like how many product purchases were made on your site (a sales goal), how many people signed up for the company newsletter (an engagement goal), or how many people watched a self-help video in your FAQ section (a customer service goal). Performance metrics are often referred to as conversion rates, which equals the percentage of visitors who performed the action being tested compared to the total number of visitors to that page.

Optimization can be thought of as one component in the web site development ecosystem. Within optimization, the basic process is to analyze data, create and run tests, then implement the winners of those tests.

Visual of where optimzation fits in site development
Optimization can be thought of as one component in the website development ecosystem.


A/B vs. multivariate

There are two basic types of optimization tests: A/B tests (also known as an A/B/N tests) and multivariate tests.

A/B tests

In an A/B test, you run two or more fixed design variations against each other. The variations might differ in only one individual element (such as the color of a button or swapping out an image for a video) or in many elements all at once (such as changing the entire page layout and design, changing a long form into a step-by-step wizard, etc…).

Three buttons for testing, each with different copy.
Example 1: A simple A/B/N test trying to determine which of three different button texts drives more clicks.




Visuals showing page content in different layouts.
Example 2: An A/B test showing large variations in both page layout and content.


In general, A/B tests are simpler to design and analyze and also return faster results since they usually contain fewer variations than multivariate tests. They seem to constitute the vast majority of manual testing that occurs these days.

Multivariate tests

Multivariate tests vary two or more attributes on a page and test which combination works best. The key difference between A/B and multivariate tests is that the latter are designed to tease apart how two or more dimensions of a design interact with each other and lead to that design’s success. In the example below, the team is trying to figure out what combination of button text and color will get the most clicks.

Buttons with both different copy and different colors
Example 1: A simple multivariate test with 2 dimensions (button color and button text) and 3 variations on each dimension.

The simplest form of multivariate testing is called the full-factorial method, which involves testing every combination of factors against each other, as in the example above. The biggest drawback of these tests is that they generally take longer to get statistically significant results since you are splitting the same amount of site traffic between more variations than A/B tests.

Other fractional factorial methods use statistics to try and interpolate the results of certain combinations, thereby reducing the traffic needed to test every single variation. Many of today’s optimization tools allow you to play around with these different multivariate methods; just keep in mind that fractional factorial methods are often complex, named after deceased Japanese mathematicians, and require a degree in statistics to fully comprehend. Use at your own risk.

Why do we test? Goals, benefits, and rationale

There are many benefits of moving your organization to a more data-driven culture. Optimization establishes a metrics-based system for determining design success vs. failure, thereby allowing your team to learn with each test. No longer will people argue ad nauseum over design details. Cast away the chains of the HiPPO effect—in which the Highest Paid Person in the Office determines what goes on your site. Once you have established a clear set of goals and the appropriate metrics for measuring those goals, the data should speak as the deciding voice.

Optimization can also drastically improve your organization’s product innovation process by allowing you to test new product ideas at scale and quickly figure out which are good and which should be scrapped. In his article “How We Determine Product Success” John Ciancutti of Netflix describes it this way:

“Innovation involves a lot of failure. If we’re never failing, we aren’t trying for something out on the edge from where we are today. In this regard, failure is perfectly acceptable at Netflix. This wouldn’t be the case if we were operating a nuclear power plant or manufacturing cars. The only real failure that’s unacceptable at Netflix is the failure to innovate.

So if you’re going to fail, fail cheaply. And know when you’ve failed vs. when you’ve gotten it right.”

Top three testing philosophies

1. Rigorously focus on metrics

I personally don’t subscribe to the philosophy that you should test every single change on your site. However, I do believe that every organization’s web strategies should be grounded in measurable goals that are mapped directly to your business goals.

For example, if management tells you that the web site should “offer the best customer service,” your job is to then determine which metrics adequately represent that conceptual goal. Maybe it can be represented by the total number of help tickets or emails answered from your site combined with a web customer satisfaction rating or the average user rating of individual question/answer pairs in your FAQ section. As Galileo supposedly said, “Measure what is measurable, and make measurable what is not so.”

Additionally, your site’s foundational architecture should allow, to the fullest extent possible, the measurement of true conversions and not simply indicators (often referred to as macro vs micro conversions). For example, if your ecommerce site is only capable of measuring order submissions (or worse yet, leads), make it your first order of business to be able to track that order submission through to a true paid sale. Then ensure that your team always has an eye on these true conversions in addition to any intermediate steps and secondary website goals.  There are many benefits of measuring micro conversion rates, but the work must be done to map them to a tangible macro conversion or you run the risk of optimizing for a false conversion goal.

2. Nobody really knows what will win

I firmly believe that even the experts can’t consistently predict the outcome of optimization tests with even close to 100% accuracy. This is, after all, the whole point of testing. Someone with good intuition and experience will probably have a higher win rate than others, but for any individual test, anyone can be right. With this in mind, don’t let certain members of the team bully others into design submission. When it doubt, test it out.

3. Favor a “small-but-frequent” release strategy

In other words, err on the side of only changing one thing at a time, but perform the changes frequently. This strategy will allow you to pinpoint exactly which changes are affecting your site’s conversion rates. Let’s look at the earlier A/B test example to illustrate this point.

Visuals showing page content in different layouts.
An A/B test showing large variations in both page layout and content.

Let’s imagine that your new marketing director decides that your company should completely overhaul the homepage. After a few months of work, the team launches the new “3-column” design (above-right). Listening to the optimization voice inside your head, you decide to run an A/B test, continuing to show the old design to just 10% of the site visitors and the new design to the remaining 90%.

To your team’s dismay, the old design actually outperforms the new one. What should you do? It would be difficult to simply scrap the new design in its entirety, since it was a project that came directly from your boss and the entire team worked so hard on it. There are most likely a number of elements of the new design that actually perform better than the original, but because you launched so many changes all at once, it is difficult to separate the good from the bad.

A better strategy would have been to have constantly optimized different aspects of the page in small but frequent tests to gradually evolve towards a new version. This process, in combination with other research methods, would provide your team with a better foundation for performing site changes. As Jared Spool argued in his article The Quiet Death of the Major Relaunch, “the best sites have replaced this process of revolution with a new process of subtle evolution. Entire redesigns have quietly faded away with continuous improvements taking their place.”


By now you should have a strong understanding of optimization basics and may have started your own healthy internal dialogue related to philosophies and rationale. In the next article, we’ll talk about more tactical concerns, specifically, the optimization process.