A Beginner’s Guide to Web Site Optimization—Part 2

Posted by

In the previous article we talked about why site optimization is important and presented a few important goals and philosophies to impart on your team. I’d like to switch gears now and talk about more tactical stuff, namely, process.

Optimization process

Establishing a well-formed, formal optimization process is beneficial for the following reasons.

  1. It organizes the workflow and sets clear expectations for completion.
  2. Establishes quality control standards to reduce bugs/errors.
  3. Adds legitimacy to the whole operation so that if questioned by stakeholders, you can explain the logic behind the process.

At a high level, I suggest a weekly or bi-weekly optimization planning session to perform the following activities:

  1. Review ongoing tests to determine if they can be stopped or considered “complete” (see the boxed section below). For tests that have reached completion, the possibilities are:
    1. There is a decisive new winner. In this case, plan how to communicate and launch the change permanently to production.
    2. There is no decisive winner or the current version (control group) wins. In this case, determine if more study is required or if you should simply move on and drop the experiment.
  2. Review data sources and brainstorm new test ideas.
  3. Discuss and prioritize any externally submitted ideas.
How do I know when a test has reached completion?
Completion criteria are a somewhat tricky topic and seemingly guarded industry secrets. These define the minimum requirements that must be true in order for a test to be declared “completed.” My personal sense from reading/conferences is that there are no widely-accepted standards and that completion criteria really depend on how comfortable your team feels with the uncertainty that is inherent in experimentation. We created the following minimum completion criteria for my past team at DIRECTV Latin America. Keep in mind that these were bare-bones minimums, and that most of our tests actually ran much longer.

  1. Temporal: Tests must run for a minimum of two weeks to account for variation between days of the week.
  2. Statistical confidence: We used a 90-95% confidence interval for most tests.
  3. Stability over time: Variations must maintain their positions relative to each other for at least one week.
  4. Total conversions: Minimum of 200 total conversions.

For further discussion of the rationale behind these completion criteria, please see Best Practices When Designing and Running Experiments later in this article.

The creation of a new optimization test may follow a process that is similar to your overall product development lifecycle. I suggest the following basic structure:

Process-diagram-abbreviated

The following diagram shows a detailed process that I’ve used in the past.

A detailed process that the author has used in the past.

Step 1: Data analysis and deciding what to test

Step one in the optimization process is figuring out where to first focus your efforts. We used the following list as a loose prioritization guideline:

  1. Recent product releases, or pages that have not yet undergone optimization.
  2. High “value” pages
    • 1. High revenue (ie. shopping cart checkout pages, detail pages of your most expensive products, etc…).
    • 2. High traffic (ie. homepage, login/logout).
    • 3. Highly “strategic” (this might include pages that are highly visible internally or that management considers important).
  3. Poorly performing pages

Step 2: Brainstorm ideas for improvement

Ideas for how to improve page performance is a topic that is as large as the field of user experience itself, and definitely greater than the scope of this article. One might consider improvements in copywriting, form design, media display, page rendering, visual design, accessibility, browser targeting… the list goes on.

My only suggestion for this process is to make it collaborative – harness the power of your team to come up with new ideas for improvement, not only including designers in the brainstorming sessions, but also developers, copywriters, business analysts, marketers, QA, etc… Good ideas can (and often do) come from anywhere.

Adaptive Path has a great technique of collaborative ideation that they call sketchboarding, which uses iterative rounds of group sketching.

Step 3: Write the testing plan

An Optimization Testing Plan acts as the backbone of every test. At a high level, it is used to plan, communicate, and document the history of the experiment, but more importantly, it fosters learning by forcing the team to clearly formulate goals and analyze results.

A good testing plan should include:

  1. Test name
  2. Description
  3. Goals
  4. Opportunities (what gains will come about if the test goes well)
  5. Methodology
    • 1. Expected dates that the test will be running in production.
    • 2. Resources (who will be working on the test).
    • 3. Key metrics to be tracked through the duration of the experiment.
    • 4. Completion criteria.
    • 5. Variations (screenshots of the different designs that you will be showing your site visitors).

Here’s a sample optimization testing plan to get you started.

Step 4: Design and develop the test

Design and development will generally follow an abbreviated version of your organization’s product development lifecycle. Since test variations are generally simpler than full-blown product development projects, I try to use a lighter, more agile process.

Be sure that if you do cut corners, only skimp on things like process artifacts or documentation, and not on design quality. For example, be sure to perform some basic usability testing and user research on your variations. This small investment will create better candidates that will be more likely to boost conversions.

Step 5: Quality assurance

When performing QA on your variations, be as thorough as you would with any other code release to production. I recommend at least functional, visual, and analytics QA. Even though many tools allow you to manipulate your website’s UI on the fly using interfaces that immediately display the results of your changes, the tools are not perfect and any changes that you make might not render perfectly across all browsers.

Keep in mind that optimization tools provide you one additional luxury that is not usually possible with general website releases – that of targeting. You can decide to show your variations to only the target browsers, platforms, audiences, etc… for which you have performed QA. For example, let’s imagine that your team has only been able to QA a certain A/B test on desktop (but not mobile) browsers. When you actually configure this test in your optimization tool, you can decide to only display the test to visitors with those specific desktop browsers. If one of your variations has a visual bug when viewed on mobile phones, for example, that problem should not affect the accuracy of your test results.

Step 6: Run the Test

After QA has completed and you’ve decided how to allocate traffic to the different designs, it’s time to actually run your test. The following are a few best practices to keep in mind before pressing the “Go” button.

1.  Variations must be run concurrently

This first principle is almost so obvious that it goes without saying, but I’ve often heard the following story from teams that do not perform optimization: “After we launched our new design, we saw our [sales, conversions, etc…] increase by X%. So the new design must be better.”

The problem with this logic is that you don’t know what other factors might have been at play before and after the new change launched. Perhaps traffic to that page increased in either quantity or quality after the new design released. Perhaps the conversion rate was on the increase anyway, due to better brand recognition, seasonal variation, or just random chance. Due to these and many other reasons, variations must be run concurrently and not sequentially. This is the only way to hold all other factors consistent and level the playing field between your different designs.

2.  Always track multiple conversion metrics

One A/B test that we ran on the movie detail pages of the DIRECTV Latin American sites was the following: we increased the size and prominence of the “Ver adelanto” (View trailer) call to action, guessing that if people watched the movie trailer, it might excite them to buy more pay-per-view movies from the web site.

We increased the size and prominence of the “Ver adelanto” (View trailer) call to action, guessing that if people watched the movie trailer, it might excite them to buy more pay-per-view movies from the web site.

Our initial hunch was right, and after a few weeks we saw that pay-per-views purchases were 4.8% higher with this variation over the control. This increase would have resulted in a revenue boost of about $18,000/year in pay-per-view purchases. Not bad for one simple test. Fortunately though, since we were also tracking other site goals, we noticed that this variation also decreased purchases of our premium channel packages (ie. HBO and Showtime packages) by a whopping 25%! This would have decreased total revenue by a much greater amount than the uptick in pay-per-views, and because of this, we did not launch this variation to production.

It’s important to keep in mind that changes may affect your site in ways that you never would have expected. Always track multiple conversion metrics with every test.

3.  Tests should reach a comfortable level of statistical significance

I recently saw a presentation in which a consultant suggested that preliminary tests on email segmentation had yielded some very promising results.

Chart showing conversion rates per 1000 emails sent.

In the chart above, the last segment of users (those who had logged in more than four times in the past year) had a conversion rate of .00139% (.139 upgrades per 1000 emails sent). Even though a conversion rate of .00139% is dismally low by any standards, according to the consultant it represented an increase of 142% compared to the base segment of users, and thus, a very promising result.

Aside from the obvious lack of actionable utility (does this study suggest that emails only be sent to users who have logged in more than four times?) the test contained another glaring problem. If you look at the “Upgrades” column at the top of the spreadsheet, you will see that the results were based on only five individuals purchasing an upgrade. Five total individuals out of almost eighty four thousand emails sent! So if, by pure chance, only one other person had purchased an upgrade in any of the segments, it could have completely changed the study’s implications.

While this example is not actually an optimization test but rather just an email segmentation study, it does convey an important lesson: don’t declare a winner for your tests until it has reached a “comfortable” level of significance.

So what does “comfortable” mean? The field of science requires strict definitions to use the terms “significant” (95% confidence level) and “highly significant” (99% confidence level) when publishing results. Even with these definitions, it still means that there is a 5% and 1% chance, respectively, of your conclusions being wrong. Also keep in mind that higher confidence intervals require more data (ie. more website traffic) which translates into longer test durations. Because of these factors, I would recommend less stringent standards for most optimization tests – somewhere around 90-95% confidence depending on the gravity of the situation (higher confidence intervals for tests with more serious consequences or implications).

Ultimately, your team must decide on confidence intervals that reflect a compromise between test duration and results certainty, but I would propose that if you perform a lot of testing, the larger number of true winners will make up for the fewer (but inevitable) false positives.

4.  The duration of your tests should account for any natural variations (such as between weekdays/weekends) and be stable over time

In a 2012 article on AnalyticsInspector.com, Jan Petrovic brings to light an important pitfall of ending your tests too early. He discusses an A/B test that he ran for a high-traffic site in which, after only a day, the testing tool reported that a winning variation had increased the primary conversion rate by an impressive 87%, with a 100% confidence interval.

The duration of your tests should account for any natural variations (such as between weekdays/weekends) and be stable over time.

Jan writes, “If we stopped the test then and pat each other on the shoulder about how great we were, then we would probably make a very big mistake. The reason for that is simple: we didn’t test our variation on Friday or Monday traffic, or on weekend traffic. But, because we didn’t stop the test (because we knew it was too early), our actual result looked very different.”

Chart showing new design results over time.

After continuing the test for four weeks, Jan saw that the new design, although still better than the control, had leveled out to a more reasonable 10.49% improvement since it had now taken into account natural daily variation. He writes, “Let’s say you were running this test in checkout, and on the following day you say to your boss something like ‘hey boss, we just increased our site revenue by 87.25%’. If I was your boss, you would make me extremely happy and probably would increase your salary too. So we start celebrating…”

Jan’s fable continues with the boss checking the bank account at the end of the month, and upon seeing that sales had actually not increased by the 87% that you had initially reported, reconsiders your salary increase.

The moral of the story: Consider temporal variations in the behavior of your site visitors, including differences between weekday and weekend or even seasonal traffic.

Step 7: Analyze and Report on the Results

After your test has run its course and your team has decided to press the “stop” button, it’s time to compile the results into an Optimization Test Report. The Optimization Test Report can be a continuation of the Test Plan from Step 2, but with the following additional sections:

  1. Results
  2. Discussion
  3. Next steps

It is helpful to include graphs and details in the Results section so that readers can visually see trends and analyze data themselves. This will add credibility to your studies and hopefully get people invested in the optimization program.

The discussion section is useful for explaining details and postulating on the reasons for the observed results. This will force the team to think more deeply about user behavior and is an invaluable step towards designing future improvements.

Conclusion

In this article, I’ve presented a detailed and practical process that your team can customize to its own use. In the next and final article of this series, I’ll wrap things up with suggestions for communication planning, team composition, and tool selection.

2 comments

Comments are closed.