Here at Apartment List, A/B testing plays a key role in ensuring the best features get rolled out to our users. During our summer internship here, we were tasked with improving the Vanity library to make A/B testing easier and more streamlined than ever.
What is this A/B Testing you speak of?
Would a user’s call to action be more effective if the text was bolded or italicized? What’s the best way to implement Big Feature X to please your users? A/B Testing is a way to answer these questions. Commonly used in Experiment Driven Development, A/B Testing consists of rolling out different versions of a feature to different subsets of users, then comparing performances to determine which version to ultimately use.
Okay.. what is Vanity?
Vanity is an open-source A/B testing framework for Ruby on Rails. It’s also what we use on Apartment List to conduct our own A/B tests. Vanity allows you to define A/B tests and track metrics to show which alternative performs the best (in 5 easy steps).
Cool, so how exactly did you make this insanely useful sounding tool even better?
While Vanity made defining A/B tests a snap, managing the tests left much to be desired. The business team had to continuously coordinate with the programming team to control A/B tests, which often meant hours or even days of delay to implement changes. That’s where we came in. We implemented several new features in the Apartment List fork of Vanity to solve this problem:
Enable & disable experiments with the push of a button
|Before:||A/B tests went into effect as soon as the code was deployed. The only way to stop the A/B test was to remove the code and re-deploy, or let the test automatically finish (which only happened if explicitly coded).|
|After:||A/B tests only go into effect when they are launched from the Vanity Dashboard by clicking ‘Enable’. Enabled A/B tests can be stopped at anytime by clicking ‘Disable’. A disabled experiment will not track visitors and will always show the ‘default’ option (another added feature).|
|Why:||Imagine if an experiment went haywire – say, because of a bug in one of the choices- and you couldn’t easily turn it off. That’s unnecessary downtime. You also don’t want an experiment to go live right as a new release is deployed, in case the business team isn’t ready for it.|
Control when and how experiments should complete
|Before:||Configure an experiment to complete automatically and choose an outcome based on some function. Otherwise, the A/B test would run as long as the code was there.|
|After:||On the Vanity dashboard, clicking the ‘complete’ button next to any alternative will complete the experiment with that alternative as the outcome. You no longer need to guess a function that will determine an appropriate winner or run an A/B test longer than necessary, while scrambling to edit the codebase to reflect the experiment’s results.|
|Why:||Many factors can go into deciding what’s the best implementation of a feature; these factors can’t always be modeled by a function. You might also want to analyze data from other sources before making a decision. For example, even if conversions increase, the quality of conversions and the average time spent on site might drop. Also, letting tests run longer than necessary means more users are seeing the not-as-good version of a feature, which means you’re losing revenue you could be getting if the better alternative was in place.|
Reset experiments when something is off
|Before:||Go into the database or console and manually wipe experiment data.|
|After:||On the Vanity dashboard, clicking the ‘reset’ button in the bottom right corner of an experiment will remove experiment data. You can even reset completed experiments, just in case you find major loopholes afterwards.|
|Why:||Sometimes, experiments may show skewed results due to unforeseen circumstances (e.g. a bug in implementation), and you want to restart it from scratch. Manhandling the production database is never a good or safe idea!|
Track multiple metrics, not just conversions
|Before:||Vanity calculated the conversion rate by taking the number of participants and dividing it by the number of converted participants. This means that each unique user could only be counted once per experiment.|
|After:||Each metric is kept track of individually for each alternative in an A/B test. Under the # converted column, there will be a breakdown of how many of each metric was tracked for that alternative (a participant can be counted more than once). The old implementation of conversions were left in place, this is an added feature.|
|Why:||Not all conversions are equal. For example, someone who spends $500 on an e-commerce site is more valuable than someone who spends $5. Imagine if you had a metric like “amount of money spent”. With the old Vanity conversions model, these two users would contribute the same amount to the conversion rate, even though the alternative that resulted in the $500 purchase is obviously better. Also, if you wanted to track another metric, such as “number of items bought” for the same A/B test, the old Vanity had no way to distinguish which of these metrics contributed to the conversion. By keeping track of individual metrics, you can extract a lot more information from an experiment and make more informed decisions about what works and what doesn’t.|
Wait, there’s more?
Tracking A/B experiments using Google Analytics
While metrics defined in Vanity are acceptable, we can do much better. Wouldn’t it be great to have the rich reporting power of Google Analytics available per experiment, to analyze the difference in user behavior of segment A versus segment B? We embarked on a quest to do just that, and found that while there are multiple ways to accomplish this, none of them were completely satisfactory.
We evaluated three implementations:
- Append data to the pageview URL. The problem here is that information about alternatives is littered all throughout Google Analytics. Single pages are split up into multiple entries, making evaluation of the website as a whole more difficult.
- Track a fake Google Analytics Event on page load. This means that metrics associated with events are skewed.
- Set a Google Analytics Custom Variable. This is what most resources online recommend, and it won’t interfere with events or Pageviews. Unfortunately, Google Analytics only provides 5 custom variables; we’re already using 4. We could encode all of the experiment data into one variable, but that would lead to messed up histories and possibly incorrect tracking of users.
This was only half of the story – we still needed to tell Google Analytics to segment based on the chosen option. Thankfully, the Filters and Profiles features gave us enough control to do this. Importantly, there was a filter to perform a regex based search and replace on the pageview URL. This eliminated the biggest problem with option 1, so we jumped on it. Our current solution is to send A/B test information as part of the pageview URL. For each option, we create a profile that filters out pageviews of other options and then removes A/B testing choices from the URL. This leaves us with the same URLs as we had before, and allows us to easily compare the reports generated different profiles.
This is acceptable, but it still feels a bit hacky and clumsy. It requires a lot of manual work to set up a Profile for every alternative of every experiment (for every web property, in case the experiment affects multiple). Every Web Property needs to have two Profiles – one that holds all the raw data, and one that removes all the vanity data without segmentation – which must be setup manually. Profiles are not retroactive so you must make them before the experiment starts (which, thankfully, you can now control). If you have a better idea for how to do this, we would love to hear it!
This feature was not written as part of the Vanity gem, but instead coded directly into Apartment List, so we would like to ask you whether you want this feature. Would you like it as part of the Vanity gem?
Wow I need a moment to take this all in, can I try it for myself?
Yes you can! Just follow the instructions at the github page. It’s as easy as changing the Gemfile to point to our fork. We recommend specifying default values for your current experiments. Your current experiments will also all be turned off upon upgrading (but they can be turned back on with the push of a button!).
We also plan on sending a pull request to the original repository. ApartmentList.com runs on tons of open-source code and we would like to give back to the open-source community.