This website uses cookies. By using the website you agree with our use of cookies. Know more

Technology

AB Testing Mobile Apps - iOS

By David Silva
David Silva
A passionate tech guy and Hip Hop culture enthusiast, with my Adidas and me close as can be.
View All Posts
AB Testing Mobile Apps - iOS
AB Test

Big companies like Farfetch work hard  to keep delivering groundbreaking features to our customers.
All these possible changes, from a simple design update to a disruptive new feature, directly affect our users' experience and the way they will benefit from our product.
In order to reduce possible effects caused by these developments, they are often introduced in the form of a digital randomized control trial known as an AB test.
We can think of an AB test as an experiment with two different variants, commonly known as control and alternative. As the name suggests, two versions of that experiment are launched and compared.
Half of our user population will experience the original version without changes (control version). As for the other half, they will be able to experience the version that includes the new feature to be launched (alternative version).
The two versions are randomly attributed to our users so that in due course, using statistical analysis, we can assess which version has the best performance and therefore becomes the winning variant.
With that said, keep in mind that we are not limited to this type of AB test. We can test a feature with more than one alternative, turning that AB test into an ABC test or ABCD test, and so on. Even when it comes to the split, we are not limited to halves, we can have 80% of our users using the control version and 20% using the alternative version, for example. Each case is different and we adapt the test to our needs.

Before development

Hypothesis

There’s a lot going on behind the scenes before our product team decides to present a new feature to the engineering team to be launched as an AB test. For instance, let us consider a recently launched AB test in the Farfetch iOS app, which allows the user to automatically sign in when the app is launched by using biometric authentication.
A well-constructed hypothesis is the foundation of a good experiment. 
Creating a hypothesis requires three different components: Insight, Product Change and Behavioural Change.

Insight

List of insights, qualitative and quantitative from research or previous experiment iterations learnings. 
For our example, insights included a seemingly low percentage of active users that were signed in. We also noticed that rates of users logging in again were quite low after their session tokens expired.

Product Change

Short description of what the product change is that you want to introduce, written as explicitly as possible.
In this case, for users that have enabled biometric authentication, they will be prompted with the option to sign in automatically using Face or Touch ID when the application starts.

Behavioural Change

Short description of the behavioural change you are expecting to induce on the customer behaviour. 
For our example, we expect users to sign in when the application starts, therefore improving their experience.
With this we can now build our hypothesis: Based on our observed percentage of our active signed-in users and how often they sign again, we believe that automatically signing in the user using Face or Touch ID will result in more users signed in as measured by sign in rate.

Measurable impact

For a given AB test to be successful, it is crucial to have the primary goal, as well as the KPI to measure impact, clearly stated from the very beginning. This will allow us to monitor and make a win or lose decision for a given AB test.
Multiple guardrail metrics can be used to evaluate an experiment. Among these metrics you will typically find ones common to many of our experiments: conversion rate, add to bag percentage, visitor conversion rate, etc. However, the primary metric is the only one that determines the experiment outcome. In this case, we set the sign-in rate as our primary metric to evaluate the experiment.

Development

Farfetch AB Service

At Farfetch there is a core Experimentation Team. It is entirely dedicated to supporting our self-service experimentation needs across the company. This includes responsibility for developing and maintaining the Farfetch AB Service(FABS).
A great number of developers, each in different product or service teams and each using different technologies use FABS to run their experiments. For the sake of this presentation, we will focus on the way we do it for our Farfetch iOS app.
For iOS, we use the public REST API provided by FABS for our AB testing.
By calling only two endpoints, we manage to launch a new AB test:

  • A given iOS app user may be eligible for one or more experiments running concurrently on the app. The predict method allows us to optionally anticipate which variant a user should receive based on their random allocation. By querying for these variations upfront, we can pre-identify them at a convenient time in the user flow. Later we can then reveal these same variants when the user encounters an experiment.

  • While the above predict method allows us to anticipate random experiment variations for the user ahead of time, we still won’t know if the user actually encountered a given experiment. The participate method signals to our back-end analytics that a user has, in fact, entered an experiment. This way we can attribute the user’s activity to their given variant as an active experiment participant.
  • The participate method can also be called without a prior predict, in which case it both randomly selects a variant and signals our back-end analytics.
Variant Distribution

FABS uses a hash function over the combined experiment name and participant ID to provide a repeatable random variant for the same client. The participant id is a UTF-8 string that uniquely identifies the subject being tested. For iOS, we use the advertising identifier. If the user has the limit ad tracking option on, making the advertising identifier unavailable, we generate one for that particular user using [NSUUID UUID].UUIDString, and store that using UserDefaults, an interface provided by Apple to the user’s default database. It allows us to store key-value pairs persistently across different launches of our application. 

By doing this, we ensure that for the same experiment name and participant ID pair provided to FABS, the exact same alternative is returned for a consistent user experience.

Swift API for AB test 

There is a great number of live experiments active on the iOS app. For this reason, we are often developing new tests and removing old ones. In order to manage our experiments in the best way and to speed up the creation of new tests, the Mobile Domain at Farfetch created a Swift API to abstract all the logic associated with the way we deal with AB tests in iOS.

Protocols

Protocols are pretty common and are an underlying feature in Swift. At its heart, Swift is protocol-oriented as stated by Apple. A protocol allows the developer to specify an interface and then make objects conform to that protocol. You can actually make one object conform to multiple protocols.



Here you can see how we use protocols to define something that could be AB tested and therefore experimentable:
  • A unique ID  that identifies the experiment
  • The variants associated with that experiment
  • A flag that tells if a given experiment is currently active
By conforming to the protocol mentioned above, we can create our experiment. Let’s focus again on the previously mentioned experiment to automatically sign in our users.



  • We use an enum to represent all the experiments because it’s the best way to organize a group of values that are related. With this we create our own data type, Experiment, and therefore ensure that our code is type-safe and easy to use.
  • We also specify that, for this enum, every raw value will be a String. After that we can use that raw value as the experiment ID .
  • To represent the variants we follow the same pattern: an enum is created for each experiment.
  • All the variant types are held in a single struct Variant defined in the Experiment enum extension.
Public Functions

Predict


  • Predict all experiments currently active during the launch of our app.
Participate



  • Receives an experiment as a parameter and participates in that given experiment.
Alternative


  • Returns the current variant for a given experiment using Generics.
Final Step


  • By making use of our public API in Swift, we can easily check the current variant for the experiment that we want. This is possible because we predicted all the variants during the launch of our app.
  • After that we just need to participate in the experiment that our user is currently testing.

After development

Continuous monitoring
In order to evaluate an AB test and make a statistically significant win or lose decision as early as possible,  we use continuous monitoring rather than the fixed horizon method. For that, the product manager responsible for the experiment must previously define a Minimum Detectable Effect for it.
With this method we are able to continuously look at new data and either, end the experiment and make a decision, or continue collecting data.
The experiment will come to a conclusion when the alternative wins or loses against the control or when the minimum detectable effect reaches the relevant value previously defined.

Rollout

After the test is done and we have all the necessary data to conclude that the alternative is the winner, we can rollout the new feature to our users.
It is possible for our core Experimentation team to estimate the Gross Transaction Value (GTV) impact introduced by the new feature. This value is automatically calculated when an experiment is over. The calculation uses the user scope, platform, uplift on primary metric, and targets for the year as inputs.
Our GTV impact estimates for an experiment that the new feature will have the effect measured on the user scope for a full year after the experiment is rolled out.

Conclusion

As you can see, there is a lot going on before launching a new feature to our Farfetch iOS app users. From the Experimentation team to product managers and engineers, to finally our app users, all the steps are meticulously calculated so that the final product and user experience are the best that we can deliver.

Lessons learned

  • Even if you take some new feature as definitely a winner, it’s always good to listen to what your users think.
  • A new feature is never too small to be AB tested, even the slightest change may affect the way your user experiences the app.
  • Using the right KPIs you can learn a lot from an AB test and gather new insights for the future.
  • Always consider that by doing an AB test you are adding extra code to your codebase. When you reach all the conclusions for your experiment, take the time to delete the code that is no longer needed, otherwise, it could represent a huge impact in your team development velocity and also an increase in the validation time for your QA team.

With this post, I hope that you all embrace the AB testing mentality, whether working at big companies or on small personal products. 
Related Articles
How to build a recommender system: it's all about rocket science - Part 2
Product

How to build a recommender system: it's all about rocket science - Part 2

By Diogo Gonçalves
Diogo Gonçalves
An engineer, a scientist, a sustainability lover and an AI geek craving for exploring the world with The North Face.
View All Posts
Paula Brochado
Paula Brochado
Astrophysicist of the galaxies, eternal pupil of arts, lover of (good) people, in a quest for all Adidas OG.
View All Posts
View