Balancing Between Product Experimentation and Software Reliability

"Businesses need to be agile, test hypotheses cheaply, and respond quickly to new market insights by keeping development cycles short and data models flexible.” - Martin Kleppmann
Successful internet services require agile, rapid and iterative product evolution to learn what your customers love and how you can better deliver it to them. However, all that code churn can make your application unstable, failing at its mission to delight your customers. In this blog post, we discuss how we manage this delicate balance between rapid iteration and software reliability.
Who We Are: the Product Display Page Team
Whenever you visit farfetch.com and click on a product, you experience our application. We are the painters behind each product’s portrait, and every day we try to paint it better. Our team is called the Product Display Page (PDP), and our challenge is to continually delight our customers.
From a wide backlog of breakthrough ideas, we pick the ones that might have the greatest impact. However, our own professional guesses about our customers' desires are not sufficient. This is why we carefully experiment with these ideas to understand if, and how well, they work.
A customer for us doesn't simply mean an individual that buys goods from our site . Merely focusing on a quick sale does not build a long-term, sustainable business relationship. A customer is a human being that trusts us, and we must honour that by treating them as a close friend with a common interest. We are driven to ensure she has an unforgettably smooth experience, making her feel welcome in our house. This means focusing on every tiny detail that makes our friend feel special.
What We Do: Great Ideas + Test, Test, Test!
As PDP, we select our best product ideas and perform experiments to understand if, and how well, they service our customers' needs. For that, we use a practice called randomized controlled trials, or A/B testing. This allows us to compare two or more versions of a page and validate, through an analytical approach, how a single change might improve various behavioural metrics reflecting the quality of a customer's experience.
Introducing the wrong product changes could not only cost us lost sales worth millions of dollars, but it could also hurt our ability to attract new customers and retain existing customers over the long term. Guessing if a feature works is not good enough: we have to prove it.
Challenge We Have: Continuous Change vs. Reliability
Experimentation is practised throughout Farfetch. What makes our team different is that we service some of the most requests per minute (RPM). PDP represents a large portion of FARFETCH’s revenue, making our efforts an essential part of the business with a huge impact.
What makes our team mission unique is contending with two opposing business forces - one constantly experiments with different product features, and the other guarantees the reliability of its service delivery. The former pushes for continuous improvement and deployment. The latter pulls it together for performance, consistency and reliability under significant load. It's an extremely tricky balancing act.
For us, creating reliable software means that not only our applications perform well under peak demand, but also that they are fault-tolerant in the face of adversity. Our reputation is at risk every single second.
People First
One of our values at FARFETCH is "Todos Juntos", which is Portuguese for "all together". Thus we treat success as a team sport.
Code Quality Guild
Some years ago, we realised that although many of our dev-teams have distinct roles, all of them share common interests. We use the same tools, share the same practices and implement similar code. The problem is that knowledge and experience can become siloed within each team rather than shared, sometimes creating re-duplication of learnings, techniques, or even just efforts. This is a common phenomenon in large companies.
So, last year we adopted the guild-community concept, inspired by Spotify. It consists of a group of developers from a wide range of teams. Everyone is free to join. What makes a guild special is that it explores a particular topic. The group occasionally gets together to discuss and share their knowledge and experience regarding the topic. We assign one or two people as coordinators, to help make the monthly events happen, where we discuss research-articles, tools and practices that may be useful.
Because our product feature research is so dependent on the trial-error of experiments, our applications must accommodate rapid evolution. This churn makes code quality critically important. Hence we created the Code Quality Guild. The guild has explored tools that have produced great results once integrated into our workflow.
Inner Source Model
At FARFETCH, we use the inner source model. This means that we embrace all the best practices and culture of the open source movement applied to our internal code. This allows everyone at FARFETCH to contribute to any repository or simply request new features.
Inner Source Impacts on the PDP
Behind any merge request there’s an investment of time and money that we, as maintainers, have to respect. It’s not only our job to develop new application features, but we must also maintain the contribution process. We are ultimately responsible for guaranteeing project code consistency and quality. So we can’t simply refuse contributions or ask for a different implementation approach. It’s our job to anticipate conflicts and prevent them from reaching the point of a merge request.
As maintainers of a widely contributed project, each year we survey he teams that are contributing to it, in order to understand their pain points and to define a plan to extinguish them. We focus on evaluating five components of the project workflow: the code, the pipelines, the testability, code review and deployment.
Here are some common survey questions:
- Is it easy to work with the code and add new features?
- Is it easy to work with our continuous integration process?
- Is it easy to test the application?
- Do you feel the code review process is working properly?
- Do you have any issues with the deployment process?
At FARFETCH, we are also trying to contribute to open source more and more. We already support some open source projects (e.g., KafkaFlow) and we have a mission to expand this and contribute more to the developers’ community.
The Role of Code Review
Code review is a fundamental part of our workflow. Every feature, no matter how big or small, passes through a formal code walkthrough that encompasses various quality points to be analysed. For instance, one such point validates the functional requirements, making sure it works as expected. Another analysis is the performance impact the feature may have. Other fundamental points include the code maintainability, testability and security. Not going to lie: code reviews require a lot of effort.
Nevertheless, what makes code reviews totally worth it, besides guaranteeing the quality of code, is how they facilitate knowledge sharing. Each participant has the chance to learn and teach. Senior devs have the opportunity to coach junior devs about programming-practices, tools and libraries, while learning more about the development practices and needs of their coworkers.
The broader team also benefits because each one becomes attuned to the best implementation approaches of the collective. Ideally, this leads to implementation consistency. In an inner source model, this consistency enables developers from other teams to easily understand and contribute to each other's projects.
Behavioural Code Analysis
Usually, developers use several tools to guide them through refactoring technical-debt. However, these tools focus on code instead of people. Andy Hunt once said, "Code is write once, and read many. So it's usually worth some effort to help make the code more human-readable."
Behavioural Code Analysis, as proposed by Adam Tornhill, is the Agile approach to static code analysis. Instead of focusing on the complexity of the code, it focuses on the people working on the code. It accomplishes this by using version control metadata, also known as code churn, that measures the rate at which the code evolves. With each commit, the tool records the lines added and deleted for each modified file. Version control also saves the person who made the change, when and why, through the commit message.
In this new approach, we don’t care the most about the most complex files. Instead, we care more about the hotspots - the parts of the source-code that have changed quickly. Research has shown the frequency of change is a good indicator of code erosion. That is, if a specific part of the code is changed frequently, it most likely means it has accumulated too many responsibilities. This is where investing in refactoring will be the most profitable.
Another indicator of declining quality is change coupling. It’s when class A is implicitly coupled to class B. There is no explicit dependency, but whenever you change one you also have to change the other. Code coupling also applies to functions that change at the same time but are not explicitly linked.
The question now arises: how do we refactor this? There is no silver bullet here, but there are multiple approaches to consider. For example: divide and conquer - we divide the code as much as possible. Another approach is the Principle of Proximity, where the developer keeps the two functions visually close. This proximity more readily alerts the developer to change both functions together.
Test Code Is More Important Than Production Code
"The moral of the story is simple: Test code is just as important as production code. It is not a second-class citizen. It requires thought, design, and care. It must be kept as clean as production code.” - Robert C. Martin, Clean Code: A Handbook of Agile Software Craftsmanship
In Lehman's law of Continuing Change, a program must be continually adapted or it progressively suffers code rot. In order to survive in our business, we have to take risks and innovate. Thus, it's common in our industry to neglect tests because they add friction without bringing value. Or so we think. In reality, they are crucial. Not only do they guarantee that the application logic is working as expected, but they also serve as documentation and as an indicator of the code quality. Complicated test code reflects the complexity and quality of the production code.
The tests are what make us confident that we can deploy the code at any time without trouble. They also help us to remember critical business rules that resulted in past bugs. If we don’t have this safety net or we don’t have faith in it, that will lead us to longer releases and a need for more team members to do the release.
Syntax: Lint and dotnet-format
Considering that code is write-once and read-many, indentation becomes a major quality factor. Striving for its consistency can be a big win. Studies have shown a correlation between code indentation and code complexity. The conditions and loops within the code are accompanied by leading tabs and whitespaces.
Basically, indentation is a reflection of the code's complexity [Indentation as a Proxy for Complexity Metrics. Program Comprehension, 2008. ICPC 2008. The 16th IEEE International Conference on [HGH08]]. It's not merely a question of aesthetics, so treat it with care and avoid random and inconsistent code shape by enforcing a consistent coding style.
We use lint on the front-end of our applications for static code analysis, which is widely adopted in the industry. However, back-end code is a work in progress. There still isn't a widely adopted tool. One tool we've just started using on our dotnet applications is dotnet-format. It automatically formats the code based on the rules defined in a specific configuration, the ".editorconfig" file.
We’ve integrated this tool in a git-hook. Whenever a developer commits new changes, dotnet-format checks for styling issues and fixes them automatically, without even the developer noticing it. This way we prevent unfortunate surprises for the developer when their code is executed within the build and deployment pipelines.
This also allows our code reviews to focus more on feature logic and not syntax. Focus on what matters most.

BenchmarkDotNet
Within the PDP team, we try to be on the vanguard of technology by constantly researching tools that may improve our work's quality and efficiency. One tool we've discovered is BenchmarkDotNet. Used by several important projects, such as Roslyn, it's very useful for producing microbenchmarks of the code. Each benchmark includes a variety of measurements, such as mean execution time and memory allocation across different dotnet runtime frameworks. Through these measurements, one can accurately understand what the impact of the code is, track its performance and compare it with baseline metrics.
We have integrated it into our workflow to test our mission-critical code, which is the most executed block of program instructions across the entire application. Testing this subset is especially important because any degradation here significantly impacts the overall application's performance.
It was not enough for us to simply integrate this tool. We had to make it more usable. By configuring it in a generic way, we enabled any contributor to easily pick it up and start using it in their task. Everyone is able to create new benchmarks without almost any configuration. Usability is key here because we want people to use it more and more.
Though we believe that "premature optimisation is the root of all evil”, there are parts of the code, mission-critical code, that must be carefully tracked. Of course, readability and maintainability are also important. Code that is performant but difficult to maintain will inevitably rot and lose its performance edge over time.

Conclusion
One of the great challenges with maintaining the Product Display Page is balancing the product experimentation with software reliability. Both are conflicting forces, while one pushes for constant shipping of new experimentational features, the other pulls towards guaranteeing the application keeps being reliable and with the latest technologies.
As maintainers of the PDP, we have been building a culture that strives for both goals by adopting certain practices and tools:
- Coding is like a team sport, so the focus should first be on the people and not the code, as the code is the mirror of the interactions between them
- Test code is more important than production code, as they are the safety net that will lead us to bring value to our product faster
- Automate as much as possible, but don't forget about code readability and maintainability!