Protobuf Lab Session

At FARFETCH, we develop and run our own, full-fledged, OpenID Connect standard compliant identity server. You can read more about it in Authorization and Authentication @ FARFETCH. This allows us to have more control over our users’ data and to tailor every layer of this platform to our business needs, from the storage, where geolocation restrictions apply, all the way up to the design of the flows that impact the user experience.
The need for scalability and reusability between a growing number of teams led this platform’s architecture to evolve from the monolith it was before into a micro service oriented one. Also, given our appetite for beautifully designed HTTP APIs and API-First approach, it shouldn't come as a surprise that every service of this platform interacts with each other in a REST fashion using JSON payloads.
As the software evolves and we try to keep the pace, we naturally create abstractions to deal with the increased complexity these new architectures entail, by creating client SDKs, gateways, parsers, serializers and so on. While we do this, it’s easy to forget that a simple and innocent POST to the `/connect/token` endpoint, to acquire an access token with some scopes, will fetch numerous resources that are now spread across multiple services, for example, users, clients, identity and API resources.
To study scenarios like this, the FARFETCH ID cluster, a group of teams responsible for the FARFETCH ID platform, invented what is now known as LabSessions. The idea is simple: every month, a person from one of the teams is taken out of the sprint, during an entire week, to investigate and develop some topic of his own interest, as long as it can potentially bring value to teams’ products and there is a presentation to the teams at the end of the week. How cool is that?! Because we always strive to squeeze the best performance out of our services, this time, we decided to test the performance of the `/connect/token` endpoint if we replaced some of our internal REST endpoints with Protocol Buffers and gRPC.

Protocol Buffers
Protocol buffers (or shorter, Protobuf) is a mechanism for serializing structured data, developed by Google, that is both language- and platform-neutral. It solves the same problem of JSON but puts more emphasis on speed of serialization/deserialization than on readability of the serialized data.
To better understand the difference between Protobuf and JSON (or XML for what matters) take a look at the next diagram.

Before serializing, we need to write a schema for our messages in Protocol Buffer language (normally in a file with .proto extension). Then, this file is parsed by the protocol buffer compiler (protoc) that will generate classes to be included in our application, in a target programming language. These classes know how to encode and decode a message. By using different compilers for different languages, Protocol Buffers achieve their language- and platform-neutral claim.
The snippet below is an example of a `.proto` file and defines two messages.

The format is pretty much explanatory but there is one key point: see those numbers at the right of each field? It’s the field number. They identify the field in the message binary format, and they must be unique and not change ever, for the lifetime of the specification, once they are in use. If you follow this rule, messages serialized with old code can be parsed by new code, and messages serialized by new code can be parsed by old code. A special type `reserved` exists if you want to delete a field and make sure its number doesn't get used by mistake later.
gRPC
Protocol Buffers are pluggable in any RPC system, but a natural choice would be gRPC. It works particularly well because it is also developed by Google, and the relevant RPC code (server stub and client) can be generated directly from the `.proto` file. This makes it also language- and platform-neutral.

This is an example of a method that receives a `SearchRequest` and returns a `SearchResponse` defined above. Self-explanatory.
Advantages and drawbacks
Here are some of the key advantages of using Protobuf/gRPC:
- Faster serialization / deserialization - particularly if the message is complex and has lots of arrays and dictionaries;
- Smaller message size - which means less bandwidth;
- RPC support - server RPC interfaces can be declared as part of protocol files;
- Structure validation - messages serialized on Protobuf can be automatically validated by the code that is responsible to exchange them;
- Predefined and larger set of data types (Messages, Scalars, Enums, Lists, Nested types, Oneof, Maps, Services, and more)
Some of the drawbacks:
- Lack of resources (documentation, blog posts)
- Smaller community
- Low maturity - especially in some of the tertiary languages
- Non-human readable - binary format
- No more cURL, Postam, Swagger, etc - because the transmission typically doesn’t rely on HTTP but on TCP directly.
- Load balancing becomes a problem - there are, however, solutions
There are many advantages of using protocol buffers in messages between our services, but there are also some drawbacks. That’s why it is important to analyse the use case before deciding what serialization method better suits.
The `/connect/token` use case
After a brief explanation of what Protocol Buffers and gRPC is and its advantages, let’s dive right into our test case.

The token service for a simple authorization code flow, consumes various other REST APIs. For this proof of concept we implemented the `/api-resources` and the `/identity-resources` endpoints using two gRPC services and the exact same payload encoded in Protobuf. The chosen endpoints are used two times each, for the sample request we tested. For this comparison we also implemented the repositories directly on the token service, to simulate the performance of the monolithic architecture (direct access to the database). The tool used to measure the performance was jMeter and we used a throughput shaping timer to set the desired RPS over 5 minutes like it is shown on the chart below.

For a single run on a test server we achieved a real throughput really close to the expected throughput at 25, 100, and 200 RPS, but when it raised to 600 RPS, we started to hit the service limit.



As you can see in the graphs, we got roughly the same throughput serializing and deserializing with Protocol Buffers than accessing directly the database to fetch the resources. We can also observe that the JSON serialization and HTTP transport are decreasing the throughput in over more than 100 RPS. Let’s also analyse the response times of the service.

It’s clear that the response time (ms) of the service begins to rise earlier in the test, when using HTTP/JSON, as soon as the throughput rises to 200 RPS. Both gRPC/Protocol Buffers and the shared repository implementations start struggling at the same time which means it’s a limitation of the service itself. We can also see a small difference between the gRPC implementation vs the shared repository which proves the serialization layer is still there, but the impact on the overall service performance is greatly reduced.
Next steps
As a proof of concept, this test validates the claim that gRPC/Protobufs is indeed faster. We also learned that up to a certain throughput, the choice of serialization and transport technology doesn’t really matter because the service will perform the same.
Many factors account for the decision of adopting a new technology in our stack, and the limitations about support, readability and tooling must not be overlooked, even more given the context of a large organization where a service is consumed by a lot of teams and maybe external entities.
When the time comes to redesign our architecture and tackle the bottlenecks in our more internal APIs, we will certainly consider this change.