This website uses cookies. By using the website you agree with our use of cookies. Know more

Technology

KEDA in Kubernetes for a Stream Processor Service

By Felipe Lino
Felipe Lino
I've been working with IT since 2006 following the evolution of the distributed systems since EAI, SOA, Microservices REST and Kubernetes. I love to work on Farfetch where I can find my Reebok sneakers.
View All Posts
KEDA in Kubernetes for a Stream Processor Service

KEDA in Kubernetes for a Stream Processor Service

FARFETCH works with top technologies such as Kubernetes, Kafka, Cassandra, MongoDB among others. And this article is about KEDA (Kubernetes Event-Driven Autoscaling) and how we find a good way to use it to scale a stream processor application.

Motivation

KEDA with the Prometheus Scaler available on FARFETCH opens a world of possibilities, once we can use any query on Prometheus to recover metrics to be used to scale our services.

One motivation to adopt KEDA is the possibility to scale based on throughput, mainly for applications stream processed. Once the HPA (Horizontal Pod Autoscaling) only allows scale based on CPU and Memory.

In this section we are going to cover some basics.

Kafka Metrics

At FARFETCH we collect some metrics regarding Kafka as:

  • Kafka lag: the amount of messages not consumed yet for a given service in a point of time.

  • Kafka burndown: the expected time that a given message "waits” in a given topic until be consumed by a service.

  • Producer rate: the quantity of messages that the producer puts on a given topic by second.

  • Consumer rate: the quantity of messages that the consumer processes from a given topic by second.


All of these metrics are available on Prometheus and can be used.

KEDA Algorithm

The KEDA will calculate the desired replicas for each scaler following the HPA (Horizontal POD Autoscaling) algorithm, and are going to scale to the max desired replicas.

Example:

  • Scaler for CPU - desired replicas: 3

  • Scaler for Memory - desired replicas: 2

  • Scaler for Prometheus - desired replicas: 2

It will scale to 3 based on CPU once is the max desired replica.

The HPA (Horizontal POD Autoscaling) Algorithm


Details available on the official documentation.

I highlight this because it is important to understand how it works and how to choose a "good” metric to base in our Scaler.

The scale down only happens when the currentMetricValue is less than the desiredMetricValue.

Searching for a "good” metric

Now that we have the foundations: available metrics, know how KEDA and HPA works; we can go ahead.

The first thing is to choose the best metric to base our scaler.

The problem with scalar numbers too high

Taking the example of Kafka Lag: How many lag is acceptable?

Example: 1 thousand messages of lag? 5 thousand, 10k, 30k, 300k?

Let’s pretend that we have the following conditions:

  • current instances of 50;

  • max acceptable lag in 5.000; 

  • current metric is 30.000

Using the formula:

We can conclude that scalar values too high give us high numbers to scale up. And to downscale takes too much time if the desiredMetric is low compared to the values that the metric value can achieve.

Also the jump is high to scale up or down, because the distance between the lowest possible value and the highest possible value is huge.

Conclusions so far…

  • The desiredMetricValue should be the highest as possible close to the max value.

    • For example when we think in CPU the desiredMetricValue should be close to the 100%

  • The metric used should fits in a limited interval

    • Again using the CPU as example, the interval is 0 - 100% (a little more than 100% sometimes).

  • The metric should have "low” numbers.

Trying to limit the interval

Since the Kafka Lag can grow to "infinite” and we desire a metric that fits the previous conditions:

  1. Interval with scalar numbers limited

  2. Interval with "low” numbers

We resolve to try to use a Logarithmic function, so even when the lag grows a log of the metric value is "limited”.



Let’s look to a graph of the Kafka Lag x Log2 x Log10

  • Kafka Lag - 24 hours

Log2 - Kafka Lag - 24 hours
  • Log10 - Kafka Lag - 24 hours

So, even though the Kafka lag vary between 0 to 250K:

  • The Log2 varies between 0 to 18.

  • The Log10 varies between 0 to 6

Back to math to check our desired replicas, if we accept a lag of 5.000 with the same current instances of 50.

Log2
  • desiredMetricValue for the log2(5.000)   = ~12;

  • currentMetricValue log2(300.000) = ~18


Log10

  • desiredMetricValue for the log10(5.000)   = ~4;

  • currentMetricValue log10(300.000) = ~6 (round up)


The maximum instance is "limited” by the metric using log, so we don't care if the lag increases a lot, we are going to limit it in some way (of course we should set the max instances through the deployment file for kubernetes too).

The graph for Log2 and Log10 is the same, but the interval for Log10 is too short, when increase +1 or decrease -1 the effect is to add or remove many instances at once.

Conclusions so far…

  • Log can limit the interval as expected;
  • Intervals too short can add / remove many instances at once;
  • The metric should have a balance interval and shouldn’t be too short;

Trying to limit the interval … but not so short

As we saw before between the Log10 and Log2 a Log2 gives us a better interval (0 to 18 > 0 - to 6). But not enough…

Let’s use some multipliers…maybe 7 (the number of perfection)


Now we have an interval 0 to 125.

Again, back to math to check our desired replicas, if we accept a lag of 5.000 with the same current instances of 50 the 

  • desiredMetricValue = log2(5.000)     x 7 = 84;

  • currentMetricValue = log2(300.000) x 7 = 127;



Now the difference +1 or -1 will scale the number of instances smoothly without high jumps.

We found a "good” metric

Finally we have a metric that fits:

  1. Interval with scalar numbers limited

  2. Interval with "low” numbers

  3. Interval isn’t too short

  4. The desiredMetricValue is close to the highest possible value

  5. Expected to add/remove instances smoothly

Real World

The current scenario for our service is the minimum of 4 instances and a maximum of 80, scaling based only by CPU usage.

Adding KEDA and scaling by Kafka Lag + CPU

The lag accepted is 20.000, so let’s calculate our desiredMetricValue

  • desiredMetricValue= log2(20.000) x 7 = ~100

To prevent any issues if a given metric isn’t available on Prometheus by any reason we adopt as good practice always scale based on CPU too.

Our final configuration for a ScaledObject (simplified)


The function "clamp_min” is to avoid negative values, this sometimes happens when we have a "glitch” in our tool to extract the kafka lag metric.

Results after some weeks running

The changes were performed in the middle of June. The following graph shows the daily cost evolution of the service for Kubernetes.



The minimum values are on weekends, the peak values are lower than previous weeks. The cost reduction was in the order of 52%.

In the HPA graph for the month of June, we can see that the quantity of running instances is lower from the middle of the month until the end.

And the Kafka Lag’s graph shows that even though we are using less instances most of the time, following the Kafka Lag’s curve we have less accumulated lag than before.


Trying other metrics

Before reaching the solution using the Kafka Lag’s metric, we tried the others and here is why we discarded them.

Consumer rate x Producer rate

The query is basically the consumer rate subtracting the producer. When the consumer can’t follow the rate of producer should scale up.

Why discard it? Because the rate can be very high at some point compared to the consumer, the lag will increase a lot, the scale up is going to happen… But a few minutes after the rate of both are very close and the downscale happens, but the lag didn’t decrease completely and can be huge.

Kafka Lag Burndown Seconds

Instead of looking at the Lag as the amount of messages we can see as the time that will take to process a message in the queue.

The reason to discard it, is because the graph is very unstable and doesn't follow the lag, causing scale up/down frequently.

Final words

The final conclusion that we took after this experiment is that your selected metric should fit some rules and if the job is well done you can have a stable behavior of your service and save costs at the same time.

And remember that adjusting Kubernetes resources and settings of your application is always an on-going process once the behavior of clients and surrounding system changes along the time; and your service as well.

I hope that this article can help you in some way.




Related Articles
Geo-distributing content at scale
Technology

Geo-distributing content at scale

By Nuno Fernandes Monteiro
Software engineer with full-stack development competencies. Experienced in the fields of eCommerce, fraud and revenue ensurance. Nike fan.
View All Posts
Thiago Rebouças
Thiago Rebouças
Thiago Rebouças
Helping to design distributed systems with head above the clouds and Adidas Ultraboost on the feet.
View All Posts
View