WEBVTT

00:00.000 --> 00:05.000
Hello everyone.

00:05.000 --> 00:07.000
Thank you all for coming.

00:07.000 --> 00:08.000
My name is Augusto.

00:08.000 --> 00:10.000
I am a software engineer at DataDog.

00:10.000 --> 00:13.000
I work at a performance engineering team.

00:13.000 --> 00:17.000
And my work mostly focuses around benchmarking tooling.

00:17.000 --> 00:18.000
Yeah, hello.

00:18.000 --> 00:19.000
My name is Kemal.

00:19.000 --> 00:21.000
And I also work for DataDog.

00:21.000 --> 00:24.000
I am part of Language Platform team,

00:24.000 --> 00:28.000
which maintains client libraries for instrumentation

00:28.000 --> 00:30.000
with all the languages.

00:30.000 --> 00:33.000
I am, like, accidentally here,

00:33.000 --> 00:37.000
because I was using seeing all these PR commands on our PR

00:37.000 --> 00:40.000
says that, oh, there's a performance regression whatnot.

00:40.000 --> 00:42.000
And I, like, dive deep.

00:42.000 --> 00:45.000
And so that it's, like, something cool.

00:45.000 --> 00:47.000
And I ask Augusto, like,

00:47.000 --> 00:49.000
Why aren't we giving a talk about this?

00:49.000 --> 00:52.000
Now that there's this total engineering team room.

00:52.000 --> 00:55.000
And he said, like, why don't you do it?

00:55.000 --> 00:57.000
So now, that's how we are here.

00:57.000 --> 01:00.000
So let's start with the basics.

01:00.000 --> 01:03.000
So performance matters.

01:03.000 --> 01:04.000
Period.

01:04.000 --> 01:08.000
So it matters because low latency systems,

01:08.000 --> 01:11.000
we will see some numbers on these,

01:11.000 --> 01:15.000
but it creates, if you have more performance system,

01:15.000 --> 01:17.000
if you pull down your latency,

01:17.000 --> 01:19.000
it will affect the user experience.

01:19.000 --> 01:22.000
And then you will have high throughput on your online systems,

01:22.000 --> 01:24.000
or like the queueing systems.

01:24.000 --> 01:25.000
And in the end,

01:26.000 --> 01:28.000
it's the better user experience.

01:28.000 --> 01:31.000
This could be rather a U-A, U-I-U-X app,

01:31.000 --> 01:33.000
or it could be any C-I-A,

01:33.000 --> 01:36.000
or, like, anything you would feel these latency.

01:36.000 --> 01:40.000
And on that, we, like, get a couple of studies.

01:40.000 --> 01:45.000
You can see that on our references.

01:45.000 --> 01:49.000
So according to, it's some sort of point of time,

01:49.000 --> 01:52.000
Google measured that 200% like the 100,

01:52.000 --> 01:54.000
500 millisecond delay,

01:54.000 --> 01:56.000
cost 20% of traffic drop.

01:56.000 --> 01:58.000
Same thing happened with the Yahoo,

01:58.000 --> 02:02.000
if you just make things like 400 millisecond faster,

02:02.000 --> 02:05.000
you got like 5 to 9% more traffic.

02:05.000 --> 02:07.000
And these numbers are all probably,

02:07.000 --> 02:09.000
these are, like, more sensitive right now.

02:09.000 --> 02:11.000
And when it comes to the cloud cost,

02:11.000 --> 02:14.000
because of all these performance things happening,

02:14.000 --> 02:18.000
this is like, this is 600 and 75 billion dollars.

02:18.000 --> 02:20.000
It's hard to imagine, apparently,

02:20.000 --> 02:21.000
there's a market for that.

02:21.000 --> 02:23.000
So if you're in this business, go for it.

02:23.000 --> 02:25.000
So, yeah,

02:25.000 --> 02:27.000
take talk without a quote,

02:27.000 --> 02:28.000
wouldn't count.

02:28.000 --> 02:30.000
So this is my first quote.

02:30.000 --> 02:32.000
This is from,

02:32.000 --> 02:35.000
from the CEO of Shopify.

02:35.000 --> 02:38.000
Like, this is one thing that we need to understand, right?

02:38.000 --> 02:42.000
It's maybe not the first class feature of our software system,

02:42.000 --> 02:45.000
but performance is always counts.

02:45.000 --> 02:48.000
And like, the all-vert class photo is there fast.

02:48.000 --> 02:51.000
So when it comes,

02:51.000 --> 02:55.000
how the user perceive this differences and how,

02:55.000 --> 02:56.000
like, they are noticeable,

02:56.000 --> 02:59.000
if it's like order of hundreds of milliseconds,

02:59.000 --> 03:01.000
it's just minimally noticeable.

03:01.000 --> 03:03.000
But if you go up,

03:03.000 --> 03:05.000
then you can see that, like,

03:05.000 --> 03:07.000
users are actually switching away.

03:07.000 --> 03:08.000
I'm sure you feel this, right?

03:08.000 --> 03:09.000
If you're a developer,

03:09.000 --> 03:11.000
like, even I feel that every day,

03:11.000 --> 03:13.000
every day, because of the CI runs,

03:13.000 --> 03:14.000
I hate CI.

03:14.000 --> 03:16.000
And this is, like, my pain, right?

03:16.000 --> 03:18.000
It says 5 to 10 seconds.

03:18.000 --> 03:21.000
Yeah, some CI systems, 30 minutes.

03:21.000 --> 03:22.000
It's killing me.

03:22.000 --> 03:26.000
So, if you want to get just one thing from this talk,

03:26.000 --> 03:28.000
this is that, like,

03:28.000 --> 03:32.000
write benchmarks, but run them continuously.

03:32.000 --> 03:34.000
Probably,

03:34.000 --> 03:36.000
we're going to talk about that,

03:36.000 --> 03:39.000
but probably the benchmarks you have,

03:39.000 --> 03:40.000
also wrong.

03:40.000 --> 03:42.000
So, you also talk about that,

03:42.000 --> 03:44.000
how to make them right.

03:44.000 --> 03:46.000
So, a quick poll,

03:46.000 --> 03:50.000
who has, like, written, has written a benchmark here.

03:50.000 --> 03:51.000
All right?

03:51.000 --> 03:54.000
Yeah, this is what they are trying to form.

03:54.000 --> 03:56.000
I should have, this is coming.

03:56.000 --> 04:00.000
So, who has been surprised by the result?

04:00.000 --> 04:02.000
All right.

04:02.000 --> 04:04.000
Okay, we have our audience.

04:04.000 --> 04:06.000
So, about first, like,

04:06.000 --> 04:08.000
why is what they are slow?

04:08.000 --> 04:10.000
Why these things are happening?

04:10.000 --> 04:12.000
First of all, like, CPUs,

04:12.000 --> 04:14.000
don't recognize the better algorithms,

04:14.000 --> 04:17.000
if you are, like, if you have put something wrong in there,

04:17.000 --> 04:19.000
like, CPUs won't make it faster.

04:19.000 --> 04:21.000
Same goes for the compiler.

04:21.000 --> 04:23.000
You cannot, like, rely on your compilers.

04:23.000 --> 04:25.000
If you are using a compiler right away,

04:25.000 --> 04:27.000
I hope you are not writing Python.

04:27.000 --> 04:28.000
Ruby?

04:28.000 --> 04:29.000
Yeah.

04:29.000 --> 04:30.000
Okay.

04:30.000 --> 04:34.000
So, compilers won't solve all their problems, right?

04:34.000 --> 04:38.000
And then, we always have this notion of, like,

04:38.000 --> 04:40.000
we have a synthetic notation,

04:40.000 --> 04:42.000
so we, like, write our algorithms,

04:42.000 --> 04:44.000
or it's, like, oh, and whatnot,

04:44.000 --> 04:46.000
but then you don't think about, like,

04:46.000 --> 04:48.000
the cache misses, branch misses predictions,

04:48.000 --> 04:50.000
so this is called, like,

04:50.000 --> 04:52.000
mechanical sympathy, so you need to understand

04:52.000 --> 04:56.000
underlying system and optimize your workflows for that.

04:56.000 --> 05:00.000
So, according to another, like, recent study,

05:00.000 --> 05:04.000
by, like, giving attention to these details,

05:04.000 --> 05:06.000
they actually sped up, like,

05:06.000 --> 05:10.000
60K times faster by just, like, tuning these things, right?

05:10.000 --> 05:14.000
This numbers, you know, you can check the reference.

05:14.000 --> 05:15.000
It's cool.

05:15.000 --> 05:17.000
So, how to design benchmarks?

05:17.000 --> 05:20.000
I think there's two major things that you need to focus on.

05:20.000 --> 05:22.000
First, like, how you represent,

05:22.000 --> 05:25.000
representative of your workload,

05:25.000 --> 05:27.000
and then, yeah, it's your, like, results,

05:27.000 --> 05:29.000
are your results are repeatable.

05:29.000 --> 05:32.000
And repeatability requires a lot of signs

05:32.000 --> 05:35.000
that I'm having a hard time to understand.

05:35.000 --> 05:37.000
I will hand it over to August for when it comes,

05:37.000 --> 05:38.000
and then the time comes.

05:39.000 --> 05:42.000
So, Arturo, like, before we segue to that,

05:42.000 --> 05:45.000
we need to classify what are, like,

05:45.000 --> 05:47.000
the benchmarks, and for that, like,

05:47.000 --> 05:50.000
I call this part as the Arturo programming benchmarks,

05:50.000 --> 05:52.000
because we need to decide whether you would like to have

05:52.000 --> 05:54.000
a micro benchmark that you want to write,

05:54.000 --> 05:56.000
or a micro one.

05:56.000 --> 05:59.000
So, when we say the micro benchmark,

05:59.000 --> 06:02.000
it is, like, isolated functions,

06:02.000 --> 06:05.000
operations, like, not a second precision,

06:06.000 --> 06:10.000
the thing that you put on your, like, unit test,

06:10.000 --> 06:13.000
if you are using code, you have an advantage.

06:13.000 --> 06:17.000
There's, like, first class facilities to do that,

06:17.000 --> 06:19.000
but they are, like, not representative.

06:19.000 --> 06:21.000
They are micro benchmarks.

06:21.000 --> 06:23.000
I should have put a code here, like,

06:23.000 --> 06:26.000
the premature optimization is the root of all evil.

06:26.000 --> 06:29.000
It was mandatory, but I forgot about it.

06:29.000 --> 06:33.000
So, this, that falls to the same category.

06:33.000 --> 06:35.000
Okay.

06:35.000 --> 06:38.000
Do you really missed all the things that I said?

06:38.000 --> 06:39.000
Sorry about that.

06:39.000 --> 06:40.000
Okay.

06:40.000 --> 06:42.000
And then the micro benchmarks is about,

06:42.000 --> 06:44.000
they are, like, enter and test flows,

06:44.000 --> 06:46.000
like, really realistic workloads.

06:46.000 --> 06:49.000
They have higher variance, because you are doing a lot of stuff.

06:49.000 --> 06:51.000
Maybe you are, like, talking to other systems,

06:51.000 --> 06:53.000
but not, but they are actually closer to

06:53.000 --> 06:56.000
what you are doing in the production.

06:56.000 --> 06:58.000
And, like, it's hard to isolate the causes.

06:58.000 --> 07:00.000
Like, you can maybe run a continuous profile,

07:00.000 --> 07:02.000
or take a profiting snapshot,

07:02.000 --> 07:04.000
while you are doing this, and try to optimize,

07:04.000 --> 07:06.000
but still, these are the benchmarks.

07:06.000 --> 07:08.000
Like, what you can additionally do is,

07:08.000 --> 07:11.000
like, use a production observability system,

07:11.000 --> 07:13.000
platform, shameless flag,

07:13.000 --> 07:16.000
and then you can, like, discover your production things,

07:16.000 --> 07:18.000
what's going on, and you,

07:18.000 --> 07:21.000
basically know more about these things.

07:21.000 --> 07:25.000
So, choose the right tool for the right job,

07:25.000 --> 07:27.000
but it's in the end, it's a mashup,

07:27.000 --> 07:29.000
like, basically use both of those strategies,

07:29.000 --> 07:31.000
and use your judgment and find the bottlenecks,

07:31.000 --> 07:33.000
and optimize.

07:33.000 --> 07:36.000
So, when we talk about, like, representative workloads,

07:36.000 --> 07:37.000
what are they?

07:37.000 --> 07:39.000
They could be your applications,

07:39.000 --> 07:42.000
could be, like, CPU ones, maybe IO bands,

07:42.000 --> 07:45.000
maybe are, like, in a mixed position,

07:45.000 --> 07:48.000
maybe you are saturating your memory bandwidth,

07:48.000 --> 07:50.000
like, you need to understand,

07:50.000 --> 07:54.000
and try to find the correct representative workload

07:54.000 --> 07:56.000
for your benchmarks, right?

07:56.000 --> 07:58.000
For that, like,

07:58.000 --> 08:00.000
and what we try to come up with,

08:00.000 --> 08:03.000
is some sort of archetypes, right?

08:03.000 --> 08:05.000
While we are designing our macro benchmark tools,

08:05.000 --> 08:07.000
like, it could be an ideal app,

08:07.000 --> 08:10.000
mostly a thing there, doing nothing.

08:10.000 --> 08:14.000
It could be, like, low latency or latency sensitive application,

08:14.000 --> 08:17.000
or maybe it's, like, a throughput sensitive application,

08:17.000 --> 08:19.000
or it's a mixed back, right?

08:19.000 --> 08:23.000
We call it enterprise, like, you have a bit of everything.

08:23.000 --> 08:25.000
And then we run all these things,

08:25.000 --> 08:27.000
and try to get all the numbers,

08:27.000 --> 08:29.000
and try to react to that.

08:29.000 --> 08:31.000
Like, we are obsessed with this,

08:31.000 --> 08:33.000
because, like, all the libraries that we write,

08:33.000 --> 08:35.000
we put them in the production services

08:35.000 --> 08:37.000
over, like, users,

08:37.000 --> 08:39.000
and then if something happens,

08:39.000 --> 08:42.000
like, they actually play us.

08:42.000 --> 08:43.000
Like, I included that,

08:43.000 --> 08:45.000
and, okay, like, this break my system,

08:45.000 --> 08:48.000
because, yeah, we just make this observable.

08:48.000 --> 08:51.000
So, yeah, we are trying to be super sensitive about it,

08:51.000 --> 08:54.000
and reduce all our workloads for that.

08:54.000 --> 08:56.000
So, how to design benchmarks,

08:56.000 --> 08:58.000
this is where the science starts.

08:58.000 --> 09:00.000
So, I'm leaving that to Augusto.

09:00.000 --> 09:01.000
Okay.

09:01.000 --> 09:04.000
So, we have seen that benchmarks have to be repeatable

09:04.000 --> 09:06.000
and representative.

09:06.000 --> 09:08.000
We are going to focus on the repeatable part now.

09:08.000 --> 09:10.000
And the idea is to take a benchmark

09:10.000 --> 09:12.000
that wasn't repeatable in the beginning,

09:12.000 --> 09:15.000
and try to make it better with some few changes that we're going to do.

09:15.000 --> 09:17.000
And often, in my job,

09:17.000 --> 09:19.000
people come to me saying that, oh, I got this benchmark,

09:19.000 --> 09:21.000
and it's not repeatable.

09:21.000 --> 09:22.000
And what can I do?

09:22.000 --> 09:26.000
And this is usually my first reaction to this.

09:27.000 --> 09:30.000
And the benchmark itself has the goal of

09:30.000 --> 09:32.000
measuring the database Java,

09:32.000 --> 09:35.000
which is one of our libraries.

09:35.000 --> 09:38.000
Instrumentation overhead on a spring app.

09:38.000 --> 09:39.000
And it's a really simple app,

09:39.000 --> 09:41.000
just made for this benchmark.

09:41.000 --> 09:44.000
The system on the test was the app

09:44.000 --> 09:46.000
instrumented or not with the database Java,

09:46.000 --> 09:48.000
so we could measure the overhead.

09:48.000 --> 09:52.000
The workload was as many requests as possible

09:52.000 --> 09:54.000
by five concurrent users.

09:54.000 --> 09:57.000
We wanted to measure the system at a really high load.

09:57.000 --> 09:59.000
And we separated it into two stages,

09:59.000 --> 10:03.000
a 22nd warm-up, and 15 seconds of actual measurements.

10:03.000 --> 10:05.000
Cool.

10:05.000 --> 10:07.000
We have these initial results,

10:07.000 --> 10:09.000
and they look right.

10:09.000 --> 10:10.000
We have the warm-up,

10:10.000 --> 10:13.000
and it seems like we have cut the data points

10:13.000 --> 10:16.000
at a place that makes sense.

10:16.000 --> 10:19.000
But actually, we had many false positives,

10:19.000 --> 10:22.000
so the benchmarks were telling us that we had some improvements.

10:23.000 --> 10:26.000
When in fact there wasn't really an improvement,

10:26.000 --> 10:29.000
and we had a really high coefficient of variation,

10:29.000 --> 10:33.000
and this is just the way to measure the variation of a sample.

10:33.000 --> 10:36.000
It's the standard deviation divided by the mean.

10:36.000 --> 10:41.000
And it was 11.80% which was unacceptable for this use case.

10:41.000 --> 10:44.000
So the first question that I asked myself was,

10:44.000 --> 10:46.000
okay, are we running the benchmark long enough?

10:46.000 --> 10:48.000
Is this really the warm-up?

10:48.000 --> 10:50.000
Turns out that no.

10:50.000 --> 10:55.000
It actually took way over 15 seconds.

10:55.000 --> 11:00.000
It took a little bit above 150 seconds to warm-up.

11:00.000 --> 11:04.000
And so tip number one is to run your benchmarks for long enough,

11:04.000 --> 11:07.000
so that you can uncover perturbations,

11:07.000 --> 11:11.000
and one of such perturbations have to do with warm-up, okay?

11:11.000 --> 11:13.000
Cool.

11:13.000 --> 11:17.000
Ideally, we could be able to run benchmarks for a really long time,

11:17.000 --> 11:21.000
and get consistent results, but we don't have infinite money to do that.

11:21.000 --> 11:23.000
So we have to choose when to stop,

11:23.000 --> 11:25.000
and for how long should we run the benchmarks?

11:25.000 --> 11:28.000
To know that, you have to experiment.

11:28.000 --> 11:31.000
And here's an example of experiments that we do.

11:31.000 --> 11:34.000
We take the measurements and cut them at different steps

11:34.000 --> 11:37.000
to understand how the variation behaves.

11:37.000 --> 11:42.000
So for example, at 30 measurements, we have a 7% coefficient of variation,

11:42.000 --> 11:49.000
and when we go to 90, we have a way lower coefficient of variation of 4.6%.

11:49.000 --> 11:50.000
Okay?

11:50.000 --> 11:53.000
But it will all depends on the benchmark that you want to do,

11:53.000 --> 11:55.000
and what you want to measure.

11:55.000 --> 12:00.000
Tip number two is therefore to connect collect enough samples,

12:00.000 --> 12:03.000
to reduce intro-run variation.

12:03.000 --> 12:08.000
And a good rule of thumb is to have at least 30 samples for that.

12:08.000 --> 12:11.000
Okay, but if you run this benchmark several times,

12:11.000 --> 12:14.000
you might even still get different results.

12:14.000 --> 12:17.000
There is something we call inter-run variation,

12:17.000 --> 12:19.000
when you run your benchmark multiple times,

12:19.000 --> 12:21.000
and you see different results like this.

12:21.000 --> 12:25.000
This is a screenshot from a paper that we reference internally,

12:25.000 --> 12:27.000
and it's measuring the initial state,

12:27.000 --> 12:31.000
the impact of the initial state on FFT benchmarks,

12:31.000 --> 12:33.000
and you can see that on every run,

12:33.000 --> 12:37.000
separated by the vertical lines, we have a different latency measurement.

12:38.000 --> 12:40.000
We can do the same for our experiment,

12:40.000 --> 12:42.000
I ran it five times,

12:42.000 --> 12:45.000
and by plotting the graph in a similar way,

12:45.000 --> 12:47.000
we see that the same effect happens.

12:47.000 --> 12:51.000
On every run, we have different measurements,

12:51.000 --> 12:54.000
so we have to take that into account.

12:54.000 --> 12:55.000
When we see the results,

12:55.000 --> 12:58.000
and you don't really need to read all of the table,

12:58.000 --> 13:00.000
we just focus on the Ming values,

13:00.000 --> 13:03.000
which concentrate around the same value of 20 milliseconds,

13:03.000 --> 13:06.000
so we know the benchmark is consistent,

13:06.000 --> 13:08.000
and if we aggregate everything,

13:08.000 --> 13:10.000
so in the last row,

13:10.000 --> 13:13.000
we see that the coefficient of variation is way lower,

13:13.000 --> 13:17.000
at 3%, which is already a good spot.

13:17.000 --> 13:20.000
Tip number three is therefore to rerun benchmarks,

13:20.000 --> 13:23.000
so that you reduce this inter-run variation.

13:23.000 --> 13:26.000
A good rule of thumb is to run it five times,

13:26.000 --> 13:28.000
but again, this is something that we use internally,

13:28.000 --> 13:31.000
depends on your problem.

13:31.000 --> 13:33.000
By applying these three tips,

13:33.000 --> 13:36.000
the coefficient of variation down from 12% to 3%,

13:36.000 --> 13:38.000
which is already great,

13:38.000 --> 13:42.000
but there are more tips that I would like to briefly talk about.

13:42.000 --> 13:44.000
I don't know if we will have enough time.

13:44.000 --> 13:47.000
The first one is to use deterministic inputs.

13:47.000 --> 13:49.000
If you have random inputs,

13:49.000 --> 13:51.000
you will have random outputs,

13:51.000 --> 13:54.000
so that is not good for repeatable benchmarks,

13:54.000 --> 13:56.000
and the next one is to use low generators

13:56.000 --> 13:58.000
that avoid coordinated omission,

13:58.000 --> 14:02.000
and coordinated omission is when your low generator synchronizes

14:02.000 --> 14:04.000
with the system of the test,

14:04.000 --> 14:08.000
in a way that you get artificial latency results.

14:08.000 --> 14:10.000
So if you have a low system,

14:10.000 --> 14:12.000
the low generators slows down,

14:12.000 --> 14:15.000
and then you have artificially better latencies.

14:15.000 --> 14:18.000
And if you want to measure your system at a high load,

14:18.000 --> 14:21.000
as it is in our case, this won't write it.

14:21.000 --> 14:23.000
That is a really good talk by Jutany about

14:23.000 --> 14:24.000
how not to measure latency,

14:24.000 --> 14:26.000
where he goes really deep into the subject,

14:26.000 --> 14:29.000
so I highly recommend it.

14:30.000 --> 14:33.000
All right, so we have a well-designed benchmark.

14:33.000 --> 14:34.000
You are getting results,

14:34.000 --> 14:37.000
but you cannot rely on simple aggregated data points

14:37.000 --> 14:39.000
to tell the whole story.

14:39.000 --> 14:41.000
You've got to look at the entire data,

14:41.000 --> 14:44.000
and aggregate values won't tell you that.

14:44.000 --> 14:46.000
To show you an example,

14:46.000 --> 14:48.000
suppose you're working on a project,

14:48.000 --> 14:50.000
and you make several commits,

14:50.000 --> 14:53.000
and at some point you decide to make an improvement,

14:53.000 --> 14:55.000
so you make an improvement,

14:55.000 --> 14:58.000
and you see some improvement on the throughput.

14:58.000 --> 15:01.000
But is it really an improvement?

15:01.000 --> 15:03.000
By looking at the main throughput,

15:03.000 --> 15:04.000
you can't really know.

15:04.000 --> 15:07.000
Zoom in in on these two last commits

15:07.000 --> 15:09.000
and there are benchmarking results.

15:09.000 --> 15:12.000
We can see that there was maybe an improvement,

15:12.000 --> 15:17.000
but if we plot all of the data points on a histogram like this,

15:17.000 --> 15:20.000
there is quite a lot of overlap.

15:20.000 --> 15:23.000
So maybe if we have ran this experiment another time,

15:23.000 --> 15:26.000
we wouldn't have this same conclusion.

15:27.000 --> 15:30.000
And we have to somehow evaluate

15:30.000 --> 15:33.000
how big the difference between the first and the second one is

15:33.000 --> 15:36.000
to make an adequate conclusion.

15:36.000 --> 15:38.000
So the question is,

15:38.000 --> 15:41.000
how can we tell if the difference is big enough?

15:41.000 --> 15:42.000
First of all,

15:42.000 --> 15:44.000
we see that there is some spread in the data,

15:44.000 --> 15:48.000
so we have to compare the difference with the noise.

15:48.000 --> 15:50.000
So you can set up,

15:50.000 --> 15:51.000
wait up.

15:51.000 --> 15:52.000
I forgot about this,

15:52.000 --> 15:53.000
but yeah,

15:53.000 --> 15:55.000
I won't get too much into the map.

15:55.000 --> 15:57.000
We can just focus on the intuition for now,

15:57.000 --> 15:58.000
because otherwise,

15:58.000 --> 16:00.000
we would be here for a long time.

16:00.000 --> 16:02.000
So we can set up a simple ratio like this

16:02.000 --> 16:04.000
to tell if the difference is big enough.

16:04.000 --> 16:09.000
So we compare the difference with the noise.

16:09.000 --> 16:11.000
We can call this a guy T,

16:11.000 --> 16:13.000
just to simplify things.

16:13.000 --> 16:16.000
If T is bigger than some critical value,

16:16.000 --> 16:17.000
then we can say that yeah,

16:17.000 --> 16:19.000
there was an improvement.

16:19.000 --> 16:22.000
How to define this critical value?

16:23.000 --> 16:24.000
Mathematicians,

16:24.000 --> 16:25.000
I've thought about this,

16:25.000 --> 16:28.000
and it turns out that you can plug in a false positive rate,

16:28.000 --> 16:30.000
which we call alpha into a formula,

16:30.000 --> 16:32.000
and you get a critical value.

16:32.000 --> 16:34.000
And this false positive rate is something

16:34.000 --> 16:36.000
that you can control

16:36.000 --> 16:37.000
to,

16:37.000 --> 16:38.000
as a tradeoff,

16:38.000 --> 16:39.000
to detect more things,

16:39.000 --> 16:41.000
or to detect less things.

16:41.000 --> 16:44.000
But you are also going to detect more false positives,

16:44.000 --> 16:45.000
or less.

16:45.000 --> 16:47.000
It depends on what you want to do.

16:47.000 --> 16:48.000
So for example,

16:48.000 --> 16:50.000
if we have a high false positive rate,

16:50.000 --> 16:52.000
we are going to detect more things,

16:52.000 --> 16:54.000
but also more false positives.

16:54.000 --> 16:56.000
Cool.

16:56.000 --> 16:58.000
And we can write the critical value

16:58.000 --> 16:59.000
as a function of alpha.

16:59.000 --> 17:00.000
And mathematicians,

17:00.000 --> 17:01.000
as I said,

17:01.000 --> 17:03.000
they worked really hard on this.

17:03.000 --> 17:05.000
There are formulas that you can look up,

17:05.000 --> 17:08.000
and the critical value is noted like this.

17:08.000 --> 17:09.000
And in fact,

17:09.000 --> 17:11.000
this is a hypothesis test,

17:11.000 --> 17:13.000
and it's T-tats specifically,

17:13.000 --> 17:16.000
and this is why the name of that ratio is T.

17:16.000 --> 17:18.000
But that's all for this talk.

17:18.000 --> 17:21.000
I won't get too much deeper into it.

17:21.000 --> 17:23.000
There are other approaches for that.

17:23.000 --> 17:25.000
One example is change point detection.

17:25.000 --> 17:26.000
And for that,

17:26.000 --> 17:28.000
you can check out Enrico's in-go-stock

17:28.000 --> 17:31.000
at 150 p.m. on this room.

17:31.000 --> 17:34.000
So be sure to check that out.

17:34.000 --> 17:38.000
Tip number six is therefore to use statistics.

17:38.000 --> 17:39.000
For example,

17:39.000 --> 17:41.000
hypothesis testing,

17:41.000 --> 17:43.000
to see if your performance improvements

17:43.000 --> 17:46.000
or regressions were statistically significant.

17:47.000 --> 17:49.000
Cool.

17:49.000 --> 17:51.000
But what if you run your experiments

17:51.000 --> 17:54.000
in the morning and then in the afternoon,

17:54.000 --> 17:56.000
especially if you're using cloud environments,

17:56.000 --> 17:59.000
you're likely going to get different results.

17:59.000 --> 18:01.000
The mitigation for that

18:01.000 --> 18:03.000
is to control your benchmarking environment.

18:03.000 --> 18:05.000
And this is, in fact,

18:05.000 --> 18:07.000
so important that it's actually tip number zero.

18:07.000 --> 18:10.000
Otherwise, all of the other ones are not going to work.

18:10.000 --> 18:14.000
And we are going to try to see how we can do.

18:14.000 --> 18:16.000
Okay?

18:16.000 --> 18:18.000
Are you still with us?

18:18.000 --> 18:20.000
Number part is kind of over.

18:20.000 --> 18:21.000
Yeah.

18:21.000 --> 18:22.000
So you can be relaxed.

18:22.000 --> 18:24.000
Now we have a story.

18:24.000 --> 18:26.000
So imagine your researcher

18:26.000 --> 18:29.000
and you're trying to measure the speed of

18:29.000 --> 18:31.000
subatomic particle called the neutrino.

18:31.000 --> 18:33.000
And the neutrino is a really small thing.

18:33.000 --> 18:36.000
You can't really measure it on your laptop on a desk.

18:36.000 --> 18:38.000
So you've got to build a huge ton of it.

18:38.000 --> 18:39.000
And we had people here,

18:39.000 --> 18:41.000
given the previous talk,

18:41.000 --> 18:43.000
Max, which was an amazing talk.

18:43.000 --> 18:44.000
That talked about certain.

18:44.000 --> 18:47.000
So you build a ton of from certain to

18:47.000 --> 18:48.000
Gran Sasu.

18:48.000 --> 18:50.000
A 730 kilometer ton of.

18:50.000 --> 18:53.000
So you can measure the speed of the neutrino.

18:53.000 --> 18:54.000
And it takes a long time.

18:54.000 --> 18:55.000
And a lot of money.

18:55.000 --> 18:58.000
But when you start getting results and

18:58.000 --> 18:59.000
people are publishing papers,

18:59.000 --> 19:00.000
everyone is happy.

19:00.000 --> 19:03.000
But at a certain point,

19:03.000 --> 19:06.000
you start breaking the laws of physics, right?

19:06.000 --> 19:08.000
And this shouldn't really happen.

19:08.000 --> 19:10.000
So you double check everything.

19:10.000 --> 19:12.000
You check the math, the sensors.

19:12.000 --> 19:14.000
You check and double check everything.

19:14.000 --> 19:16.000
Until you come across the root cause,

19:16.000 --> 19:18.000
which is in a revolutionary physics.

19:18.000 --> 19:20.000
But it's a loose cable.

19:20.000 --> 19:24.000
So this goes to show that really small things

19:24.000 --> 19:28.000
can impact highly controlled environments in a big way.

19:28.000 --> 19:30.000
Okay?

19:30.000 --> 19:34.000
Most of us aren't viewing 730 kilometer tonals.

19:34.000 --> 19:36.000
Some of us are.

19:36.000 --> 19:39.000
But we have to deal with this.

19:39.000 --> 19:41.000
So to speak loose cables,

19:41.000 --> 19:45.000
every day when we are working with softer performance.

19:45.000 --> 19:50.000
This is a no extensive list of sources of noise and

19:50.000 --> 19:52.000
mitigations that we can do.

19:52.000 --> 19:54.000
We are not going to talk about all of them.

19:54.000 --> 19:55.000
Only this one.

19:55.000 --> 19:56.000
Okay?

19:56.000 --> 19:57.000
This is already quite a lot.

19:57.000 --> 20:00.000
So I have to go a little bit fast.

20:00.000 --> 20:01.000
All right.

20:01.000 --> 20:02.000
First one, virtualization.

20:02.000 --> 20:05.000
The problem with virtualization on cloud environments

20:05.000 --> 20:08.000
have to do with the noisy neighbor problem,

20:08.000 --> 20:10.000
which means that on a single host,

20:10.000 --> 20:15.000
you will have multiple tenants competing for resources.

20:15.000 --> 20:20.000
And this will give you no repeatable benchmarks.

20:20.000 --> 20:24.000
The mitigation for that is to use bar matter cloud distances,

20:24.000 --> 20:27.000
and besides getting rid of the noisy neighbor problem,

20:27.000 --> 20:31.000
the kernel and CPU layer mitigations that we are going to see

20:31.000 --> 20:35.000
on the next slide requires bar matter access.

20:35.000 --> 20:39.000
This is basically you have access to the hardware of a dedicated machine.

20:39.000 --> 20:40.000
Okay?

20:40.000 --> 20:42.000
These are the kernel level mitigations.

20:42.000 --> 20:45.000
They have to do mainly with scheduling and caching.

20:45.000 --> 20:49.000
They are studying CPU affinity, process priority,

20:49.000 --> 20:51.000
and warming up or dropping caches.

20:51.000 --> 20:53.000
Here we are dropping caches.

20:53.000 --> 20:57.000
And I won't get into the comments themselves,

20:57.000 --> 21:00.000
but this is something that you can copy and paste,

21:00.000 --> 21:04.000
and immediately have a more repeatable setup.

21:05.000 --> 21:07.000
With regards to the CPU layer,

21:07.000 --> 21:11.000
we have simultaneous multi-threading contention,

21:11.000 --> 21:13.000
and dynamic frequency scaling.

21:13.000 --> 21:17.000
They are a little bit tricky, so we are going to see each one separately.

21:17.000 --> 21:19.000
So simultaneous multi-threading,

21:19.000 --> 21:22.000
SMT is when you have multiple hardware threads running on the same car.

21:22.000 --> 21:24.000
So you are going to have some resource contention

21:24.000 --> 21:27.000
if you have CPU bound processes.

21:29.000 --> 21:33.000
To disable SMT, you can simply run this command on unix machines,

21:34.000 --> 21:37.000
and the impact of disabling it was measured by a really simple experiment.

21:37.000 --> 21:39.000
So we had a bare metal instance,

21:39.000 --> 21:42.000
and with dynamic frequency scaling disabled,

21:42.000 --> 21:45.000
which is the other CPU layer mitigation.

21:45.000 --> 21:50.000
We are using two CPU bound tasks on the same car versus separate cars.

21:50.000 --> 21:54.000
And we can see clearly that when we are running things on the same car,

21:54.000 --> 21:57.000
there is contention, so the latency is higher.

21:57.000 --> 22:00.000
And if we look at the numbers,

22:01.000 --> 22:04.000
the coefficient of variation goes down significantly

22:04.000 --> 22:08.000
from 24% to 0.24%.

22:08.000 --> 22:11.000
So this is 100 times less variation,

22:11.000 --> 22:15.000
which is huge for a really simple trick that you can do.

22:15.000 --> 22:20.000
Dynamic frequency scaling is when your CPU works harder for hardware tasks,

22:20.000 --> 22:23.000
and works less hard for easier ones,

22:23.000 --> 22:25.000
so that you save energy.

22:25.000 --> 22:28.000
But this is terrible for repeatability, right?

22:28.000 --> 22:30.000
So it matches the workload,

22:30.000 --> 22:33.000
and you don't really want that for repeatable experiments.

22:33.000 --> 22:36.000
To disable it, it is a little bit more complicated.

22:36.000 --> 22:40.000
You have to pin the clock rate to a frequency that you want,

22:40.000 --> 22:42.000
then you have to control the scaling governor,

22:42.000 --> 22:44.000
which is a kernel subsystem,

22:44.000 --> 22:48.000
to make it use this clock rate,

22:48.000 --> 22:52.000
and then you have to consider things like frequency boosting,

22:52.000 --> 22:55.000
which are features of modern CPUs as well,

22:55.000 --> 22:57.000
and you have to disable them.

22:59.000 --> 23:03.000
The impact of disabling DFS was also measured by an experiment,

23:03.000 --> 23:09.000
and here we're measuring the latency on a varying number of CPU

23:09.000 --> 23:14.000
bound tasks on the same core with DFS on versus DFS off.

23:14.000 --> 23:18.000
And here, of course, when we have DFS,

23:18.000 --> 23:21.000
we have a smaller latency because the CPU works harder,

23:21.000 --> 23:26.000
but we can see that the variation is quite bigger when we have DFS,

23:26.000 --> 23:28.000
which we don't want.

23:28.000 --> 23:31.000
Again, you don't really need to look at all of the table,

23:31.000 --> 23:35.000
but you can see that the coefficient of variation drops significantly,

23:35.000 --> 23:37.000
when we are looking at DFS for one task,

23:37.000 --> 23:40.000
in comparison with no DFS for one task.

23:40.000 --> 23:43.000
In this case, it's 10 times less variation.

23:43.000 --> 23:46.000
And of course, there is a caveat that these experiments were made,

23:46.000 --> 23:48.000
specifically to test this,

23:48.000 --> 23:53.000
but it was a way for us to test the boundaries of these mitigations.

23:53.000 --> 23:57.000
Those experiments were meant by my manager, in fact,

23:57.000 --> 24:00.000
as you meet with your chanko, so close to him.

24:00.000 --> 24:03.000
And these C-pups run button.

24:03.000 --> 24:13.000
The CPU level tweaks were mostly sourced at Danish baccalaves book,

24:13.000 --> 24:16.000
about performance analysis and tuning on modern CPUs,

24:16.000 --> 24:18.000
which is a really good reference on this.

24:18.000 --> 24:22.000
There is another source of noise,

24:22.000 --> 24:24.000
which has to do with vibration.

24:24.000 --> 24:28.000
And the mitigation product is to not shout at the data center.

24:28.000 --> 24:30.000
And in fact, it actually happens.

24:30.000 --> 24:32.000
So this is Brendan Gregg, performance engineering legend,

24:32.000 --> 24:34.000
shouting at the data center,

24:34.000 --> 24:37.000
and seeing some discounted latency spikes.

24:37.000 --> 24:40.000
So don't shout at the data center.

24:43.000 --> 24:47.000
So after all these cool stuff built,

24:47.000 --> 24:49.000
and they are doing the maths, what not,

24:49.000 --> 24:51.000
but as a developer, what we see.

24:52.000 --> 24:55.000
So first of all, let's check out the architecture.

24:55.000 --> 24:57.000
This is a simple one.

24:57.000 --> 25:00.000
What we do is, like, GitLab is an implementation detail,

25:00.000 --> 25:03.000
but that's where our C-IPLs work.

25:03.000 --> 25:06.000
We run these on Kubernetes runners,

25:06.000 --> 25:10.000
and collect the data, store them for our in-and-art effects,

25:10.000 --> 25:12.000
storage, S3 whatnot, for like,

25:12.000 --> 25:14.000
analysis later analysis.

25:14.000 --> 25:17.000
We have a specific, the purpose build UI,

25:17.000 --> 25:18.000
where we analyze them.

25:18.000 --> 25:21.000
We also get their metrics out of them,

25:21.000 --> 25:24.000
and push that to our dashboards.

25:24.000 --> 25:26.000
And then the action part comes.

25:26.000 --> 25:30.000
They're like, all these statistical analysis tools run

25:30.000 --> 25:32.000
on the data that we aggregated,

25:32.000 --> 25:35.000
and they send comments to the GitHub,

25:35.000 --> 25:38.000
PRs, or they block our releases.

25:38.000 --> 25:41.000
So you can think the whole system,

25:41.000 --> 25:44.000
like the action part is like that is the alerting part

25:44.000 --> 25:46.000
on the monitoring system,

25:46.000 --> 25:49.000
and the rest of the things are just like collecting data,

25:49.000 --> 25:51.000
having dashboards whatnot.

25:51.000 --> 25:54.000
So that feedback loop, how does it work?

25:54.000 --> 25:56.000
As we talk at the beginning,

25:56.000 --> 25:59.000
we segregate them into two classes,

25:59.000 --> 26:03.000
is the microbenchmarks, and the microbenchmarks.

26:03.000 --> 26:07.000
We run microbenchmarks for each PR.

26:07.000 --> 26:10.000
These are benchmarks that are written for the person,

26:10.000 --> 26:13.000
think of people that actually write these client libraries,

26:13.000 --> 26:15.000
and they know what to measure,

26:15.000 --> 26:18.000
and they write these benchmarks for themselves,

26:18.000 --> 26:21.000
and it gets run.

26:21.000 --> 26:23.000
We do the static analysis.

26:23.000 --> 26:27.000
And if something is not looking right,

26:27.000 --> 26:30.000
we send these comment messages to the PRs.

26:30.000 --> 26:32.000
And these are actually like,

26:32.000 --> 26:34.000
basically merge gates,

26:34.000 --> 26:39.000
so you cannot merge the PR without like addressing these concerns.

26:39.000 --> 26:40.000
This could be anything,

26:40.000 --> 26:43.000
like maybe you're increasing the memory,

26:43.000 --> 26:48.000
you are like introducing higher CPU usage whatnot.

26:48.000 --> 26:52.000
In theory, yes, sometimes you are adding a feature,

26:52.000 --> 26:54.000
and then you need to say that, okay,

26:54.000 --> 26:57.000
like this is somewhat something I was expecting,

26:57.000 --> 26:59.000
but most of the times,

26:59.000 --> 27:02.000
or like what did I mess up,

27:02.000 --> 27:04.000
and you go back, run the benchmarks locally,

27:04.000 --> 27:06.000
collect some profiles,

27:06.000 --> 27:08.000
and try to optimize these things.

27:08.000 --> 27:10.000
And the second floor,

27:10.000 --> 27:13.000
follow is about running the macro benchmarks.

27:13.000 --> 27:16.000
These are the representative loads that I mentioned previously,

27:16.000 --> 27:18.000
with the archetypes and everything,

27:18.000 --> 27:20.000
and then we run them.

27:20.000 --> 27:23.000
If this is a virtual machine environment,

27:23.000 --> 27:25.000
we even like warm the machines,

27:25.000 --> 27:27.000
as like Augusta mentioned,

27:27.000 --> 27:30.000
we have different type of archetypes,

27:30.000 --> 27:32.000
plus like low,

27:32.000 --> 27:35.000
like lower higher throughput,

27:36.000 --> 27:38.000
like V-run punch-off scenarios.

27:38.000 --> 27:40.000
And then depending on that,

27:40.000 --> 27:45.000
if we also set SLOs in our benchmark environment,

27:45.000 --> 27:50.000
and if those variations are like not breaching our SLOs,

27:50.000 --> 27:53.000
then we allow releasing these libraries.

27:53.000 --> 27:54.000
But if they do,

27:54.000 --> 27:56.000
and then we go back,

27:56.000 --> 27:59.000
and focus on how we optimize those things.

27:59.000 --> 28:01.000
This is,

28:01.000 --> 28:05.000
how you see,

28:05.000 --> 28:07.000
like we get notified by blocking releases,

28:07.000 --> 28:09.000
just like message.

28:09.000 --> 28:11.000
Everybody uses like nowadays,

28:11.000 --> 28:13.000
so.

28:13.000 --> 28:17.000
And then we have dashboards on the SLOs that we put,

28:17.000 --> 28:21.000
and we also can go and analyze what is going on,

28:21.000 --> 28:23.000
depending on the scenarios,

28:23.000 --> 28:26.000
and we have like different server to levels,

28:26.000 --> 28:28.000
and then we actually,

28:28.000 --> 28:30.000
even though we don't breach anything,

28:30.000 --> 28:34.000
we also check out the trends and where they go,

28:34.000 --> 28:38.000
and we have the thing called operation excellence,

28:38.000 --> 28:39.000
meetings,

28:39.000 --> 28:40.000
and we check these dashboards,

28:40.000 --> 28:42.000
and try to understand what is the trend,

28:42.000 --> 28:43.000
why this is happening,

28:43.000 --> 28:46.000
and we try to be proactive

28:46.000 --> 28:50.000
and fix this issues before they hit the production.

28:50.000 --> 28:52.000
So.

28:52.000 --> 28:54.000
This is what they built.

28:54.000 --> 28:56.000
It's not open source,

28:56.000 --> 28:58.000
unfortunately, not yet.

28:59.000 --> 29:01.000
This is not something we're going to say,

29:01.000 --> 29:04.000
so we plan to make it open source.

29:04.000 --> 29:05.000
But right now,

29:05.000 --> 29:07.000
these are the tools that they are actually,

29:07.000 --> 29:08.000
you can use.

29:08.000 --> 29:10.000
Like high-profile is a CLI benchmarking tool.

29:10.000 --> 29:11.000
I need to already handle,

29:11.000 --> 29:13.000
running your benchmark a couple of times,

29:13.000 --> 29:15.000
and collect data analytics,

29:15.000 --> 29:18.000
and it is better than most of the tools.

29:18.000 --> 29:20.000
Or if you're working with go,

29:20.000 --> 29:22.000
go benchmarking facilities already,

29:22.000 --> 29:24.000
that I was this for you.

29:24.000 --> 29:26.000
And I think,

29:27.000 --> 29:30.000
like Henry going to mention about the GitHub action benchmark,

29:30.000 --> 29:31.000
in more detail,

29:31.000 --> 29:33.000
but it's something that you can immediately use right now,

29:33.000 --> 29:36.000
if you're running your benchmark on GitHub actions.

29:38.000 --> 29:39.000
And like,

29:39.000 --> 29:42.000
what is this key takeaway in the end,

29:42.000 --> 29:44.000
or all these things in mind?

29:44.000 --> 29:47.000
So control your benchmarking environment,

29:47.000 --> 29:49.000
so this is the most important part,

29:49.000 --> 29:53.000
I guess, like try run your benchmarking on your benchmark,

29:53.000 --> 29:55.000
on bare-metting machine, in isolation,

29:55.000 --> 29:57.000
disabling SMT,

29:57.000 --> 29:59.000
and disabling the FS.

29:59.000 --> 30:00.000
Unfortunately,

30:00.000 --> 30:03.000
you can do that on a virtual machine environment,

30:03.000 --> 30:05.000
so like out of the box,

30:05.000 --> 30:07.000
GitHub action runners won't help you,

30:07.000 --> 30:09.000
so you need a solution for that.

30:09.000 --> 30:13.000
And design your benchmarks to be representative,

30:13.000 --> 30:14.000
and like repeatable,

30:14.000 --> 30:17.000
that's the other most important thing

30:17.000 --> 30:19.000
that Augusto already mentioned,

30:19.000 --> 30:21.000
like there are a lot of variations going on.

30:21.000 --> 30:24.000
And then like interpreting the benchmark result,

30:24.000 --> 30:27.000
is the most important part in the end,

30:27.000 --> 30:29.000
maybe there could be false positive,

30:29.000 --> 30:31.000
it's like there's no way to automate these things,

30:31.000 --> 30:34.000
and you just need to use your better judgments,

30:34.000 --> 30:38.000
or maybe just re-run your benchmarks again.

30:38.000 --> 30:40.000
And then for us,

30:40.000 --> 30:43.000
like from a user's perspective,

30:43.000 --> 30:46.000
integrating benchmarks into your work-it-flow,

30:46.000 --> 30:49.000
and continuously optimizing your workloads,

30:49.000 --> 30:52.000
and be aware of like basically having

30:52.000 --> 30:54.000
performance engineer mindset,

30:54.000 --> 30:57.000
it pays dividends in the long run.

30:57.000 --> 30:59.000
But maybe the most important part,

30:59.000 --> 31:01.000
don't shout to your data centers.

31:01.000 --> 31:05.000
Maybe we don't use the 8HDD anymore,

31:05.000 --> 31:07.000
but now we have AI alerts in there,

31:07.000 --> 31:10.000
so make sure you don't shout at them.

31:10.000 --> 31:12.000
So, for that, thanks for.

31:12.000 --> 31:26.000
I fully understand wanting to reduce the noise and variance,

31:26.000 --> 31:30.000
but if I'm targeting desktop and laptop machines for my software,

31:30.000 --> 31:31.000
I feel like for example,

31:31.000 --> 31:33.000
disabling multi-threading is kind of lying to myself,

31:33.000 --> 31:37.000
because I'm not representing the machine as it is.

31:37.000 --> 31:39.000
So, what do you do with that?

31:40.000 --> 31:42.000
I couldn't hear entirely the question.

31:42.000 --> 31:45.000
So, if I'm targeting desktop and laptop machines,

31:45.000 --> 31:48.000
I feel like if I disable multi-threading to reduce the noise,

31:48.000 --> 31:49.000
it does reduce the noise,

31:49.000 --> 31:52.000
but then I'm not being true to the actual machine that I would run.

31:52.000 --> 31:53.000
Yeah. In this case,

31:53.000 --> 31:55.000
there is a clear tradeoff.

31:55.000 --> 32:00.000
There is a clear tradeoff between representatives and repeatability, right?

32:00.000 --> 32:04.000
So, in our specific case,

32:04.000 --> 32:07.000
where we need to have repeatability,

32:07.000 --> 32:11.000
as we are running benchmarks on every PR and every release,

32:11.000 --> 32:12.000
we need it to disable them.

32:12.000 --> 32:15.000
And if you need more representatives,

32:15.000 --> 32:18.000
as we do for some tests, we keep them on,

32:18.000 --> 32:21.000
and we see what happens if we have those things on.

32:21.000 --> 32:26.000
So, I think the better answer is to run everything,

32:26.000 --> 32:28.000
that is probably not possible.

32:28.000 --> 32:32.000
So, it depends on what you want to really focus on.

32:32.000 --> 32:35.000
One of the things that I do, like locally,

32:35.000 --> 32:39.000
use Dr. Compos, and try to isolate these resources

32:39.000 --> 32:41.000
against the Dr. Containers,

32:41.000 --> 32:43.000
and there is much as possible tracking it

32:43.000 --> 32:44.000
to get that content.

32:44.000 --> 32:45.000
Yeah.

32:45.000 --> 32:46.000
I touched this thing,

32:46.000 --> 32:47.000
and you said,

32:47.000 --> 32:50.000
constantly, that has another rise.

32:50.000 --> 32:51.000
There are,

32:51.000 --> 32:53.000
I think, running it now on the team

32:53.000 --> 32:55.000
and dealing with partners.

32:55.000 --> 32:57.000
Yeah, thank you for the question.

32:57.000 --> 32:59.000
This is a really important point.

32:59.000 --> 33:00.000
Sorry.

33:00.000 --> 33:02.000
I had one more question.

33:02.000 --> 33:03.000
So, during your talk,

33:03.000 --> 33:05.000
you mentioned that, like,

33:05.000 --> 33:08.000
so an important aspect of benchmarking is to test.

33:08.000 --> 33:10.000
So, we have to assess the environment,

33:10.000 --> 33:11.000
making sure it is well.

33:11.000 --> 33:12.000
And, as you said,

33:12.000 --> 33:14.000
we don't know how, for how long we won a test,

33:14.000 --> 33:16.000
and it cost money to test.

33:16.000 --> 33:18.000
So, the question I had in mind was that

33:18.000 --> 33:20.000
was there any perspective of,

33:20.000 --> 33:22.000
instead of testing,

33:22.000 --> 33:23.000
expecting the results.

33:23.000 --> 33:25.000
So, for example, using machine learning models,

33:25.000 --> 33:27.000
or deploying models to expect the result,

33:27.000 --> 33:29.000
which would maybe cost less,

33:29.000 --> 33:30.000
would be accurate.

33:30.000 --> 33:33.000
So, I want to ask if you have any information,

33:33.000 --> 33:35.000
or pretty about that.

33:35.000 --> 33:38.000
My team has already researched

33:38.000 --> 33:41.000
a little bit about performance models.

33:41.000 --> 33:44.000
We had, have not,

33:44.000 --> 33:46.000
gotten too much deep into it,

33:46.000 --> 33:48.000
because I think it was less costly

33:48.000 --> 33:50.000
to simply get the benchmarks running.

33:50.000 --> 33:52.000
It would take too much developer time

33:52.000 --> 33:54.000
for us to focus on this problem itself.

33:54.000 --> 33:56.000
So, we had this in mind.

33:56.000 --> 33:58.000
We're probably going to look into it,

33:58.000 --> 34:01.000
probably running more in more benchmarks.

34:01.000 --> 34:02.000
But for now,

34:02.000 --> 34:04.000
it is not something that we have looked into,

34:04.000 --> 34:07.000
but that is a really good thing to look into.

34:07.000 --> 34:09.000
If you can be done well,

34:09.000 --> 34:11.000
you can save a lot of money,

34:11.000 --> 34:14.000
and really predict,

34:14.000 --> 34:15.000
like,

34:15.000 --> 34:18.000
maybe not at the really accurate level,

34:18.000 --> 34:19.000
but predicts in a general sense,

34:19.000 --> 34:21.000
the performance of your system.

34:21.000 --> 34:23.000
Yeah.

34:23.000 --> 34:24.000
Thank you.

34:24.000 --> 34:26.000
Are there any other questions?

34:28.000 --> 34:29.000
Okay, then.

34:29.000 --> 34:30.000
Thank you for your presentation.

34:30.000 --> 34:31.000
Thank you.

34:31.000 --> 34:32.000
Thank you.

