WEBVTT

00:00.000 --> 00:14.000
Hi everyone, my name is Alex, welcome everyone who woke up so early to visit this room,

00:14.000 --> 00:19.000
there's a software performance, the room, I'm a the room organizer, I'm not an Bluetooth

00:19.000 --> 00:24.000
work but let's find, we didn't have enough of them for us.

00:24.000 --> 00:31.000
So, and for today, I wanted to talk about one interesting topic, interesting to me,

00:31.000 --> 00:37.000
about software performance, but not about actual benchmarks, so I'm a state-of-the-art performance,

00:37.000 --> 00:39.000
it's several domains, something like that.

00:39.000 --> 00:46.000
I wanted to talk about how accessible this software performance is for regular people.

00:46.000 --> 00:55.000
For like-for-ups, a few words about me, I'm a regular C++ engineer, right now I'm a Rust engineer,

00:55.000 --> 01:02.000
I'm interested in different compiler stuff, especially LVM, I prefer LVM over GCC for several reasons.

01:02.000 --> 01:11.000
I spent several years in C++ Commits Thunder Committee, I'm the offer of the awesome PGO project, if you hear about it,

01:11.000 --> 01:17.000
and the room organizer, and I actually actually like rapids of swear.

01:17.000 --> 01:22.000
So, what do we usually hear about software?

01:22.000 --> 01:27.000
Usually, at first, different benchmarks, good benchmark report, and not so good benchmark reports,

01:27.000 --> 01:33.000
please don't use the compound or benches, there's serious stuff, different engineering and blogs,

01:33.000 --> 01:40.000
something like we rewrote something, usually in Rust, and it became blazingly fast,

01:40.000 --> 01:49.000
hardcore, and low-level optimization stuff, possibly from FFMP guys, something, rewriting in assembly,

01:49.000 --> 01:56.000
and achieving great performance for conversions, and academic papers like, for example, a SIMD GISON,

01:56.000 --> 02:02.000
like a normal technist, pretty noble technics for like three years ago, that was invented.

02:02.000 --> 02:09.000
But one point is frequently missed, how all of this software performance equipment are easily,

02:09.000 --> 02:15.000
to be used in everyday life. For example, check with one.

02:15.000 --> 02:24.000
These are two titles from Faronics, and actually for this performance improvements for the Linux kernel,

02:24.000 --> 02:35.000
and they are pretty significant, but can you say right now, which performance improvement is ready to use right now,

02:35.000 --> 02:44.000
and which one is really is not achievable even after one year or two, with current tooling?

02:44.000 --> 02:53.000
That's all the idea of my talk. So, let's start from the compilers, the compilation models, and that's driven optimizations.

02:53.000 --> 03:00.000
So, actually there are two major compilation models ahead of time, and just a time compilation model,

03:01.000 --> 03:07.000
and there are a pretty much differences between them.

03:07.000 --> 03:14.000
For example, ahead of time compilation model is not limited in time, so much as a Justin time,

03:14.000 --> 03:20.000
regarding how much time can we spend in performing different optimizations.

03:20.000 --> 03:27.000
For example, we can spend several hours in extreme cases, even days for optimizing one binary.

03:27.000 --> 03:32.000
If we can achieve, for example, one additional percent of performance.

03:32.000 --> 03:37.000
In Justin time model, it's impossible, because it's usually run on a target machine,

03:37.000 --> 03:43.000
it has limited time frame to perform optimizations, et cetera, et cetera.

03:43.000 --> 03:48.000
However, Justin time model has one advantage.

03:48.000 --> 03:54.000
It can collect on a target machine running for cloth, type of a cloth,

03:54.000 --> 03:58.000
which paths of the coat are executed, how frequently et cetera.

03:58.000 --> 04:02.000
At this information can be used during the compilers' optimization.

04:02.000 --> 04:08.000
For example, more precise in lining, in lining was one of the most important optimizations,

04:08.000 --> 04:15.000
hot cold splitting for better utilisation, CPU, iKesh, et cetera.

04:15.000 --> 04:20.000
Fortunately, in the ahead of time world, it's not available, right?

04:20.000 --> 04:26.000
It's not available, because on a target machine, we don't have a virtual machine

04:26.000 --> 04:30.000
which can collect these metrics, we just have our own binary.

04:30.000 --> 04:37.000
So, for ahead of time, we arose implemented a technology which is called profile guided optimizations.

04:37.000 --> 04:40.000
Opidure, the idea is to practice the same.

04:40.000 --> 04:46.000
We need to collect profile on a target machine, pass it to the compiler, probably convert, and that's it.

04:47.000 --> 04:51.000
It's a pegoor, that's important, so do we need to carry about it.

04:51.000 --> 04:57.000
So, I'll collect some benchmarks, that's part of the awesome Pedro project, and as you see,

04:57.000 --> 05:00.000
a performance improvement, appropriate huge.

05:00.000 --> 05:07.000
Sometimes, even two X, for example, it's about MongoDB, that's not so good database.

05:07.000 --> 05:14.000
I would say, something from this perspective, for different libraries, compilers,

05:14.000 --> 05:20.000
local analyzers, databases, et cetera, improvements are really huge.

05:20.000 --> 05:23.000
So, we need to care about it.

05:23.000 --> 05:36.000
However, so, in theory, enabling Pedro is a simple process, like just recompile your project.

05:36.000 --> 05:44.000
If you're a special compiler switches, run target work load, collect profiles, and that's it.

05:44.000 --> 05:50.000
However, if you will try to do it, you will get this one.

05:50.000 --> 05:59.000
You will get a lot, and a lot of additional problems, which almost don't exist in just in time world.

06:00.000 --> 06:04.000
You will have at first double or triple compilation model.

06:04.000 --> 06:09.000
In extreme cases, you will need to compile your head of time binary four times.

06:09.000 --> 06:15.000
In some extreme, a Pedro scenario, it's pretty huge overhead on CI.

06:15.000 --> 06:19.000
You need to think about profiles, cube between different workloads.

06:19.000 --> 06:26.000
You need to think about merging profiles for the same binary, but from different workloads.

06:26.000 --> 06:32.000
You need to think about profile storage, if you want to reproduce your binary, et cetera, et cetera.

06:32.000 --> 06:36.000
There are a lot of additional problems. You need to solve them.

06:36.000 --> 06:39.000
There are some systems that can help you with that.

06:39.000 --> 06:46.000
For example, there is a dedicated way to do Pedro, which is called sampling Pedro.

06:46.000 --> 06:50.000
It eliminates instrumentation overhead from the instrumentation Pedro.

06:50.000 --> 06:54.000
So, instrumentation Pedro has overhead several times.

06:54.000 --> 06:59.000
On the run time, sampling Pedro can have one percent, for example.

06:59.000 --> 07:06.000
Unfortunately, for using sampling Pedro, you need a bunch of additional tooling, one more,

07:06.000 --> 07:10.000
and you need to install it, update, et cetera.

07:10.000 --> 07:16.000
For example, if you want to use parca, or this yandex-perforator,

07:16.000 --> 07:22.000
with system-wide profile or open source by yandex, which supports profiling,

07:22.000 --> 07:29.000
the wall-serer fleet of binaries, and using these profiles during the profile guide optimization phase.

07:29.000 --> 07:32.000
It can be integrated on CI, et cetera, et cetera.

07:32.000 --> 07:39.000
Let's see how, actually, yandex Google simply didn't open source their own system.

07:39.000 --> 07:45.000
Google white profiler, and actually, many other big projects are doing inside it.

07:45.000 --> 07:51.000
However, as you see, you need to maintain additional amount of infrastructure.

07:51.000 --> 07:56.000
For example, path-greSQL, usually cluster, click-house, cluster, and something,

07:56.000 --> 07:58.000
some extra storage.

07:58.000 --> 08:01.000
It's not that friendly.

08:01.000 --> 08:09.000
So, you need to do a lot, really, a lot of additional work.

08:09.000 --> 08:20.000
Just too much, the ability to mimic optimization of a just-in-time world.

08:20.000 --> 08:25.000
Usually, we think that just-in-time optimizes worse when they head-of-time.

08:25.000 --> 08:32.000
However, for workload-specific optimizations, just-in-time is just simpler to use,

08:32.000 --> 08:35.000
compared to Pedro, just-in-time really just works.

08:35.000 --> 08:42.000
For example, in every browser of 8, the 8th JavaScript engine, that's it.

08:42.000 --> 08:51.000
So, we can try to eliminate at least part of this complexity by proper documentation.

08:51.000 --> 08:56.000
I thought so, however, there are a lot of traps in this way too.

08:56.000 --> 09:01.000
For example, performance optimization can take several forms,

09:01.000 --> 09:04.000
or can have forms of really good books.

09:04.000 --> 09:11.000
Like system-performance or brand-run-grade hackers, the light-famous, and inter-architect-specific optimizations.

09:11.000 --> 09:15.000
Or any other form, like an official performance book,

09:15.000 --> 09:21.000
project-specific simulation guidelines, like Reddit, staff, YouTube, coding influencers,

09:21.000 --> 09:24.000
and a lot of different videos.

09:24.000 --> 09:27.000
Even this talk.

09:27.000 --> 09:31.000
But there is one issue with all of that.

09:31.000 --> 09:37.000
Unfortunately, people just don't read it, don't wash it, and don't listen it.

09:37.000 --> 09:45.000
And we are going once again, so we try to document something, like all of this Pedro stuff,

09:45.000 --> 09:51.000
fancy how to avoid our problems, and people still will not do it.

09:51.000 --> 09:55.000
Just because we don't care, we don't want to read, we don't have time,

09:55.000 --> 09:59.000
we have work life balance, etc.

09:59.000 --> 10:05.000
And just in time, it's simply better from this perspective.

10:05.000 --> 10:13.000
Right right to eliminate for several frameworks.

10:13.000 --> 10:19.000
I tried to push a little bit more optimizations to the upstream,

10:19.000 --> 10:22.000
like to the documentation of these libraries.

10:22.000 --> 10:26.000
Rata 2 is the most popular terminal user interface library in Rust.

10:26.000 --> 10:30.000
They have a dedicated optimization guideline.

10:30.000 --> 10:36.000
They also have a similar framework in Rust, dedicated optimization guideline,

10:36.000 --> 10:38.000
and towering.

10:38.000 --> 10:40.000
Almost the same.

10:40.000 --> 10:46.000
However, I wanted to check how effective is this solution just putting

10:46.000 --> 10:50.000
a documentation and crossing the fingers.

10:50.000 --> 10:56.000
So, I checked a lot of GitHub projects,

10:56.000 --> 11:02.000
and just found that people offers all these projects.

11:02.000 --> 11:06.000
They have written their applications.

11:06.000 --> 11:10.000
They didn't enable all of these optimizations from documentation.

11:10.000 --> 11:12.000
They were existing at the moment.

11:12.000 --> 11:16.000
And when I created a PR, it's enabling all of these optimizations.

11:16.000 --> 11:18.000
They happily accept it.

11:18.000 --> 11:20.000
Almost all of them.

11:20.000 --> 11:24.000
I have a pretty high conversion rate, for example, like 90%.

11:24.000 --> 11:28.000
That's a photography part.

11:28.000 --> 11:32.000
That's for Rata 2 documentation.

11:32.000 --> 11:40.000
And even when you're contributed,

11:40.000 --> 11:42.000
some changes to the documentation,

11:42.000 --> 11:44.000
like I did, for example, for Rata 2.

11:44.000 --> 11:46.000
It was my contribution.

11:46.000 --> 11:48.000
There is one more problem.

11:48.000 --> 11:52.000
It will work, probably, probably work.

11:52.000 --> 11:54.000
Only one newly created applications.

11:54.000 --> 11:58.000
And all applications in the ecosystem, which I already exist in,

11:58.000 --> 12:00.000
highly likely, will not be updated.

12:00.000 --> 12:04.000
Simply because developers of these applications

12:04.000 --> 12:08.000
will not find an update in the documentation of their framework.

12:08.000 --> 12:10.000
They don't care.

12:10.000 --> 12:14.000
They just write a wrong way of application one time, and that's it.

12:14.000 --> 12:20.000
So, and even if they read documentation carefully.

12:20.000 --> 12:22.000
The documentation doesn't care.

12:22.000 --> 12:24.000
In many cases, really important details.

12:24.000 --> 12:28.000
For example, that's why OSIMPJOR project was created,

12:28.000 --> 12:34.000
because I tried to apply PJOR for several databases.

12:34.000 --> 12:38.000
Let's say PostgreSQL, Escalated, etc.

12:38.000 --> 12:42.000
And unfortunately, I found so many issues in current

12:42.000 --> 12:44.000
documentation in the PJOR ecosystem.

12:44.000 --> 12:48.000
So, I already spent three years in discovering

12:48.000 --> 12:52.000
all the issues and hidden gems at treps

12:52.000 --> 12:56.000
on the way, and I'm still not finished.

12:56.000 --> 13:00.000
In many cases, the documentation is outdated,

13:00.000 --> 13:04.000
and even worse, the documentation is outdated in such a way

13:04.000 --> 13:07.000
that you cannot understand it, it's outdated.

13:07.000 --> 13:13.000
For example, where it was a manual in client,

13:13.000 --> 13:17.000
how to use PJOR, like a simple PJOR manual.

13:17.000 --> 13:22.000
And this manual consisted of two PJOR modes.

13:22.000 --> 13:27.000
It's called Frontend PJOR, F-PJOR, and IR PJOR,

13:27.000 --> 13:31.000
intermediate representation PJOR, intermediate representation PJOR,

13:31.000 --> 13:33.000
like L-O-V-R, IR.

13:33.000 --> 13:41.000
So, by default, client was recommending using Frontend PJOR.

13:41.000 --> 13:45.000
That's like in the guide line, a visual guide line from the compiler,

13:45.000 --> 13:49.000
and only in the edge end of the instruction,

13:49.000 --> 13:53.000
where it was written a small note, like very alternative way

13:53.000 --> 13:57.000
to use PJOR, with the IR PJOR.

13:57.000 --> 14:01.000
However, I randomly found an issue

14:01.000 --> 14:07.000
that, actually, Frontend PJOR is deprecated.

14:07.000 --> 14:12.000
And no one even written a note anywhere

14:12.000 --> 14:15.000
in the documentation or whatever.

14:15.000 --> 14:17.000
It was an internal knowledge in Google,

14:17.000 --> 14:21.000
because Google was implementing most PJOR staff in LVM.

14:21.000 --> 14:24.000
And we simply didn't put a note, and that's it.

14:24.000 --> 14:27.000
And the issue was created in 2020,

14:27.000 --> 14:32.000
and I only three years after that discovered this change,

14:32.000 --> 14:35.000
only three years, and I need to ask additionally

14:35.000 --> 14:38.000
from PJOR developers from Google

14:38.000 --> 14:44.000
at the LVM discourse forum, and we answered my question.

14:44.000 --> 14:49.000
So, unfortunately, there are already open source projects,

14:49.000 --> 14:54.000
a pretty huge open source projects that are already integrated PJOR.

14:54.000 --> 14:58.000
And they are integrated in the wrong way.

14:58.000 --> 15:02.000
They integrated Frontend PJOR, and they were using

15:02.000 --> 15:04.000
for many, many years.

15:04.000 --> 15:06.000
It wasn't that critical.

15:06.000 --> 15:09.000
You didn't need something performance.

15:09.000 --> 15:13.000
Yes, unfortunately, they did.

15:13.000 --> 15:17.000
When the client documentation was changed,

15:17.000 --> 15:21.000
hardly pushed them to change a lot of things

15:21.000 --> 15:24.000
in their PJOR guideline.

15:24.000 --> 15:29.000
After some time, you go by the developers.

15:29.000 --> 15:34.000
Somehow found it, probably because I reported an issue,

15:34.000 --> 15:35.000
to the upstream.

15:35.000 --> 15:40.000
And they decided to implement PJOR for their database,

15:40.000 --> 15:44.000
and they got additional 10% improvement.

15:44.000 --> 15:50.000
So, for many years, I guess, at least two or three years,

15:50.000 --> 15:54.000
you got by the view was missing additional 10%

15:54.000 --> 15:58.000
of performance for their and their users,

15:58.000 --> 16:01.000
just because the documentation was,

16:01.000 --> 16:05.000
I wouldn't say, line, it was updated.

16:05.000 --> 16:10.000
It was updated in a pretty dirty way.

16:10.000 --> 16:16.000
And sometimes, the documentation is simply line.

16:16.000 --> 16:19.000
I had the case with SQLite.

16:19.000 --> 16:23.000
So, when I started investigating PJOR for different database,

16:23.000 --> 16:27.000
of course, I started from particular SQL,

16:27.000 --> 16:30.000
my SQL, and someday I came to SQLite,

16:30.000 --> 16:34.000
SQLite is fast database, so I wanted to optimize it even more.

16:34.000 --> 16:39.000
SQLite and the documentation had a dedicated note

16:39.000 --> 16:43.000
that PJOR doesn't help to optimize SQLite.

16:44.000 --> 16:46.000
It was a dedicated note.

16:46.000 --> 16:50.000
I, of course, I tested it on my own hardware,

16:50.000 --> 16:53.000
and I got 10% improvement.

16:53.000 --> 16:58.000
Of course, I reported to the upstream, they didn't believe me.

16:58.000 --> 17:02.000
I needed to perform a bunch of additional benchmarks

17:02.000 --> 17:04.000
on different compilers.

17:04.000 --> 17:07.000
So, at first, I started with Clank.

17:07.000 --> 17:09.000
They asked, please use GCC.

17:09.000 --> 17:11.000
Okay, I reproduced on GCC.

17:11.000 --> 17:15.000
So, later, they asked, so your benchmark suit,

17:15.000 --> 17:19.000
I used Clankbench for lab workloads.

17:19.000 --> 17:21.000
So, please use our own data.

17:21.000 --> 17:25.000
Our own benchmark, we have speed test, something.

17:25.000 --> 17:27.000
So, okay, I implemented once again.

17:27.000 --> 17:31.000
I reproduced results with PJOR improvement once again,

17:31.000 --> 17:36.000
10% on the after that, they just deleted this note

17:36.000 --> 17:39.000
from the documentation, and they never added a note

17:39.000 --> 17:43.000
about the PJOR helps to improve SQLite performance.

17:43.000 --> 17:49.000
And this topic on the forum is still unanswered.

17:49.000 --> 17:53.000
So, if documentation doesn't work,

17:53.000 --> 17:55.000
maybe tooling can help us.

17:55.000 --> 17:57.000
Okay, let's try.

17:57.000 --> 18:01.000
How tooling can interior help us.

18:01.000 --> 18:05.000
Okay, of course, we can try to automate some best practices

18:05.000 --> 18:10.000
from optimization guidelines, from framework, et cetera.

18:10.000 --> 18:14.000
Of course, we can do some kind of convenient benchmark

18:14.000 --> 18:16.000
in the course, how can we measure performance

18:16.000 --> 18:19.000
without a good benchmark is impossible.

18:19.000 --> 18:22.000
So, actually, I've seen several projects

18:22.000 --> 18:27.000
that we are measuring performance

18:27.000 --> 18:29.000
of improvement by NI.

18:29.000 --> 18:33.000
Actually, I am not lying into that GUI stuff.

18:34.000 --> 18:38.000
We can try to automate optimization routines,

18:38.000 --> 18:41.000
for example, at least semi-automatic PJOR stuff.

18:41.000 --> 18:45.000
And, of course, profilerists, if a good visualization,

18:45.000 --> 18:51.000
for example, imagine the newbie perspective

18:51.000 --> 18:57.000
on Intel VTUN versus Linux Perf, which one is more friendly.

18:57.000 --> 19:00.000
From my perspective, I'm kind of in UB.

19:01.000 --> 19:06.000
I was in UB in profiling, Intel VTUN is much, much more friendly.

19:06.000 --> 19:12.000
So, let's start with an example, creating a new application

19:12.000 --> 19:14.000
from a template.

19:14.000 --> 19:17.000
We can try to integrate in some templates,

19:17.000 --> 19:20.000
ready to use template, our old recommendations,

19:20.000 --> 19:21.000
regarding the link time optimization,

19:21.000 --> 19:23.000
called Gen Units, whatever you want,

19:23.000 --> 19:28.000
like all our favorite compilers wishes.

19:29.000 --> 19:32.000
And, there are, actually, already,

19:32.000 --> 19:34.000
a lot of, like, template, generators,

19:34.000 --> 19:36.000
and ready to use templates.

19:36.000 --> 19:38.000
I use the same framework,

19:38.000 --> 19:40.000
rather, to the towering and dieoxos.

19:40.000 --> 19:44.000
And, I integrated,

19:44.000 --> 19:49.000
Intel, rather, to templates, these recommendations.

19:49.000 --> 19:52.000
They are already merged, and ready to use.

19:52.000 --> 19:54.000
Unfortunately, towering and dieoxos,

19:54.000 --> 19:56.000
I created an issue in UB,

19:56.000 --> 19:58.000
and, again, they are not that interested.

19:58.000 --> 20:01.000
They are still unanswered.

20:01.000 --> 20:06.000
So, when, if a router to user, right now,

20:06.000 --> 20:10.000
we'll try to use a cargo-generate tool,

20:10.000 --> 20:13.000
it's kind of the same standard tool in the RASTEK system,

20:13.000 --> 20:17.000
and we'll try to create a new application for the router tool.

20:17.000 --> 20:19.000
It's, it's, how recommended,

20:19.000 --> 20:21.000
by the documentation from templates.

20:21.000 --> 20:25.000
These optimizations will be enabled from the day one.

20:25.000 --> 20:29.000
Not sometimes when I will create one and other PR

20:29.000 --> 20:32.000
on the GitHub, please enable them from the day one.

20:32.000 --> 20:35.000
However, this way still has issues.

20:35.000 --> 20:38.000
People don't use project generators.

20:38.000 --> 20:43.000
For various reasons, because they were just like copy paste

20:43.000 --> 20:45.000
from their previous project,

20:45.000 --> 20:48.000
they don't know about generators,

20:48.000 --> 20:52.000
and they simply wipe coding stuff,

20:52.000 --> 20:55.000
that's like a template engine, let's say,

20:55.000 --> 20:59.000
and this wipe coded templates,

20:59.000 --> 21:01.000
I'm not poisoned enough,

21:01.000 --> 21:04.000
it's optimization guidelines, let's say.

21:04.000 --> 21:09.000
And already written applications will not be covered anyway,

21:09.000 --> 21:12.000
because when you change S,

21:12.000 --> 21:14.000
the same situation is the documentation.

21:14.000 --> 21:17.000
When you change it template,

21:17.000 --> 21:21.000
it doesn't apply to the already created applications,

21:21.000 --> 21:26.000
because people don't use generators only on the start.

21:26.000 --> 21:30.000
In our example, let's compare benchmarking

21:30.000 --> 21:33.000
between Rust and C++ stuff,

21:33.000 --> 21:35.000
how the ecosystems are different,

21:35.000 --> 21:39.000
and which one is more accessible to the developer.

21:39.000 --> 21:41.000
benchmarking, of course, is needed

21:41.000 --> 21:43.000
to measure performance, as I said.

21:43.000 --> 21:46.000
So, good example is how it's done in Rust.

21:46.000 --> 21:49.000
Rust, very, very,

21:50.000 --> 21:53.000
by default, that's a default ecosystem,

21:53.000 --> 21:55.000
it's called Cargabange.

21:55.000 --> 21:59.000
I think I don't think that you can find

21:59.000 --> 22:02.000
any benchmark in the Rust ecosystem

22:02.000 --> 22:05.000
that will not be running by Cargabange command,

22:05.000 --> 22:06.000
that's a standard,

22:06.000 --> 22:11.000
and I did it hundreds of times for many, many projects,

22:11.000 --> 22:14.000
it's always done in the same way.

22:14.000 --> 22:17.000
Let's compare it to C++.

22:18.000 --> 22:20.000
We can start from the simple question,

22:20.000 --> 22:22.000
which benchmark library should we use?

22:22.000 --> 22:26.000
Most people will say Google benchmark, okay, let's find.

22:26.000 --> 22:31.000
How to connect, how to add the Google benchmark to your project?

22:31.000 --> 22:34.000
There are a lot of different ways.

22:34.000 --> 22:35.000
There are,

22:35.000 --> 22:38.000
language-specific package managers like Conan,

22:38.000 --> 22:41.000
the Cpc, of course, system dependencies,

22:41.000 --> 22:44.000
like your favorite package manager on the system,

22:44.000 --> 22:46.000
maybe an XOS, maybe whatever.

22:46.000 --> 22:48.000
And next question,

22:48.000 --> 22:50.000
how to run with benchmark?

22:50.000 --> 22:52.000
In Rust, it's a very standard command.

22:52.000 --> 22:54.000
In C++,

22:54.000 --> 22:57.000
there is no any standard command to run a benchmark.

22:57.000 --> 23:00.000
You will need, in the case of Google benchmark,

23:00.000 --> 23:02.000
you will need to build a binary,

23:02.000 --> 23:04.000
and run with binary.

23:04.000 --> 23:06.000
And if it's your project,

23:06.000 --> 23:07.000
it's fine.

23:07.000 --> 23:10.000
But if you are running benchmarks for your

23:10.000 --> 23:13.000
third-party project, you will need to figure out how run

23:13.000 --> 23:15.000
with benchmark, read, read, read me,

23:15.000 --> 23:17.000
and usually you need to read,

23:17.000 --> 23:20.000
Simaq files, make files, whatever.

23:20.000 --> 23:22.000
And of course, it's like a Zoo,

23:22.000 --> 23:24.000
a C++.

23:24.000 --> 23:26.000
And now, a good tool,

23:26.000 --> 23:29.000
a good example of such helpful tooling,

23:29.000 --> 23:32.000
supported with Cargo Vizor tool.

23:32.000 --> 23:37.000
That's a tool that tries to automatically

23:37.000 --> 23:42.000
apply, require some helpful optimizations

23:42.000 --> 23:44.000
into your project.

23:44.000 --> 23:47.000
So, how is it helpful?

23:47.000 --> 23:49.000
When you have this tool,

23:49.000 --> 23:51.000
you don't need to learn the

23:51.000 --> 23:54.000
world documentation for the RustC compiler.

23:54.000 --> 23:58.000
You don't need to be an expert in different compiler optimizations.

23:58.000 --> 24:00.000
What does it mean?

24:00.000 --> 24:04.000
About the different modes and the differences.

24:04.000 --> 24:07.000
You can just run this Vizor.

24:07.000 --> 24:12.000
This Vizor tool will do pretty good defaults

24:12.000 --> 24:16.000
for you for several profiles and with it.

24:16.000 --> 24:20.000
Unfortunately, there is anything comparable

24:20.000 --> 24:23.000
in C++ for all.

24:23.000 --> 24:27.000
Another example, post link optimization.

24:27.000 --> 24:29.000
Post link optimization,

24:29.000 --> 24:33.000
that's yet another step over profile guide optimization.

24:33.000 --> 24:36.000
This technique actually,

24:36.000 --> 24:38.000
what it does,

24:38.000 --> 24:43.000
is actually the most important optimization.

24:43.000 --> 24:47.000
It's a reordering functions in your binary

24:47.000 --> 24:50.000
to make your CPU instruction cache happy,

24:50.000 --> 24:54.000
because it reduces eye-cash instruction misses.

24:54.000 --> 24:57.000
And it improves performance a lot.

24:57.000 --> 25:00.000
There are three, like two.

25:00.000 --> 25:03.000
Okay, two is at the moment.

25:03.000 --> 25:05.000
With the most popular LVM bolt,

25:05.000 --> 25:08.000
it was developed by phasebook slash meta.

25:08.000 --> 25:11.000
That's default way to do payload right now.

25:11.000 --> 25:13.000
And Google propeller, that's similar to

25:13.000 --> 25:16.000
but from Google, works a bit in a different way,

25:16.000 --> 25:20.000
because all of them both tries to disassemble your binary.

25:20.000 --> 25:23.000
Reassemble it like re-shuffle functions,

25:23.000 --> 25:26.000
and assembly binary gain.

25:26.000 --> 25:28.000
But Google propeller,

25:28.000 --> 25:33.000
they are modifying the compiler itself,

25:33.000 --> 25:38.000
and they are reordering functions during the link in stage.

25:38.000 --> 25:43.000
So they don't need to disassemble the reduced binary.

25:43.000 --> 25:46.000
There are pros and cons of each approach,

25:46.000 --> 25:49.000
and both developers and Google developers

25:49.000 --> 25:52.000
don't agree with each other.

25:52.000 --> 25:54.000
But it's what we have.

25:54.000 --> 25:56.000
Also, we had Intel TLO,

25:56.000 --> 25:57.000
finally out optimizer,

25:57.000 --> 26:00.000
but it was up and sourced.

26:00.000 --> 26:02.000
It had several commits,

26:02.000 --> 26:04.000
and it's for now it's archived.

26:04.000 --> 26:06.000
And since we have a lot of layoffs in Intel,

26:06.000 --> 26:09.000
it's probably resting piece.

26:09.000 --> 26:11.000
So forever.

26:11.000 --> 26:13.000
So you can just look into the code,

26:13.000 --> 26:15.000
but please don't use it.

26:15.000 --> 26:17.000
Should we care about PLO?

26:17.000 --> 26:19.000
Yes, we should,

26:19.000 --> 26:23.000
because it gives one really great leap

26:23.000 --> 26:25.000
in performance once again,

26:25.000 --> 26:27.000
in compared to Pidgeor.

26:27.000 --> 26:30.000
Let's talk from Amir Ayupov.

26:30.000 --> 26:33.000
It's one of the developers, all of the emboldt.

26:33.000 --> 26:39.000
And the implement of this tool for using internally

26:39.000 --> 26:40.000
at Meta,

26:40.000 --> 26:42.000
and they just decided to open source it.

26:42.000 --> 26:45.000
And you can see,

26:45.000 --> 26:49.000
they like to show LDM both

26:49.000 --> 26:52.000
performance improvement on clinic.

26:52.000 --> 26:55.000
But I have more benchmarks,

26:55.000 --> 26:58.000
for example, on different databases,

26:58.000 --> 27:00.000
for example, on the index database.

27:00.000 --> 27:03.000
And I still, even on database,

27:03.000 --> 27:05.000
on the real workload,

27:05.000 --> 27:09.000
I can get plus 5% of performance,

27:09.000 --> 27:11.000
just by applying this tool.

27:11.000 --> 27:14.000
Let's approach the performance improvement

27:14.000 --> 27:16.000
we are talking about already

27:16.000 --> 27:19.000
LTO plus Pidgeor optimized binaries.

27:19.000 --> 27:22.000
But how easy is to use PLO in practice?

27:22.000 --> 27:24.000
So let's start from the good example.

27:24.000 --> 27:26.000
I was done in Rust.

27:26.000 --> 27:29.000
Rust has a yet another,

27:29.000 --> 27:33.000
like plug-in to cargo, cargo Pidgeor,

27:33.000 --> 27:36.000
cargo Pidgeor allows you to,

27:36.000 --> 27:39.000
do in a semi-automatic way,

27:39.000 --> 27:42.000
Pidgeor optimization routines,

27:42.000 --> 27:46.000
and additionally both optimization routines.

27:46.000 --> 27:51.000
And it's implemented in so good way.

27:51.000 --> 27:54.000
So actually, you can implement

27:54.000 --> 27:57.000
in exactly this comment,

27:57.000 --> 28:01.000
Pidgeor plus PLO optimization pipeline.

28:01.000 --> 28:06.000
That's a really, really powerful optimization pipeline.

28:06.000 --> 28:11.000
It's used, it's used by clank,

28:11.000 --> 28:14.000
not in distributions,

28:14.000 --> 28:17.000
by default, but if you wear a scripts.

28:17.000 --> 28:20.000
This pipeline is used by default by RustC,

28:20.000 --> 28:21.000
right now.

28:21.000 --> 28:23.000
So if you're using Rust,

28:23.000 --> 28:25.000
and you're using RustC compiler,

28:25.000 --> 28:26.000
it's optimized by this.

28:26.000 --> 28:29.000
And let's compare to the C++ world.

28:29.000 --> 28:32.000
How can you try to mimic it?

28:32.000 --> 28:35.000
For example, C++ world doesn't have such a tool.

28:35.000 --> 28:38.000
You need to implement everything on your own.

28:38.000 --> 28:41.000
You need to start from the Pidgeor stuff.

28:41.000 --> 28:45.000
You need to figure out proper compiler switches.

28:45.000 --> 28:49.000
Of course, GCC, GCC, and clank,

28:49.000 --> 28:51.000
they have similar compilers,

28:51.000 --> 28:54.000
which are so Pidgeor, but they are not the same.

28:54.000 --> 28:57.000
So we need to support in extreme cases both.

28:57.000 --> 29:00.000
You need to build it, you need to run it.

29:00.000 --> 29:03.000
Manually, of course, you need to collect Pidgeor profiles.

29:03.000 --> 29:06.000
Pidgeor profiles are collected differently.

29:06.000 --> 29:09.000
In GCC and clank ecosystems.

29:09.000 --> 29:12.000
You need to convert this profiles to a proper format.

29:12.000 --> 29:15.000
Format switches recognizable by the compiler.

29:15.000 --> 29:18.000
It's done by different tools for GCC and clank.

29:18.000 --> 29:21.000
And they are not compatible with each other.

29:21.000 --> 29:24.000
You need to recompile your application once again.

29:24.000 --> 29:27.000
Manually, of course, by parsing this profiles.

29:27.000 --> 29:30.000
With additional bunch of compilers switches.

29:30.000 --> 29:33.000
You need to figure out how to run LVM-bolt.

29:33.000 --> 29:40.000
LVM-bolt internally has 289 switches.

29:40.000 --> 29:46.000
And 100, I guess 150 are...

29:46.000 --> 29:49.000
So a regular available.

29:49.000 --> 29:51.000
You can just minus minus help.

29:51.000 --> 29:53.000
And a rest of them.

29:53.000 --> 29:57.000
You need to write something like minus minus hidden help.

29:58.000 --> 30:00.000
And they will be available.

30:00.000 --> 30:03.000
And even you will look through all of them.

30:03.000 --> 30:07.000
You will not understand what does they mean and what we do.

30:07.000 --> 30:11.000
So, of probably you figure out how to run both.

30:11.000 --> 30:15.000
So, you need to instrument your binary once again.

30:15.000 --> 30:16.000
It's both.

30:16.000 --> 30:19.000
Somehow, you need to run your binary once again.

30:19.000 --> 30:21.000
You need to collect both profiles.

30:21.000 --> 30:24.000
They are not the same as Pidgeor profiles.

30:24.000 --> 30:28.000
And finally, you need to optimize your binary of both.

30:28.000 --> 30:31.000
All of these things are done only manually.

30:31.000 --> 30:35.000
Only by you, you need to run scripts compared to this one.

30:35.000 --> 30:37.000
It's all already automated.

30:37.000 --> 30:43.000
It's all already automated by clever engineers from the RASDF team.

30:43.000 --> 30:50.000
I would say RASDF is much, much simpler and more accessible to a regular developer.

30:50.000 --> 30:53.000
I done both, both stuff.

30:53.000 --> 30:58.000
And if you remember the table about Pidgeor table,

30:58.000 --> 31:02.000
most of these applications, except for your libraries,

31:02.000 --> 31:04.000
they are written in RASD.

31:04.000 --> 31:06.000
Not because I'm a RASDZ,

31:06.000 --> 31:10.000
because I've had a dedicated track for RASDF here.

31:10.000 --> 31:16.000
But simply because it's so much easier can be done in RASD,

31:16.000 --> 31:20.000
but I just don't want to waste my time with C++.

31:20.000 --> 31:23.000
During the optimization routines, that's it.

31:23.000 --> 31:27.000
Not because I just simply like RASD, but I like RASD.

31:27.000 --> 31:30.000
So, not about issues is extra tooling.

31:30.000 --> 31:34.000
Of course, people just don't know about Pidgeor tools.

31:34.000 --> 31:36.000
The same issue is the documentation.

31:36.000 --> 31:40.000
So, where we need to learn the documentation,

31:40.000 --> 31:44.000
like official books or whatever they need to read it, we don't have time.

31:45.000 --> 31:49.000
Tools are not easy to install for various RASDF reasons.

31:49.000 --> 31:52.000
For example, no Pidgeor package in a repository.

31:52.000 --> 31:54.000
For example, that was the case.

31:54.000 --> 31:58.000
One year ago, I go for LVM Bolt.

31:58.000 --> 32:03.000
I was suggesting an idea into different distributions.

32:03.000 --> 32:05.000
The simple idea.

32:05.000 --> 32:08.000
Please optimize Clank with LVM Bolt,

32:08.000 --> 32:12.000
because it's already shown in a lot of conferences.

32:12.000 --> 32:14.000
It really works.

32:14.000 --> 32:17.000
I got exactly the same answer, I remember.

32:17.000 --> 32:20.000
We don't have a package for LVM Bolt.

32:20.000 --> 32:23.000
We don't care when we need to create a package.

32:23.000 --> 32:29.000
So, sometimes, at that time, LVM Bolt wasn't LVM Bolt.

32:29.000 --> 32:34.000
It was external tool, so they needed to compile it separately.

32:34.000 --> 32:36.000
Create a package, et cetera, et cetera.

32:36.000 --> 32:37.000
It's a maintainer overhead.

32:37.000 --> 32:40.000
Luckily, right now, LVM Bolt is a part of LVM.

32:40.000 --> 32:42.000
It can be built as a part of LVM.

32:42.000 --> 32:48.000
And many all distributions have ready to use RASDF LVM,

32:48.000 --> 32:52.000
and enable just another siege.

32:52.000 --> 32:56.000
Now, easy way to build tool on your own.

32:56.000 --> 33:02.000
For example, it's a really big program for Google Autofidio

33:02.000 --> 33:07.000
and Google to stuff, because they need to specific LVM version.

33:07.000 --> 33:12.000
They're not upstream, and if you'll try to build it as a LVM version,

33:12.000 --> 33:16.000
of course there will be a lot of compilation errors.

33:16.000 --> 33:17.000
We'll need to fix them.

33:17.000 --> 33:22.000
No one can want to waste time here.

33:22.000 --> 33:29.000
So, if even tools cannot help us in a good way,

33:29.000 --> 33:34.000
maybe we'll try to change defaults in our ecosystem.

33:34.000 --> 33:36.000
Okay, let's try.

33:36.000 --> 33:40.000
So, if you'll be able to change defaults,

33:40.000 --> 33:47.000
we will, we will, there is no need to read the documentation.

33:47.000 --> 33:50.000
Install extra tools and whatever.

33:50.000 --> 33:54.000
All recommended optimizations will be done by default.

33:54.000 --> 33:57.000
For example, you just enable a release profile,

33:57.000 --> 34:01.000
and instead of just up to level 3, up to level 2,

34:01.000 --> 34:04.000
to LVM will be enabled by default,

34:04.000 --> 34:07.000
and we will get a more optimized binary.

34:07.000 --> 34:08.000
It's great.

34:08.000 --> 34:11.000
However, there is a high room rule law.

34:11.000 --> 34:14.000
So, if you have a lot of users,

34:14.000 --> 34:16.000
and whatever you will try to change,

34:16.000 --> 34:20.000
it will break some workloads, some scenarios.

34:20.000 --> 34:24.000
And here we have exactly the same case.

34:24.000 --> 34:27.000
Once again, Rustic Assistant is a good example here,

34:27.000 --> 34:30.000
because we are trying to change defaults in a good way.

34:30.000 --> 34:35.000
We are trying to push a more rapid linker

34:35.000 --> 34:40.000
to optimize developer rigs, some benchmarks here.

34:40.000 --> 34:46.000
And of course, we have even more quick linkers on the market.

34:46.000 --> 34:48.000
But of course, we are not ready yet.

34:48.000 --> 34:51.000
But we are already discussions, Rust,

34:51.000 --> 34:55.000
team waiting for stabilizing wild linker or mild linker.

34:55.000 --> 34:58.000
And we will try to switch from LLD to them,

34:58.000 --> 35:03.000
because it gives more speed for us.

35:03.000 --> 35:05.000
Rust, see and Rust analyze,

35:05.000 --> 35:09.000
or like, claim this stuff, I already optimized this pjl

35:09.000 --> 35:11.000
LLD by default.

35:11.000 --> 35:16.000
Using a more optimized allocators for clip stuff,

35:16.000 --> 35:20.000
like a clientize, trying to change,

35:20.000 --> 35:23.000
very very carefully default release profile.

35:23.000 --> 35:25.000
For example, strip debugging,

35:25.000 --> 35:27.000
so binaries will be smaller,

35:27.000 --> 35:30.000
because Rust binary is operating famous,

35:30.000 --> 35:34.000
because since we are huge, by default.

35:34.000 --> 35:37.000
However, in the Rustic Assistant,

35:37.000 --> 35:39.000
we are asked some interesting defaults,

35:39.000 --> 35:42.000
if not questionable.

35:42.000 --> 35:47.000
One of the way, how regular Rust tools are installed,

35:47.000 --> 35:49.000
it's a cargo install.

35:49.000 --> 35:51.000
Cargo install with just,

35:51.000 --> 35:53.000
again, to style installation way,

35:53.000 --> 35:57.000
to take out your repository on your machine,

35:57.000 --> 36:00.000
and compile it on your machine.

36:00.000 --> 36:01.000
That's it.

36:01.000 --> 36:03.000
No pre-built binary package,

36:03.000 --> 36:05.000
whatever just from sources.

36:05.000 --> 36:07.000
And of course,

36:07.000 --> 36:12.000
and the SIG, again, to style installation.

36:12.000 --> 36:15.000
Of course, you can optimize for your hardware,

36:15.000 --> 36:19.000
but no one actually does it with cargo install.

36:19.000 --> 36:22.000
And cons, you have much, much more limitations

36:22.000 --> 36:25.000
in enabling expensive optimizations.

36:25.000 --> 36:28.000
And here, there is a problem with LTO optimization,

36:28.000 --> 36:30.000
because LTO optimization,

36:30.000 --> 36:31.000
in the most aggressive,

36:31.000 --> 36:33.000
it's form, full LTO.

36:33.000 --> 36:36.000
VTLTO is really expensive.

36:36.000 --> 36:39.000
It doubles your compilation time,

36:39.000 --> 36:40.000
at least doubles,

36:40.000 --> 36:43.000
and it requires much more memory.

36:43.000 --> 36:45.000
Okay, okay,

36:45.000 --> 36:49.000
1.5x to x independence,

36:49.000 --> 36:51.000
depends on different flex.

36:52.000 --> 36:55.000
So, if you will try to enable LTO by default,

36:55.000 --> 36:57.000
for the release profile,

36:57.000 --> 37:01.000
all cargo installed applications

37:01.000 --> 37:07.000
will be installing twice time as before.

37:07.000 --> 37:08.000
And it's a problem.

37:08.000 --> 37:10.000
There are solutions to mitigate them,

37:10.000 --> 37:14.000
just using cargo bin install with pre-built packages.

37:14.000 --> 37:15.000
But unfortunately,

37:15.000 --> 37:19.000
this solution is not that popular in the RASTEC system,

37:19.000 --> 37:21.000
and it's not considered as a default.

37:21.000 --> 37:23.000
Whereas,

37:23.000 --> 37:25.000
the reason for that,

37:25.000 --> 37:27.000
for example, cargo bin install is not maintained

37:27.000 --> 37:29.000
by the main RASTEC team,

37:29.000 --> 37:32.000
so credibility issues, etc., etc., etc.,

37:32.000 --> 37:34.000
but it's a limitation.

37:34.000 --> 37:36.000
Another example about defaults,

37:36.000 --> 37:39.000
very dedicated tool for preparing the binary

37:39.000 --> 37:41.000
for distributing cargo dist,

37:41.000 --> 37:44.000
or right now it's called simple dist.

37:44.000 --> 37:48.000
So, it's a tool that tries to pick

37:48.000 --> 37:52.000
a good optimization optimization,

37:52.000 --> 37:56.000
cargo switches for optimization of your binary.

37:56.000 --> 37:58.000
However,

37:58.000 --> 38:04.000
this tool enables not the most aggressive form of LTO,

38:04.000 --> 38:06.000
and I was wondering,

38:06.000 --> 38:07.000
what is the reason?

38:07.000 --> 38:08.000
Because the final TOR,

38:08.000 --> 38:10.000
it compiles much faster,

38:10.000 --> 38:13.000
but the trade-off here,

38:13.000 --> 38:15.000
is that the final TOR cannot perform

38:15.000 --> 38:17.000
the most aggressive optimization,

38:17.000 --> 38:22.000
and if we are trying to prepare a binary for delivering

38:22.000 --> 38:23.000
for target machine,

38:23.000 --> 38:26.000
usually we are trying to perform aggressive stuff.

38:26.000 --> 38:29.000
And that was the reason, actually,

38:29.000 --> 38:32.000
because a lot of simply,

38:32.000 --> 38:34.000
on the other point of view,

38:34.000 --> 38:36.000
a lot of build binaries,

38:36.000 --> 38:38.000
build binaries for shipable binaries,

38:38.000 --> 38:39.000
for distributing,

38:39.000 --> 38:41.000
will not be shipped,

38:41.000 --> 38:44.000
so they are not worth of optimizing.

38:44.000 --> 38:45.000
Unfortunately,

38:45.000 --> 38:47.000
this detail is not written as a documentation.

38:47.000 --> 38:50.000
It's written only in a random comment

38:50.000 --> 38:52.000
in the GitHub tracker,

38:52.000 --> 38:53.000
and that's it.

38:53.000 --> 38:56.000
And regular users don't know about this detail.

38:56.000 --> 38:57.000
They simply,

38:57.000 --> 39:00.000
and blindly apply cargo dist profile,

39:00.000 --> 39:02.000
and that's it.

39:02.000 --> 39:05.000
So, I just decided to test once again.

39:05.000 --> 39:07.000
Probably,

39:07.000 --> 39:10.000
cargo dist users will be okay.

39:11.000 --> 39:12.000
This changing,

39:12.000 --> 39:14.000
fine LTE war,

39:14.000 --> 39:15.000
to fat LTE war,

39:15.000 --> 39:16.000
or full LTE war,

39:16.000 --> 39:18.000
in their profiles.

39:18.000 --> 39:20.000
And as you see,

39:20.000 --> 39:21.000
I have,

39:21.000 --> 39:22.000
once again,

39:22.000 --> 39:25.000
pretty good conversion rate here.

39:25.000 --> 39:26.000
So,

39:26.000 --> 39:28.000
I guess,

39:28.000 --> 39:33.000
607 projects are accepted,

39:33.000 --> 39:35.000
my changes,

39:35.000 --> 39:37.000
and five,

39:37.000 --> 39:38.000
five,

39:39.000 --> 39:42.000
or four of them just didn't answer.

39:42.000 --> 39:43.000
No,

39:43.000 --> 39:44.000
no rejections.

39:44.000 --> 39:45.000
No,

39:45.000 --> 39:46.000
no,

39:46.000 --> 39:47.000
it's all.

39:47.000 --> 39:48.000
So, probably,

39:48.000 --> 39:51.000
these defaults are not very good for users.

39:51.000 --> 39:55.000
And we just don't know what we are enabling.

39:55.000 --> 39:56.000
Let's see,

39:56.000 --> 39:58.000
we are just simply following.

39:58.000 --> 39:59.000
So,

39:59.000 --> 40:02.000
I tried to push an idea

40:02.000 --> 40:05.000
into the rough community about pushing LTE war

40:05.000 --> 40:06.000
in any form,

40:06.000 --> 40:07.000
actually,

40:07.000 --> 40:09.000
to the default release mode.

40:09.000 --> 40:11.000
Even if we are knowing about cargo install,

40:11.000 --> 40:12.000
we are not,

40:12.000 --> 40:14.000
we know stuff about,

40:14.000 --> 40:17.000
not doubting compilation time,

40:17.000 --> 40:18.000
if he knows your mode,

40:18.000 --> 40:22.000
but more course on the CI,

40:22.000 --> 40:23.000
and whatever.

40:23.000 --> 40:24.000
Luckily,

40:24.000 --> 40:27.000
we can enable LTE war in a much easier way

40:27.000 --> 40:29.000
in the rough ecosystem,

40:29.000 --> 40:30.000
compared to C++.

40:30.000 --> 40:32.000
Because usually in C++,

40:32.000 --> 40:35.000
enabling LTE war,

40:35.000 --> 40:38.000
we will uncover a lot of hidden and defined behaviors.

40:38.000 --> 40:39.000
And whatever,

40:39.000 --> 40:41.000
we will meet a lot of interesting sick folds

40:41.000 --> 40:43.000
in runtime of the enabling LTE war,

40:43.000 --> 40:45.000
and we will disable it after that.

40:45.000 --> 40:46.000
In Rust,

40:46.000 --> 40:48.000
we don't have such a problem.

40:48.000 --> 40:52.000
You can enable LTE war with one cargo command,

40:52.000 --> 40:55.000
and this optimization is really safe in Rust,

40:55.000 --> 40:57.000
compared to C++.

40:57.000 --> 41:01.000
How dangerous is LTE in C++?

41:01.000 --> 41:04.000
There is a great repository from gentle,

41:04.000 --> 41:05.000
to LTE war,

41:05.000 --> 41:08.000
and you can just see how many issues

41:08.000 --> 41:10.000
there is in this repository,

41:10.000 --> 41:11.000
and believe me,

41:11.000 --> 41:12.000
all of them,

41:12.000 --> 41:14.000
almost all of them,

41:14.000 --> 41:17.000
are about blowing up some C or C++ code

41:17.000 --> 41:18.000
from enabling LTE war.

41:18.000 --> 41:19.000
Yes,

41:19.000 --> 41:20.000
yes,

41:20.000 --> 41:22.000
they are C++ victims.

41:22.000 --> 41:23.000
In Rust,

41:23.000 --> 41:25.000
we don't have such a big such an issue.

41:25.000 --> 41:27.000
In C++ it's a huge issue.

41:27.000 --> 41:30.000
However,

41:30.000 --> 41:33.000
I met a bunch of additional issues.

41:34.000 --> 41:38.000
Some people using cargo release profile

41:38.000 --> 41:41.000
during their development phase on their local machines.

41:41.000 --> 41:44.000
Even if you have documentation in cargo,

41:44.000 --> 41:48.000
that release profile is meant for using on production.

41:48.000 --> 41:50.000
We don't care.

41:50.000 --> 41:52.000
We just don't care.

41:52.000 --> 41:56.000
We are using release profile during the development phase,

41:56.000 --> 41:59.000
and when we will try to point them,

41:59.000 --> 42:01.000
just please don't do it.

42:01.000 --> 42:02.000
We don't care.

42:02.000 --> 42:03.000
We say,

42:03.000 --> 42:05.000
that's okay for me.

42:05.000 --> 42:07.000
That's it.

42:07.000 --> 42:09.000
So,

42:09.000 --> 42:11.000
one week ago,

42:11.000 --> 42:12.000
where I was a topic,

42:12.000 --> 42:13.000
when I read some,

42:13.000 --> 42:14.000
subreddit,

42:14.000 --> 42:17.000
about some of the messengers it's me.

42:17.000 --> 42:20.000
This person created almost

42:20.000 --> 42:23.000
five hundred issues manually,

42:23.000 --> 42:25.000
on GitHub by,

42:25.000 --> 42:29.000
about enabling LTE war in Rust projects.

42:29.000 --> 42:31.000
So, the Rust community was wondering,

42:31.000 --> 42:33.000
what is going on here?

42:33.000 --> 42:36.000
So, that was my attempt

42:36.000 --> 42:40.000
to convince Rust ecosystems in Rust dev team

42:40.000 --> 42:42.000
that they are defaults,

42:42.000 --> 42:45.000
are not that optimal,

42:45.000 --> 42:48.000
and the Rust community are ready

42:48.000 --> 42:52.000
to enable LTE of a very least profile.

42:52.000 --> 42:54.000
So, unfortunately we have really,

42:54.000 --> 42:57.000
we had really great conversation,

42:57.000 --> 42:59.000
but finally I was banned,

42:59.000 --> 43:01.000
not on Rust subreddit.

43:01.000 --> 43:03.000
I was banned on Reddit.

43:03.000 --> 43:06.000
So, I don't have an account anymore in Reddit,

43:06.000 --> 43:09.000
but actually we had really good conversation.

43:09.000 --> 43:11.000
We were raised some points,

43:11.000 --> 43:13.000
good points in this conversation.

43:13.000 --> 43:15.000
Fortunately, I wasn't able to answer all of them,

43:15.000 --> 43:16.000
because I was banned.

43:16.000 --> 43:17.000
So, sorry.

43:17.000 --> 43:20.000
If someone didn't get an answer for me,

43:20.000 --> 43:22.000
it's not my wish.

43:22.000 --> 43:26.000
And so, I created more than 500,

43:26.000 --> 43:28.000
that's only for us,

43:28.000 --> 43:30.000
trust me, only for us projects,

43:30.000 --> 43:32.000
more than 500 stuff,

43:32.000 --> 43:37.000
and more than 300 and 50,

43:37.000 --> 43:39.000
plus is closed.

43:39.000 --> 43:41.000
Closed, unfortunately there is no filter,

43:41.000 --> 43:44.000
to differentiate between closed,

43:44.000 --> 43:47.000
accepted and closed, not planned.

43:47.000 --> 43:51.000
But this statistics is public, you can check.

43:51.000 --> 43:56.000
I would say 95% of closed issues are accepted,

43:56.000 --> 44:00.000
out here, to the default release profile.

44:00.000 --> 44:04.000
So, let's message to the Rust team.

44:04.000 --> 44:08.000
Please reconsider your decision once again.

44:08.000 --> 44:12.000
Probably we need to spend some more time here,

44:12.000 --> 44:15.000
in an instigating and enabling out here for default.

44:15.000 --> 44:16.000
And by the way,

44:16.000 --> 44:20.000
I proposed only full out here.

44:20.000 --> 44:22.000
It's the most aggressive one.

44:22.000 --> 44:25.000
So, the most time consuming and resource consuming.

44:25.000 --> 44:27.000
So, I wanted to test,

44:27.000 --> 44:31.000
to test the border of acceptance from the project offers,

44:31.000 --> 44:33.000
and the asset pretty well,

44:33.000 --> 44:34.000
the trade-off,

44:34.000 --> 44:37.000
because benefits from LTI war are so great,

44:37.000 --> 44:39.000
performance, of course.

44:39.000 --> 44:42.000
But you cannot measure performance for all projects,

44:42.000 --> 44:44.000
because you need to implement benchmark full projects,

44:44.000 --> 44:48.000
but you can easily measure binary size improvement,

44:48.000 --> 44:53.000
and full LTI war gives 20% of binary size improvement,

44:53.000 --> 44:58.000
just by one simple comment in the cargo-tombo file.

44:58.000 --> 45:03.000
So, there are some LLM staff,

45:03.000 --> 45:06.000
so when you try to enable,

45:06.000 --> 45:10.000
of I was hardly switched off.

45:10.000 --> 45:11.000
Okay.

45:11.000 --> 45:12.000
So,

45:13.000 --> 45:17.000
I'm fortunate the LLM is not that heavy,

45:17.000 --> 45:20.000
if a full LTI war,

45:20.000 --> 45:23.000
but Microsoft engineers say go away,

45:23.000 --> 45:24.000
and that's it.

45:24.000 --> 45:27.000
And I have actually much more ideas

45:27.000 --> 45:29.000
to try to implement them,

45:29.000 --> 45:31.000
like semi-automatic way,

45:31.000 --> 45:35.000
to propose more efficient tools like grip,

45:35.000 --> 45:36.000
something,

45:36.000 --> 45:38.000
open the tables of performance challenges,

45:38.000 --> 45:40.000
for different projects,

45:40.000 --> 45:41.000
et cetera, et cetera,

45:41.000 --> 45:42.000
like one beer,

45:42.000 --> 45:43.000
sea-but-from-pants,

45:43.000 --> 45:44.000
or stuff,

45:44.000 --> 45:46.000
it could actually useful for anyone.

45:46.000 --> 45:48.000
And I would say,

45:48.000 --> 45:50.000
I want to collaborate with our

45:50.000 --> 45:53.000
first-damn deaf rooms in different directions,

45:53.000 --> 45:54.000
to compilers,

45:54.000 --> 45:56.000
optimizing from different domains.

45:56.000 --> 45:58.000
I want to hear your opinion about

45:58.000 --> 45:59.000
software performance stuff,

45:59.000 --> 46:02.000
and we can collaborate here

46:02.000 --> 46:04.000
and exchange opinions,

46:04.000 --> 46:07.000
trying to optimize something in different ways.

46:07.000 --> 46:09.000
And actually,

46:09.000 --> 46:11.000
that's it,

46:11.000 --> 46:13.000
with all the reasons

46:13.000 --> 46:14.000
that I've starting

46:14.000 --> 46:16.000
awesome Pedro project,

46:16.000 --> 46:17.000
awesome multi-all,

46:17.000 --> 46:19.000
spin-off of awesome Pedro,

46:19.000 --> 46:22.000
and with why I actually created

46:22.000 --> 46:24.000
software performance dev room,

46:24.000 --> 46:27.000
just to try to push as much as possible,

46:27.000 --> 46:30.000
more optimizations,

46:30.000 --> 46:33.000
much to the software,

46:33.000 --> 46:35.000
and make them more accessible

46:35.000 --> 46:37.000
to the regular developers.

46:37.000 --> 46:39.000
Thank you.

46:39.000 --> 46:41.000
Thank you.

46:41.000 --> 46:43.000
Thank you.

46:43.000 --> 46:45.000
Thank you.

46:45.000 --> 46:47.000
Thank you.

46:53.000 --> 46:55.000
If you have any questions,

46:55.000 --> 46:58.000
you're a microphone.

46:58.000 --> 46:59.000
You can say,

46:59.000 --> 47:00.000
and I'll repeat.

47:00.000 --> 47:01.000
Nobody's.

47:02.000 --> 47:05.000
It seems to me that one would work

47:05.000 --> 47:07.000
to make...

47:09.000 --> 47:10.000
Hello.

47:10.000 --> 47:13.000
So it seems to me that one way to make

47:13.000 --> 47:16.000
this optimization more accessible

47:16.000 --> 47:18.000
for C++ or C code

47:18.000 --> 47:20.000
would be to integrate the settings

47:20.000 --> 47:21.000
or build system,

47:21.000 --> 47:22.000
something like CMAX.

47:22.000 --> 47:24.000
So how difficult would that be

47:24.000 --> 47:27.000
to add an option in CMAX to make life possible?

47:27.000 --> 47:28.000
I will.

47:28.000 --> 47:30.000
I will.

47:30.000 --> 47:31.000
Yes.

47:31.000 --> 47:32.000
Yes.

47:32.000 --> 47:34.000
I proposed all of these ideas

47:34.000 --> 47:35.000
to the CMAX.

47:35.000 --> 47:37.000
Where are open issues of that?

47:37.000 --> 47:38.000
Of course.

47:38.000 --> 47:41.000
No one is implemented.

47:41.000 --> 47:43.000
Because it's difficult?

47:43.000 --> 47:44.000
No.

47:44.000 --> 47:45.000
Well,

47:45.000 --> 47:47.000
I would say it will be more difficult

47:47.000 --> 47:48.000
than for us,

47:48.000 --> 47:50.000
because multiple build systems,

47:50.000 --> 47:52.000
multiple compilers,

47:52.000 --> 47:54.000
multiple dependency managers.

47:54.000 --> 47:56.000
If you want to optimize the world three,

47:56.000 --> 47:57.000
for example,

47:57.000 --> 47:58.000
if Pedro,

47:58.000 --> 47:59.000
you need to pass flags, etc.

47:59.000 --> 48:01.000
It will be more difficult compared to us,

48:01.000 --> 48:02.000
but it's achievable.

48:02.000 --> 48:03.000
It's achievable.

48:03.000 --> 48:04.000
And I know some people

48:04.000 --> 48:06.000
that at least try and

48:06.000 --> 48:07.000
implement it for CMAX.

48:07.000 --> 48:09.000
But we have CMAX,

48:09.000 --> 48:10.000
we have Basel,

48:10.000 --> 48:11.000
we have Meazen,

48:11.000 --> 48:12.000
we have...

48:12.000 --> 48:16.000
We have a lot of stuff in C++.

48:16.000 --> 48:18.000
And unfortunately,

48:18.000 --> 48:21.000
there is no any major build system in C++.

48:21.000 --> 48:23.000
You can see like poles

48:23.000 --> 48:25.000
from official C++ committee

48:25.000 --> 48:27.000
and CMAX will be like

48:27.000 --> 48:29.000
one-third of the ecosystems,

48:29.000 --> 48:31.000
only one-third.

48:31.000 --> 48:33.000
Okay, thank you.

48:45.000 --> 48:47.000
How does this apply to

48:47.000 --> 48:49.000
we've talked now only about

48:49.000 --> 48:51.000
real compiled languages without

48:51.000 --> 48:53.000
any garbage collection or anything?

48:53.000 --> 48:55.000
This is applied to the...

48:55.000 --> 48:57.000
I'm talking about Pedro.

48:57.000 --> 48:59.000
Yeah, Pedro, LTO,

48:59.000 --> 49:01.000
LTO actually,

49:01.000 --> 49:03.000
so let's say,

49:03.000 --> 49:05.000
if you have an LVM-based compiler,

49:05.000 --> 49:08.000
all of the stuff is already available

49:08.000 --> 49:11.000
or easily implementable, I would say.

49:11.000 --> 49:14.000
If you're talking about custom compilers,

49:14.000 --> 49:17.000
you need to check the implementation.

49:17.000 --> 49:19.000
Yeah, I'm talking about, let's say,

49:19.000 --> 49:20.000
go or check here.

49:20.000 --> 49:22.000
Go already supports Pedro.

49:22.000 --> 49:23.000
Yeah,

49:23.000 --> 49:25.000
we're supporting on the sampling Pidgeon mode,

49:25.000 --> 49:27.000
but they know what they did.

49:27.000 --> 49:29.000
So...

49:29.000 --> 49:31.000
JPM rules?

49:31.000 --> 49:32.000
Sorry?

49:32.000 --> 49:35.000
The JPM world has just in time compilers,

49:35.000 --> 49:37.000
it's the main model,

49:37.000 --> 49:39.000
but if you want to ahead of time model,

49:39.000 --> 49:41.000
where is Gralvem?

49:41.000 --> 49:43.000
Gralvem supports Pidgeon,

49:43.000 --> 49:45.000
but unfortunately,

49:45.000 --> 49:47.000
it's insanely undocumented.

49:47.000 --> 49:49.000
And they stopped answering my questions

49:49.000 --> 49:51.000
in the upstream, after some point.

49:51.000 --> 49:53.000
Okay, thank you.