WEBVTT

00:00.000 --> 00:14.560
So we now have Andrea Rigi who's going to give a talk on rustifying the Linux kernel scheduler.

00:14.560 --> 00:23.880
So please give it up.

00:23.880 --> 00:38.680
Hello, you can hear me, I guess.

00:38.680 --> 00:44.160
So yeah, we're going to talk about scheduling kernel scheduler, but in raster.

00:44.160 --> 00:49.600
How many kernel developers are here, or people that have played with the kernel?

00:49.600 --> 00:50.600
A few?

00:50.600 --> 00:51.600
A few?

00:51.600 --> 00:52.600
Okay, that's good.

00:52.840 --> 00:55.440
Okay, you don't need to understand the kernel.

00:55.440 --> 00:58.680
That's a good thing.

00:58.680 --> 01:04.160
Yeah, so here's the agenda.

01:04.160 --> 01:09.000
I want to go to the cool stuff, but before going to the cool stuff, I need to tell you something

01:09.000 --> 01:14.680
about scheduling in general, and then we see how we can use this technology called

01:14.680 --> 01:22.160
the SKDX to basically implement a scheduler kernel scheduling rust.

01:22.160 --> 01:26.560
But first of all, what is a scheduler?

01:26.560 --> 01:33.880
So a scheduler is a kernel component that determines where each task needs to run, when, and

01:33.880 --> 01:35.760
for a long.

01:35.760 --> 01:43.960
So it seems fairly easy to conceptually speaking, but in practice it can be really hard.

01:43.960 --> 01:49.160
Scheduling is a very known trivial problem, particularly because there are different

01:49.160 --> 01:55.360
architectures, different workloads, and it's really difficult to model a scheduler that works

01:55.360 --> 02:00.120
for everyone and in every possible scenarios.

02:00.120 --> 02:06.400
So the challenges are fairness if you want to design a scheduler that is generic as

02:06.960 --> 02:14.280
possible, because you need to give each task some chances to run after a while, like

02:14.280 --> 02:18.600
you've got more than something so generic.

02:18.600 --> 02:24.560
Now, so what's the situation in Linux?

02:24.560 --> 02:31.960
The policy in Linux is to maintain just a single scheduler or one scheduler to rule the

02:32.920 --> 02:32.960
wall.

02:32.960 --> 02:41.800
We used to have CFS before six, six, now we have another scheduler called EVDF.

02:41.800 --> 02:47.480
The first one is like the CFS stands for completely fair share scheduler or completely

02:47.480 --> 02:51.920
fair scheduler, and it's based on fairness from the previous slide.

02:51.920 --> 02:58.000
So it gives each task is like it's doing way if it bandwidth allocation to task that's

02:58.000 --> 02:59.000
it.

02:59.000 --> 03:06.920
Now, we move to EVDF, EVDF is a deadline-based scheduler, deadline allows to perform

03:06.920 --> 03:13.240
better in latency, sensitive workloads, so it's for latency.

03:13.240 --> 03:18.800
But I'm not going into details of these schedulers if you're interested in some details,

03:18.800 --> 03:23.840
there's another talk tomorrow in the kernel development where I'm talking more about these

03:23.840 --> 03:29.640
schedulers, and I'll play in video games.

03:29.640 --> 03:36.320
But anyway, the fact is that there's just one scheduler, and therefore it's really difficult

03:36.320 --> 03:40.600
to conduct experiments, because if you want to do some tests or experiments with the

03:40.600 --> 03:47.840
Linux kernel, you need to patch the kernel, recompile, reboot the system.

03:47.840 --> 03:52.840
It's not always easy to do, especially in production.

03:52.840 --> 03:57.960
Even rebooting in production, if you think at large cloud environments, you reboot

03:57.960 --> 04:04.480
your utility, warm the caches, you need to have a downtime, so it's really difficult to conduct

04:04.480 --> 04:10.240
experiments, and it's really difficult to upstream changes, because you may find something

04:10.240 --> 04:17.520
that works really well for you, but if it doesn't work for anyone, your change is not

04:17.520 --> 04:21.640
going to get merged, because you may introduce regressions.

04:21.640 --> 04:28.480
So the single Linux kernel scheduler is like, there are compromises, it's trying to do

04:28.480 --> 04:35.280
the best, and it's the best generics scheduler that we may possibly have, but sometimes,

04:35.280 --> 04:39.560
you may want to relax some constraints and say, I'm willing to accept to have it like

04:39.560 --> 04:43.640
an unfair scheduler just to solve my problem.

04:43.640 --> 04:48.800
And right now, I mean, in this situation, it was impossible to do that unless you maintain

04:48.800 --> 04:49.800
your own scheduler.

04:49.800 --> 04:55.800
Our big company is that can afford that, even if it's painful for big companies, because

04:55.800 --> 05:05.720
maintaining a patch, so impactful as scheduler, it's not a joke, it's quite a tight time

05:05.720 --> 05:07.920
consuming.

05:07.920 --> 05:13.520
So here comes BPS, KDEX and BPF.

05:13.520 --> 05:20.360
The next, for those that don't know, is a new technology in the Linux kernel that allows

05:20.360 --> 05:28.480
you to implement a scheduling policy as a BPF program, and it needs to be GPLB2, by the way,

05:28.480 --> 05:31.560
it's not like a software requirement.

05:31.560 --> 05:39.880
Actually the BPF verifier would not accept your scheduler if it's not licensed as GPLB2.

05:39.880 --> 05:44.600
So BPF, for those that, I think you're all familiar with BPF, I don't want to go,

05:44.600 --> 05:45.600
no, okay.

05:45.600 --> 05:50.040
BPF is like a GIT in the kernel.

05:50.040 --> 05:55.840
You can see BPF is like kind of a virtual machine, and you can, you can write programs

05:55.840 --> 05:56.960
in C, for example.

05:56.960 --> 06:03.720
Our whole BQs, you can test things, you can test changes to the kernel without, without

06:03.720 --> 06:06.920
crashing the kernel, so it's safe.

06:06.920 --> 06:12.520
Of course, you can't write things at the trial in kernel memory, but you get read access

06:12.520 --> 06:15.160
to kernel memory.

06:15.160 --> 06:23.760
And Skatex, the leverage BPF, to give the connections to the scheduler called BX.

06:23.760 --> 06:31.360
So the scheduler in the Linux kernel is implemented as a class, or as a struct of function

06:31.360 --> 06:38.840
pointers, and those function pointers are via Skatex, the redirected to the BPF program.

06:38.840 --> 06:44.600
That's how you can write a BPF program that implements your team called BX, that are

06:44.600 --> 06:49.920
called by the Skatex, the technology, and that's how you can implement your scheduling

06:49.920 --> 06:51.920
policy in BPF.

06:51.920 --> 06:58.240
So of course the benefits of this technology are the fact that you can design custom schedulers.

06:58.240 --> 07:03.080
You can load that demo runtime, you don't need to recompile the kernel or reboot, you

07:03.080 --> 07:05.560
just start the program.

07:05.560 --> 07:12.700
And of course, this leads to rapid experimentation, usually when you test the program

07:12.700 --> 07:21.960
uses base, you know, you edit compile, run, and then edit compile run, so the turnaround

07:21.960 --> 07:28.080
of this edit compile run cycle is really fast in user space.

07:28.080 --> 07:33.200
It's really slow in kernel space because you need to reboot and whatnot.

07:33.200 --> 07:40.080
But with this technology, you have the same feeling of developing like a user space application,

07:40.080 --> 07:46.280
while instead you are actually changing kernel code, in particular the schedule.

07:46.280 --> 07:51.480
So how does it work?

07:51.480 --> 07:59.320
So there's a kernel component that runs in the kernel, kernel subsystem called Skatex.

07:59.320 --> 08:03.720
And in BPF, you just need to implement a few callbacks.

08:03.720 --> 08:11.280
For example, there's a callback called NQ that is invoked every kind of task once to run.

08:11.280 --> 08:18.400
There's a callback called dispatch that is invoked every time a CPU is ready to accept tasks

08:18.400 --> 08:19.680
and so on and so on.

08:19.680 --> 08:28.480
So implementing this callback allows you to write a program in BPF that implements a scheduler.

08:28.480 --> 08:36.640
But like I mentioned, there are restrictions because BPF, in order to provide this concept

08:36.640 --> 08:45.440
of safety, added there's this project, RAS4 Linux, that tries to bring RAS to the kernel.

08:45.440 --> 08:51.760
Here I'm trying to do the opposite, I'm trying to bring the kernel into RAS4.

08:51.760 --> 08:57.760
Because RAS4 Linux is controversial, this is even more controversial.

08:57.760 --> 09:00.000
So here's the idea.

09:00.000 --> 09:06.840
I had to give you a little introduction about Skatex and BPF just to explain this slide.

09:06.840 --> 09:14.560
Basically, so my idea was like, if we what if you use Skatex to implement a BPF scheduler

09:14.640 --> 09:21.600
that does nothing, except to bounce all the scheduling events to a user space program.

09:21.600 --> 09:29.520
Because a call feature of BPF is that the user space program that loads the BPF by code via

09:29.520 --> 09:38.400
the BPF-C scroll shares the other space with BPF and BPF provides some data structures that are

09:38.480 --> 09:45.920
called maps that can be used like a message passing interface between BPF and the user space program.

09:46.800 --> 09:54.400
So the idea is like let's write a minimal layer in BPF that just bounces the scheduling

09:54.400 --> 09:59.760
events to user space. And then the user space can take all the scheduling decisions.

09:59.760 --> 10:05.360
So you actually implement the scheduling user space. You pass the results of the scheduling

10:05.440 --> 10:11.920
back to the BPF and BPF will do the actions that are decided by the user space program

10:11.920 --> 10:19.280
pass everything back to the kernel. And in this way, we have uploaded complexity from BPF

10:19.280 --> 10:28.720
into a user space program. We can still load and unload the scheduler at runtime using the usual

10:28.800 --> 10:36.640
like BPF scheduling way, but we have a user space program now. So that's like

10:38.320 --> 10:43.520
micro-cernal vibes here because we're actually moving part of the kernel into user space.

10:44.720 --> 10:51.520
The consequence of good consequence of these is that now we have unlocked the access to any kind

10:51.520 --> 10:58.640
of libraries and languages because if the kernel scheduler is a user space program,

10:59.360 --> 11:05.120
I can write this user space program in any language. I can write it in Python, in Java, in

11:05.120 --> 11:13.600
Raster, for instance. And I can use all the user space libraries in Raster. We use crates. I can use

11:13.600 --> 11:21.360
any kind of crates and whatnot. So yeah, we have seen like the previous slide

11:21.360 --> 11:29.440
where there was only the left part. Now we have also the right part that is basically the scheduler.

11:30.160 --> 11:36.160
And what I'm doing is like the BPF scheduler will share data is only implementing this kind of

11:36.160 --> 11:43.360
message passing interface and it uses BPF which is a C library, but there's also

11:43.440 --> 11:53.120
BPF at S that is the Rust binding to BPF. So user space program, user space scheduler becomes

11:53.120 --> 12:04.400
the program down here. Where am I here? Down here. And yeah, and this is like a regular Rust program.

12:04.480 --> 12:14.160
I can literally cargo build this thing and run and it will replace the internal scheduler

12:14.160 --> 12:23.520
with my program. And so as a proof of concept, I implemented a scheduler that is called Rustland.

12:25.360 --> 12:30.880
Because initially I implemented there was a scheduler called user land that was written in C.

12:31.520 --> 12:37.920
So I decided to do the same with Rustland and it's called Rustland. The scheduler itself,

12:37.920 --> 12:44.880
okay, so deadline-based scheduler. So it's better for latency. It's not fair.

12:46.000 --> 12:53.920
In a position to the, to the internal scheduler, it's very unfair. It prioritizes latency

12:53.920 --> 13:00.480
sensitive work to task a lot. But that's like the implementation of the scheduler is like

13:00.480 --> 13:06.400
it's a secondary goal. I just wanted to prove that it was possible to implement a scheduler,

13:06.400 --> 13:11.840
a kernel scheduler in user space using this technology. So this is more like a proof of concept.

13:11.840 --> 13:19.120
It's supposed to be a proof of concept. And I tested this with like a video game because

13:19.760 --> 13:25.840
I was like, well, with your game is cool. Let's see how fast a video game can go if I replace the

13:25.840 --> 13:33.680
Linux scheduler with this complete mess that is moving everything to user space. And like I was expecting

13:33.680 --> 13:42.000
to do like, I don't know, 5 FPS or something like that. But then I actually posted this video.

13:42.080 --> 13:49.680
I was showing that this unusual, unusual workload where I'm playing through Raria, it's a video game,

13:50.720 --> 13:56.560
while building the kernel, which is not something that you usually do because if you're a gamer,

13:56.560 --> 14:03.840
I mean, you want to do gaming unless you're also a kernel developer and you'll be the kernel in the background.

14:04.800 --> 14:10.960
But yeah, so the part on the left, the EVDF is the kernel is the ink kernel scheduler,

14:10.960 --> 14:18.640
the default Linux scheduler. And there's this video that's also available on the schedule with website.

14:18.640 --> 14:26.480
And shows that I mean, the game is very choppy and it's like 26 FPS, but it's not just choppy,

14:26.480 --> 14:33.360
it's also very inconsistent. So it goes from 10 FPS to 40 FPS, sometime 30, 20.

14:34.240 --> 14:40.400
So it's really a really bad gaming experience. And with the user space scheduler, with the

14:40.400 --> 14:49.040
Rask scheduler is doing 60 FPS, which is insane. But the thing is like, it's just the scheduling algorithm,

14:49.040 --> 14:55.920
that is different, and it's overly prioritizing latency sensitivity work. So it's just the scheduling,

14:55.920 --> 15:03.520
but that was to prove that not that the schedule is better because it's better for this particular use case.

15:03.520 --> 15:12.880
It's worse for everything else. But the point was it's possible to implement user space schedulers

15:12.880 --> 15:19.840
that can outperform the internal scheduler, which is an interesting thing. Like, and this one,

15:19.840 --> 15:25.200
you don't have to recompine the kernel, you just insert any, and it works. I was actually planning

15:25.200 --> 15:35.360
to show you a demo if I can. Let's see if I can. So we have this fish tank. This is the default scheduler.

15:37.120 --> 15:46.320
Let's see what happens if I start. Do you see it? Let me try to be in the kernel.

15:46.400 --> 16:00.400
That's the same example, but showing live, it's more cool. You see the FPS goes 9 FPS, it's really bad.

16:02.080 --> 16:12.000
Now, I should have here. Sorry, I prepared everything at advance, but I had to reboot.

16:12.000 --> 16:33.200
So, okay, let's do, so Rastland is the scale. Okay, now let's go back to, okay, I start the

16:33.280 --> 16:42.800
build. I'm doing 10 FPS. Now, I just start this program down here, which is the, okay,

16:42.800 --> 16:47.440
now the Rast scheduler is going, and you see, it's going in 40 FPS.

16:55.440 --> 17:02.800
You know, I don't care if the schedule is not fair. If I stop the program,

17:04.080 --> 17:12.080
you see, it's going back to the bad performance. Start the program again, and it's going back to, so

17:13.120 --> 17:18.720
I can, if you imagine doing these by changing also the schedule, this is really powerful,

17:18.720 --> 17:26.400
because you can literally do this in production. If I intentionally insert a bug in my scheduler,

17:27.360 --> 17:33.680
there's, so there are two things. There's a very fire that if there's a memory problem,

17:33.680 --> 17:38.880
it won't load my scheduler. So I can see, and see, it's actually faster now, that's the,

17:38.880 --> 17:45.840
it's which is live. Now, but the thing is, so why is it better? What is happening? Is it,

17:45.840 --> 17:53.840
because Rast magic? No. Yeah, yes, some of them will say, yes, I wish I could say yes,

17:53.840 --> 18:01.120
but unfortunately, the answer is, it's not really. So the trick is in the algorithm, right? It's

18:01.120 --> 18:06.160
not in the language. The algorithm is different, and you can see here, this is a, this is a

18:06.160 --> 18:11.440
professor, a big shout out to Google for making this tool, which is amazing. It helps you,

18:12.000 --> 18:17.600
tracks all, what happens? Like what's on the, on the Y axis, you see all the CPUs, on the X axis,

18:17.680 --> 18:26.080
you see the tasks. The yellowish one is the area, it's like the main thread, and you can see that

18:27.920 --> 18:33.680
the top, that's the kernel scheduler, is trying to be fair with all the tasks that are running,

18:33.680 --> 18:40.880
so all the, all the clang, that are compiling the kernel, and also the main terraria task,

18:40.880 --> 18:47.840
are, you know, it's fair, it's a fair scatterer. So there are certain logics, like it's still

18:47.840 --> 18:55.680
deadline based, so there are certain logic to prioritize latency for the terraria task, and you can

18:55.680 --> 19:04.960
see, you know, periodically it gets some, some CPU time, but down below, Rastlin is just doing

19:05.040 --> 19:12.080
a lot of time to the main terraria thread compared to the other tasks. It's also a lot more

19:12.080 --> 19:18.960
bouncy, like it constantly bouncing the task here and there, which probably is not a good idea,

19:18.960 --> 19:25.200
because for casual quality this is bad, but, you know, it's trying to find all the possible

19:25.200 --> 19:30.960
freeze lots, and it's very work-conservant, so it's as soon as there a CPU available,

19:31.840 --> 19:39.840
you can see that it's trying to schedule the main game thread into the CPU that is available,

19:41.040 --> 19:48.880
so that that's why it's working better, but like what's the benefit of Rastlin, what's the

19:48.880 --> 19:55.920
powerful thing about Rastlin? Because I thought, okay, this is a cool idea, why don't I

19:55.920 --> 20:07.520
generalize this, and why don't I make like a crate that people can use to design, to write

20:07.520 --> 20:15.760
their own scheduler, so I generalize the backend of this Rastlin scheduler, and I implemented

20:16.000 --> 20:29.920
a Rastrate, a CX Rastlin Core, which is available in crates.io, and you can use this, like, you don't have

20:29.920 --> 20:39.040
to learn like all these Skydex and BPF boilerplate, it provides an easier API, and the cool thing

20:39.040 --> 20:45.440
is that you can literally like cargo unit your project, you use this crate, and I also wrote a

20:45.440 --> 20:54.320
template, so you can literally get, it's on the top, you can get clone the template, and yeah,

20:54.320 --> 21:02.400
just cargo build that, and the template, the template implements a five-scadler using the Rastlin

21:02.400 --> 21:08.800
Core crate, so it's a really easy to understand template, and it is where you don't have to

21:08.800 --> 21:17.440
learn anything about the underlying layers of Skydex and BPF kernel, it's a very abstracted interface,

21:18.480 --> 21:30.000
and I think I have, okay, that's the design, basically, it's more in details, we have the

21:30.320 --> 21:41.200
the coldbacks here that are implemented, like this is the BPF part, and the crate contains a

21:41.200 --> 21:47.600
back end, which implements the BPF part, and then it's using the BPFRS to implement a front end,

21:48.480 --> 22:03.440
and your program is here. Whoops, yeah, so this is cool, I wanted to show, like this is a

22:03.440 --> 22:11.840
working five-scadler in SCX, implemented in SCX Rastlin Core, which fits in as light, so I thought

22:11.920 --> 22:19.440
that was a cool achievement, this is a kernel scheduler, like we can run this, and it's like,

22:19.440 --> 22:36.320
yeah, I can show you this, like it's in Rastlin scheduler,

22:36.320 --> 22:51.920
so this program that runs here, it's printed some statistics, but basically it's the code is this one,

22:52.800 --> 22:59.360
and right now we are doing this presentation, running a five-scadler implemented in Rast,

22:59.680 --> 23:06.880
and what it's doing is just, you know, the Q task is just consuming task that wants to run,

23:07.600 --> 23:14.880
select CPU is telling the back end, like give me a CPU that is available, is idle, and I assign

23:15.440 --> 23:24.640
the task, the CPU to the task, if select CPU doesn't give me a CPU, I say, I tell the back end,

23:24.720 --> 23:31.680
just dispatch this task in the first CPU available, then I assign a slide times lies that is

23:31.680 --> 23:38.080
inversely proportional to the amount of tasks that are waiting in the waiting to be scheduled,

23:38.080 --> 23:45.520
and then I dispatch a task in order, that's it, like that's a kernel scheduler written in Rastlin

23:45.520 --> 23:55.680
that fits in a slide, I think it's pretty cool, and yeah, if you have questions,

23:59.280 --> 24:07.040
actually, no, these are the cakeaways. I told this already, like the important part here is

24:07.760 --> 24:12.480
Rastlin is not a better scheduler in general, it's just a proof of concept to show you the

24:12.560 --> 24:19.760
potential of this technology that you can use, literally using production to do scheduling testing,

24:20.800 --> 24:27.120
which never happened before, because scheduler was a very monolithic part in the Linux kernel

24:27.120 --> 24:34.400
difficult to change. Rast itself doesn't make the scheduler better, but having access to Rast,

24:34.400 --> 24:42.400
especially in user space, allows you to use the whole language without any restriction,

24:43.200 --> 24:49.680
it's not like Rastlin, that still has restriction, it needs the proper abstraction to be implemented

24:49.680 --> 24:55.280
in the kernel as kernel code, this one you're actually using space, so you don't have the Rastlin

24:55.280 --> 25:02.080
of restrictions, it's easy to experiment, because it's like writing, I mean, the five

25:02.080 --> 25:06.960
scheduler fits in as light, and you just need to compile and run and stop it whenever you want,

25:08.800 --> 25:13.920
and potentially you can import any other create that you have access with a regular user space

25:13.920 --> 25:21.760
program, you can import libraries, maybe if you're crazy enough you can plug an entire AI or a

25:21.760 --> 25:28.080
large language model in this program that decides your scheduling, it's going to be more

25:28.080 --> 25:34.800
CPU expensive than the task that you're trying to run, but I mean, that's like, yeah, let's get

25:34.800 --> 25:41.920
x, there's a logic that says, if nothing is running, this patch the user space scheduler in the

25:41.920 --> 25:50.080
first CPU available, and you know, the CPU user space scheduler will have like a queue internally,

25:50.080 --> 25:54.880
which all with all the tasks that are waiting to run, so it's scheduled by the BFF program,

25:54.960 --> 26:02.000
it runs, it tells the BFF program like, okay, this guy goes here, this guy goes here, then go,

26:02.880 --> 26:10.240
and then BFF tells KDEX the results and the scheduling happens, but it's a good question, it's the

26:10.240 --> 26:14.560
BFF backend that decides to schedule the user space program.

26:18.240 --> 26:22.320
How can you tell that terraria has tighter deadlines than everything else?

26:23.280 --> 26:30.320
Sorry, I can tell that the microphone is for the video, but you still need to speak up, so, okay,

26:30.320 --> 26:37.280
so how can you tell that terraria has tighter deadlines than compiling the kernel?

26:38.880 --> 26:47.120
Okay, the thing is like, usually tasks that voluntarily releases the CPU before their times

26:47.200 --> 26:53.360
lies expires, their latency sensitivity, so let's say video games are very seek-lic and

26:55.360 --> 27:01.200
all the words are very periodic, usually in this case, you know, you're supposed to render frames

27:01.200 --> 27:10.000
at 60 FPS, so every one over 60 seconds, which is like 60 milliseconds, every 16 milliseconds

27:10.000 --> 27:16.960
a task needs to run, usually the composer or ex-wailant needs to run and it needs to draw a frame,

27:16.960 --> 27:24.640
send a frame to the video card, and then it's just leaping, so latency sensitivity tasks are

27:24.640 --> 27:33.520
working in bars of CPUs and days leap, so if you find, if you model a deadline that prioritizes

27:33.600 --> 27:40.960
this task that has this leap behavior, and you can do that by, for example, that there are

27:40.960 --> 27:49.600
many ways, one is measuring out the average runtime between a wake up and asleep, if this average

27:49.600 --> 27:57.200
time is very short, you're probably facing latency sensitivity tasks, and you can model the deadline

27:57.200 --> 28:04.640
as, like, you can use a Viren time, which is a proportional fair share, and you add, for example,

28:04.640 --> 28:11.040
the average of this average runtime, so if a task is using a small amount of time, it would get

28:11.040 --> 28:17.600
a smaller deadline, and if a task is, like, a client that is building the kernel, it would use

28:17.600 --> 28:26.000
its entire CPU, it's entire time's lies, so its deadline will be longer, and with this trick you

28:26.000 --> 28:32.640
can prioritize the lift and sensing, it doesn't work every time, but that is the best, I mean,

28:32.640 --> 28:38.240
you would need to predict the future to write the best deadline based algorithm, but that's the best you can do.

28:40.960 --> 28:44.320
I imagine that you need some context switches between the VPS,

28:44.320 --> 28:48.240
well, I won't be able to understand this thing, okay, I'm coming here, it's easy.

28:49.280 --> 28:55.280
So I imagine that you need some context switches between the kernel space VPS, and the user space

28:55.280 --> 28:58.640
scheduler, right? So how much of an overhead is it?

28:58.640 --> 29:07.520
Yeah, so, yeah, it's a good question. Of course, the user space scheduler is a task in its own, so

29:07.520 --> 29:13.040
there are context switches when you need to do scheduling, when you need to do scheduling decisions.

29:14.400 --> 29:19.040
I mentioned, like, almost at the beginning that the scheduler itself is not a

29:19.040 --> 29:27.520
CPU-intensive application, it works almost like a latency-sensitive task, that means,

29:27.520 --> 29:32.480
when it needs to run, it needs to run, and then it just stops, it just, you know, the decisions

29:32.480 --> 29:38.320
that the scheduler makes are very small, you know, you need to say, okay, this guy goes here, this

29:38.320 --> 29:44.480
guy goes here, and then I go to sleep. So, the context switches at the end are not,

29:45.440 --> 29:53.840
they don't impact much on the scheduler itself, but they can impact on the user space applications

29:53.840 --> 30:02.960
that are running, specifically more interruptions you have, like if you decide to use a fine grain

30:02.960 --> 30:08.960
times less, like a smaller times less, you would get more context switches, but this is,

30:08.960 --> 30:14.240
it doesn't depend on the schedule, it's based on how small you define your times less, right?

30:14.800 --> 30:22.720
So, a shorter times less would give you probably better performance in terms of responsiveness,

30:22.720 --> 30:28.560
but you also have more over it, so the throughput would be worse, right?

30:30.400 --> 30:36.080
And that, you know, the scheduler here is just yet another task that is running in the system,

30:36.080 --> 30:39.200
but you have the same problem with the synchronous scheduler, because

30:40.160 --> 30:45.440
at some point the synchronous scheduler would also run with the same constraint.

30:52.800 --> 30:58.000
While running the scheduler, the same happens with the synchronous scheduler,

30:58.960 --> 31:05.520
because, you know, you have, okay, you're like, I'm not switching to another task,

31:06.320 --> 31:12.960
oh, yeah, I see what you mean, yeah, some switching to a different task, yes, that's correct.

31:14.160 --> 31:20.880
So, potentially that could be, like the contest which penalty is bigger with this approach, yes, that's correct.

31:23.520 --> 31:30.960
So, I have a question with regards to, yeah, okay, okay, I can speak louder.

31:31.040 --> 31:39.360
How do you ship a six-risk land curve, for example? So, there are two questions behind.

31:39.360 --> 31:44.400
Your BPS program, is it, is it written in rest or not, or is it to see?

31:44.400 --> 31:48.800
No, okay, that's another question, okay, yeah, that's a question, and second question,

31:49.920 --> 31:53.440
do you ship a blob in the crate, or is it built, how do you build it?

31:53.440 --> 32:03.040
Yeah, so the BFF part is currently written in C, and what I ship is, is the dot C,

32:04.080 --> 32:11.520
which is compile, like there's a build dot RS that calls Clang and compiles the BFF part,

32:12.160 --> 32:18.880
and then it becomes BFF by code in your machine when you build it, and that's how it works.

32:19.360 --> 32:25.680
Now, I'm looking at, there's a library called Aya that generates BFF code that actually,

32:25.680 --> 32:31.200
so I was investigating if that would be probably better, because everything would be raster,

32:31.200 --> 32:35.840
and then BFF, but that's like, that's the bike code, is the backhand.

32:37.760 --> 32:44.800
And, yeah, probably it's not like a super polished crate, I'm sure there's something that is not.

32:44.880 --> 32:50.320
Even if it passes like the cargo test and everything, but I'm not sure if it's properly polished,

32:50.320 --> 32:58.160
because I'm shipping the dot C as an artifact that is included in the crate, which is compiled

32:58.160 --> 33:07.280
when you build your schedule. But, for any reason, if the user space schedule is blocked,

33:08.000 --> 33:13.200
the whole schedule, the system won't progress, everything would be dead blocked and blocked.

33:13.840 --> 33:19.360
One thing is like page fault, let's say you do our locations in the user space

33:19.360 --> 33:25.360
scheduler, and you eat the page fault. In order to resolve the page fault, you need to schedule

33:25.360 --> 33:31.520
a kernel thread, but the scheduler is the guide that is blocked, so there's a dead look. That's

33:31.520 --> 33:40.800
a problem, but that was solved. I solved this one using by, I'm locking all the memory of the user space

33:40.800 --> 33:45.760
scheduler, so the backhand is actually unlocking all the memory. You can do a location, it's just,

33:45.760 --> 33:52.480
they will happen in an arena of pre-allocated memory, and I'm using the, how's it called,

33:52.480 --> 33:58.160
the global alok, which is an abstraction that allows you to read the direct, all the

33:58.160 --> 34:02.400
locations. That was really cool, I didn't know that. I learned this in raster, I was like,

34:02.400 --> 34:08.880
oh, raster is the best, because I can read the direct, like, all the locations into whatever

34:09.200 --> 34:15.920
algorithm I want to use to do our locations, and I did this trick to solve the page fault problem.

34:19.040 --> 34:27.120
Okay, if you have further questions, please find Andrea, in the hallway, or if you don't have

34:27.120 --> 34:30.240
a hallway, I guess. But thank you very much.

