WEBVTT

00:00.000 --> 00:07.000
So, hey, I'm Daniel Amada.

00:07.000 --> 00:08.000
Thanks for having me here.

00:08.000 --> 00:12.000
It's the first time I'm ever in the graphics bedroom.

00:12.000 --> 00:14.000
I'm here today to talk about here.

00:14.000 --> 00:18.000
There's a new Rust GPU driver that would have been writing.

00:18.000 --> 00:21.000
And when I say we, there's me, Daniel.

00:21.000 --> 00:26.000
Some two people from Google, there's one lady from arm.

00:26.000 --> 00:30.000
And there's some volunteers as well, contributing code.

00:30.000 --> 00:33.000
So, this is a Rust GPU driver as I just said.

00:33.000 --> 00:42.000
So, what I want to start with is by talking a bit about what is a GPU kernel driver,

00:42.000 --> 00:43.000
right?

00:43.000 --> 00:47.000
And versus what is a user mode GPU driver?

00:47.000 --> 00:51.000
The difference is between the two because usually people get it confused

00:51.000 --> 00:54.000
and they think we're working on the user mode driver

00:54.000 --> 00:56.000
when we are not.

00:56.000 --> 01:01.000
And also, I'd like to begin with a little bit of the story behind this project.

01:01.000 --> 01:10.000
So, basically arm approached us to try and get a Rust driver done.

01:10.000 --> 01:17.000
At that point, they were experimenting with having a driver that was basically have written in C

01:17.000 --> 01:19.000
and then have written in Rust.

01:19.000 --> 01:21.000
And that did not go anywhere.

01:21.000 --> 01:27.000
So, we basically start over from a clean slate with a driver completely written in Rust.

01:27.000 --> 01:31.000
And after that, Google also joined the project.

01:31.000 --> 01:36.000
They're also contributing one or two engineers today.

01:36.000 --> 01:41.000
And the reason why we're using Rust is that turns out that it's fairly easy

01:41.000 --> 01:43.000
to compromise a GPU driver.

01:43.000 --> 01:47.000
And then, by compromising the GPU driver, you take over the entire system.

01:47.000 --> 01:52.000
And then, basically, you can steal the user's personal data, et cetera, et cetera.

01:52.000 --> 01:56.000
So, it seems like the GPU driver is really inoffensive,

01:56.000 --> 02:00.000
but actually it's one of the most targeted attack vectors.

02:00.000 --> 02:06.000
So, during the stock, we have to briefly talk about how GPUs work.

02:06.000 --> 02:13.000
Then we will discuss a little bit how the kernel driver exposes the API to the user space.

02:13.000 --> 02:16.000
We'll talk about a bit about job submission.

02:16.000 --> 02:19.000
And that will conclude the first part of what is this?

02:19.000 --> 02:21.000
And how does this work?

02:21.000 --> 02:24.000
And then the second part is, what is ahead of us?

02:24.000 --> 02:34.000
What's because my point here is to try to try to lay out some plan for the future.

02:34.000 --> 02:36.000
And this is like, what is ahead of us?

02:36.000 --> 02:38.000
How is the prototype looking like?

02:38.000 --> 02:42.000
What's the plan for this year and the next year?

02:43.000 --> 02:47.000
So, again, briefly, how would GPU work?

02:47.000 --> 02:50.000
Again, I'm just simplifying a few things here.

02:50.000 --> 02:57.000
But you're basically trying to have the geometry like this one,

02:57.000 --> 03:00.000
where you can have more complex shapes like this.

03:00.000 --> 03:04.000
And this, textures, which will give a shape,

03:04.000 --> 03:08.000
some appearance of a real-life object.

03:08.000 --> 03:12.000
And then shaders, for example, where their programs running on the GPU,

03:12.000 --> 03:17.000
you're going to compile that down to some machine code that can execute on the hardware.

03:17.000 --> 03:23.000
And then everything is going to be, you know, place inside some buffers in GPU memory.

03:23.000 --> 03:32.000
And like the vertex data, the texture data, your command stream has to be place inside of a buffer object as well,

03:32.000 --> 03:34.000
the shader machine code.

03:34.000 --> 03:42.000
And then, at some point, you basically record your commands into a software queue like a VKQ.

03:42.000 --> 03:45.000
And then that has to be backed by something.

03:45.000 --> 03:49.000
And there's something in this particular case for our molly is tier.

03:49.000 --> 03:51.000
Is the kernel driver, right?

03:51.000 --> 03:55.000
So, for an application like VKQ, for example,

03:55.000 --> 04:01.000
VKQ will be, you know, taking the geometry for a cube, the shaders, everything.

04:01.000 --> 04:05.000
By being that thing to Vulkan, Vulkan will eventually, you know,

04:05.000 --> 04:09.000
build a command buffer, place that command buffer.

04:09.000 --> 04:12.000
I'm sorry, the application will place the command buffer in the VKQ.

04:12.000 --> 04:17.000
And then the driver will take over, basically, after this point.

04:17.000 --> 04:24.000
So, in terms of, you know, how the code is structured, much of the code is actually implemented at the user mode level.

04:24.000 --> 04:30.000
So, when I say user mode level, I'm referring to Mesa for those of you who were here in the talk in the,

04:30.000 --> 04:32.000
to talk to go, basically.

04:32.000 --> 04:39.000
So, mostly Mesa, which is where the shader compiler lives, which is where the Vulkan driver lives,

04:39.000 --> 04:41.000
the jail driver lives.

04:41.000 --> 04:47.000
So, the majority of the actual stack is implemented at the user mode level.

04:47.000 --> 04:56.000
And then the kernel mode driver is just a small layer that will do things that user mode drivers cannot do.

04:56.000 --> 05:02.000
Basically, sharing the hardware between different people who are accessing the GPU, for example,

05:02.000 --> 05:09.000
ensuring isolation, bringing up power, for example, or clock, so things of these nature,

05:09.000 --> 05:15.000
allocating memory, things that the user mode driver cannot do on its own,

05:15.000 --> 05:17.000
because it requires more privilege.

05:17.000 --> 05:22.000
So, as I said, the user mode driver will be implementing the API, like Vulkan or GL,

05:22.000 --> 05:26.000
that will be also compiling the shaders, et cetera, et cetera.

05:26.000 --> 05:32.000
And then, eventually, has to talk to the kernel driver thing, where the kernel driver is basically

05:32.000 --> 05:35.000
making sure that the user mode driver is implementable.

05:35.000 --> 05:42.000
How, by basically, again, allowing for the user mode driver to locate the memory,

05:42.000 --> 05:46.000
providing a way for the user mode driver to access the GPU's ring buffer,

05:46.000 --> 05:51.000
which is where commands are going to be placed in, doing dependency management,

05:51.000 --> 05:58.000
for example, there is a lot of most jobs they will depend on a series of previous jobs,

05:58.000 --> 06:03.000
and then it's up to user space to build as dependency graph, and then tell the kernel about

06:03.000 --> 06:08.000
this dependency graph, such that when it comes at the time to execute a job,

06:08.000 --> 06:12.000
a job is not executed before its dependencies are done.

06:12.000 --> 06:17.000
So, basically, the kernel driver is the one keeping track of this dependency graph,

06:17.000 --> 06:20.000
and making sure that the dependency graph is followed.

06:20.000 --> 06:24.000
Power management, debugging facilities, because when things break,

06:24.000 --> 06:30.000
usually the kernel driver will be able to give you some information

06:30.000 --> 06:34.000
to dev core dump, because it has the privilege of just to do so.

06:34.000 --> 06:38.000
When the GPU breaks, I'm sorry, when the GPU goes down, because I don't know,

06:38.000 --> 06:42.000
it's crashed at the kernel driver is the one that can restart it,

06:43.000 --> 06:47.000
and again, the kernel mode driver is just the bridge between,

06:47.000 --> 06:53.000
like a much larger user mode driver and the overall hardware.

06:53.000 --> 06:57.000
So, for example, just to go into a little bit more detail of what's here is doing.

06:57.000 --> 07:01.000
So, GPUs nowadays, they can provide their own vision of,

07:01.000 --> 07:08.000
they can isolate different contexts using virtual memory through the IOMMU,

07:08.000 --> 07:11.000
much like we do for CPUs, for example.

07:11.000 --> 07:14.000
And this is something that the kernel driver is actually managing.

07:14.000 --> 07:19.000
So, the kernel driver is the one programming the IOMMU in the system,

07:19.000 --> 07:24.000
and basically making sure that one application cannot access the memory

07:24.000 --> 07:27.000
from another application, does it be a memory from another application?

07:27.000 --> 07:32.000
Again, we give the user mode driver the ability to say what he wants,

07:32.000 --> 07:36.000
where he wants things to be mapped and unmapped,

07:36.000 --> 07:40.000
but the kernel mode driver is in charge of programming the actual hardware,

07:40.000 --> 07:42.000
and the page tables.

07:42.000 --> 07:47.000
Synchronization, so basically the kernel mode driver is in charge of saying

07:47.000 --> 07:51.000
of letting user space know when jobs are done.

07:51.000 --> 07:56.000
So, again, so that the user mode driver doesn't really wait forever.

07:56.000 --> 08:03.000
Some resources are shared between multiple users like the GPU driver,

08:03.000 --> 08:09.000
the display driver, et cetera, et cetera.

08:09.000 --> 08:12.000
I've already talked about this.

08:12.000 --> 08:14.000
I've already talked about the dependency graph,

08:14.000 --> 08:17.000
where the user mode driver will be building those dependency graph,

08:17.000 --> 08:23.000
and then tier will be making sure that these dependencies are actually respected.

08:23.000 --> 08:27.000
And we're recovery, again, we've already spoken about this,

08:27.000 --> 08:28.000
bar management.

08:28.000 --> 08:31.000
The kernel is the only thing that can like bring up regulators,

08:31.000 --> 08:35.000
and basically when the device goes idle,

08:35.000 --> 08:40.000
again, it's the job of the kernel mode driver to actually lower a little bit

08:40.000 --> 08:43.000
of the clocks, and that's such as what you don't use as much power,

08:43.000 --> 08:46.000
and unfortunately, tier cannot do any of that in the moment.

08:46.000 --> 08:51.000
But it's blend, also debunked facilities we've talked about this.

08:51.000 --> 08:56.000
So, the API for tier is much smaller, again,

08:56.000 --> 08:58.000
than the API for user space.

08:58.000 --> 09:04.000
Because, again, user space is actually covering a full API like Vulcan,

09:04.000 --> 09:07.000
or OpenGL, but tier is basically just this.

09:07.000 --> 09:09.000
These are the main APIs that we have.

09:09.000 --> 09:14.000
So, creating and allocating GPU memory is the top row.

09:14.000 --> 09:20.000
Enforcing isolation between different applications using virtual memory regions,

09:20.000 --> 09:23.000
which is what the VM create and VM bind I often do.

09:23.000 --> 09:29.000
And then giving user space access to the GPUs ring buffer

09:29.000 --> 09:32.000
by creating an execution context.

09:33.000 --> 09:36.000
So, basically allocating GPU memory,

09:36.000 --> 09:41.000
a job submission, and enforcing isolation.

09:41.000 --> 09:48.000
So, here is the overall picture for what a GPU kernel mode driver looks like.

09:48.000 --> 09:52.000
We have the UAPI which is communicating with user space.

09:52.000 --> 09:57.000
And then, again, as I said, scheduling jobs is one of the major responsibilities.

09:57.000 --> 10:01.000
So, we have a component, a shared component between multiple drivers

10:01.000 --> 10:06.000
in DRM called the DRM GPU scheduler, which is doing this dependency tracking thing.

10:06.000 --> 10:12.000
We've been talking about there's one component, one shared component called DRM GPU VM,

10:12.000 --> 10:18.000
which is doing this isolation between multiple applications that we've been talking about.

10:18.000 --> 10:24.000
Allocating GPU memory is done through this gem infrastructure within DRM.

10:24.000 --> 10:27.000
And then, below that, we have the actual hardware.

10:27.000 --> 10:31.000
We have the VRAM, we have the firmware scheduler,

10:31.000 --> 10:34.000
we're going to be talking a little bit more about the firmware scheduler,

10:34.000 --> 10:36.000
and then the actual course.

10:36.000 --> 10:47.000
And on top of that, our Mali has its own firmware assisted jobs scheduling system,

10:47.000 --> 10:49.000
basically inside of it.

10:49.000 --> 10:52.000
And that's basically implemented through a microcontroller unit.

10:52.000 --> 10:57.000
So, this microcontroller, basically in tier, we never really interact with any of the course.

10:57.000 --> 11:01.000
We talked to this microcontroller unit through a shared memory region,

11:01.000 --> 11:06.000
and then in this microcontroller unit, we can basically allocate a ring buffer

11:06.000 --> 11:12.000
where we can place command buffers in, and then we can tell the microcontroller unit to execute that.

11:12.000 --> 11:18.000
At which point it will take over, and then tell tier when the work is done.

11:18.000 --> 11:25.000
So, again, if we go back to Vulcan, for example, we have command buffers being recorded into Qs.

11:25.000 --> 11:31.000
Then tier will back these Qs with what we call CFF groups,

11:31.000 --> 11:35.000
which is just one view into the GPUs ring buffer, basically.

11:35.000 --> 11:40.000
And then eventually once you place your command buffers inside of this ring buffer,

11:40.000 --> 11:45.000
the microcontroller will take over, the work will eventually get scheduled inside of the ring buffer,

11:46.000 --> 11:51.000
and then the firmware will eventually raise an interrupt to tell you, hey, this job is done.

11:51.000 --> 11:59.000
At which point you can tell user space, because again, the kernel mode driver is the one that has to tell user space when work has completed.

11:59.000 --> 12:05.000
So, this is basically what a kernel driver does.

12:05.000 --> 12:12.000
And this is not a specific to rust, but to all general kernel drivers.

12:13.000 --> 12:17.000
So, where are we with the rust driver and particular?

12:17.000 --> 12:26.000
And I don't know how many of you have seen my presentation at XDC, where I presented some to the three months ago,

12:26.000 --> 12:32.000
and basically back then we didn't have a functional prototype, but now we have,

12:32.000 --> 12:35.000
and this thing is actually running tier.

12:35.000 --> 12:41.000
So, this is being rendered using the rust driver at this moment, so it's running, it hasn't crashed so far.

12:41.000 --> 12:49.000
We ran this thing for three days during plumber,

12:49.000 --> 12:54.000
Linux Plumber's Conference in December, and we had, so there's this board here.

12:54.000 --> 13:00.000
I don't know how many of you can see, I'm not going to touch it, because imaging sits.

13:00.000 --> 13:07.000
So, we had controllers, actually let's plug it into the board, and then people were playing supertips cart, you know, the cart game,

13:08.000 --> 13:11.000
and people were playing against each other for three days and they didn't crash.

13:11.000 --> 13:18.000
So, at this point, I'm somewhat, I believe somewhat that this thing isn't going to crash on us, and he hasn't so far.

13:18.000 --> 13:23.000
So, basically, we have a prototype, but yeah, what now, right?

13:23.000 --> 13:32.000
We have two different strategies at the moment, so we have this thing downstream, which we basically wanted to ensure and tell the community

13:32.000 --> 13:39.000
that, hey, we can get a rust driver done that can be, as before, and as a C driver,

13:39.000 --> 13:45.000
and we wanted to show it with, you know, we wanted to show people something real.

13:45.000 --> 13:51.000
So, we did this downstream driver, we, again, we demoed it at plumber, et cetera, et cetera.

13:51.000 --> 13:57.000
We didn't do enough benchmarking at this point, because there's a lot of moving targets on the table,

13:57.000 --> 14:05.000
but we ran a few games and the performance between tier and the C driver called Panther is roughly similar.

14:05.000 --> 14:10.000
And by roughly similar, what I'm trying to say is you get roughly the same amount of frames,

14:10.000 --> 14:18.000
but we did not do actual benchmarking, which would be the right thing to do, but we're going to get there in the future.

14:18.000 --> 14:26.000
And then we have the upstream effort, which is basically the most important thing at this moment, right?

14:26.000 --> 14:30.000
Because it's no good to have a simple downstream driver like this.

14:30.000 --> 14:35.000
Well, we want to do is to have this thing basically be a part of the Linux kernel.

14:35.000 --> 14:41.000
And for that, we will have to figure out how to would stream what we currently have inside of the kernel.

14:41.000 --> 14:51.000
And as I've been saying, I've explained this a couple of slides ago, writing a GPU kernel driver requires a lot of shared components.

14:51.000 --> 15:03.000
I told you guys not when we were discussing GPU VM, the GPU scheduler in the picture that had the basically tier in a nutshell, something like that.

15:03.000 --> 15:09.000
And we need to have abstractions for the shared components for all drivers, not only tier.

15:10.000 --> 15:27.000
And they have to work, of course, for all drivers. So in particular nowadays we have Nova, which is, you know, been written by the NVIDIA people, which is a massive driver for NVIDIA hardware, which is going to support tiering and up.

15:28.000 --> 15:37.000
So I guess, Turing starts from 20XX series of NVIDIA GPUs, if I'm not mistaken. So Nova is going to support that and up.

15:37.000 --> 15:48.000
So a lot of hardware, a very complex driver, there's also a Zahi, who was initially written by Azahilina, right?

15:48.000 --> 15:53.000
And nowadays, the Azahil people have taken over the project because you left.

15:53.000 --> 16:05.000
And they're also trying to upstream the driver. So there's basically three of us, and RKVM, RVKMS, which is a mode setting driver, also written in Russ. So there's four drivers.

16:05.000 --> 16:10.000
And they're all going to use, you know, these shared infrastructure.

16:10.000 --> 16:18.000
So our plan initially was, let's get this infrastructure up, same, right? Because if we don't have the infrastructure, there's going to be shared by everybody.

16:18.000 --> 16:31.000
We cannot have our driver. It's a simple as that. So we basically have most of the DRM stuff upstream, so you can have, you can have a DRM device, a rendering node, show up.

16:31.000 --> 16:39.000
We have clocks and regulators so that you can bring the GPU hardware up because you need power and clocks signal for that.

16:39.000 --> 16:45.000
We need RKVM as I told you before, we need RKVM used to know when jobs are done, for example.

16:45.000 --> 16:52.000
And it's the main way through which we communicate with the microcontroller is through interrupts.

16:52.000 --> 16:58.000
And we also need work used for a lot of reasons.

16:58.000 --> 17:09.000
All of these things are downstream, unfortunately. And this is a problem, because for as long as these things are downstream, we cannot upstream or work because it works depends on this.

17:09.000 --> 17:15.000
So allocating GPU memory nowadays is not possible in the upstream Linux kernel.

17:15.000 --> 17:25.000
The reason being does gym, SHM, there's no way to communicate with that in Rust today in an upstream code.

17:25.000 --> 17:32.000
If there's no way to communicate with the gym framework, again, there's no way to allocate GPU memory.

17:32.000 --> 17:41.000
If there's no way to allocate GPU memory, as I said, allocating GPU memory is basically the number one thing that a criminal driver does, you can't do anything.

17:41.000 --> 17:52.000
So to answer a question that somebody may or may not have, where are we in upstream, we can basically probe the device and bring the power up basically.

17:52.000 --> 17:54.000
And not much more than that.

17:54.000 --> 17:57.000
Reason being, again, we cannot allocate memory.

17:57.000 --> 18:02.000
We cannot talk to the IOMM use, so we cannot provide the isolation thing.

18:02.000 --> 18:08.000
I've been talking about, we cannot have, we cannot have fences.

18:08.000 --> 18:13.000
Again, fences is this synchronization mechanism where you can basically wait on a fence.

18:13.000 --> 18:16.000
There's only two operations, waiting and signaling.

18:16.000 --> 18:22.000
And it's how the kernel driver tells us these are space, for example, that a given job has finished executing.

18:22.000 --> 18:24.000
It's how you build this dependency graphs.

18:24.000 --> 18:30.000
I've been talking about where you say, hey, this job wants to complete, it's going to give you one fence.

18:30.000 --> 18:35.000
And then this other job will depend on this fence, so it's how you build these dependency graphs.

18:35.000 --> 18:44.000
And we cannot allocate fences in tier because there's no way to talk to the current C code that does dot in Rust.

18:44.000 --> 18:52.000
And this GPU scheduler, again, one of the major responsibilities of a kind of driver is scheduling jobs.

18:52.000 --> 19:01.000
Again, by noticing which job has its dependencies met, such that this job can execute at the moment.

19:01.000 --> 19:04.000
This is done by this GPU scheduler component.

19:04.000 --> 19:08.000
There's no way to talk to the staying upstream at the moment.

19:08.000 --> 19:13.000
So there's very little that can be done, however, work tackly, all of these things.

19:13.000 --> 19:18.000
So, viewed Paul from Red Hat is working on the first bullet point.

19:18.000 --> 19:22.000
And I'm pretty sure his series is going to be merged soon.

19:22.000 --> 19:24.000
So soon we're going to get the first one.

19:24.000 --> 19:28.000
Second one is probably going to be merged in the next cycle.

19:28.000 --> 19:35.000
And then again, there's a person, Philip Stoner from Red Hat, who's working on the scheduler and the fence stuff.

19:35.000 --> 19:40.000
And just last week he put out a RFC.

19:40.000 --> 19:44.000
I'm not really sure how this RFC is going to perform.

19:44.000 --> 19:51.000
And it's not really your M GPU scheduler, but a new thing, but we're going to talk a little bit about this.

19:51.000 --> 19:57.000
But the good news is we have a prototype driver, so we can just take whatever code anybody puts out.

19:57.000 --> 20:03.000
And then backport that into the prototype and then test it out on a real device to see how it performs.

20:03.000 --> 20:08.000
Because drivers like Nova, for example, do you're not at a point where they can test a scheduler?

20:08.000 --> 20:16.000
Because they cannot submit jobs yet, because they did not get there yet.

20:16.000 --> 20:25.000
And for the prototype, which is the staying, I mean, I used to have these pictures there before I could present from the actual device.

20:25.000 --> 20:30.000
So I don't think pictures are really needed because we're seeing it being presented.

20:30.000 --> 20:36.000
We can basically watch YouTube play games.

20:36.000 --> 20:40.000
So yeah, gnome and Western are working.

20:40.000 --> 21:00.000
And in fact, let's just see whether we can run something like supertickscart.

21:00.000 --> 21:05.000
And hey, supertickscart is working.

21:05.000 --> 21:14.000
This GPU thingy at zero means that we don't have support for performance counters, which we did not get there yet.

21:14.000 --> 21:25.000
This thing, basically, we're actually only 50% sure that this thing will actually work until the end of the presentations because a lot of things to do is still.

21:25.000 --> 21:29.000
And you can play.

21:29.000 --> 21:42.000
Yeah, whatever.

21:42.000 --> 21:47.000
And yeah, there you go.

21:47.000 --> 21:50.000
You can basically play.

21:50.000 --> 21:55.000
And again, people played this for a few hours during plumbers.

21:55.000 --> 22:08.000
Um, how do I quit this?

22:08.000 --> 22:14.000
And then if we do, like, demas, grab tier.

22:14.000 --> 22:16.000
Tier is basically outputting a lot of things.

22:16.000 --> 22:20.000
Basically, all mappings have traces on them for debugging reasons.

22:20.000 --> 22:26.000
So every time you map and map memory, you bring something to the console.

22:26.000 --> 22:32.000
Let's see if VKQ is actually working.

22:32.000 --> 22:35.000
So yes, it's running on Molly, just 16.

22:35.000 --> 22:45.000
And Vulcan is working, albeit with some glitches because as we will discuss, there's a lot of shortcuts there.

22:45.000 --> 22:51.000
So if you run VKQ on a actual, you know, how do I say this?

22:51.000 --> 22:55.000
On the driver that has actually been deployed, let's put it this way.

22:55.000 --> 22:57.000
It doesn't stutter like this.

22:57.000 --> 23:02.000
This is where the synchronization is in 100% okay yet.

23:02.000 --> 23:08.000
We have taken some shortcuts to get this to work.

23:08.000 --> 23:09.000
But it works.

23:09.000 --> 23:13.000
I mean, I couldn't use that as my daily driver for email.

23:13.000 --> 23:18.000
And, you know, I, as long as I saved all my work.

23:18.000 --> 23:24.000
After, like, every three or five minutes, let's, uh, yes.

23:24.000 --> 23:25.000
All right.

23:25.000 --> 23:26.000
So gnome is working.

23:26.000 --> 23:27.000
VKQ is working.

23:27.000 --> 23:30.000
Super tics cart is working.

23:30.000 --> 23:33.000
This can we run Firefox?

23:33.000 --> 23:40.000
No?

23:40.000 --> 23:41.000
Yeah.

23:41.000 --> 23:47.000
So Firefox is working.

23:47.000 --> 23:48.000
All right.

23:48.000 --> 23:58.000
So let's get back to the presentation.

23:58.000 --> 24:03.000
So anyways, as I said, some parts will need more iterations before being ready.

24:03.000 --> 24:08.000
As you can obviously see, there's a few things that this thing does not have.

24:08.000 --> 24:11.000
For example, it doesn't have any power management code whatsoever.

24:11.000 --> 24:17.000
So the power management strategy for this thing is probe the driver,

24:17.000 --> 24:25.000
clock the GPU to the max clock frequency that you can, and then just leave it there.

24:25.000 --> 24:29.000
So what it was is not good enough, like, as I said, synchronization is not good enough.

24:29.000 --> 24:34.000
So I'm not going to be showing those, well, like, if you run super tics cart in windowed mode,

24:34.000 --> 24:39.000
it's actually glitchy and there's, uh, a few, um, how do I say that?

24:39.000 --> 24:40.000
Yeah, well, glitches.

24:40.000 --> 24:46.000
As you see, red and two different colors popping up every now and then, um,

24:46.000 --> 24:50.000
what I was just not here, um, oh, um, error recovery.

24:50.000 --> 24:54.000
So if the GPU crashes in this thing, you're, you're done.

24:54.000 --> 24:56.000
You have to review the computer.

24:56.000 --> 25:01.000
Usually, when the GPU crashes, the first thing you have to do is, hey, is this crash recoverable?

25:01.000 --> 25:03.000
Because some crashes are recoverable.

25:03.000 --> 25:08.000
And if they are, then you should rewind the state to, you know,

25:08.000 --> 25:14.000
you should basically recover and put the GPU in a state where the user can resume, uh,

25:14.000 --> 25:16.000
using the, the hardware.

25:16.000 --> 25:18.000
But this is not the case here.

25:18.000 --> 25:23.000
So if this crash has a crash has, and then you reboot the system and bye bye for all of your work.

25:24.000 --> 25:28.000
Let me see if I can remember something that is not implemented yet.

25:32.000 --> 25:37.000
Oh, um, you can only run H programs at the same time.

25:41.000 --> 25:43.000
For reasons.

25:45.000 --> 25:51.000
No, really, um, I said that the microcontroller can schedule jobs automatically.

25:51.000 --> 25:55.000
But that's only if you give it up to eight jobs, right?

25:55.000 --> 25:58.000
If you give it eight jobs, like from zero to seven jobs,

25:58.000 --> 26:02.000
it can pick between these eight and automatically scheduled them.

26:02.000 --> 26:05.000
But if you have more than eight, you have to have a software schedule on top,

26:05.000 --> 26:10.000
which is more code that is not as nice as nice and shiny to show people.

26:10.000 --> 26:12.000
So it's not there yet.

26:12.000 --> 26:15.000
Um, but what do we want to stay for?

26:15.000 --> 26:18.000
As we said, going forward now,

26:18.000 --> 26:23.000
we have a driver where we can test the code that other people from,

26:23.000 --> 26:27.000
from NVIDIA, from Red Hat, or any other contributor,

26:27.000 --> 26:30.000
we can test their code in a real device, right?

26:30.000 --> 26:36.000
Uh, we can basically backport their code into on top of our prototype and then run it and then collect data

26:36.000 --> 26:39.000
as does working, as does before meant or whatever.

26:39.000 --> 26:47.000
And if it works through tier, there's a good chance it's going to work for Azahi and for, um, and video as well.

26:47.000 --> 26:56.000
So again, how can we, you know, get this prototype and upstream it and, you know,

26:56.000 --> 26:59.000
get rid of the shortcuts, that's where we are at the moment.

26:59.000 --> 27:02.000
So we are focusing basically upstream now.

27:02.000 --> 27:05.000
We're going to get a lot of the code that we have on the stain.

27:05.000 --> 27:11.000
And then we're going to see you, hey, where's, um, where do we take any shortcuts or what can we improve?

27:11.000 --> 27:16.000
Or can we design the Rust API to actually be better or safer,

27:16.000 --> 27:19.000
or more performing, et cetera, et cetera.

27:19.000 --> 27:21.000
So this is where we are.

27:21.000 --> 27:25.000
There's actually a big discussion nowadays about how

27:25.000 --> 27:27.000
job submission is going to look like.

27:27.000 --> 27:34.000
Again, job submission is, let's say, one of the major responsibilities of a criminal or

27:34.000 --> 27:35.000
driver.

27:35.000 --> 27:41.000
And we are, this thing is using the GPU scheduler.

27:41.000 --> 27:44.000
Uh, the DRM GPU scheduler.

27:44.000 --> 27:49.000
That's written in C, however, as I said, this thing has a firmware.

27:49.000 --> 27:55.000
Uh, a microcontroller unit that can automatically schedule jobs up to age jobs.

27:55.000 --> 27:59.000
And then it doesn't really make sense for you to have this direct GPU

27:59.000 --> 28:01.000
DRM scheduler on top.

28:01.000 --> 28:06.000
And this is also the same thing for Azahi and the same thing for Nova.

28:06.000 --> 28:12.000
So it's looking like we're going to be writing a new scheduler directly and Rust for,

28:12.000 --> 28:16.000
for job submission, which is not going to schedule anything because as we said,

28:16.000 --> 28:19.000
the GPUs nowadays, they can do that on their own.

28:19.000 --> 28:22.000
It's only going to do the dependency management part, right?

28:22.000 --> 28:26.000
So figuring out whether the dependencies are met before executing work.

28:26.000 --> 28:30.000
And Red Hat is basically working, or spearheading does effort.

28:30.000 --> 28:33.000
They're calling it the job queue.

28:33.000 --> 28:38.000
And we're working together with them to help them test it, give them input on the design.

28:38.000 --> 28:42.000
And so on and so forth.

28:42.000 --> 28:46.000
So as I said, we're going to start with a clean slate.

28:46.000 --> 28:49.000
Try to upstream whatever you have here.

28:49.000 --> 28:52.000
Try to upstream whatever the dependencies are.

28:52.000 --> 28:56.000
Still, not upstream in order to get a driver upstream.

28:56.000 --> 29:01.000
And then we're going to go, uh, focus some effort on benchmarking.

29:01.000 --> 29:02.000
Basically.

29:02.000 --> 29:05.000
So how performant is this thing compared to the C driver?

29:05.000 --> 29:09.000
We're in a couple of games, but we need more data than that.

29:09.000 --> 29:15.000
We're going to run CTS and ensure that CTS is passing, you know, that sort of thing.

29:15.000 --> 29:21.000
On the upstream driver as we get there.

29:21.000 --> 29:25.000
So basically, this is what I had to show for today.

29:25.000 --> 29:29.000
It's sorry I did not bring the controllers, otherwise your guys could play for 10 minutes.

29:29.000 --> 29:31.000
It'd be cool.

29:31.000 --> 29:33.000
Do you have any questions?

29:33.000 --> 29:36.000
Which talk are you using in your demo?

29:36.000 --> 29:38.000
Arcade 3588.

29:38.000 --> 29:42.000
The question was, which SLC we're using for the demo.

29:42.000 --> 29:45.000
And it's the Arcade 3588.

29:45.000 --> 29:46.000
Hi.

29:46.000 --> 29:47.000
Thank you for your talk.

29:47.000 --> 29:52.000
And you said that you don't need scheduling or up to eight jobs,

29:52.000 --> 29:56.000
but then you need some kind of scheduling.

29:56.000 --> 29:58.000
And then you said, why?

29:58.000 --> 30:01.000
Let's get early in front of the stuff.

30:01.000 --> 30:02.000
Let's get you.

30:02.000 --> 30:04.000
I won't choose by that.

30:04.000 --> 30:05.000
All right.

30:05.000 --> 30:08.000
So CSF will give you eight.

30:08.000 --> 30:10.000
So sorry.

30:10.000 --> 30:19.000
So the question was, the person was confused because I said that the GPU can schedule up to eight jobs.

30:19.000 --> 30:22.000
But then we need the software schedule on top.

30:22.000 --> 30:26.000
And I told that the job queue is not going to be scheduling anything.

30:26.000 --> 30:31.000
So there was a confusion in all of these different information.

30:31.000 --> 30:40.000
So to answer your question, CSF, which is this firmware assisted scheduling thing,

30:40.000 --> 30:43.000
will it give you up to eight slots where you can basically,

30:43.000 --> 30:48.000
you have eight ring buffers where you can place your command streams in there.

30:48.000 --> 30:53.000
And then it can automatically schedule up to these eight ring buffers.

30:54.000 --> 31:00.000
If you have more than eight jobs trying to use the GPU simultaneously,

31:00.000 --> 31:01.000
you have two options.

31:01.000 --> 31:05.000
Either you say, hey, the GPU is busy, which is what this is doing.

31:05.000 --> 31:10.000
So just return ebusy or you boot someone out.

31:10.000 --> 31:15.000
You stop the world and have a look, hey, who's idle or if nobody's idle,

31:15.000 --> 31:22.000
who can I remove from the ring buffer at this moment to free up one slot to give it to another

31:22.000 --> 31:25.000
job basically.

31:25.000 --> 31:31.000
So this is what I'm referring when I say software scheduler.

31:31.000 --> 31:33.000
It's this component.

31:33.000 --> 31:37.000
Now this component is really only for tier and panther,

31:37.000 --> 31:41.000
because for Azai and for Nova,

31:41.000 --> 31:46.000
they basically can have as many ring buffers as they want.

31:46.000 --> 31:49.000
It depends on the amount of memory, but there's no restriction.

31:49.000 --> 31:52.000
No eight or 16 is lots.

31:52.000 --> 31:58.000
So when I say that this job queue thing is not going to be doing any scheduling,

31:58.000 --> 31:59.000
this is what I mean.

31:59.000 --> 32:01.000
If you're driver, for example,

32:01.000 --> 32:06.000
needs to have any extra scheduling,

32:06.000 --> 32:08.000
because you only have a limit at amount of ring buffers,

32:08.000 --> 32:10.000
then you have to do that on top.

32:10.000 --> 32:14.000
And so far, this is only the case for tier.

32:14.000 --> 32:17.000
Any other questions?

32:18.000 --> 32:20.000
I have a more general question.

32:20.000 --> 32:21.000
You have some of that.

32:21.000 --> 32:23.000
In GPU memory allocation of the course,

32:23.000 --> 32:24.000
don't worry.

32:24.000 --> 32:26.000
But in a scope, in a scope,

32:26.000 --> 32:29.000
modern arm chips, which have tasks of CPUs,

32:29.000 --> 32:31.000
CPUs, GPUs, GPUs,

32:31.000 --> 32:34.000
and CPU core for GPU and stuff like this.

32:34.000 --> 32:37.000
And surrounded by high-binded memory, share memory.

32:37.000 --> 32:41.000
Do we really need to allocate if this memory is heavy to share

32:41.000 --> 32:43.000
it with all types of chip?

32:43.000 --> 32:45.000
Well, all types of course.

32:45.000 --> 32:49.000
So the question was in correct me if I'm wrong,

32:49.000 --> 32:52.000
given that modern SLCs have a lot of memory that's shared

32:52.000 --> 32:54.000
between a lot of components.

32:54.000 --> 32:56.000
Do we really need to allocate memory?

32:56.000 --> 32:58.000
The answer is yes.

32:58.000 --> 33:00.000
So basically,

33:00.000 --> 33:05.000
you need to allocate a portion of the system's memory

33:05.000 --> 33:07.000
to the GPU.

33:07.000 --> 33:09.000
And this is what Jim is doing.

33:09.000 --> 33:13.000
So Jim is going to use the SH,

33:14.000 --> 33:17.000
SHM, FS layer, or anyways,

33:17.000 --> 33:19.000
to actually do the allocation from you.

33:19.000 --> 33:24.000
And it's going to carve out memory from the system

33:24.000 --> 33:25.000
overall memory.

33:25.000 --> 33:26.000
So yes.

33:26.000 --> 33:29.000
I question about crucial routes.

33:29.000 --> 33:32.000
Did you do it only because the memory is empty,

33:32.000 --> 33:35.000
or you're looking to get some components?

33:35.000 --> 33:38.000
And if it's even possible in a framework.

33:38.000 --> 33:41.000
Oh, sorry.

33:42.000 --> 33:44.000
So the question is, are we doing the standard routes

33:44.000 --> 33:46.000
only because of the memory safety?

33:46.000 --> 33:50.000
Or are we also looking to get more performance in rust?

33:50.000 --> 33:53.000
So performance is actually a known goal.

33:53.000 --> 33:55.000
More performance is actually a known goal.

33:55.000 --> 34:01.000
What we're trying to do is to get as much performance as the C driver.

34:01.000 --> 34:04.000
And the reason why is,

34:07.000 --> 34:10.000
it's basically not a goal of the language to be faster

34:10.000 --> 34:12.000
than C in general.

34:12.000 --> 34:14.000
And even if the language is faster,

34:14.000 --> 34:18.000
having a faster kernel driver wouldn't necessarily make it

34:18.000 --> 34:20.000
faster overall.

34:20.000 --> 34:22.000
Because the kernel driver is not really the bottleneck

34:22.000 --> 34:23.000
most of the time.

34:23.000 --> 34:26.000
So having the kernel driver execute faster wouldn't necessarily

34:26.000 --> 34:29.000
need to games executing much faster.

34:29.000 --> 34:32.000
So no, we're only doing it for the safety.

34:36.000 --> 34:37.000
Yes.

34:37.000 --> 34:42.000
You said something about the developer and the as I drive up.

34:42.000 --> 34:47.000
If there anything you heard from people from AMD to you or Intel,

34:47.000 --> 34:50.000
I mean they have to print what's the biggest price of the kernel.

34:50.000 --> 34:54.000
I think that did they show any interest in looking at rust?

34:54.000 --> 34:58.000
Has AMD show any interest in looking at rust?

34:58.000 --> 35:00.000
Or in looking at rust?

35:00.000 --> 35:02.000
Well, no, as far as I'm aware.

35:02.000 --> 35:04.000
I haven't heard anything.

35:04.000 --> 35:06.000
That's my answer.

35:06.000 --> 35:07.000
Thank you.

35:09.000 --> 35:10.000
Yes?

35:10.000 --> 35:11.000
The mentioned feature.

35:11.000 --> 35:12.000
Something from here.

35:12.000 --> 35:15.000
These are implemented in PAN4, I guess?

35:15.000 --> 35:16.000
Yes.

35:16.000 --> 35:19.000
Is there anything else that implemented in PAN4?

35:19.000 --> 35:21.000
Which is one to have feature,

35:21.000 --> 35:23.000
but not master feature,

35:23.000 --> 35:24.000
but implemented in here?

35:24.000 --> 35:25.000
Yes.

35:25.000 --> 35:27.000
The performance comes as I guess?

35:27.000 --> 35:28.000
Yeah, a lot of things.

35:28.000 --> 35:31.000
In tier, basically, in the downstream prototype,

35:31.000 --> 35:33.000
we have the bare minimum.

35:34.000 --> 35:39.000
The question is, is there more things that's implemented in PAN4?

35:39.000 --> 35:42.000
That's not implemented in tier.

35:42.000 --> 35:45.000
So, a lot of things, basically.

35:45.000 --> 35:49.000
We only have the bare minimum here to submit jobs and, you know,

35:49.000 --> 35:51.000
build the firmware and have jobs execute.

35:51.000 --> 35:54.000
So, performance counters, bar and management.

35:54.000 --> 36:00.000
I think the bug facilities, error recovery,

36:01.000 --> 36:03.000
support for more GPU models.

36:03.000 --> 36:06.000
So, there's only supports Molly, just 16.

36:06.000 --> 36:10.000
We're a PAN4 supports other GPUs and other architectures.

36:10.000 --> 36:15.000
So, there's a lot of missing at this moment.

36:15.000 --> 36:17.000
More questions?

36:17.000 --> 36:18.000
Yes.

36:18.000 --> 36:20.000
You know, don't you stop this, PAN4, I guess.

36:20.000 --> 36:23.000
So, I could just pull your budget drive up,

36:23.000 --> 36:26.000
fill it and drive on my own,

36:27.000 --> 36:30.000
RK-8 or RK-8.

36:30.000 --> 36:35.000
Can I, the question was, can I download the driver

36:35.000 --> 36:39.000
and test it on my own, RK-358?

36:39.000 --> 36:42.000
Yes, but don't blame me.

36:42.000 --> 36:44.000
It takes back.

36:44.000 --> 36:47.000
No, yeah, it's public.

36:47.000 --> 36:51.000
But don't do your most important work and then have a crash.

36:51.000 --> 36:54.000
Do you think there's any more questions?

36:54.000 --> 36:56.000
You mentioned the video playback though.

36:56.000 --> 36:58.000
It's a bit of a blind spot for me.

36:58.000 --> 37:03.000
Is that like an entirely separate set of BDIs

37:03.000 --> 37:05.000
of the driver from our virtualeration?

37:05.000 --> 37:06.000
Or is it a different driver?

37:06.000 --> 37:10.000
Apparently, was it the same buffer objects as everything else?

37:10.000 --> 37:14.000
All right, so the, the question is,

37:14.000 --> 37:16.000
how is and correct me from our own,

37:16.000 --> 37:22.000
how is, how is video encoding the code implemented at the driver?

37:22.000 --> 37:27.000
And basically, I don't think arm has any encoder decode engines

37:27.000 --> 37:28.000
in the GPU.

37:28.000 --> 37:30.000
So it's totally separate from the driver.

37:30.000 --> 37:34.000
More questions?

37:34.000 --> 37:36.000
I have a question.

37:36.000 --> 37:37.000
I know it's very close.

37:37.000 --> 37:39.000
You're working on the who's funding it and why?

37:39.000 --> 37:41.000
Because that's all the things I know about.

37:41.000 --> 37:42.000
Oh, yes.

37:42.000 --> 37:47.000
So, who's funding the work and why?

37:47.000 --> 37:49.000
That was the question.

37:49.000 --> 37:53.000
So, basically, arm and Google at this point.

37:53.000 --> 37:58.000
And reason is, basically, as I said, security.

37:58.000 --> 38:02.000
So, making sure that people cannot hack devices.

38:02.000 --> 38:06.000
They may otherwise sell, for example.

38:06.000 --> 38:11.000
And eventually, we plan on, you know,

38:11.000 --> 38:14.000
so there's a seed driver, which is Panther,

38:14.000 --> 38:16.000
and then there's tier.

38:16.000 --> 38:20.000
And eventually, in a few years, if everything works out,

38:20.000 --> 38:23.000
the overall plan might be to move the platform from,

38:23.000 --> 38:25.000
from plant or to tier.

38:25.000 --> 38:28.000
And then, you know, to develop near features there.

38:28.000 --> 38:32.000
This is still too much in the future too, actually, discuss.

38:32.000 --> 38:36.000
But that's the general idea.

38:36.000 --> 38:41.000
Requestions?

38:41.000 --> 38:42.000
Hi.

38:42.000 --> 38:44.000
I mean, that's some stuff that's not ready yet,

38:44.000 --> 38:48.000
because there's no, like, gross funding for the figure you're looking at.

38:48.000 --> 38:50.000
Is it, like, painful to implement?

38:50.000 --> 38:52.000
So, yes, it's, is it quality?

38:52.000 --> 38:54.000
Is it the technical problem?

38:54.000 --> 38:57.000
Or is there a problem with more rust?

38:57.000 --> 38:58.000
Yes.

39:02.000 --> 39:06.000
The question was about the abstractions,

39:06.000 --> 39:09.000
and whether it's the, I said there are things that are missing,

39:09.000 --> 39:13.000
and whether they're hard to implement, or what the problem actually is.

39:13.000 --> 39:15.000
Correct?

39:15.000 --> 39:21.000
They're not hard to implement, but we need community consensus, actually.

39:21.000 --> 39:23.000
So, that's the hard part.

39:23.000 --> 39:27.000
And the number one roadblock, because when I had this slide saying,

39:27.000 --> 39:31.000
what we have versus what we don't have in terms of the abstractions,

39:31.000 --> 39:34.000
I said the most of them are going to be upstream soon,

39:34.000 --> 39:37.000
except for the job submission logic.

39:37.000 --> 39:42.000
And this is where this job queue stuff, replacing the GPU scheduler,

39:42.000 --> 39:46.000
and doing versus not doing scheduling, et cetera, et cetera.

39:46.000 --> 39:51.000
It's something that has to, this, we have to talk a lot about this,

39:51.000 --> 39:54.000
because it has to work for everybody for all drivers.

39:54.000 --> 39:57.000
And getting it to work, and getting it to work correctly,

39:57.000 --> 39:59.000
I mean, we already have this scheduler in C,

39:59.000 --> 40:03.000
and it's plagued with some issues.

40:03.000 --> 40:06.000
That nobody has managed to fix for seven years.

40:06.000 --> 40:10.000
And we don't want to repeat that just so that we have something that works.

40:10.000 --> 40:12.000
So that's, yeah.

40:12.000 --> 40:16.000
And fences are usually the same where it's like a house of cards.

40:16.000 --> 40:19.000
So if you have a bug somewhere, everything comes crashing down,

40:19.000 --> 40:22.000
so you have to be really careful there.

40:22.000 --> 40:25.000
More questions?

40:25.000 --> 40:28.000
Hi.

40:28.000 --> 40:32.000
Have you ever destroyed hardware, Bob?

40:33.000 --> 40:37.000
Thankfully not, have you ever destroyed hardware while doing this?

40:37.000 --> 40:42.000
Thankfully not, not yet.

40:47.000 --> 40:49.000
Hopefully never, right?

40:52.000 --> 40:56.000
More questions?

40:56.000 --> 40:57.000
Yes.

40:57.000 --> 41:00.000
Malay is like, why did you go to an Android?

41:00.000 --> 41:03.000
And I'm going to use the S-turn over for you.

41:03.000 --> 41:06.000
We have plans to back work on this or,

41:06.000 --> 41:12.000
but I have to wait a few years before we can move on to our next news.

41:12.000 --> 41:17.000
The question is, what are the plans for Android?

41:17.000 --> 41:22.000
Basically, right?

41:22.000 --> 41:26.000
I'm not sure I'm allowed to discuss this.

41:26.000 --> 41:31.000
There may or may not be plans.

41:31.000 --> 41:34.000
Yeah.

41:34.000 --> 41:38.000
And it may or may not be soon, ish.

41:38.000 --> 41:40.000
All right.

41:40.000 --> 41:41.000
Yes.

41:41.000 --> 41:53.000
Any more questions?

41:53.000 --> 41:54.000
No?

41:54.000 --> 41:55.000
All right.

41:55.000 --> 41:56.000
Thanks for having me.

41:56.000 --> 41:59.000
Hopefully you guys enjoyed it.

