WEBVTT

00:00.000 --> 00:17.160
Welcome. I'll give this talk about static code analysis in the kernel. So last year, I did

00:17.160 --> 00:27.160
just talk about hardware uploading through network caps and using internal kernel APIs

00:27.160 --> 00:36.440
to push rules into hardware. I didn't talk much about last year, but this actually led

00:36.440 --> 00:45.800
to discovery of some performance issues in the kernel, which then fixed shortly after. So this

00:45.800 --> 00:56.080
was in 610 where we implemented a TC bypass feature, which is then refined further in the

00:56.080 --> 01:03.080
next release, which is essentially only enables software processing of TC rules, if all

01:03.080 --> 01:12.400
the rules don't require. It's only enabled if any rules require software processing. And

01:12.400 --> 01:18.760
then after I did this, I also worked a bit on trying to make sure that the validation of

01:18.760 --> 01:23.520
these rules were done properly, whether they could be hard to upload it or not. And this

01:23.520 --> 01:31.960
set me to poke around in five few different drivers. So one of the drivers that I poke

01:31.960 --> 01:41.960
around was the quid driver from Qilotik. And in that I found something like this, where

01:41.960 --> 01:49.600
this looks like a normal static helper function and a current current zero and a current

01:49.600 --> 01:58.520
turn two different error codes. I then had to propagate in any argument to this function.

01:58.520 --> 02:04.560
And that led me to edit all the call sites. And then I discovered the call site looked

02:04.560 --> 02:11.560
like this. So it just checked for none, several X code, and always returned the same error

02:11.560 --> 02:20.920
code. So the unabsuppe would not be possible to get exported from this function and

02:20.920 --> 02:30.680
out into use space. And all's used in this sense in turn on in the kernel. In some

02:30.680 --> 02:38.440
case where you don't specify if you use software hardware, then it will try to do hardware.

02:38.440 --> 02:43.440
If it can, it will say that it's hard to upload it. And otherwise it'll just influence

02:43.440 --> 02:55.600
the software. And it depends on getting the unabsuppe for that to work. So then I found

02:55.600 --> 03:00.880
three different bugs of this kind within this driver, where they had all been there from

03:00.880 --> 03:14.000
the initial commit off this functionality. And in some of the cases, it wasn't an issue yet

03:14.000 --> 03:19.360
because the helper function only did return one error code. And therefore it was fine that

03:19.360 --> 03:27.640
it was all written for the out. But in another case, it did cause an issue later on because

03:27.720 --> 03:35.400
the reviewers didn't really catch that adding a new return code in a function didn't

03:35.400 --> 03:44.160
necessarily propagate all the way through. So I don't blame the review for not catching this.

03:44.160 --> 03:52.360
And in general, if you intend on to use a function in this sense, it's better to have it

03:52.360 --> 03:57.720
be able to enroll you by it only has two extra codes when you just have it in a simple

03:57.720 --> 04:08.120
if statement. So they have been a bunch of other research on this topic. Unfortunately,

04:08.120 --> 04:15.120
the research team from Reconcentre University that have done this have focused only on

04:15.200 --> 04:23.200
the file system and the block layer. So they haven't looked into other subsystems. And

04:23.200 --> 04:32.240
they're not active anymore. And this book that I found was in Houston, Susan 18, but still

04:32.240 --> 04:37.120
been there for quite some time. So therefore I thought it was fun to see if I could do some

04:37.120 --> 04:44.720
static analyzes and detect these because they seem pretty simple to detect. So for doing that,

04:44.800 --> 04:56.000
I've looked into a tool called SPAS, SPAS. It's a tool written by Lune Souss, way back in order

04:56.000 --> 05:04.400
to help him vet patches for the Lune's kernel. And it is also standalone static analyzes tool

05:04.400 --> 05:12.240
that has some building checks. And more or less, the whole Lune's kernel today is clean in this

05:12.240 --> 05:19.840
tool. And by using this code base for doing this, I also know that this code base works well

05:19.840 --> 05:28.880
with the kernel. So I know that it works with the whole code base and all this special ways

05:28.880 --> 05:40.720
that the kernel abuses GCC. So it has a bunch of binaries in the code base, which are useful

05:40.720 --> 05:56.720
inspiration. And the initial commit was this from, yeah. It's not so simple anymore, but it's very

05:56.720 --> 06:05.840
easy to run with K-built. So you can just do a check equals SPAS. And then it will run SPAS for

06:05.840 --> 06:11.840
area of single source file. And this is frequently done in the kernel tree. But it's also very easy

06:11.840 --> 06:21.280
to replace SPAS here with your own SPAS base tool that will do some analyzes. So that's the way

06:21.280 --> 06:29.200
I hooked it in. Unfortunately, this runs one per source file and we'll come back to that data.

06:29.200 --> 06:38.240
But yeah, SPAS has a function looks like this within SPAS. If you dump the instructions,

06:39.280 --> 06:45.040
this is internally we represent a lot of structs and unions that are all in the find with a lot of

06:45.040 --> 06:52.240
pointers, which makes it a bit hard to dump. But more in the later. But this is basically a

06:52.320 --> 06:58.880
returned step in the bottom that returns a register. Then register is set to zero or to an error

06:58.880 --> 07:11.600
code. And up top is a code to a register, which a register is a time from a code for a static function.

07:11.600 --> 07:18.640
And then based on the return code, we choose which branch to take. So this is a simple case

07:18.640 --> 07:23.920
where we can analyze that we only have two possible return codes from this function. This is the

07:23.920 --> 07:29.520
only the brief part of the function that this mentioned on top to fit on this slide.

07:31.360 --> 07:39.280
This is a more complicated code where it's a function pointer that's called where you need to know

07:39.280 --> 07:46.960
which structs is reading in and of which offset you have the different functions defined.

07:47.200 --> 07:54.640
Which is a bit more interesting to, so I've left that as a later exercise.

07:58.640 --> 08:07.280
But basically in order to do an analysis of this, I haven't completely finished yet, unfortunately.

08:07.760 --> 08:14.720
The way I've gotten is that, you know, because this is called once the source file,

08:15.360 --> 08:22.400
then I need to dump all of the information that I need and then retrieve it again in a later processing

08:22.400 --> 08:29.200
stage. And most of it worrying, it's like 95% done. So probably in a week or two, I will publish

08:30.160 --> 08:39.440
how many other bugs I found, unfortunately, life happened. And there was enough time to finish it for this.

08:43.440 --> 08:49.920
But basically it's turned into a big civilisation, deceleration task in order to

08:50.880 --> 09:00.080
be able to do the analysis at a later stage, combining all of the knowledge that was found

09:00.080 --> 09:09.440
in the initial processing stage. But it does show promise and the way I've implemented those two

09:10.160 --> 09:17.600
focus on finding the stages where I know that a bug, like the ones that I have fixed already,

09:18.880 --> 09:27.520
and I will also welcome you to have this website here, where I'll post updates. And I will

09:27.520 --> 09:33.760
welcome you all to look at this a few weeks and see if there are some bugs that you can fix,

09:33.840 --> 09:41.760
because it would be awesome to get newcomers to help contribute some of those fixes back.

09:47.120 --> 09:49.440
So yeah, any questions?

09:49.920 --> 10:11.920
So I love what you're doing. This is fantastic work. Having newcomers do it may be problematic,

10:12.640 --> 10:21.040
because it does require a certain amount of big picture understanding that newcomers may not

10:21.040 --> 10:32.720
have. So the code that we have today in the tree has been tested and does work. And you may find

10:32.720 --> 10:40.560
that in some of these cases returning e-overflow or all over the other, I think that was actually

10:40.640 --> 10:47.920
great user space, because it's not expecting to get blacks. It's not been tested.

10:47.920 --> 10:58.240
We just terrified. So and we may well have situations where we are deliberately throwing away

10:58.240 --> 11:02.800
an error code and returning a different error code. We actually go to all the time in the file system

11:02.800 --> 11:11.040
world. We guess back some very specific errors from scuzzi devices. The scuzzi error return

11:11.040 --> 11:17.520
code, I think in return like 300 different errors. And 99% of the get mapped EIO. And that is

11:17.520 --> 11:23.120
entirely on purpose, because user space cannot cope with, oh well what's actually going on here is

11:23.120 --> 11:28.000
that there's a medium error and blah blah blah, you space can't cope. It just needs to know EIO

11:28.080 --> 11:34.640
this didn't work. Of course. And there are many cases like that. But part of the minimizing

11:34.640 --> 11:44.800
false positives is also to begin with just focusing on taking a looking at this function,

11:44.800 --> 11:49.040
this helper functions, how many places are called. It's all of the places that it's calling

11:49.040 --> 11:55.760
only treating it as a blur. Then it's worth looking into if that can be changed over the return

11:55.840 --> 12:04.000
isably in. If there's only one core site like the functions that I have here, then it should

12:06.320 --> 12:14.640
return by the whole spectrum off them, otherwise it's a bug. So I'm trying to minimize the

12:14.640 --> 12:21.200
amount of false positives by not targeting finding all the bugs, but targeting finding the

12:21.200 --> 12:32.160
bugs that I find are actually probably bugs. And this was in a driver. And when different drivers

12:32.160 --> 12:40.480
are not returning right error codes as the rest of drivers do, then use space which normal

12:40.480 --> 12:47.360
support multiple drivers will normally expect to do the bad thing. That actually sets my mind

12:47.520 --> 12:53.520
disease to a large extent. Thank you. The other thing about trying to get more people to submit

12:53.520 --> 12:58.960
the patches is that if I submit all the patches, then I get blamed by not doing enough reviews.

13:04.720 --> 13:05.520
Any other questions?

13:05.760 --> 13:18.880
It's basically continuous. Can you add a comment? So it stires that you can overwrite

13:18.880 --> 13:23.520
is that you say here, the other code is okay. Yeah. Okay, you have certain.

13:23.520 --> 13:30.960
I don't have that yet. It is a little bit problematic to do because what I can see in sparse

13:31.040 --> 13:40.240
is only the actual code after the preposterous have run. So I might be able to see attributes,

13:42.320 --> 13:47.200
but I haven't looked into that, but yeah, good system. Okay, save me as close in error.

13:47.200 --> 13:52.320
Yeah. It all means that it needs to be run on many different configurations.

13:56.560 --> 13:57.360
Anything else?

13:57.360 --> 14:01.440
All right, thank you.

