WEBVTT

00:00.000 --> 00:10.000
Okay. Thank you. Yes. I'll talk about linear verification of Linux kernel code. So I'll start

00:10.000 --> 00:17.000
out with what is program verification. Program verification involves the source code and some

00:17.000 --> 00:23.000
specification of what the source code is supposed to do. And then we are goal is to be sure that

00:23.000 --> 00:29.000
the specification describes what the source code actually does. So ideally this tracking

00:29.000 --> 00:35.000
should be done automatically. So here's a simple example. This was presented in the Linux

00:35.000 --> 00:41.000
loop conference a couple years ago. Here we have the swap function extremely simple function.

00:41.000 --> 00:45.000
So what we would like to know about this function. So we might need to think about what the

00:45.000 --> 00:49.000
function does a little bit. It takes two arguments. Those two arguments are pointers. And the

00:49.000 --> 00:53.000
ideas we want to swap the values that are referenced by those two pointers. So clearly we have to

00:53.000 --> 00:58.000
be able to do reference those pointers. And we have to be able to write new values back into

00:58.000 --> 01:04.000
them. So then those are our requirements on the inputs. And the goal for the function

01:04.000 --> 01:10.000
should actually do is that the values at those two places should be interchanged. So then we

01:10.000 --> 01:14.000
want to consider it. So that's kind of a high level English language specification of what

01:14.000 --> 01:18.000
should happen. What we would like to have is some kind of formal representation that we can

01:18.000 --> 01:25.000
drop into some tool that will be checked. So the first property is that the two pointers

01:25.000 --> 01:31.000
are readable and rightable. So we use the notation of pharmacy, which is a tool for

01:31.000 --> 01:38.000
checking properties of C code. So basically we need to require that two input parameters

01:38.000 --> 01:43.000
the pointers are valid, meaning that we can be referenced them. And we can also write

01:43.000 --> 01:50.000
to the memory that is referenced by those pointers. And the next one is that we want that

01:50.000 --> 01:58.000
the final value of P is pointed to by P should be the original value of Q. So we can express

01:58.000 --> 02:05.000
that by saying ensures that means a requirement on the output that the final difference

02:05.000 --> 02:12.000
of the pointer P is equivalent to the original value that was the reference by Q. And

02:12.000 --> 02:19.000
similarly we want the final Q value to be the same as the P value. So the final value

02:19.000 --> 02:25.000
this D reference by Q should be the original value of this D reference by P. So so far things

02:25.000 --> 02:32.000
seem fairly simple. Now we put all this into a comment that is before the source code. We get

02:32.000 --> 02:38.000
a single file and now we can drop this into the pharmacy tool. And in a few seconds it tells

02:38.000 --> 02:43.000
that all the goals were proved. And so now we have confidence that our swap function is doing

02:43.000 --> 02:48.000
what it's supposed to or at least what the specifications that we wrote said it is supposed

02:48.000 --> 02:53.000
to do. So the question is, I mean swap functions very nice, but pretty simple we probably

02:53.000 --> 02:58.000
have some confidence that works. So we want to, the question we want to ask is could we do

02:58.000 --> 03:05.000
this for the Linux kernel? So this kind of an ongoing discussion about like requirements

03:05.000 --> 03:12.000
and documentation and how can we improve somehow the correctness and understandability

03:12.000 --> 03:20.000
of Linux kernel source code. So this sits somewhere in that space. Probably maybe the

03:20.000 --> 03:24.000
biggest question we ask is like, do we want to do this for the entire Linux kernel, little

03:24.000 --> 03:29.000
parts of it, very important parts of it parts that we don't understand that's all very

03:29.000 --> 03:36.000
open questions. So the positive aspects of writing these specifications in checking these specifications

03:36.000 --> 03:41.000
just thinking about the specification should be can lead you to find bugs. So later on in this

03:41.000 --> 03:46.000
talk I will describe some experiments that we've done and in several of these experiments

03:46.000 --> 03:50.000
just in doing writing the specifications in checking then we have actually found some bugs that

03:50.000 --> 03:56.000
have been submitted to Linux kernel and accepted or maybe found some potential bugs that were

03:57.000 --> 04:03.000
an old code that doesn't exist anymore and so on. So writing specifications means you

04:03.000 --> 04:08.000
have to somehow think pervasively thoroughly about what your code does. It's kind of a different

04:08.000 --> 04:12.000
point of view than when you're just trying to run it and so on but do some test or something

04:12.000 --> 04:18.000
and so it can highlight some issues. The specifications also provide an ambiguous notation

04:18.000 --> 04:22.000
for what the function is supposed to do. If you write out something in English you may have

04:23.000 --> 04:27.000
some, the person who writes them has their intended meaning of the English words, the person who

04:27.000 --> 04:31.000
reads them might understand something different. There's lots of things you can write

04:31.000 --> 04:36.000
in English, there's somehow like friendly English language that are kind of open to

04:36.000 --> 04:41.000
misinterpretation afterwards. So if you write out things in a more formal way there's perhaps

04:41.000 --> 04:46.000
less confusion and the specifications if they're written out in a formal way can be checked

04:46.000 --> 04:51.000
mechanically against the source code and that can happen also over time if you make

04:51.000 --> 04:56.000
some changes to your function, maybe you just make some refactoring to your function, you

04:56.000 --> 05:00.000
expect it doesn't change anything at all so then your specifications might still continue

05:00.000 --> 05:05.000
to work or you might make some small changes and then maybe your specifications will need

05:05.000 --> 05:10.000
adjustment but you still have some kind of control, some kind of insurance that what the

05:10.000 --> 05:15.000
function was doing before is still what it's going to be doing. On the other hand, on the

05:15.000 --> 05:20.000
side creating these specifications is hard, we're not going to create any more specifications

05:20.000 --> 05:25.000
but you'll see them and they're quite big and complex and then the question is I

05:25.000 --> 05:30.000
mean I just argued that maybe this whole specification idea is useful as the code changes

05:30.000 --> 05:35.000
but still in general can we really hope to maintain these things over time, developers are

05:35.000 --> 05:39.000
already more interested in improving their code, they don't really want to improve this

05:39.000 --> 05:44.000
kind of mysterious formal language and of course there's no magic bullet if you're a

05:44.000 --> 05:49.000
specifications say the wrong thing then you'll be checking that your code does the wrong thing

05:49.000 --> 05:55.000
you won't really have advanced things very much. There's in practice I think some kind of

05:55.000 --> 06:01.000
check on that in that if you write bad specifications it won't match your code and then you'll

06:01.000 --> 06:07.000
kind of be forced to iterate until when they match up you might be closer to reality but there's

06:07.000 --> 06:12.000
always the risk that specifications are wrong and you ensure that your code does the wrong thing.

06:12.000 --> 06:17.000
So so far we have done three experiments one of them is the function should we balance

06:17.000 --> 06:23.000
it's part of the Linux kernel schedule there's load balancer select idle core it's a function

06:23.000 --> 06:28.000
that is used when task wakes up and you want to figure out where to place it on the machine

06:28.000 --> 06:34.000
and F trace event enable the stable this is part of the event tracing infrastructure.

06:34.000 --> 06:39.000
So somebody might argue that maybe load balancing and task scheduling are really core

06:40.000 --> 06:47.000
aspects of operating system kernel maybe the code for enabling this enabling tracing is not really all that

06:47.000 --> 06:54.000
important but actually the third case we picked it because some human actually wrote specifications

06:54.000 --> 06:59.000
in natural language and so it was the opportunity to compare them.

06:59.000 --> 07:05.000
So start with should we balance load balancing the idea of load balancing is we have perhaps

07:05.000 --> 07:10.000
we have some CPU it would like to take on some more work maybe because it's idle maybe

07:10.000 --> 07:15.000
because it just has not enough stuff to do and what easy to do is search across the entire

07:15.000 --> 07:20.000
machine or in general across some set of CPUs to see if there's some other work that's available that

07:20.000 --> 07:27.000
it can steal. So this is beneficial because you'll get more work done more fast but it takes time

07:28.000 --> 07:34.000
because it takes time because we have to look at all those different CPUs look at their data structures

07:34.000 --> 07:42.000
maybe count some things and so on. And so what we want to do is just let the first CPU do the

07:42.000 --> 07:49.000
stealing and not all of them at the same time. So should we balance this sort of a gatekeeper

07:49.000 --> 07:56.000
that each CPU says should I balance should we balance and then sometimes it will turn true

07:56.000 --> 08:03.000
yes you'll get to do load balancing sometimes it doesn't. Here is the a recent version of the

08:03.000 --> 08:09.000
function it's not a huge function it's not a small function it's just what it is.

08:09.000 --> 08:15.000
The most interesting part of it is this loop in the middle. This loop is quite interesting for

08:15.000 --> 08:21.000
reasons it has this for each CPU and thing so that's like a special purpose it's like a

08:21.000 --> 08:26.000
for loop but actually it's iterating over a mask of CPU so you can think of it as sort

08:26.000 --> 08:31.000
of a set of CPUs but it's actually implemented as a bit mask and it's going to go through

08:31.000 --> 08:38.000
the mask and see which CPUs are under consideration and then do the loop body only for those

08:38.000 --> 08:44.000
CPUs. So that's fairly easy to understand then the more frightening thing about this loop is

08:44.000 --> 08:53.000
here in the middle we have the CPU and mask and mask and not and this is a side effecting

08:53.000 --> 09:00.000
operation what basically what this is doing is removing some CPUs from the set that we're

09:00.000 --> 09:07.000
iterating over. So this makes it harder to it reason about what the thing is doing because it's not just going

09:07.000 --> 09:16.000
from zero to 10 it's like going over a changing set of things. So here are the specifications

09:16.000 --> 09:21.000
this is much more intimidating than the ones for that we had for the swap function.

09:21.000 --> 09:26.000
Basically we have a bunch of issues that we want to consider for example is there any

09:26.000 --> 09:33.000
idle CPU are we the first idle CPU are there no idle CPUs are there some other things like

09:33.000 --> 09:37.000
maybe the CPU's actually been hot unplugged and so we obviously don't want to still work for that.

09:37.000 --> 09:43.000
So there's various issues that are taken into account and then we have the specifications that

09:43.000 --> 09:49.000
should be respected in each of those cases. So these are what are called the pre-imposed conditions

09:49.000 --> 09:54.000
under what circumstances the function could should be called and what behavior should

09:54.000 --> 10:00.000
happen if we call it in those cases. We also if you're familiar with verification at all you

10:00.000 --> 10:06.000
know you have to write loop invariants a loop invariant is dealing with the issue that we actually

10:06.000 --> 10:11.000
this is all happening before one time so we have no idea how many times the loop is going to iterate.

10:11.000 --> 10:19.000
So we need some human to provide information about some property that is going to hold no matter how many times the loop occurs.

10:19.000 --> 10:26.000
So this is a whole other pile of loop invariants that we need to take into account.

10:27.000 --> 10:34.000
So what we are interested in this particular experiment is how to think how does the code change over time and what are the impact on the specifications.

10:34.000 --> 10:42.000
And actually so this function has changed basically ten times and the last ten years that it has existed.

10:42.000 --> 10:49.000
So the beginning part of this experiment was very positive.

10:50.000 --> 10:55.000
The changes were things like the first ones to fix a bug by changing not equal to equal.

10:55.000 --> 11:04.000
Then there were some refacturings, some changing names, some realization that this function is not necessary so we can use some other function instead.

11:04.000 --> 11:14.000
And all of those changes were very simple and either they had no impact on the specifications or they had only a very minor impact on the specifications of few lines had to change.

11:14.000 --> 11:21.000
And you can see that actually the verification time is also reasonably fast and doesn't change very much.

11:21.000 --> 11:31.000
But then here in the last in 2023 the things kind of go more like this.

11:31.000 --> 11:41.000
This is where they had someone had the idea of adding in this optimization of throwing out CPUs that we don't actually have to look at again.

11:41.000 --> 11:52.000
So then that makes the whole analysis of the iterations more complicated because we have to take into account that things have disappeared along the way.

11:52.000 --> 12:03.000
The first change was to actually this change here is preferring before it was just like is the CPU itself idle at this point in number eight.

12:03.000 --> 12:10.000
We don't actually change the set but we have an extra loop in the middle which is going to check all the hyper threads.

12:10.000 --> 12:19.000
So it's not very desirable if you have a CPU but it's hyper thread is busy that's not really the ideal place to put new work because this course actually doing work quality.

12:19.000 --> 12:29.000
So we prefer fully idle course so that adds an extra loop in the middle and that adds about 10 seconds 12 seconds to the verification time.

12:29.000 --> 12:40.000
And then here at this point we add the optimization that if we have checked all the hyper threads we and we find out what the one of them is not idle we don't need to check them again.

12:40.000 --> 12:53.000
So this this causes a increase in the verification time it causes a lot of added complexity in the specifications it took us a long time to figure out what the specification should actually be.

12:53.000 --> 13:07.000
But you notice that these cases here are in red and that this case here is actually in a release to releases a Linux kernel and what that means is actually the person adding this optimization introduced the bug.

13:07.000 --> 13:21.000
So it's there was some benefit to sitting there and thinking about these specifications and thinking about what the function is really doing because we actually found a bug that was in a release.

13:21.000 --> 13:36.000
Another issue we have here is that these are the specifications this are the proving times that we have now when we originally did this work we had proving times there were much much higher especially for the complicated cases.

13:36.000 --> 13:46.000
And then we invited someone who works on pharmacy to come and help us to do it better so choose we found that choosing the right representations of certain things is very important.

13:46.000 --> 13:58.000
It's kind of an impracticality if you have to actually bring in a person who maintains the tool to figure out how to do it maybe there can be some kind of.

13:58.000 --> 14:03.000
Some kind of libraries of good representations so on.

14:03.000 --> 14:13.000
So the next one case I will go for much more quickly this is select idle core this is the function that places pass on idle core here we have again the same looping property.

14:13.000 --> 14:22.000
We want to iterate over a bunch of CPUs and we have again the property of updating the set of CPUs that we are working on at the same time.

14:22.000 --> 14:35.000
This is a looping function iterating function this more complex than the previous one the previous one started is zero and went to the end this one starts at the middle goes to the end and wraps around and works on the beginning.

14:35.000 --> 14:43.000
So you have whole extra set of specifications saying we're up here in this region we're up here in this region or we reach the end.

14:43.000 --> 14:46.000
So it just kind of adds more work.

14:46.000 --> 14:52.000
Here what we found was very helpful was to introduce some macros that what hide some complexity.

14:52.000 --> 15:02.000
These are, if I'm citing one of them is figuring out exactly what are the CPUs we can consider one of them is just hiding this for all this quantifier stuff.

15:02.000 --> 15:16.000
You do this stuff over and over again it's not very intellectually it doesn't contribute much but it takes up a lot of space and it makes it very hard to figure out what is going wrong when things go wrong.

15:16.000 --> 15:38.000
So the last function we consider is this F-trace one this code is much more complex but you can consider maybe you know maybe it should be easy because actually if you look at this carefully what we see are lots of ifs which very different kinds of if but there's actually no loops so we have no loop and variance to figure out.

15:38.000 --> 16:02.000
So that should be good on the other hand these ifs all lead to assignments of different variables things data structures and so on and then there's a very scary place here where we have if and it depends this if here depends on variables that we have updated along the way which makes proving specifying everything much more complicated.

16:02.000 --> 16:25.000
But then the question is I'm this is a very complex function how what kind of specifications do we want so the first attempt was just to thread through every single path through the function figure out what what things it tests and figure out what things it updates so we get as this is just a small extract of the specification this is just one particular case and then we ended up with.

16:25.000 --> 16:36.000
More than 350 lines of specification this function is doing enabling the atracing and disabling this only for the disabling part 350 lines is completely unenlightening.

16:36.000 --> 16:45.000
We also discovered that pharmacy is not very good at this bit masks which the original implementation used extensively so this is somehow not very good.

16:45.000 --> 16:55.000
As I mentioned I chose this because some human had written some specifications and so this is what the specification is a little extract of what the specification looks like.

16:55.000 --> 17:00.000
If we have this variable we do this and this and this and so on.

17:00.000 --> 17:14.000
So obviously they I mean this is these are perhaps understandable to humans other users the other developers but there's not checkable it's not complete and it's actually not entirely correct either.

17:15.000 --> 17:26.000
So so but we took those specifications and wrote them out in the formal notation and we end up with something that's much simpler than what I showed previously.

17:26.000 --> 17:35.000
But on the other hand it's I mean it's only this is half of a disabled case it's only 25 lines instead of 250 lines so that's pretty good.

17:35.000 --> 17:50.000
On the other hand it's still not completed it's based on something that's not complete we're not gonna end up with something complete on the other hand it has been checked so we filled in we figured out what was going wrong and added the extra cases so actually in new way it's everything that's important.

17:51.000 --> 18:02.000
On the other hand it's not very satisfying it seems like we just have some random information about some random variables we don't have kind of somehow some kind of complete definition of what's going on.

18:02.000 --> 18:19.000
So in the third case we I thought about like think about each variable and think about all of the things that affect that variable and put them together in a particular place and so somebody wants to understand a particular variable then they have something specific they can use.

18:19.000 --> 18:29.000
And we can also use pharmacy to check that everything what we describe is complete and they're all disjoint from each other so we have some confidence that we're doing some reasonable thing.

18:29.000 --> 18:48.000
And then the last thing I started thinking about is okay all of those things we've seen before describe what the function does but it doesn't describe what is the impact for the users so we also want to have a bunch of function calls and they should be able to interface with each other and maintain some invariance over time.

18:49.000 --> 18:58.000
So in conclusion what did we learn from should we balance that we can easily maintain things over time select at a core we learned that abstraction is important.

18:58.000 --> 19:07.000
And from last one we learned that really specifications need to give insight could users about how they should use the functions what functions they should do and so on.

19:07.000 --> 19:18.000
In the future we would like to do more experiments think about what of this is practical to automate so far everything has been written by hand and what is the trade-off between understandability and maintainability and correctness.

19:19.000 --> 19:21.000
So thank you.

