WEBVTT

00:00.000 --> 00:08.560
All right, so the next step is going to be a presentation about

00:08.560 --> 00:12.400
Thursday by Jason and Chris.

00:12.400 --> 00:14.560
Thank you, Stefan.

00:14.560 --> 00:16.560
Thank you, good afternoon.

00:16.560 --> 00:17.560
Thank you for joining us.

00:17.560 --> 00:20.000
We're going to talk about Tuesday, but do it yourself.

00:20.000 --> 00:22.160
Linux, kernel, live patching tool.

00:22.160 --> 00:23.160
My name's Chris Townsend.

00:23.160 --> 00:25.560
This is Grace and Burino.

00:25.560 --> 00:29.960
Today, we're going to talk, introduce ourselves who and why, who we are,

00:29.960 --> 00:34.720
while we're doing this, then I'm going to hand it over to Grace and for the cool stuff.

00:34.720 --> 00:35.720
What is TuxTake?

00:35.720 --> 00:42.800
He's going to talk about the architecture overview, do it demo, and then we'll open up for questions.

00:42.800 --> 00:44.120
So who are we?

00:44.120 --> 00:45.680
Again, my name's Chris Townsend.

00:45.680 --> 00:47.800
I'm a senior engineering manager.

00:47.800 --> 00:54.760
Grace is a software engineer, the lead developer on TuxTake, or both from GICO.

00:54.760 --> 00:59.480
If you don't know what GICO, who GICO is, where a large US insurance company, one

00:59.480 --> 01:05.840
largest in the US, we are from a team that's called Containers OS and language

01:05.840 --> 01:08.320
run times within GICO.

01:08.320 --> 01:17.860
Our mission is to deliver a secure, compliant, and efficient containers OS and language

01:17.860 --> 01:20.240
run time solutions.

01:20.240 --> 01:22.040
So why are we building this?

01:22.040 --> 01:26.240
So we obviously have a need for live patching the kernel.

01:26.240 --> 01:30.640
So obviously there are distro-specific solutions out there.

01:30.640 --> 01:35.600
There are paid solutions, but none of them really fit the need that we have internally.

01:35.600 --> 01:40.400
So then we went to see, are there any open source solutions out there?

01:40.400 --> 01:45.920
There is one from the Ginto Gen2 project called E Live Patch, but it's been pretty much

01:45.920 --> 01:52.920
a band in last commit was almost six years ago, so it's pretty much dead.

01:52.920 --> 01:54.440
So that's not really good solution.

01:54.480 --> 01:57.560
So there's another one we found from the Debian project.

01:57.560 --> 02:07.680
I think it was in DebConf, 2023, kind of trying to get interest from the community to do this,

02:07.680 --> 02:13.120
but they're just a little, they have just kind of a sample patch, not a whole lot going

02:13.120 --> 02:14.120
on there.

02:14.120 --> 02:16.440
So why not do it ourselves?

02:16.440 --> 02:24.400
We want to try to get a community involved, see, get the experts to help us out, and do

02:24.400 --> 02:25.400
this out there.

02:25.400 --> 02:28.120
It was not really a good open source solution.

02:28.120 --> 02:31.400
So with that, I'm going to hand it over to Grace.

02:31.400 --> 02:36.760
All right, so in summary, so what is talks tape, right?

02:36.760 --> 02:42.160
So talks tape, it's a tool chain for creating, building, and deploying Linux kernel live

02:42.160 --> 02:43.160
patches.

02:43.160 --> 02:49.320
And it's not just one program, it's a whole ecosystem made up of several components.

02:49.320 --> 02:54.920
This is kind of the component that we've decided will be once we hit the MVP phase,

02:54.920 --> 02:58.800
and once we actually release, I'm going to be demoing a proof of concept, so things are

02:58.800 --> 03:03.120
a little bit different than this slide here in its current state.

03:03.120 --> 03:07.280
But just a little bit on how Linux kernel live patch works, there's a tool within the

03:07.280 --> 03:09.520
kernel called K patch.

03:09.520 --> 03:14.800
And what this does is it allows you to redirect vulnerable function calls over to past ones.

03:14.800 --> 03:21.800
So you can fix a vulnerable function in the kernel, and then using F trace in the kernel,

03:21.800 --> 03:27.240
you're able to then redirect that function call over to a compiled kernel module, and it

03:27.240 --> 03:32.080
will run that instead of the vulnerable function.

03:32.080 --> 03:36.880
But the thing with K patch is you can't just look at what they've got in the patch that

03:36.880 --> 03:39.000
they submitted to the upstream Linux kernel.

03:39.000 --> 03:42.360
The patches need to meet some certain requirements for them to actually be compatible with

03:42.360 --> 03:43.360
K patch.

03:43.360 --> 03:48.720
One of the good examples is you can't directly modify statically allocated data, like the

03:48.720 --> 03:54.480
stack that the function takes cannot change in size, so there are some work around available

03:54.480 --> 03:57.360
in order to get around that.

03:57.360 --> 04:00.280
So what is the proof of concept seek to explore?

04:00.280 --> 04:06.080
The automatic generation of raw patches was one of them, so we watched the Linux CNA, the

04:06.080 --> 04:10.760
CB naming authority, see the patches that they submit, and we try to get a get dip of the

04:10.760 --> 04:14.600
changes that they made to fix a particular CBE.

04:14.600 --> 04:19.600
We also wanted to determine which CBEs actually affect a mock server fleet, so just because

04:19.600 --> 04:24.360
a CBE affects the kernel version you're running doesn't necessarily mean that that vulnerability

04:24.360 --> 04:28.800
is in the kernel configuration that you're running on your server.

04:28.800 --> 04:34.920
Let's say a file didn't get included that the CBE affects in your particular kernel build,

04:34.920 --> 04:40.600
then you're not affected, so we also wanted to kind of explore what the maintenance work

04:40.600 --> 04:47.160
flow would be like for the developers who are creating these patches and reviewing the

04:47.160 --> 04:50.040
K patch patches before they get sent out.

04:50.040 --> 04:54.800
We wanted to build a text user interface to see kind of how that would all go down and

04:54.800 --> 04:57.440
with the process would be like.

04:57.440 --> 05:02.000
We wanted to see if a viable language for this, and if it was a preferable language for

05:02.000 --> 05:08.440
us to build this in, for the whole ecosystem, parts of it, or none of it, and we wanted

05:08.440 --> 05:13.840
to see if sufficient creates existed to do some of the heavy lifting that we needed to do.

05:13.840 --> 05:19.160
Some of the crates we've used in this proof of concepts, using Tonic for GRPC, get to which

05:19.160 --> 05:25.400
is like a lip-get wrapper, Tokyo standard, read it to a really great, to a library.

05:25.400 --> 05:29.120
There was a wonderful talk on that in the kernel room, or in the Rustroom yesterday, I highly

05:29.120 --> 05:31.240
recommend looking at that.

05:31.240 --> 05:36.960
You rack for a simple HTTP client, and we're currently using SQLite, we're thinking about

05:36.960 --> 05:43.880
moving the Postgres once we actually move into the next, the MVP phase of this, so all

05:43.880 --> 05:47.600
of this is subject to change, it's all kind of fluid.

05:47.600 --> 05:50.120
What is this proof of concept, not explore?

05:50.120 --> 05:54.200
There's a lot of bits that we still need to take the time to continue flushing out.

05:54.200 --> 06:00.400
One of those is the ability to do the conversion from the raw patch and convert it into something

06:00.400 --> 06:05.080
that's K patch compatible, so any of the funding that you need to do in order to get it

06:05.080 --> 06:10.640
actually to be run-able by a K patch, we want to get the ability to have that done automatically.

06:10.640 --> 06:11.640
We're not there.

06:11.640 --> 06:17.080
Similarly, since we're not actually building these K patch modules yet, we're not compiling

06:17.080 --> 06:21.280
them, and since we're not compiling them, we're not deploying them, so there's no fleet

06:21.280 --> 06:27.160
client and plash deployment yet, this is all upcoming, and then we wanted the ability

06:27.160 --> 06:35.040
to submit the K patch patches for review and approval, all via this text user interface dashboard,

06:35.040 --> 06:39.440
that's coming, there's a bunch more advanced features for patch creation that need to be

06:39.440 --> 06:45.400
done, and then additionally, so we really want to support as many non-mainline kernels as

06:45.400 --> 06:46.400
we can.

06:46.400 --> 06:51.960
We've got some ideas on how we want to do that, but that is still in development, and this includes

06:51.960 --> 06:55.240
Ubuntu kernels, which is the first thing we really want to target after we get

06:55.240 --> 06:56.800
in mainline working.

06:56.800 --> 07:01.080
All of this is to say, some of the functionality and this proof of concept is not representative

07:01.080 --> 07:05.160
of the future behavior of how this is all going to work.

07:05.160 --> 07:09.120
So let's get into the architecture of the proof of concept.

07:09.120 --> 07:13.160
It looks a little something like this, and I'm going to break this down by components,

07:13.160 --> 07:18.760
but this is the overall scheme of how it works in its current state.

07:18.760 --> 07:23.960
So let's start from the top, so this is a toxtape CVE parser.

07:23.960 --> 07:29.640
This is the thing that actually builds our database that we're able to store our patches

07:29.640 --> 07:32.520
in and access by the dashboard.

07:32.520 --> 07:38.880
So this will fetch the Linux CVE naming authority repo that they publish on Git, and

07:38.880 --> 07:45.600
publish by again, I mean, and it, they give us die-ad files, which states, and I'll break

07:45.600 --> 07:52.040
this down in the future, but give us information on what a patch was introduced in and

07:52.040 --> 07:56.080
what commit the vulnerability itself was introduced in.

07:56.120 --> 08:01.560
From this, we're able to generate Git patches from the stable kernel repo.

08:01.560 --> 08:05.920
So when I've been saying the term raw patch, that's what I mean, it's just a pure Git

08:05.920 --> 08:08.480
diff of the changes that were made.

08:08.480 --> 08:11.880
So this is what their die-ad files look like.

08:11.880 --> 08:15.640
It's in the format of the introduced version, the introduced commit, so this is when

08:15.640 --> 08:21.720
the vulnerability itself was introduced, and then they state the version that the vulnerability

08:21.800 --> 08:26.840
was fixed in and the commit where they submitted that patch.

08:26.840 --> 08:37.840
So this is an example of a raw patch, so this is for CVE 2024, 4256 on the 6.10 train.

08:37.840 --> 08:44.480
So this was a 9.8 base score CVE, and this would be the lines of code that need to get

08:44.480 --> 08:49.040
changed in order to patch out that vulnerability.

08:49.040 --> 08:54.680
So after we've gotten the raw patch itself, we need to fetch all of the other sort of CVE

08:54.680 --> 09:03.040
metadata from the NVD API, the national vulnerability database, so we'll also contact their

09:03.040 --> 09:08.480
API fetch things like the base score, the description, and all that stuff.

09:08.480 --> 09:14.480
So the CVE parser is responsible for on an initial run, building our database, and then

09:14.480 --> 09:18.600
on future runs, we'll update the database with any new CVEs or patches that have been

09:18.600 --> 09:23.120
discovered since the last run, and you could run this as a cron job and just have it

09:23.120 --> 09:27.120
have it go in the background once every day or so, however frequently you need.

09:27.120 --> 09:32.600
This is a quick aside, we discovered a bug in the CNA's diagenerator, and they almost

09:32.600 --> 09:38.160
immediately had patches within hours, which is really cool, we're very grateful for that.

09:38.160 --> 09:40.760
So let's get into the touch-tap server.

09:40.760 --> 09:47.040
This is our API for interfacing with that database, we're doing this with GRPC.

09:47.120 --> 09:53.800
It's got a TLS, it's got a GRPC reflection, and with this we're able to fulfill requests

09:53.800 --> 09:58.360
from the dashboard, the dashboards, how you're going to manage everything, what you're

09:58.360 --> 10:02.360
actually managing is all stored in the database.

10:02.360 --> 10:08.200
So this allows you to right now in the proof of concept, the main service functions we have is

10:08.200 --> 10:13.280
you're able to fetch all of the CVEs that affect your mock server fleet, and you have

10:13.280 --> 10:19.360
the ability to add a new kernel config that you then want to compile into a kernel, and

10:19.360 --> 10:24.040
we have a kernel builder which I'll talk about next, which we'll compile the kernel and

10:24.040 --> 10:29.000
use a thing called remake to profile the build and keep track of what files actually

10:29.000 --> 10:31.840
got included.

10:31.840 --> 10:34.200
So this is how the kernel builder works.

10:34.200 --> 10:40.000
Once it spins up, it'll register itself to touch-tap server, and after it's registered

10:40.000 --> 10:46.600
any time a build kernel request is needed to be sent from the TuxTap server, it'll find

10:46.600 --> 10:51.160
an available kernel builder, and dispatch that build job.

10:51.160 --> 10:57.280
It'll profile it, remake gives us a nice JSON explaining what files got included, and then

10:57.280 --> 11:02.840
we're able to return a build kernel response which then gives us a big list of all of the

11:02.840 --> 11:08.200
file paths of the files that got included in this particular build.

11:08.200 --> 11:14.680
So we've built a dashboard as well, so this is the management interface for viewing and

11:14.680 --> 11:16.520
editing those raw patches.

11:16.520 --> 11:22.200
It will prioritize the CVEs by the base score and just straight up ignore any CVEs that

11:22.200 --> 11:29.480
don't actually affect our fleet, and then it also has a second pane that allows you to

11:29.480 --> 11:36.080
configure kernels via menu config, and then request that it get built and profile.

11:36.080 --> 11:39.520
So let's get into the demos.

11:39.520 --> 11:45.360
So we've got here, we're going to spin up the CVE parser, it's going to pull down any changes

11:45.360 --> 11:53.280
that have been detected from the CNAs repo, and it'll generate a bunch of CVEs after

11:53.280 --> 12:00.840
this it will contact nist to infect all of that metadata that we need from them.

12:00.840 --> 12:05.960
And then we'll spin up TuxTap server, not much to see here, the TuxTap server is running,

12:05.960 --> 12:09.240
and then we'll set up one instance of the kernel builder.

12:09.240 --> 12:13.920
This will register itself over to the TuxTap server, and we can see that the server's got

12:13.920 --> 12:17.440
a connection back to the kernel builder.

12:17.440 --> 12:25.320
And that's the fun part, so we'll launch the dashboard itself.

12:25.320 --> 12:31.360
So this is our two-week here, this lists all of the CVEs that affect our mock fleet, we've

12:31.360 --> 12:33.720
got a 6, 6, and a 6, 11 in there.

12:33.760 --> 12:37.840
You can see this is all the metadata that we get from NIST, the description, all that you

12:37.840 --> 12:39.960
can scroll through.

12:39.960 --> 12:43.720
And whenever you click on a CVE, we have different instances, depending on which train the

12:43.720 --> 12:47.560
patch came from, if it was from 6, 10, or 6, 11 in this case.

12:47.560 --> 12:49.880
This is our, I'm sorry, 6, or 6, or 6, 11.

12:49.880 --> 12:57.120
This is our 6, 6 patch for this vulnerability, I'll close it out, and open up the 6, 11 one,

12:57.120 --> 13:00.680
looks pretty much exactly the same, very few differences here.

13:03.720 --> 13:07.920
And then let's get into the configs pane.

13:07.920 --> 13:14.240
So this is how you're able to configure a kernel, and profile it's build, this is very

13:14.240 --> 13:17.720
kind of primitive right now, but it gets the point across and has been helpful for us

13:17.720 --> 13:19.320
to test this.

13:19.320 --> 13:25.120
So we will fetch the 6, 10, 13 kernel, which is what I'm building, then you can fill building

13:25.120 --> 13:31.040
it with defaults, and if we switch back to our kernel builder, we can see that it is now

13:31.040 --> 13:34.480
building and profiling that kernel as it goes.

13:34.480 --> 13:41.920
So we're keeping track of every file that actually gets included.

13:41.920 --> 13:47.080
So as a recap, for the current capabilities, we are able to generate these raw patches,

13:47.080 --> 13:52.440
and we can view and edit the raw patches, and because we're able to profile these kernel

13:52.440 --> 13:56.800
builds, we're able to determine which CVE's affect the fleet.

13:56.800 --> 14:03.080
Future capabilities is the ability to submit, review, and compile these manually formatted

14:03.080 --> 14:04.080
patches.

14:04.080 --> 14:10.240
The ability to auto-generate K-patch compatible patches, to compile those, then dispatch

14:10.240 --> 14:16.960
them to the server fleet, and support for non-mainline kernels is also in the pipeline.

14:16.960 --> 14:24.080
On top of that scaling, testing all of the fun groundskeeping that needs to be done.

14:24.080 --> 14:25.080
So that's it for the demo.

14:25.080 --> 14:26.080
Thank you all.

14:26.080 --> 14:39.120
I'll just add that we don't have a public repo open just yet, but we're like weeks away

14:39.120 --> 14:40.120
from having that.

14:40.120 --> 14:44.600
So I'll make an announcement on LinkedIn, and you can find me, we'll try to message that

14:44.600 --> 14:51.240
very well, but look for that to be done very soon.

14:51.480 --> 14:53.480
Any questions?

14:53.480 --> 14:55.480
Questions?

14:55.480 --> 14:57.680
What is the always on the top?

15:08.680 --> 15:10.480
So thanks for the talk.

15:10.480 --> 15:14.280
I have a question regarding the generation of the live patches.

15:14.280 --> 15:21.240
So as I've said, you are trying to turn it right from the GitHub, a K-batch, and then

15:21.240 --> 15:22.240
apply it.

15:22.240 --> 15:30.560
But many of the kernel patches that fix bugs are not just changing code, they also change

15:30.560 --> 15:31.560
data structures.

15:31.560 --> 15:37.880
That's why all live patches from Zoosa are right at require a lot of manual injection.

15:37.880 --> 15:40.120
And how are you going to use to solve this?

15:40.560 --> 15:44.560
Yes, so that's all stuff that we are discussing for the next phase.

15:44.560 --> 15:45.520
We've got some ideas.

15:45.520 --> 15:48.920
I don't want to get too indefinite because I don't want to lock us into an approach.

15:48.920 --> 15:50.520
But I don't want to say it.

15:50.520 --> 15:51.520
Sorry?

15:51.520 --> 15:54.880
No, no, not immediately.

15:54.880 --> 15:58.240
We haven't got any specific plans for AI in this.

16:05.880 --> 16:06.880
Hello.

16:07.800 --> 16:13.880
Can you elaborate a bit more about the term in specific CVs relevant to a survey?

16:13.880 --> 16:19.640
So it's just some tracing, if effective line is called, or in the modern world?

16:19.640 --> 16:25.720
Yes, so right now, all that we have in mind is just checking whether or not the file that

16:25.720 --> 16:31.960
was affected by the CV is actually included in the kernel config that you then are in the kernel

16:31.960 --> 16:33.840
that you have built.

16:33.840 --> 16:39.760
In the future, we've talked about kind of doing some euristics to see whether or not functions

16:39.760 --> 16:45.120
are actually being hit, how frequently they are to determine something like that, but that's

16:45.120 --> 16:46.880
something we haven't got to explore yet.

16:46.880 --> 16:50.240
We will probably be approaching that at some point.

16:50.240 --> 16:53.240
Hello, the questions?

16:54.160 --> 17:08.040
Yeah, I'm just curious you've been filming this in terms of live patching, but it seems like

17:08.040 --> 17:13.760
a lot of this work here just in terms of determining the relevance of the CVs and the

17:13.760 --> 17:14.760
whole config stuff.

17:14.760 --> 17:19.080
It is useful on its own without even thinking about getting to the live patching bit.

17:19.080 --> 17:22.920
So if you thought about just having that part usable separately for what you're planning

17:22.920 --> 17:24.720
to do with live patching?

17:24.720 --> 17:26.320
Yeah, absolutely.

17:26.320 --> 17:31.440
It definitely could be useful for the ability to, because the general plan that we want

17:31.440 --> 17:36.240
to do is the ability to then build a kernel that has the patches in it.

17:36.240 --> 17:39.520
Live patching should be kept up as short of a time as possible.

17:39.520 --> 17:43.520
You don't want to leave the kernel in unpredictable state with a live patch.

17:43.520 --> 17:47.520
This is our, like we got to break the glass, we need to fix this vulnerability.

17:47.600 --> 17:51.120
Yeah, that it is kind of the last resort to roll out a live patch.

17:51.120 --> 17:56.320
Once we've confirmed that a patch works, we can then rebuild a new kernel with it in there.

17:58.680 --> 17:59.720
Other questions?

18:02.720 --> 18:03.880
Thank you.

18:03.880 --> 18:10.080
How do you manage to stay on top of every CVs that is exposed in the kernel?

18:10.080 --> 18:12.000
Do you manage to date on it?

18:12.000 --> 18:13.040
I'm sorry, could you repeat the question?

18:13.040 --> 18:15.320
Do you manage to stay on top of every CVs?

18:15.320 --> 18:22.320
Check them, everything, and make sure that it is actually something that you need to patch?

18:22.320 --> 18:24.320
Do you want to take that?

18:24.320 --> 18:25.320
Yeah.

18:28.320 --> 18:35.320
Right now, we will have, like, every enterprise is going to have, like, there are different ideas about what they need to do on this, right?

18:35.320 --> 18:40.320
We have our own ideas on this and what we feel like we need to patch within our fleet, right?

18:40.320 --> 18:46.320
But it's going to be, you know, a human manual process for us to do that, right?

18:46.320 --> 18:52.320
And we will have to determine what we feel is that we need to live patch or not, right?

18:52.320 --> 18:58.320
But I won't get into, like, what our criteria is and what that determine what we do on that.

18:58.320 --> 19:06.320
Yeah, so, like, you know, every, I, that would be a difficult question to answer for everybody, right?

19:06.320 --> 19:12.320
Like, like, everybody's going to have, like, different need, they're going to have their own idea of what is critical.

19:12.320 --> 19:14.320
Well, critical, obviously, probably everybody's going to be the same.

19:14.320 --> 19:17.320
But if we talk about, like, medium CVs or something like that, right?

19:17.320 --> 19:22.320
Like, we will, we may feel like we have to do it, but other, by not, you know, we just have to do that.

19:22.320 --> 19:28.320
So, we will provide the tooling to do this, and then you will have to decide what's important for you, right?

19:36.320 --> 19:37.320
All right.

19:37.320 --> 19:38.320
Okay?

19:38.320 --> 19:39.320
Thank you.

19:39.320 --> 19:40.320
Thank you.

19:40.320 --> 19:43.320
Thank you all.

