WEBVTT

00:00.000 --> 00:18.920
Hello everyone. My name is Annie, and I'm a software engineer at Red Hat. Today, I'll

00:18.920 --> 00:26.480
be talking about Fookie, which is UKI with guest farmer image model. Hopefully, everyone

00:26.480 --> 00:35.040
is White White early morning. Hopefully, if not, then I can get you guys all white

00:35.040 --> 00:42.240
away and alert with this. So, we're involved. Me, Vitali, who is the resident here, the

00:42.240 --> 00:51.000
next speaker, Alex, who is right there. He's an original idea, and then a bunch of us

00:51.000 --> 00:57.840
follow and good from Red Hat and hurled from Adelabs. I miss the tank Leonard for actually

00:57.840 --> 01:03.540
getting up to speed with this project and helping us through all the system D and ETI bits

01:03.540 --> 01:10.240
on the upstream side. So, thanks Leonard very much, and also for referencing this talk yesterday

01:10.240 --> 01:16.360
at his presentation. So, how this talk is going to go? Initially, I'm going to give a

01:16.360 --> 01:21.560
brief background of the basic confidential stuff and our motivation, what problem we're trying

01:21.560 --> 01:26.400
to solve and how we're trying to solve it. And then I'll go into the new stuff, which

01:26.400 --> 01:31.440
is the UKI, the Fookie bits. And then I'm going to describe some of the current status

01:31.440 --> 01:38.600
and how things are looking on the upstream with the different bits of work streams.

01:38.600 --> 01:45.960
So, what is the combination of them? So, on the left is the traditional virtual machine,

01:46.040 --> 01:54.960
the hypervisor, which is running on the host, is actually able to look into guest memory pages,

01:54.960 --> 02:00.960
able to read and also manipulate those pages if it wants to, and the guest is no control over

02:00.960 --> 02:09.960
that. Welcome to confidential VMs. Here, a part of the virtual machine memory is encrypted,

02:09.960 --> 02:17.960
so that the host and the hypervisor no longer is able to manipulate guest memory, at least

02:17.960 --> 02:24.960
the parts that are encrypted. So, the guest can keep its circuit stuff, its things that

02:24.960 --> 02:33.960
it wants the host to not know or tamper with. So, that's the fundamentals of confidential

02:33.960 --> 02:40.960
VM. There are a lot of hardware technologies, both on the internal and MD side that guarantees

02:40.960 --> 02:46.960
some of this confidentiality. For example, on the MD, we have serve service and service

02:46.960 --> 02:52.960
and P. And the details of this is beyond the scope of this talk. On the intel, we have intel

02:52.960 --> 03:04.960
specific technology that guarantees encryption of guest memory. There is one thing, though,

03:04.960 --> 03:09.960
that the host and the hypervisor can still disrupt the execution of the guest. So, they can

03:09.960 --> 03:17.960
deprive the guest of CPU cycles, that is still possible, but that's beyond the proof of confidential

03:17.960 --> 03:24.960
computing. And then the description, so the secrets can still be in the guest disk images

03:24.960 --> 03:29.960
and the guest is responsible for encrypling that as well. So, that is also outside the

03:29.960 --> 03:38.960
pervue of this talk. So, what are we trying to do here? So, what we are trying to do is

03:38.960 --> 03:46.960
we are trying to make sure that the confidentiality with respect to guest firmware is achieved

03:46.960 --> 03:53.960
when the guest is deployed in an external cloud provider, for example, AWS or some other

03:53.960 --> 04:04.960
tenant. And in theory, our idea can work on the non confidential case as well, but it makes

04:04.960 --> 04:14.960
more sense in the confidential environment. So, why is it necessary? Because, so the guest firmware

04:14.960 --> 04:21.960
is actually running within the context of the guest. So, it is important to make sure that

04:21.960 --> 04:26.960
the guest is able to trust the firmware, but the problem is that in the traditional case,

04:26.960 --> 04:33.960
the firmware comes from the cloud provider and one does necessarily know how that firmware

04:33.960 --> 04:39.960
is built. And even though we can measure it and get hatches, the guest doesn't necessarily

04:39.960 --> 04:43.960
know what that hatch really means. Unless there is the cloud provider changes, the

04:44.960 --> 04:49.960
firmware, and then we get a different hash. But so, the guest can detect that the new

04:49.960 --> 04:55.960
firmware is present, but the guest doesn't necessarily know what that really means.

04:55.960 --> 05:02.960
The other thing is that the guest depends on things like TPM and secure UFI, variable

05:02.960 --> 05:09.960
store, etc. And so, all of these, for all of these things, without our idea, without

05:10.960 --> 05:15.960
this idea, the guest has to depend on the cloud providers firmware, which is not

05:15.960 --> 05:26.960
very convenient in the confidential setup. So, our idea is that the guest brings

05:26.960 --> 05:33.960
in its own firmware as a part of the UKI. And then we have a mechanism so that

05:33.960 --> 05:38.960
just the guest loads it in the memory, and then there is a normal VM reset, and then the

05:38.960 --> 05:45.960
firmware is then activated. So, the other interesting and useful thing is that we do not

05:45.960 --> 05:51.960
need to realize the guest, just a simple normal operating system reset is good enough to

05:51.960 --> 05:58.960
activate the firmware. From the cloud provider's perspective, this is also useful because

05:58.960 --> 06:05.960
the cloud provider can then say update the firmware, because they might have found

06:05.960 --> 06:09.960
some security issues or whatever. And then when the firmware changes, it's

06:09.960 --> 06:13.960
mentioned when changes from within the guest, but the guest doesn't care, because

06:13.960 --> 06:17.960
they're not going to run that firmware anyway. They're going to use the firmware that

06:17.960 --> 06:23.960
it brings with it, which it knows and it trusts. And so, it really doesn't

06:23.960 --> 06:27.960
matter what the cloud provider is doing with its own firmware.

06:27.960 --> 06:37.960
So, the reason why we prefer that the guest brings the firmware

06:37.960 --> 06:43.960
for with UKI is that, say for example, if the guest has provided the firmware in a

06:43.960 --> 06:48.960
different way, like for example, maybe the cloud provider has a UI where it can

06:48.960 --> 06:54.960
upload the firmware. Then the problem is that it requires storage on the part of the cloud

06:54.960 --> 07:01.960
provider, which the cloud provider may not be willing to spend money on.

07:01.960 --> 07:06.960
And the other thing is that if the firmware is actually built in as a part of its

07:06.960 --> 07:12.960
guest storage, then the cloud provider may not actually able to read that storage,

07:12.960 --> 07:18.960
the guest storage, because maybe it's a past storage, which the cloud provider

07:18.960 --> 07:27.960
doesn't have access to it. So, our idea solves all these problems when we use

07:27.960 --> 07:34.960
FUKI. Now, let's look at the broad idea of how this works. Say the virtual

07:34.960 --> 07:42.960
machine that the ending is running brings a firmware with it as a part of the

07:42.960 --> 07:50.960
UKI. Now, this firmware, this UKI then loads the firmware and other boot components,

07:50.960 --> 07:57.960
like kernel image or any carding into memory. So, you can see that on the left

07:57.960 --> 08:03.960
and the right, both cases, it's actually confidential VMs. On the left, it is running cloud

08:03.960 --> 08:10.960
providers provided firmware, which is not necessarily trusted. On the right, it is

08:10.960 --> 08:17.960
the same virtual machine and same Coco context, but it is running the trusted firmware, which

08:17.960 --> 08:25.960
the end user brought with the UKI. So, in both cases, we have encrypted part of the memory

08:25.960 --> 08:32.960
and a non-encrypted part shared part of the memory. So, the first the UKI loads these components

08:32.960 --> 08:38.960
firmware, kernel image and a carding into the shared part of the memory. And then what it does

08:38.960 --> 08:45.960
is that it tells the hypervisor where it loaded these components. Now, the green dot

08:45.960 --> 08:50.960
is actually hypervisor interface through which the UKI talks to the hypervisor.

08:50.960 --> 08:54.960
The hypervisor could be more something else. And the interface is something that is

08:54.960 --> 09:00.960
generic, which should design something which is works across all different kinds of

09:00.960 --> 09:07.960
hypervisor. So, now that the hypervisor knows where these components are loaded, it

09:08.960 --> 09:14.960
the UKI initiates, once it tells the hypervisor where it loaded these components, the

09:14.960 --> 09:21.960
boot components in the memory, it initiates a normal guest reset. And then, when hypervisor

09:21.960 --> 09:29.960
realizes that the guest is triggering a reset, it copies the firmware, which the UKI

09:29.960 --> 09:36.960
loaded in the shared memory into a standard location where normally you will find the firmware

09:36.960 --> 09:40.960
for that particular platform. So, these addresses are kind of standard addresses where

09:40.960 --> 09:49.960
farmers are loaded. And so, once the VM resets, the instruction pointer goes directly at the

09:49.960 --> 09:55.960
address where it will find the firmware. And it does find the firmware, but now it is the

09:55.960 --> 10:02.960
one that the guest trusts, not the one that came with the cloud provider. So, now, once

10:02.960 --> 10:08.960
the firmware is loaded, and the guest is reset, it starts its normal execution. The firmware

10:08.960 --> 10:13.960
validates the kernel image and any tardy, because the firmware knows which of these images

10:13.960 --> 10:19.960
can be trusted, and it knows the hatches of the kernel image and any tardy that are

10:19.960 --> 10:25.960
trustable. And so, the firmware is now responsible for loading the rest of the boot components,

10:25.960 --> 10:30.960
which is kernel image in tardy, etc. So, this kind of looks like the guest memory after the

10:30.960 --> 10:39.960
reset. Now, let's talk about why we are using UKI for bringing in the firmware. Now, UKI

10:39.960 --> 10:45.960
is a very common standard mechanism used by most distros to already deploy kernel images

10:45.960 --> 10:52.960
and any tardy and kernel command line. And they can be also customized for different target

10:52.960 --> 10:59.960
environments. So, you could have UKI add-ons that are specific to the target platforms.

10:59.960 --> 11:06.960
UKI can be signed. So, you can verify this signature that it comes from a trustable source,

11:06.960 --> 11:12.960
like for example, Red Hat. They can be revoked using as back mechanism. So, all these things

11:12.960 --> 11:20.960
comes free when we use UKI. This technology is already there. And the UKI can be deployed in

11:20.960 --> 11:28.960
kernel packs management tools, like RPM or DNF or whatever. So, all of this is comes free free

11:28.960 --> 11:34.960
when we bundle the firmware image with the UKI. So, it is very convenient container to deploy

11:34.960 --> 11:41.960
these guest firmware images in the cloud context. So, let's see how we can actually build a

11:41.960 --> 11:48.960
full key. So, UKI file or Python standard tool that can be used to build any UKI. So, today

11:48.960 --> 11:56.960
you can bundle your own firmware, your kernel, any tardy, kernel image, kernel command line etc.

11:56.960 --> 12:02.960
using UKI file. So, it is the same tool has been extended to add a new command line called

12:02.960 --> 12:10.960
UFI firmware. And this one takes a name of directory, which contains a firmware image. And you

12:10.960 --> 12:17.960
have multiple of these past in the command line. So, each of these directory contain different

12:17.960 --> 12:25.960
firmware image that you want to put in this tardy to UKI. And they can be for different platforms

12:25.960 --> 12:32.960
or they can be of different kinds of bills, get be debug build or release build or whatever.

12:32.960 --> 12:37.960
Now, to distinguish these different firmware that are in the UKI, the name of the directory

12:38.960 --> 12:45.960
is used as a firmware ID. So, it takes the name of the directory as an ID for the firmware

12:45.960 --> 12:52.960
that it can match with particular tardy platform. This was Linux idea. We can see the upstream

12:52.960 --> 13:00.960
discussion on that with Linux. So, he thought that we should just keep it simple for now. So, it takes the

13:00.960 --> 13:08.960
directory uses that as a firmware ID. And then generates one of these sections called EFI

13:08.960 --> 13:13.960
firmware within the UKI. Now, I will tell you how this matching happens for particular

13:13.960 --> 13:21.960
platform. Now, some of it has already been described by Linux yesterday stock. So, I am just going

13:21.960 --> 13:29.960
to give you a little bit more details on this. So, what happens is that the UKI calculates different

13:30.960 --> 13:35.960
hardware IDs for the target platform called computer hardware IDs. Now, these are hashes based

13:35.960 --> 13:44.960
on looking at different things in the hardware. And then what it does is that it goes into the

13:44.960 --> 13:51.960
hardware ID sections in the UKI. And then tries to match one of those key values that is calculated

13:51.960 --> 13:57.960
for the virtualized hardware with one of the entries in the hardware ID section. Now, the hardware ID

13:57.960 --> 14:04.960
section contains a mapping between the speed value and a firmware ID string. This string can be

14:04.960 --> 14:13.960
anything. And in our case, this string is actually the name of this directory, which is used

14:13.960 --> 14:21.960
as the firmware ID. So, now it finds a matching entry in hardware ID that matches the speed value.

14:21.960 --> 14:29.960
And then it goes to the UFI firmware section of the UKI and tries to find a firmware that matches

14:29.960 --> 14:37.960
that firmware ID. So, then once it finds it, it loads the first matching entry. So, now, we have

14:37.960 --> 14:46.960
a speed value for a particular firmware matching a specific firmware entry in this UKI. So, that is how

14:47.960 --> 14:53.960
the firmware that is loaded in the memory matches a specific hardware. So, you can you can look

14:53.960 --> 15:00.960
into the highly recommend looking into the recording of Leonard's truck yesterday because he briefly describes

15:00.960 --> 15:09.960
this also. And then the thing is this, once this firmware is loaded, it will load the other boot components,

15:09.960 --> 15:19.960
which could be the, the firmware image, sorry, the kernel inequality or the command line or the kernel

15:19.960 --> 15:31.960
BZ or whatever. So, the task of loading the firmware entry, it is done. And then that changes loads the other stuff.

15:31.960 --> 15:44.960
Now, this diagram you remember from few previous slides. The interesting thing is this green line here,

15:44.960 --> 15:54.960
which is actually a well defined interface that the UKI will use to talk to this hypervisor. And this is this, this, this part is

15:54.960 --> 16:02.960
we are already in the process of designing. And the thing is, one of the things that we have to figure out is how we can design this in a hypervisor

16:02.960 --> 16:10.960
agnostic manner. So, that it can work across all different hypervisor. So, you have something for Kimo, a Kimo drive specific driver,

16:10.960 --> 16:23.960
and maybe something for some other hypervisor. And then you have a hypervisor agnostic layer that then uses that interface to talk to the hypervisor

16:23.960 --> 16:31.960
and hypervisor agnostic way. So, some of those things are still in the design phases and we are still trying to figure out how to make that work.

16:31.960 --> 16:45.960
We are requesting inputs from the community to help us in this effort. Kimo uses firmware config as a well defined interface for the guests to talk to the host.

16:45.960 --> 16:57.960
So, we need to also find a ways to implement firmware config support into UKI. And then there is a question of what happens when you have multiple kernel learning

16:57.960 --> 17:07.960
utility bundled into the same UKI, then how do you make sure that the firmware loads the right kernel image, the right in it already image, etc.

17:07.960 --> 17:21.960
So, some of these are open questions. We will, we have not yet figured out how to do that. We just just kind of incrementally going through the process of merging small small bits.

17:21.960 --> 17:45.960
And so, we very much appreciate your inputs and suggestions in this direction. So, talking about current status, like I said, some parts of it is already merged in mostly in around Kimo and system D. I also posted a patch earlier this week.

17:45.960 --> 18:07.960
That kind of proposes the Kimo hypervisor interface that we planned to implement. So, in the reference section there will be links to that patch. So, please feel to have a take a look and provide us any input if you think it will be useful.

18:07.960 --> 18:33.960
So, here is already an EFI firmware section in the UKI today. So, that part has already merged, but some of the parts that are not in place today. For example, here this matching part where particular hardware and ID entry maps to a particular EFI firmware entry, some of the bits are still working progress and we have to still merge those bits.

18:33.960 --> 18:48.960
So, yeah, so it's reference, feel free to take a look at all the PRs and then if you have any suggestions, let us know.

18:48.960 --> 19:04.960
Our Kimo forum talk last year in September details most of the idea and gives us more details. So, feel free to watch the recording and let us know what you think.

19:04.960 --> 19:20.960
That's it, and questions. Yes.

19:20.960 --> 19:49.960
That's right. So, the question is that why are we proposing to attach different formary images for different platforms in the same UKI?

19:49.960 --> 20:14.960
So, the idea is that the UKI can contain different firmware bundles for different platforms so that way we can deploy the same UKI to different target environments, but then when you are actually running on that target environment based on the hardware, the UKI selects the right firmware for that particular so architecture.

20:14.960 --> 20:32.960
So, that's right. I mean the kernel matching, I'm not sure if that goes to the hardware ID section that I described, but the, like potentially today you can have multiple kernel images within the same UKI.

20:32.960 --> 20:39.960
I don't think that, like, this stuff can be multi architecture, yes, you could have different kernel.

20:39.960 --> 20:50.960
Yeah. Stop itself. Right. I mean, it's just that the example is like, may that be rather, you know, something like different colors, for example.

20:50.960 --> 21:01.960
Yeah. So, the UKI stop can build only for one architecture, but, you know, like, so it may not be a different completely different architecture, but you can have different kinds of bills or differences.

21:01.960 --> 21:10.960
For example, one build supporting secure boot, and the other is there's not supposed to secure boot, so you can have like different bills for the same architect.

21:10.960 --> 21:30.960
So, what Alex was saying is like he made my life easier. So, he's like saying that we can have one focus for different, like cloud deployments, like one for easy to or one one one for Azure, one for AWS, one for whatever, something else.

21:30.960 --> 21:37.960
Yes.

21:37.960 --> 21:46.960
Yes.

21:46.960 --> 21:56.960
Okay.

21:56.960 --> 22:01.960
Yes.

22:01.960 --> 22:27.960
So the question is that whether it's possible to have the same UKI with the same firmware that we built locally as well as

22:27.960 --> 22:39.960
potentially if it works and if the hardware IDs are, if it generates, like basically you have to

22:39.960 --> 22:45.960
generate, you have to make sure that the hardware IDs against which it matches are same for both

22:45.960 --> 22:47.960
environments.

22:47.960 --> 22:51.960
I think probably also like comment on this.

22:51.960 --> 22:56.960
You might want to try to get the same exact cache for example, this initial measurement,

22:56.960 --> 22:58.960
on your cloud and locally.

22:58.960 --> 23:03.960
You may or may not actually, because these initial measurements, for example, covered the

23:03.960 --> 23:04.960
CDI table, right?

23:04.960 --> 23:08.960
And they describe the particular instance, right?

23:08.960 --> 23:10.960
How many drives you have?

23:10.960 --> 23:11.960
How many are you?

23:11.960 --> 23:12.960
How many are you?

23:12.960 --> 23:13.960
How many are you?

23:13.960 --> 23:14.960
How many are you?

23:14.960 --> 23:15.960
How many are you?

23:15.960 --> 23:21.960
So unless it can feel exactly the same configuration locally and on the cloud, you will likely get

23:21.960 --> 23:22.960
different measurements.

23:22.960 --> 23:23.960
Right.

23:23.960 --> 23:26.960
You can predict this measurement.

23:26.960 --> 23:27.960
Right.

23:27.960 --> 23:33.960
So even if you have different measures, they are expected by you.

23:33.960 --> 23:39.960
So what Vitali was saying is that basically for local build, as well as for the cloud build,

23:39.960 --> 23:44.960
we could have different measurements because they are different environments.

23:44.960 --> 23:49.960
But the good thing is that you can predict what those measurements value would be in both

23:49.960 --> 23:50.960
environments.

23:50.960 --> 23:59.960
Is that right?

24:00.960 --> 24:02.960
Thank you.

24:02.960 --> 24:03.960
Right.

24:03.960 --> 24:09.960
Thanks.