WEBVTT

00:00.000 --> 00:18.200
I guess that's okay for the microphone. Great. So, hello everybody. My name is Andrey Seminoff

00:18.200 --> 00:26.040
and I work at the company and the XPNG team. So, this talk is about implementing AMD

00:26.040 --> 00:36.200
ACV-7 technology, exemptive provider, well, exempt project community with all behind the

00:36.200 --> 00:43.400
schedule comparing to other heavy providers on that technology. So, I'll work just trying to

00:43.400 --> 00:56.240
fill the gap on that. So, what's in the agenda, I will talk first briefly on AMD

00:56.240 --> 01:05.800
CV technology and presented basically to, so it will be more likely for later. After that,

01:05.800 --> 01:11.680
we will go through this in more deeper, on more deeper level. I will talk about AMD

01:11.680 --> 01:19.560
ACV-7 processor as well as some work to need to be done to be compliant with CV technology.

01:19.560 --> 01:29.120
So, we'll talk about platform, finite state, automata, guest, finite state automata, about

01:29.120 --> 01:40.120
exempt specials, way to address exempts and hypercalls. And some challenges we need to implement

01:40.120 --> 01:49.080
and to deal with to emulator, emulation, emulator and some instructions. After that, I will

01:49.080 --> 01:53.960
talk a little bit about what's going on for now work. It's not finished yet, but we advanced

01:53.960 --> 02:04.520
well, active led once and then that's all. Let's go ahead. So, what's AMD ACV technology?

02:04.520 --> 02:12.360
ACV is a secure, secure, uncrupet virtualization. Basically, it's a computer, computer technology

02:12.360 --> 02:20.320
from AMD and which targets specifically women, environments. Well, you know, couple of words,

02:20.400 --> 02:30.160
it's about it's to have for EM and encrypt it environment where it's up to them to decide whether

02:30.160 --> 02:37.560
it wants or not to share its data with other software running on the platform as other

02:37.560 --> 02:49.240
VMs and even the hypervisor. So, only DVM, only 7 leveled VM can access the decrypt this data

02:49.240 --> 02:59.640
and that's about it. So, it's an extension to AMD, V technology, it's an AMD virtualization

02:59.640 --> 03:09.920
technology. And it comes in building blocks. First of all, it's a semi-secure memory encryption,

03:09.920 --> 03:18.480
which is the basis for this technology. It's not particularly, it's not in the ACV specification,

03:18.480 --> 03:25.960
but it's a certain technology is based on it. After that, you have ACV playing ACV, ACV,

03:25.960 --> 03:34.360
encrypted state, ACV and TIO. You will just present it briefly in the next slide. So, ACV,

03:34.400 --> 03:47.600
what's this about? They introduced an AMD on memory controller, an encryption, an encryption

03:47.600 --> 03:56.240
unity. So, the memory can be encrypted. There's one IS and encryption key generated at

03:56.240 --> 04:03.560
the boot. It's not persistent key. And whether or not the memory is encrypted or not, it's up

04:03.560 --> 04:14.960
to the systems software to decide. The decision is made with the page tables and the special

04:14.960 --> 04:25.840
flag is so called CBIT, which you can position in the PTE. And the memory writes to this page

04:25.840 --> 04:35.680
will be encrypted by this encryption unity. What's the mention, maybe, that position in this CBIT

04:35.680 --> 04:43.440
will not encrypt the memory by some magic. You try to that. So, if you want to encrypt some

04:43.440 --> 04:50.640
memory, which already, some data, which already in memory, you probably need to use some

04:50.640 --> 04:57.760
special procedures as an encrypt in place. So, we established two mapping, watch with the one

04:57.760 --> 05:04.680
you will read the data and through the next one you will write it. So, the memory will be encrypted.

05:05.640 --> 05:12.440
So, next, you have a CB, play a CB. So, it will produce AMD secure processor. So, it

05:12.440 --> 05:20.120
is a piece of hardware, which is in the heart of this technology and the one which enforce

05:20.120 --> 05:26.760
the whole technology. In fact, it's allowed to generate VM encryption key, which is bound to

05:26.760 --> 05:38.840
ICWDVM. If, for VM, which we try to print in 32 bit mode, basically all the memory is encrypted.

05:39.960 --> 05:50.840
In memory, which is in the 4VDM, which runs in the 40, for 64 bit mode, you can, the VM can choose

05:51.000 --> 05:59.400
which the memory is encrypted. All instruction fetch done by the processor and hardware,

05:59.400 --> 06:05.560
patch double walls are also encrypted, regardless of the state of the CBT. So, it's obviously

06:05.880 --> 06:21.320
a good thing to have to avoid some code injection, but you will we will see that basically

06:22.120 --> 06:31.000
we have to deal with some with this in some special way, because it needs to be a special address

06:31.160 --> 06:38.120
in the hypervisor, this thing. So, encryption is bound to the physical address, so you can just

06:39.960 --> 06:48.840
copy the page from one place to another, because it does not work. You don't can have DMA to this

06:48.840 --> 06:56.920
DMA to this encrypted pages. Well, in a couple of words, what's next on this CB, play a CB,

06:57.400 --> 07:08.440
yes. So, all CPU and FPU context is encrypted, they have some, they enlarged, these are

07:09.720 --> 07:13.640
instructions said to introduce some special instructions to handle with this,

07:14.520 --> 07:24.200
some new exception handler, which is the VM utilization communication exception, and they introduced

07:24.280 --> 07:30.840
GHDB protocol, so even the VM is totally encrypted, the registers in the memory,

07:31.960 --> 07:42.200
the VM can communicate with the hypervisor. After that come a CB is NP, and so secure nested pages,

07:42.840 --> 07:53.480
that feature assume integral decontrol on the memory, so each if hypervisor was software

07:54.360 --> 08:01.800
tried to temper with memory, the VM will detect it, it's based on reverse map, table, stuff like that,

08:01.800 --> 08:09.720
so it's, we target this work later, it's not still done, and more recently, you have CIO,

08:10.520 --> 08:18.200
CIO, CIO, so it allows to establish a trusted channel to the device and to have

08:18.840 --> 08:30.680
communicate with the device with the encrypted memory. Okay, so the first one, what we had to do in

08:30.680 --> 08:39.400
the exam, it's implement, I am this CPU processor. Also known as, we call it SP, also known as SP,

08:40.120 --> 08:46.520
probably the most people who know a little bit about this technology, you know, the SP, because for

08:46.600 --> 08:54.600
historical reasons, so basically the PCI device, the most unknown thing we had to address on that,

08:54.600 --> 09:01.880
it was, it's that this device have multiple interfaces, it's the only one PCI BDF,

09:04.440 --> 09:13.400
all these interfaces covered by one PCI BDF, so you have also, you have also

09:13.400 --> 09:19.320
cryptocoprocessor, which behind this BDF, and also some trusted execution environments,

09:21.640 --> 09:28.840
and in KVM, they have all this stuff running in the host kernel,

09:29.480 --> 09:36.680
obviously we can run it, we can run it in the host kernel, because the VM

09:36.680 --> 09:43.880
managed on the exam, they provide a level, so we need to create some parallel to a

09:43.880 --> 09:51.560
light interfaces for the other functions, if if some other piece of the software want to

09:51.560 --> 10:01.560
use this other interface, other interfaces, other than the SP, so SP presents some mailbox protocol,

10:01.560 --> 10:08.680
which is bi-zixing about three-christic play, and that for some API, we can be grouped in

10:08.680 --> 10:15.560
following area, platform management, guest management, remote attestation supports,

10:15.560 --> 10:24.120
guest migration, some other stuff is copy-swapping, so one, so let's take a look, it'll be

10:24.120 --> 10:35.400
about platforms finance state, automata, nothing that dig about that, the ASP handle the platform

10:35.400 --> 10:43.240
context and manage it, so basically what you need to do certainly it's to need the platform,

10:44.440 --> 10:52.760
which is not some particularly complicated thing to do, you have all bunch of functions,

10:52.760 --> 11:01.640
which allows you to create the platform and development key, defy element key, and so for us,

11:01.640 --> 11:11.960
so one, so we didn't have any trouble to go through that, it's kind of three-state automata,

11:11.960 --> 11:23.640
nothing special on that, so for the guest, the guest life cycle is managed in Zen,

11:23.640 --> 11:33.240
so we need to do some bunch of functions, which allows to launch the guest, we have the

11:33.240 --> 11:40.360
launch start, which creates the guest context, activate, which generate the VM encryption key,

11:40.360 --> 11:50.680
and bounce it to the guest ACID, launch method, take the measure of the guest in some

11:50.760 --> 12:01.400
TPM, like manner, so we, it's not the whole picture of the guest state automata,

12:01.400 --> 12:10.520
but because there also states related to migrating the VM from one host to another, we don't

12:10.600 --> 12:21.880
talk about that, maybe the most important in most, almost, how could I say, not difficult,

12:21.880 --> 12:31.080
but need to pay attention to that, it's launch of data, as I said, the guest will run in

12:31.720 --> 12:38.920
the guest instruction fetches, we'll be, we'll consider that the guest is encrypted,

12:40.440 --> 12:48.280
so you need to encrypt the initial image of the guest, or the firmware you're willing to run before

12:48.360 --> 12:57.640
the guest starts as grab, or we, or we map stuff like that, so we need to encrypt first the whole

12:57.640 --> 13:05.720
guest memory, because the hardware will consider it's encrypted, so it needs to be encrypted,

13:08.440 --> 13:17.560
well, not mixing to say otherwise on that, it's pretty straightforward, things to do,

13:17.640 --> 13:25.800
we introduced all this, all this hook to the exam, to the server, to the SP processor,

13:28.200 --> 13:40.760
in all this SVM operations, which already are exams, so we can hide the most of it from the

13:40.760 --> 13:49.960
toolstack and stuff, so pretty straightforward, okay, so it starts to be a little bit trickier,

13:49.960 --> 13:57.800
for example, because then have a way to provide services to the VM through the hypercalls,

13:58.440 --> 14:04.520
unfortunately for us, this hypercalls use virtual memory,

14:06.680 --> 14:16.360
ABI, so the guest who will unlock this hypercall through the VM call, the same call on the

14:18.360 --> 14:26.120
instruction will provide some virtual addresses, as I said, unfortunately and happily,

14:26.280 --> 14:31.400
because it needs to be done like this, with linear address in the guest,

14:33.880 --> 14:41.560
have no meaning for the hypervisor anymore, because all the page tables are encrypted,

14:41.560 --> 14:49.000
so basically, before that the software page wall, through the page tables of the guest,

14:49.960 --> 14:58.920
we are possible with a CV technology, you just can't guess what's about linear address,

14:58.920 --> 15:05.000
so you can't use this ABI, which is in the end, and basically we need to change it,

15:06.840 --> 15:14.920
there's also, in the end community, upstream, and the project, there's many talks to change it,

15:14.920 --> 15:23.560
because it's all the subject to address in the end, so we changed it on our level to, by modifying

15:25.960 --> 15:35.480
a couple of macros and stuff like that, so it's not very hard to do,

15:36.040 --> 15:44.680
also, some challenges on the hypercall pages, which was dynamically constructed,

15:45.400 --> 15:49.480
you can do this anymore, and then, while some tricks and stuff to be,

15:51.480 --> 15:58.760
to be compliant with the way the guest invokes the guest services.

15:59.720 --> 16:07.560
Well, where did it become a little bit more trickier, it's about how you emulate the devices

16:07.560 --> 16:20.120
and how you, how you emulate some instructions, so to IO space, there's just two way to use the device

16:20.120 --> 16:30.040
in X86 architecture, so the IO space, so in out instructions, and memory access is for more

16:30.040 --> 16:41.960
complicated devices, so basically it's memory, okay, so in the IO space, you have in out instructions

16:42.040 --> 16:50.040
stuff like that, as I said, your instruction pointer pointer is just in others, so you can

16:51.880 --> 16:59.240
access to the real instruction, which is in it, so basically, happily for us, you have a feature

16:59.240 --> 17:06.920
called decodacist in the RAM architecture, which provide you some information as per the IO port,

17:07.000 --> 17:16.520
direction, size, and easy, it's a string operation, and we can guess, we can guess what,

17:16.520 --> 17:26.600
when guests exit it on IO, where it makes it, we can guess what's happening, so we can do stuff

17:26.600 --> 17:33.320
in the melee stuff for this. Obviously, if there's a string operation, we can do anything,

17:33.400 --> 17:44.840
so probably need to, in Linux drivers, we need to touch this, and there's no way to guess what's

17:44.840 --> 17:54.600
happening if string IO operation are used. Almost the same thing for memory mapped IO,

17:55.400 --> 18:01.080
you have, when you exit on nested page fault, it's how the memory mapped IO I'm related,

18:01.960 --> 18:10.040
only what you have, it's only a narrow code, and the fault in guests, physical others,

18:11.000 --> 18:17.640
so happily for us, there's decodacist, which provides instruction lengths and instruction bytes in the

18:17.640 --> 18:27.240
VMCB, but it's in theory, and fortunately, there's some corner cases, which are hard to handle,

18:28.600 --> 18:37.000
related to, there's a processor, the rat has related to map map technologies,

18:37.960 --> 18:44.600
as some other cases, which we can handle, so it's not very, very stable,

18:46.680 --> 18:54.360
maybe need to avoid this memory IO access in your guest, if you want to use

18:54.600 --> 19:02.440
a 7 technology, well, you can't do this to memory reference, if you use some, if you're

19:02.440 --> 19:13.560
driver or use some instructions, you can't guess what's happened in guests, so basically,

19:14.520 --> 19:20.040
as far as I know, in linings, you have some feature called the unroll,

19:22.760 --> 19:31.480
unroll IO strings, so this instruction will not be used, but still,

19:33.000 --> 19:40.680
some stuff can be evaluated, okay, well, some other, what's about exceptions,

19:40.680 --> 19:47.160
basically, the instruction emulates, by exam, all can be evaluated, most of them,

19:48.120 --> 19:53.160
obviously, undefined instructions you can, nothing to do with that,

19:55.400 --> 20:03.080
so you can have must be disabled for guests, move to the control registers,

20:03.080 --> 20:07.720
basically, decodacist will help, you provide the direction of the operation,

20:07.720 --> 20:15.000
and also general purpose, but registers invoked in that, and so one is so forth,

20:15.800 --> 20:25.160
basically, all the instructions fit well in that, some things you can, the task switch on

20:25.160 --> 20:32.440
80, x86 feature, on x86 architecture, you can handle, it's okay for everything injection too,

20:32.600 --> 20:42.360
yes, so I have not that much time, what's the next to be done, so to run one x and

20:42.360 --> 20:52.440
a cv uncritly state environment, so we need to add up, then depend on parts in linings kernel,

20:53.480 --> 21:00.840
also add up to speed rivals, in example, special guest management on that, implement

21:00.920 --> 21:08.440
GHCB guest host common block protocol exam, add up to different exam parts for emulation,

21:09.640 --> 21:19.000
hv and function table stuff like that, and that's basically what we did and what we're doing,

21:19.800 --> 21:22.520
so thank you, everybody, if you have questions.

21:22.600 --> 21:36.200
How do you do migration? For now, we didn't address that, but yeah, there's a whole bunch of

21:37.080 --> 21:47.160
ASP interface, which allow to migrate one guest, from one host to another host, what's going on

21:48.120 --> 21:54.440
two hostes before he established trusted channel in a testing with AMD processor, he's talking

21:54.440 --> 22:00.520
with AMD processor, they all the all the processors are seven-level processors, there was seven-level

22:00.520 --> 22:07.640
platforms, and they established trusted channel, after that, he realized I can just

22:08.520 --> 22:18.760
encrypt memory in some common key to the both platforms, put it through the channel to the other

22:18.760 --> 22:26.520
host and decreed it and uncrited with a new VM encryption key, so it runs on, so it's basically

22:26.520 --> 22:48.920
the operation, actually we didn't did that yet, so yeah, sorry, can you repeat the question sorry,

22:49.880 --> 22:57.480
can you repeat the question? Who wouldn't, can you repeat his question?

22:57.480 --> 23:06.200
Ah, okay, okay, because those are the audience, yeah, so okay, sorry, so is there any work around

23:06.200 --> 23:14.760
for memory or your emulation? No, you okay, so AMD position from what I understand is that

23:15.400 --> 23:23.640
you don't have to do this because the next step is the CVES encrypted state, so basically what's

23:23.640 --> 23:30.760
happening? You're going out with the guest on the AMX, it's stuff like that, and you can't

23:30.760 --> 23:36.600
not sing guests because the whole registers are encrypted, the whole memory is encrypted potentially

23:37.480 --> 23:46.120
so it's a introduced some as I told the guest host communication block, so some chunk of

23:46.120 --> 23:53.240
memory which is decreed it, so well I will not go deep in the technology, but basically you

23:53.240 --> 24:02.600
the guests need to provide special information to the supervisor that I want this, I want to

24:02.680 --> 24:10.440
access this memory, I'll address you, so there's a way to do this with a new next step of the

24:12.360 --> 24:21.560
with the next step of the CV technology, so basically in fact a CV playing a CV has not that much

24:21.560 --> 24:30.280
meaning because first obviously run guest pretend it's secure and have all the register accessible

24:30.360 --> 24:39.480
by the supervisor, it's kind of joke because at some point it will be leaking to the registers,

24:39.480 --> 24:47.160
some data potentially you want to protect, so it's just a step that you can

24:48.360 --> 24:56.120
a CV playing a CV is just a step to the next level, to the CVES, to be to run fully

24:56.360 --> 25:05.240
encrypted VM which is really a short to have security properties, confidentiality and SNP provide

25:05.240 --> 25:16.360
also integrity control on that memory, you mentioned that GMA is not available, so will this impact

25:16.920 --> 25:28.200
performance? So the question is, is GMA is not available for 7, in fact it's not exactly

25:28.200 --> 25:38.120
it, you can't communicate in playing the CV, up to a CV, you can't communicate with the device

25:38.920 --> 25:45.240
through the encrypted memory because device will not understand it, so probably you can, but

25:45.400 --> 25:54.120
it's wrong, not be that mean, so you have to declare this memory through the CV as a shared

25:54.120 --> 26:00.840
memory, so all these DMA accesses need to be done through shared memory and the old communication

26:00.840 --> 26:05.240
with the other parts of the VMs because you have some priority allies, drivers, stuff like

26:05.240 --> 26:10.920
that or the supervisor because you all this memory need to be especially addressed to not be

26:11.000 --> 26:21.400
private, they called it private memory, so it could not be shared memory, sorry guys, time is up, so

26:23.240 --> 26:32.120
if you want, you can chat later or send me an email