WEBVTT

00:00.000 --> 00:12.000
Okay, hello and welcome to this talk on how secure are commercial risk 5 CPUs.

00:12.000 --> 00:17.840
So we will answer this question today in this talk, but let's first introduce ourselves.

00:17.840 --> 00:20.040
So this is Lucas, I'm Fabian.

00:20.040 --> 00:23.120
We are both PhD students from CISP and Germany.

00:23.120 --> 00:25.280
We are basically a big research institution.

00:25.280 --> 00:30.760
We do cyber security research and publish that at academic conferences.

00:30.760 --> 00:36.760
And yeah, with that we can already hop into basically CPU vulnerabilities.

00:36.760 --> 00:40.960
So what do we have there in that space?

00:40.960 --> 00:47.080
And you might know these two meltdown inspector, they were discovered in 2017 and then disclosed

00:47.080 --> 00:49.600
in 2018 to the public.

00:49.600 --> 00:56.400
And after that, a lot of other CPU vulnerabilities came to light mostly on X86.

00:56.400 --> 01:01.080
But then at some point people also discovered vulnerabilities on ARM as well.

01:01.080 --> 01:07.120
And this is still trying to field where we do our research with vulnerabilities, even

01:07.120 --> 01:08.120
a stackwork.

01:08.120 --> 01:12.960
We also publish that one, this one, few might have seen that as well.

01:12.960 --> 01:18.320
Now the question given that X86 and ARM CPUs have these vulnerabilities, the question

01:18.360 --> 01:24.720
is, with first-clive CPU, which is a new standard, with new CPUs, can we learn from the mistakes

01:24.720 --> 01:27.560
that were done on ARM and X86?

01:27.560 --> 01:33.160
So can we design a secure architecture from the start or will we repeat the same mistakes

01:33.160 --> 01:38.000
both in the architecture, so the design and also in the micro architecture.

01:38.000 --> 01:40.560
So what is actually implemented?

01:40.560 --> 01:45.240
And we want to structure that talk basically in three different areas.

01:45.240 --> 01:51.200
So we'll first talk about side channels, on-risk-5 CPUs, then we will talk about actual

01:51.200 --> 01:55.920
CPU bugs, and then in the end we will also conclude with transit execution attacks.

01:55.920 --> 02:02.040
And with that I pass it over to Lucas, who will talk about side channels.

02:02.040 --> 02:07.640
Then let's start with a real side-channel classic, which is RSA, square and multi-plight,

02:07.640 --> 02:09.880
to get to know what a side-channel is.

02:09.880 --> 02:13.760
Maybe you can raise your hand if you know what a side-channel is and how it works.

02:13.760 --> 02:18.400
There are nice already, many, many, where the first people here.

02:18.400 --> 02:23.240
So square and multi-plight is really classic example of a software side-channel, where in RSA

02:23.240 --> 02:27.880
you need to do explanation and because explanation is expensive, you do that bit by bit,

02:27.880 --> 02:33.320
with a nice modification that is called square and multi-plight, where your explanation

02:33.320 --> 02:37.200
basically gets shifted each time and you do a different operation depending on whether

02:37.200 --> 02:38.440
that's one or zero.

02:38.440 --> 02:42.880
And this is very bad because you do a branch in this operation and this branch can then

02:42.920 --> 02:44.840
be observed by an attacker.

02:44.840 --> 02:48.520
And by observing the attacker, the branch, the attacker can directly determine whether you

02:48.520 --> 02:54.120
exponent is zero or one and if you know RSA, you know that is very bad because that allows

02:54.120 --> 02:58.320
the attacker to basically then decrypt arbitrary messages.

02:58.320 --> 02:59.720
And how would you leak this?

02:59.720 --> 03:04.720
Another side-channel classic, a flush and reload, which does this by a shared memory.

03:04.720 --> 03:08.480
So if you assume you have descriptive library, it's somewhere in shared memory, right?

03:08.480 --> 03:13.040
So this branch that you attack lies somewhere in shared memory that both you and your

03:13.040 --> 03:18.960
victim can access and when the victim accesses this branch, it enters the cache, right?

03:18.960 --> 03:24.560
So they are both accessing shared libraries, the attacker and the victim and when they

03:24.560 --> 03:26.240
access it, it enters the cache.

03:26.240 --> 03:31.040
So what you basically do is you get rid of that code first by flushing it.

03:31.040 --> 03:36.280
There's usually instructions that do that, very convenient for side-channel attackers.

03:36.280 --> 03:41.120
And then the victim runs and depending on what the secret is, that either caches to

03:41.120 --> 03:44.440
line that you want to cache, or caches to code that you want to cache, or it doesn't cache

03:44.440 --> 03:46.280
to code that you want to cache.

03:46.280 --> 03:48.800
And now you have a controlled cache state.

03:48.800 --> 03:53.200
And afterwards what you need to do is you would need to exit right this cache state and

03:53.200 --> 03:56.120
you would do that by accessing the shared library.

03:56.120 --> 03:58.520
And if it's fast, then you know it's cached.

03:58.520 --> 04:02.760
And if it's slow, then you know it's uncaged and the victim didn't access.

04:02.760 --> 04:07.920
And so let's basically think about what were the basic requirements for this attack.

04:07.920 --> 04:11.760
So you need to share library, you obviously have that with your crypto library, you need

04:11.760 --> 04:12.760
to flush.

04:12.760 --> 04:16.640
So you need some way to get stuff out of the cache.

04:16.640 --> 04:21.600
And we will see whether that exists on a scribe and you need a timer.

04:21.600 --> 04:26.160
And this timer is exist on a scribe and there's even quite a lot of them.

04:26.160 --> 04:30.400
So the isom end-aids that there needs to be like three timers that are available, there's

04:30.400 --> 04:35.520
like are the time which is a little bit interesting to us because it's really slow when

04:35.520 --> 04:36.520
the fast timer.

04:36.520 --> 04:41.560
So there's are the cycle which causes fast as the hardware cycles go, which is nice.

04:41.560 --> 04:45.680
And then there's even something that counts retired instructions, which we will see is even

04:45.680 --> 04:50.360
nice because you can do even nicer text with that.

04:50.360 --> 04:54.000
And they are all high resolution and they all should be accessible from user space.

04:54.000 --> 04:57.600
So for a attacker that's really nice baseline, right?

04:57.600 --> 05:00.760
And in case maintenance, it looks a little bit dire on risk files.

05:00.760 --> 05:06.800
So there's like no flush instruction in the base ISR, there's only fences.

05:06.800 --> 05:09.480
But there's cool two cool tricks in risk files.

05:09.480 --> 05:13.960
So first of all, there's these tiered cores and they thought like, hmm, so this is the

05:13.960 --> 05:18.080
core that you basically saw before and the Kubernetes talks and they were like, ah, how

05:18.080 --> 05:21.480
about we implement this for core, how about we make it unprivileged.

05:21.480 --> 05:27.280
So you again have this flash clear full control over the cache, which is nice for an attacker.

05:27.280 --> 05:32.160
I don't know who has used a set in user space, but where it's there now.

05:32.160 --> 05:36.440
And then there's also the fence eye instruction, which doesn't say anything about the cache.

05:36.440 --> 05:38.880
It's there to synchronize the instructions to them.

05:38.880 --> 05:42.560
But a very easy way to synchronize the instructions to them is just to flush the cache.

05:42.560 --> 05:44.600
So all of the vendors do that.

05:44.600 --> 05:49.680
And yeah, with that you also gain a very nice cache files, even though the ISR does not

05:49.680 --> 05:52.680
say anything about maintaining the cache.

05:52.680 --> 05:56.000
And with this, then you can build a lot of attacks.

05:56.000 --> 06:00.480
Another classic one is the AST table attack, and a nice feature about the IST table attack

06:00.480 --> 06:04.760
is that you see very easily if it works, the only thing that you need to know is if this

06:04.760 --> 06:08.480
diagonals very clearly visible, then you attack worked.

06:08.480 --> 06:13.040
And you see here one example in x86, then you see another example in risk files.

06:13.040 --> 06:17.520
And all those of make need to better because the risk files cause a very nice free and

06:17.520 --> 06:19.280
have less optimizations right now.

06:19.280 --> 06:23.480
So you don't get so much noise in between.

06:23.480 --> 06:26.960
And yeah, for other sites, you can even have like new stuff there.

06:26.960 --> 06:32.000
I said before, there's this ID cycle and this instruction retired calendar.

06:32.000 --> 06:37.440
And what you can now look at is basically the data between those two, because there's instructions

06:37.440 --> 06:39.760
that take more than one cycle, right?

06:39.760 --> 06:44.080
So for example, if you have a division, a division can be like very belatency and a division

06:44.080 --> 06:45.960
is also longer than one cycle.

06:45.960 --> 06:50.760
So you get more information and just looking at a cycle calendar.

06:50.760 --> 06:57.000
By doing this, you can basically identify via the data parts in the code where heavy instructions

06:57.000 --> 07:01.680
are run and the turned out that, for example, during interrupts, more heavy instructions

07:01.680 --> 07:02.680
are run.

07:02.680 --> 07:07.080
And so you see these peaks here where the data changes.

07:07.080 --> 07:12.240
And so that allowed us, for example, to monitor interrupts also in the kernel.

07:12.240 --> 07:14.120
And the count has also worked across all modes.

07:14.120 --> 07:16.040
You can monitor interrupts in the kernel.

07:16.040 --> 07:21.440
You can also monitor the instruction code stream in machine mode, that's your thing.

07:21.440 --> 07:25.140
And this allows you to, for example, finger print interrupts, which is an interesting

07:25.140 --> 07:30.320
attack, because, for example, when you press a keyboard, this also sends an interrupt.

07:30.320 --> 07:34.600
And between your keystroke timings, there's basically a different latency.

07:34.600 --> 07:40.360
And so in a take I can from the keystroke timings, then reconstruct your key presses.

07:40.360 --> 07:45.120
And there's a lot of attacks on this, like there's a lot of fancy interesting attacks.

07:45.120 --> 07:49.040
This is not new to risk five, but it's an interesting attacks scenario.

07:49.040 --> 07:55.080
And so to summarize, on risk five, we have like accurate timings, we have even more than accurate

07:55.080 --> 07:56.080
timings.

07:56.080 --> 07:59.680
We also have this, like, look at the retired instructions, get it.

07:59.680 --> 08:02.280
But luckily Linux reacted the way back.

08:02.280 --> 08:06.000
And these timers are now kind of disabled in user space.

08:06.000 --> 08:07.840
You can still reenavid them.

08:07.840 --> 08:10.000
There's the ways around it.

08:10.000 --> 08:15.600
But most of the day limited to the perf interface, which in our opinion is where these timers

08:15.600 --> 08:18.160
should, like, be accessible.

08:18.160 --> 08:22.000
And cache maintenance is unprivileged on some processors.

08:22.000 --> 08:26.360
There's this fence, I instruction, which flashes the entire instruction cache, which is,

08:26.360 --> 08:30.160
like, implementation detail, that is there now.

08:30.160 --> 08:32.600
There's also these custom instructions.

08:32.600 --> 08:36.920
Yeah, so there's also a lot of cache maintenance stuff there.

08:36.920 --> 08:40.440
And even if you wouldn't have that, there's ways around all the stuff.

08:40.440 --> 08:45.080
So the baseline stuff is quite similar between X86 and RISC.

08:45.080 --> 08:51.280
So all the attacks you could mount before you can also mount now.

08:51.280 --> 08:52.840
Okay, yeah, thanks Lucas.

08:52.840 --> 08:55.600
So this was basically side channels.

08:55.600 --> 08:59.200
And if you ask me, side channels will always be in our systems.

08:59.200 --> 09:04.720
You can mitigate them to some degree by writing constant time crypto, for example.

09:04.720 --> 09:06.600
But they will always be there.

09:06.600 --> 09:13.600
So now let's talk about actual CPU bugs, where the CPU basically implement something

09:13.600 --> 09:19.280
differently than the ISO dictates, or has some other kind of bug.

09:19.280 --> 09:21.920
And until now this research was mostly manual.

09:21.920 --> 09:27.760
So you basically read the ISA, you read the hardware manual, you do some manual experiments.

09:27.760 --> 09:32.360
But we wanted to see if we cannot medically find such CPU bugs.

09:32.360 --> 09:37.120
And there's a nice tool for that in software world that is fuzzy.

09:37.120 --> 09:42.880
But turns out you can also apply that to CPUs, even to silicon CPUs.

09:42.880 --> 09:44.200
So how does that work?

09:44.200 --> 09:49.520
We basically buy RISC5 CPUs, so for example, the tiered cores, then also the space

09:49.520 --> 09:51.840
with cores or the SIFE cores.

09:51.840 --> 09:54.440
And we connect them here the network.

09:54.440 --> 10:01.480
And then we basically ship programs to all of these CPUs and see how they react to that.

10:01.480 --> 10:04.160
So that means we send a program and instruction sequence.

10:04.160 --> 10:07.200
We send register content and also memory content.

10:07.200 --> 10:10.440
And then we observe what happens when we execute this program.

10:10.440 --> 10:15.680
We collect that, we compare that, this is called differential, fuzzy or differential testing.

10:15.680 --> 10:19.360
And we do that with a lot of inputs like you would do in the software world.

10:19.360 --> 10:21.960
And then at some point you might spot a difference.

10:21.960 --> 10:26.920
For example, this variable I is now different, this could be, now a difference in a register

10:26.920 --> 10:29.520
or a difference in memory.

10:29.560 --> 10:34.680
So with that, we uncovered a lot of nice bugs on these RISC5 silicon chips.

10:34.680 --> 10:37.760
I first want to talk about denial of service.

10:37.760 --> 10:43.160
Here you might remember this Intel FOOF bug, which is basically a bug instruction on

10:43.160 --> 10:48.640
Intel CPUs that just that lock the CPU, so that means you have to unplug the power cable,

10:48.640 --> 10:51.760
plug it back in, and the CPU comes online again.

10:51.760 --> 10:55.440
But obviously this is bad for a cloud setting where you have multiple tenants that could

10:55.440 --> 11:00.200
just crash the machine for the cloud provider, or at least for basically the other guests

11:00.200 --> 11:02.160
that are running on this system.

11:02.160 --> 11:07.160
And we found something very similar here on one of the tier costs, the CNO6.

11:07.160 --> 11:12.360
So you can basically run this instruction sequence with this weird tiered L-bip instruction

11:12.360 --> 11:14.920
on your CNO6, and it will just hang.

11:14.920 --> 11:20.520
So as I said, you have to unplug power to bring it back online.

11:20.520 --> 11:23.400
Now the question is, why does that lock up the CPU?

11:23.400 --> 11:30.880
We also asked us this question, because it just came out of our fuzz and campaign basically.

11:30.880 --> 11:34.960
And then we always don't know what happened there, because we didn't write this assembly

11:34.960 --> 11:35.960
code, right?

11:35.960 --> 11:42.120
So we looked at this instruction here, this tiered L-bip, and it's not too special, basically,

11:42.120 --> 11:48.760
that's a normal load with an index load, and afterwards it also increments the address,

11:48.760 --> 11:53.520
it loaded from, and there's one interesting detail here that you might have spotted

11:53.520 --> 12:01.560
the tiered bottom, it says the encoding of this instruction with equal Rd and R1 is reserved.

12:01.560 --> 12:06.440
And if you read something like that, a security researchers will definitely try this,

12:06.440 --> 12:12.840
because I mean you should not use it, they tell you that, but from a security domain, this

12:12.840 --> 12:16.520
is exciting stuff, you always want to try that.

12:16.520 --> 12:22.000
And yeah, turns out, if you do that, if you use one, it can be any register, we just use

12:22.000 --> 12:26.800
T0 here in this case, if you use the same register there for us, one, and Rd in this instruction

12:26.800 --> 12:33.600
encoding, and afterwards do one register, CSR read on this register, and one other instruction

12:33.600 --> 12:38.240
that modifies this register, then you basically lock up the CPU here.

12:38.240 --> 12:43.120
And now this is nice on hardware, but we wanted to know as this, now fuzz expark, or as

12:43.120 --> 12:48.840
this synthesis spark, and for that we looked at the RTL source code.

12:48.840 --> 12:54.880
For the CNO6, we have the Open CNO6 repository with the RTL code from tiered, so we can

12:54.880 --> 13:01.040
actually look into the source code of the CPU, and we tried it there as well, and as you

13:01.040 --> 13:05.780
can see, basically, our simulator, which in this case was very later, tells us that

13:05.780 --> 13:11.160
the simulation failed at finish, because there were no retired instructions in a few seconds

13:11.160 --> 13:13.480
of simulation basically.

13:13.480 --> 13:18.160
Now the question is, okay, the CNO6 is buggy, what about the other cores?

13:18.160 --> 13:24.120
Well, on the CNO8, we also found an instruction that just hangs the CPU, it's even easier,

13:24.120 --> 13:29.840
it's just this encoding here, which is basically an illegally encoded vector instruction,

13:29.840 --> 13:36.080
we don't really know what it is, it just falls into the vector extension basically, and

13:36.160 --> 13:41.880
also on the space mid-X60, if this instruction here, which also just hangs the CPU

13:41.880 --> 13:42.880
basically.

13:42.880 --> 13:49.000
Now, the question is always, in security research, what do we do now, what are our mitigation

13:49.000 --> 13:51.520
options, so how can we prevent this?

13:51.520 --> 13:57.560
And for the CNO8 and X60, the story is kind of nice, that we can just disable the vector

13:57.560 --> 14:02.440
extension in the kernel, that prevents the denial of service, because then if someone just

14:02.440 --> 14:07.680
runs this instruction, it's a guild, so it can't hang up the CPU anymore.

14:07.680 --> 14:12.960
Obviously, that gives a performance loss if you use the vector extension in your workload,

14:13.000 --> 14:16.080
it completely breaks vector dependence softwares.

14:16.080 --> 14:21.840
For example, this is nice of a epic version with a handwritten assembly code, basically,

14:21.840 --> 14:27.440
vector assembly code, just doesn't work, you have to recompile that and use another software

14:27.440 --> 14:28.440
there.

14:28.440 --> 14:34.840
On the CNO6, it's not as nice, because this T-advino extension can just not be disabled.

14:34.840 --> 14:41.280
The documentation there again tells you this bit, which basically tells you the extension

14:41.280 --> 14:44.040
is there, it is enabled, should not be cleared,

14:44.040 --> 14:45.760
so should not be unset.

14:45.760 --> 14:48.960
And the behavior of clearing this bit is also undefined.

14:48.960 --> 14:50.480
Again, undefined is interesting,

14:50.480 --> 14:54.600
so we try it, but if you try to unclear this bit,

14:54.600 --> 14:56.160
if you try to clear this bit,

14:56.160 --> 14:57.200
basically nothing happens.

14:57.200 --> 14:59.080
It's always on this extension.

14:59.080 --> 15:01.240
So that means we have no non-medication

15:01.240 --> 15:03.920
for the scene, 906.

15:03.920 --> 15:05.640
And yes, this is already nice.

15:05.640 --> 15:08.320
So we can basically crash machines here,

15:08.320 --> 15:09.840
but what about other bugs?

15:09.840 --> 15:13.120
Can we maybe modify memory to some degree?

15:13.120 --> 15:15.920
And that's where we found Gostrad,

15:15.920 --> 15:18.480
which is also on the C910.

15:18.480 --> 15:20.800
I just hope before in the live demo,

15:20.800 --> 15:24.760
one of the most performed tiered course.

15:24.760 --> 15:27.760
And to understand what Gostrad is and what it does,

15:27.760 --> 15:30.720
we quickly again have to look at vector instructions.

15:30.720 --> 15:33.400
So I guess most of you have a basic at least

15:33.400 --> 15:35.440
understanding of what they do.

15:35.440 --> 15:37.640
Just a quick introduction, if you don't know,

15:37.640 --> 15:39.960
this is basically a vector store operation.

15:39.960 --> 15:43.040
And what it does, it just moves more data at a time,

15:43.040 --> 15:43.880
two memory.

15:43.880 --> 15:46.560
So instead of moving forward or eight by let's say,

15:46.560 --> 15:48.440
it moves 16 or 32.

15:48.440 --> 15:50.880
So this is what this instruction should do.

15:50.880 --> 15:54.360
So it should move data from the registers

15:54.360 --> 15:56.280
to the virtual memory.

15:57.280 --> 15:58.760
Now the question is, what happens

15:58.760 --> 16:02.040
if you execute this instruction on the C910?

16:02.040 --> 16:05.120
We did that, other did that basically.

16:05.120 --> 16:08.520
And what we found out that it's sometimes

16:08.520 --> 16:11.200
or somehow changes physical memory.

16:11.200 --> 16:14.560
And sometimes also crashes to machines somehow.

16:14.560 --> 16:18.720
And we looked into this and apparently this instruction

16:18.720 --> 16:21.440
here, at least this illegal encode vector instruction,

16:21.440 --> 16:23.880
basically, skips virtual memory and just

16:23.880 --> 16:26.560
rides to physical memory directly.

16:26.560 --> 16:29.320
But it only changes one by it in physical memory,

16:29.320 --> 16:31.280
even though it's vector operation.

16:31.280 --> 16:34.360
And basically, what we think happens here

16:34.360 --> 16:37.640
is that basically in the instruction moves

16:37.640 --> 16:39.800
values from registers into memory.

16:39.800 --> 16:42.160
But every time it only rides one by it.

16:42.160 --> 16:45.000
So if you see what this would look like here,

16:45.000 --> 16:47.960
it would basically just ride the first bite of each of the vector

16:47.960 --> 16:51.520
registers and then only the last vector register

16:51.520 --> 16:53.480
would be in physical memory.

16:53.480 --> 16:56.240
So we can now ride one by it at a time

16:56.240 --> 16:58.600
into physical memory.

16:58.600 --> 17:01.800
And why is this security issue?

17:01.800 --> 17:02.840
Well, look at this.

17:02.840 --> 17:06.320
This is normal rides, so normal storage instruction.

17:06.320 --> 17:09.080
You typically start to virtual address.

17:09.080 --> 17:11.040
That virtual address is then translated

17:11.040 --> 17:13.160
by the hardware to a physical address.

17:13.160 --> 17:17.560
And in that flow, the permissions are checked.

17:17.560 --> 17:19.920
So it is checked that the application is not writing

17:19.920 --> 17:23.840
from process A to process B or maybe even from one

17:23.840 --> 17:25.880
we have to another.

17:25.880 --> 17:28.360
And then only we can ride to physical memory

17:28.360 --> 17:31.200
or the hardware to does that for you.

17:31.200 --> 17:33.600
With cost ride, this looks like this.

17:33.600 --> 17:36.520
So we just ride from user mode process

17:36.520 --> 17:41.480
from process A to physical memory of the kernel or process B.

17:41.480 --> 17:43.840
What when we now do with that?

17:43.840 --> 17:46.120
We have multiple exploits there in the paper.

17:46.120 --> 17:47.960
You can read up on the details there.

17:47.960 --> 17:49.520
But what we can do is basically

17:49.520 --> 17:52.040
we can change our own page tables.

17:52.040 --> 17:54.560
That means we can read arbitrary physical memory

17:54.560 --> 17:56.360
with this instruction.

17:56.360 --> 17:57.760
We can override the kernel.

17:57.760 --> 18:01.040
So we can just change the way the kernel functions

18:01.040 --> 18:03.160
at runtime.

18:03.160 --> 18:06.160
We can even change M mode firmware here.

18:06.160 --> 18:09.240
So we're talking about something like OpenSBI, for example.

18:09.240 --> 18:11.440
So all these protectors also gone.

18:11.440 --> 18:13.120
And that also means we basically

18:13.120 --> 18:16.320
can't do trusted execution on this chip anymore, which

18:16.320 --> 18:18.520
is kind of sad.

18:18.520 --> 18:21.800
Now the question is again, why is this now a big security issue?

18:21.800 --> 18:26.600
Well, we first have the demo, sorry.

18:26.600 --> 18:30.000
So you can see we are now on the C910 here.

18:30.000 --> 18:31.680
And we are an unfrivileged user.

18:31.680 --> 18:33.200
We run our exploit.

18:33.200 --> 18:34.680
And it's very quick.

18:34.680 --> 18:37.040
And afterwards, we are basically the root user

18:37.040 --> 18:39.240
and can run arbitrary commands here.

18:39.240 --> 18:41.400
So this is now just patching the kernel,

18:41.400 --> 18:44.120
to get your Adysus code in that case.

18:44.120 --> 18:47.280
But you can do arbitrary other text here.

18:47.280 --> 18:50.120
And yes, you can basically also do that in the cloud.

18:50.120 --> 18:51.520
So you've seen that before.

18:51.520 --> 18:55.240
That's where basically, month or two before we

18:55.240 --> 18:56.960
discover all stride, they basically

18:56.960 --> 18:59.920
launch their battle risk five instances.

18:59.920 --> 19:02.840
So you even have to see view back now in the cloud,

19:02.840 --> 19:04.840
which is, again, sad.

19:04.840 --> 19:09.240
But yeah, this is basically a demonstration of this little

19:09.240 --> 19:10.400
ghost trial to reproduce here.

19:10.400 --> 19:12.640
We also have to get up real quick for that, which

19:12.640 --> 19:18.040
shows that it works here on the scanwork out on this C910 chip.

19:18.040 --> 19:20.960
Now the question is again, and also as the cloud provider,

19:20.960 --> 19:22.240
what can we do about that?

19:22.240 --> 19:25.200
What are our mitigation options again?

19:25.200 --> 19:27.880
Yes, ghost thread is again, a vector instruction.

19:27.880 --> 19:30.840
We can just disable vector extension in the kernel.

19:30.840 --> 19:32.560
That blocks ghost thread because then it

19:32.560 --> 19:36.480
signals because then an illegal instruction.

19:36.480 --> 19:39.480
That obviously is again, perform, gives it performance

19:39.480 --> 19:42.080
loss for a vector in a task.

19:42.080 --> 19:44.280
And also, again, breaks software that

19:44.280 --> 19:46.440
relies on this vector extension.

19:46.440 --> 19:49.560
And in this case, you also require trust to kernel.

19:49.560 --> 19:53.240
You as a VM, for example, on such a machine,

19:53.240 --> 19:55.520
you would require that the hypervisor makes sure

19:55.520 --> 19:59.560
that this vector extension is never enabled for any other person

19:59.560 --> 20:01.760
on the system because then the person

20:01.760 --> 20:03.880
could use ghost thread again.

20:03.880 --> 20:06.080
And this mitigation is in fact also deployed

20:06.080 --> 20:07.240
in the Linux kernel.

20:07.240 --> 20:10.520
So if you have a kind of recent risk-5 kernel,

20:10.520 --> 20:13.560
but also in x86 kernel, you can just run LSEB

20:13.560 --> 20:17.120
on your laptop, and it will show up there as ghost thread.

20:17.120 --> 20:19.160
In this case, this is from the C910.

20:19.160 --> 20:21.000
So we need to mitigation.

20:21.000 --> 20:23.120
It is affected by ghost thread.

20:23.120 --> 20:28.200
And that's why this XT-head vector extension is disabled.

20:28.200 --> 20:30.200
But the only real fix for this issue

20:30.200 --> 20:32.920
is actually replacing the CPUs.

20:32.920 --> 20:35.760
So we basically need to fix the bug in a new revision

20:35.760 --> 20:39.120
of the CPU that then fully fixes ghost thread.

20:39.120 --> 20:42.400
But that is obviously expensive and slow to deploy.

20:42.400 --> 20:44.520
And it also requires a hardware replacement.

20:44.520 --> 20:47.200
So you also basically need to buy new chips.

20:47.200 --> 20:50.120
But it has that it's the only real fix.

20:50.120 --> 20:54.360
So it's always basically the struggle in security research

20:54.360 --> 20:57.320
that you have the straight off between short-term software

20:57.320 --> 21:01.720
and a mitigation kernel patch versus the long-term fix

21:01.720 --> 21:05.240
where you would actually have to buy these new CPUs

21:05.240 --> 21:07.640
with the new CPU revision.

21:07.640 --> 21:10.080
And with that, we can wrap up the CPU bugs part

21:10.080 --> 21:14.200
by then going over what are the patterns from our findings.

21:14.200 --> 21:17.800
What can we improve upon in the risk-5 silicon space?

21:17.800 --> 21:19.640
So first of all, I would say,

21:19.640 --> 21:23.720
when I should improve their DV-screen security,

21:23.720 --> 21:25.880
because hardware is not software.

21:25.880 --> 21:30.120
You can't just deploy a software patch.

21:30.120 --> 21:33.320
When hardware is in the field, it can't be patched anymore,

21:33.320 --> 21:35.880
at least not in this case.

21:35.880 --> 21:39.400
And that's why this very fabrication is critical.

21:39.400 --> 21:41.000
And we have shown in our research

21:41.000 --> 21:44.840
that differential testing across Windows can be a good tool there.

21:44.840 --> 21:48.840
So hardware Windows should try that and should put it

21:48.840 --> 21:51.720
into their testing technique, basically.

21:51.720 --> 21:55.000
Then we have seen that one problem with risk-5 silicon right now

21:55.000 --> 21:58.520
is that there's just high diversity.

21:58.520 --> 22:00.840
What we mean there, basically, that Windows

22:00.840 --> 22:02.920
they either have their own vendor extensions,

22:02.920 --> 22:07.080
or they implement some old standard of the vector extensions.

22:07.080 --> 22:10.360
For example, T hat has this XT vector extension,

22:10.360 --> 22:12.360
which is kind of vendor.

22:12.360 --> 22:17.800
But it's then also kind of similar to the 0.7.1 vector extension.

22:17.800 --> 22:19.720
And then there are also fights around that.

22:19.720 --> 22:22.280
So we don't like that as security researchers.

22:22.280 --> 22:24.680
You should only implement the ratified extensions,

22:24.680 --> 22:27.480
because that also makes it easier to test those.

22:27.480 --> 22:30.280
And if you use vendor extension, if you need that,

22:30.280 --> 22:31.640
you should use it carefully.

22:31.640 --> 22:33.560
So also in the design.

22:33.560 --> 22:36.920
What we have seen with this R1 and RD,

22:36.920 --> 22:39.960
which should not be the same with this XT hat,

22:39.960 --> 22:41.800
and that might be X extension.

22:41.800 --> 22:45.160
And you should always mandate a specific behavior

22:45.160 --> 22:49.480
in the risk-5 ISO, so there shouldn't be basically instances

22:49.480 --> 22:51.800
where one chip does one thing or one the other,

22:51.800 --> 22:54.760
because that also makes different testing.

22:54.760 --> 22:58.760
Much harder, which we can also discuss after the talk.

22:58.760 --> 23:01.000
And then another pattern we have seen here

23:01.000 --> 23:04.200
is that a lot of this hardware is inconfigurable,

23:04.200 --> 23:06.040
and this shouldn't be the case.

23:06.440 --> 23:08.200
If you have features like this,

23:08.200 --> 23:10.200
vendor extension from T hat,

23:10.200 --> 23:11.800
these features should have kills, which is.

23:11.800 --> 23:14.840
So if you find a bug in those, that you can just disable them,

23:14.840 --> 23:17.080
like we did for the vector extension.

23:17.080 --> 23:20.200
And we argue that there should be some instruction-hooking mechanism,

23:20.200 --> 23:23.240
but that we basically mean that the instruction can be hooked.

23:23.240 --> 23:25.320
So if the instruction is executed,

23:25.320 --> 23:28.600
it either goes into the kernel, so the kernel can do something about it.

23:28.600 --> 23:32.120
For example, say, okay, this instruction is illegal,

23:32.280 --> 23:36.360
we should not execute it, or, and that is the second point,

23:36.360 --> 23:39.000
there should be some update path around it.

23:39.000 --> 23:43.320
So for example, Arm and X86, they both have microcode,

23:43.320 --> 23:45.960
which is basically firmer that runs in your CPU,

23:45.960 --> 23:48.760
and you can just deploy fixes to that.

23:48.760 --> 23:50.920
So for example, for the stack war vulnerability,

23:50.920 --> 23:53.160
that we had on the first slide, AMD could just deploy

23:53.160 --> 23:57.400
a new firmware update, and by that fix the vulnerability.

23:57.400 --> 24:00.040
And we argue that microcode is good thing here,

24:00.040 --> 24:03.720
that it increases security, even if it adds complexity to your core.

24:03.720 --> 24:08.200
So I know risk-fi should be simpler, and risk,

24:08.200 --> 24:12.440
but still, I think lay awake, we can basically push updates

24:12.440 --> 24:15.240
to the hardware, would be a good thing.

24:15.240 --> 24:19.400
And with that, I give it back to Lucas.

24:19.400 --> 24:22.040
Yeah, thanks. So we talked about side changes,

24:22.040 --> 24:24.600
we talked about CPU bugs, and now we come basically

24:24.600 --> 24:27.720
to the last class of the abilities we talk about,

24:27.720 --> 24:30.040
and this is going to be trends in execution.

24:30.040 --> 24:33.240
And we're going to focus not on the back class of trends in execution,

24:33.240 --> 24:36.760
which is meltdown, but more on the side channel class,

24:36.760 --> 24:38.600
of trends in execution with the specter,

24:38.600 --> 24:43.000
in a sense that it's like a feature of the CPU more than it is a bug.

24:43.000 --> 24:45.320
So if you have an out-of-order CPU, you probably

24:45.320 --> 24:47.560
are going to have specter.

24:47.560 --> 24:51.960
And so yeah, we also thought about what happens with these other CPUs,

24:51.960 --> 24:54.760
and the expert is already running.

24:54.760 --> 24:56.520
I'm going to explain you what you're going to see here,

24:56.520 --> 24:58.360
because it's over so fast.

24:58.360 --> 25:01.800
So we also did some research on specter,

25:01.800 --> 25:04.200
and we managed to build an exploit,

25:04.200 --> 25:07.080
and the gadget that specter basically gives you is,

25:07.080 --> 25:09.240
you get, in the classic sense,

25:09.240 --> 25:12.680
from user space and arbitrary read in the corner,

25:12.680 --> 25:15.800
and we managed to build that in a very stable way

25:15.800 --> 25:22.040
by using basically some, yeah, unfixed path in BPF,

25:22.040 --> 25:24.680
that only exists on risk five.

25:24.680 --> 25:28.600
So what you can basically do is you inject arbitrary BPF code into the corner,

25:28.600 --> 25:31.640
that, or like, especially BPF code into the corner,

25:31.640 --> 25:35.560
that then reads where it shouldn't read speculatively,

25:35.560 --> 25:37.720
and codes that into the cache.

25:37.720 --> 25:40.600
And if you combine that, then you get an exploit like this,

25:40.600 --> 25:42.600
which this is running.

25:42.600 --> 25:45.880
So this is the real-time recording.

25:45.880 --> 25:49.240
And in this time, can basically leak your entire ETC shadow fire,

25:49.240 --> 25:53.160
which is not very nice, and can also leak arbitrary other fires,

25:53.240 --> 25:56.360
which is also not very nice for security read.

25:56.360 --> 25:58.840
So the insight is that risk five,

25:58.840 --> 26:00.600
cause they're now out of order.

26:00.600 --> 26:02.680
They're also vulnerable to specter, of course,

26:02.680 --> 26:04.680
and the kernel acts a lot of mitigations

26:04.680 --> 26:07.960
that you would have on x86 and arm.

26:07.960 --> 26:11.160
And part of this is due to kernel oversight,

26:11.160 --> 26:15.720
but part of this is also because risk five does not have mitigation,

26:15.720 --> 26:17.880
that other architectures have.

26:17.880 --> 26:20.600
So after specter was first discovered,

26:20.760 --> 26:25.320
x86 and arm were kind of quick to standardize this speculation barrier,

26:25.320 --> 26:27.720
which is basically an instruction that tells your processor,

26:27.720 --> 26:31.560
hey, here you cannot, like, front run use instructions anymore,

26:31.560 --> 26:33.560
you need to stop your speculation,

26:33.560 --> 26:36.440
and risk five has nothing there,

26:36.440 --> 26:39.080
which is kind of understandable,

26:39.080 --> 26:43.000
because they are like, we're in order, of course, right?

26:43.000 --> 26:45.400
But it's kind of required now in this standard,

26:45.400 --> 26:48.040
and they are coming along, and there's the will, I think,

26:48.040 --> 26:50.040
to standardize this speculation barrier,

26:50.040 --> 26:53.400
but I think it's quite important that risk five pushes for this now,

26:53.400 --> 26:55.800
because the kernel is already relying on it.

26:55.800 --> 26:58.760
For example, the bpfj, it is relying on it,

26:58.760 --> 27:01.160
and there's this really nowhere around it,

27:01.160 --> 27:04.280
like you lose out on security if you don't have this.

27:04.280 --> 27:07.160
And luckily, there's right now work around,

27:07.160 --> 27:09.400
because if you read a CSR on risk five,

27:09.400 --> 27:13.000
that's also serializing another course very tested,

27:13.000 --> 27:16.440
so you can basically read this time register to, for example, 0,

27:16.440 --> 27:18.840
and then that just serializes the pipeline,

27:18.840 --> 27:21.240
but this is a really, really bad heck, right?

27:21.240 --> 27:23.720
Like, in the future, some CPU vendors could say,

27:23.720 --> 27:26.760
oh, well, if I read this register to the 0 register,

27:26.760 --> 27:29.240
then I'm obviously not intending on keeping the reside,

27:29.240 --> 27:31.720
that's okay, it can just drop the instruction

27:31.720 --> 27:33.640
and up on reading it, all right?

27:33.640 --> 27:36.760
So you could come out with a ton of things,

27:36.760 --> 27:41.480
where these like undocumented stop-gaps solutions here don't work,

27:41.480 --> 27:43.560
so the risk five needs to speculation ends,

27:43.560 --> 27:45.480
and the risk five also needs kernel patches.

27:46.440 --> 27:48.600
And so there's a few paths in the kernel,

27:48.600 --> 27:53.320
where it just doesn't work now with respect to security.

27:53.320 --> 27:55.720
So, for example, when dispatching a Cisco,

27:55.720 --> 27:58.760
and I take a quick redirect to speculative control flow

27:58.760 --> 28:01.080
to some arbitrary point in the kernel,

28:01.080 --> 28:03.240
where they want to redirect the control flow,

28:03.240 --> 28:06.280
and that's not nice, because you can place some get it there,

28:06.280 --> 28:08.280
and then basically it will exploit Rob style,

28:08.280 --> 28:12.360
but in speculation, yeah, the user memory access

28:12.360 --> 28:14.360
has something that is called point-on-asking,

28:15.320 --> 28:19.400
which basically ensures that non-user point is fault,

28:19.400 --> 28:21.560
which doesn't exist on risk five,

28:21.560 --> 28:23.560
then the BPR thing that I talked about,

28:23.560 --> 28:25.320
and then there's also the few takes operations,

28:25.320 --> 28:28.440
which also needs some form of point-on-masking and so on,

28:28.440 --> 28:31.480
but luckily we submitted patches for this,

28:31.480 --> 28:33.880
and there in the process of getting upstream,

28:33.880 --> 28:36.600
but there also want to nail down the point

28:36.600 --> 28:39.640
that it's quite complicated to use on risk five,

28:39.640 --> 28:42.840
because in a difference to x86,

28:42.840 --> 28:46.120
where you have major manufacturers that implement the course,

28:46.120 --> 28:50.120
you now have a much more diverse landscape of processors,

28:50.120 --> 28:52.840
and Linux kernel is not super well equipped,

28:52.840 --> 28:55.320
that is current point, basically, handle this.

28:55.320 --> 28:58.760
So, you would need very, very fine-grained mitigation things,

28:58.760 --> 29:00.920
where you say, ah, if this processor

29:00.920 --> 29:03.960
implements this specific predictor, I need to mitigate here,

29:03.960 --> 29:07.960
and then I can maybe save some mitigation data to gain some performance,

29:07.960 --> 29:10.840
which I think in the future is the way to go,

29:10.840 --> 29:13.400
but currently, this is just work and progress,

29:13.400 --> 29:15.720
and I think the community should really push for this.

29:16.680 --> 29:19.720
And so I think what we need to mitigate this is first of all,

29:19.720 --> 29:21.720
the standardized serializing fence,

29:21.720 --> 29:25.160
then we need these fine-grained mitigations that I talked about,

29:25.160 --> 29:27.400
and then there's also on the compiler side,

29:27.400 --> 29:29.640
a lot of mitigations that come from the compiler,

29:29.640 --> 29:31.880
or some mitigations that come from the compiler,

29:31.880 --> 29:34.280
and those also need some equivalents on risk five,

29:34.280 --> 29:38.520
which is actually really hard, because the calling conventions

29:38.520 --> 29:40.680
under the code layout just works differently

29:40.680 --> 29:44.040
in these mitigations that just don't part over very easily.

29:45.160 --> 29:48.440
And I think the last thing that should happen

29:48.440 --> 29:50.600
is that vendors would collaborate together

29:50.600 --> 29:53.800
with all the other parts of the software ecosystem,

29:53.800 --> 29:57.960
and with researchers so that we can get like a very precise idea

29:57.960 --> 30:00.200
of what each core actually does in speculation,

30:00.200 --> 30:03.320
and what each core actually implements and doesn't implement.

30:03.320 --> 30:06.920
This way, you can even like get more performance out of it,

30:06.920 --> 30:08.760
and everybody benefits, because in the end,

30:08.760 --> 30:11.960
you can only do target mitigations where they actually need it,

30:11.960 --> 30:13.640
instead of having to over mitigate it.

30:15.960 --> 30:19.560
Yes, and with that, we can already wrap it up for this talk.

30:19.560 --> 30:23.240
So I want to split this into three sections.

30:23.240 --> 30:24.600
So first, what goes wrong?

30:25.560 --> 30:29.240
So we've seen our unprivileged timers and cache maintenance,

30:29.240 --> 30:30.600
so we have side channels.

30:30.600 --> 30:34.520
Then we have seen a lot of diversity with vendor extensions,

30:34.520 --> 30:37.560
and monratified extensions, which are implemented,

30:37.560 --> 30:39.720
and by that we saw CPU bugs.

30:39.720 --> 30:42.360
So go-striot is arbitrary, physically right,

30:42.360 --> 30:44.520
and also denial of service bugs.

30:45.560 --> 30:47.640
For these, we didn't have kill switches,

30:47.640 --> 30:51.800
and also no update paths, so we have these mitigations in the kernel,

30:51.800 --> 30:55.640
but we can't really mitigate go-striot, for example,

30:55.640 --> 30:59.000
and then in the end, Lucas told you about transit execution,

30:59.000 --> 31:01.720
which is also present on risk5.

31:01.720 --> 31:04.120
So I guess you can see we basically have all the issues

31:04.200 --> 31:07.000
we had on X86 and arm nodes on risk5.

31:07.000 --> 31:09.320
So the question is, what can we do about that?

31:09.320 --> 31:12.120
What are the short things we should do?

31:12.120 --> 31:13.960
So first, as we argued before,

31:13.960 --> 31:15.960
we should have privileged cache maintenance,

31:15.960 --> 31:17.400
if we add such an extension,

31:18.520 --> 31:20.280
even if we do that as a vendor,

31:20.280 --> 31:23.480
then we should have course user space timers.

31:23.480 --> 31:27.400
Decisions won't go away, but it would be much harder to exploit them.

31:28.360 --> 31:31.320
Then you should only implement ratified extensions in your course.

31:32.280 --> 31:35.800
If you use a vendor extension, do that very carefully,

31:35.800 --> 31:38.440
you should have kill switches for all the features,

31:38.440 --> 31:40.280
and there should be some instruction,

31:40.280 --> 31:41.880
hooking, or update mechanism.

31:41.880 --> 31:45.320
So we can have firm and we can ship updates to the CPUs,

31:45.320 --> 31:47.640
even if that complicates the design.

31:47.640 --> 31:49.880
And then in the end, for transit execution,

31:49.880 --> 31:53.080
we really need to standardize serializing fence,

31:53.080 --> 31:55.240
so we can build good mitigations.

31:55.240 --> 31:58.040
And in the end, we basically have wishlist

31:58.040 --> 31:59.800
for us as security researchers,

31:59.800 --> 32:01.880
so what would make our lives easier

32:01.880 --> 32:04.280
so that the entire world becomes the safer place?

32:05.320 --> 32:07.080
Documentation is always great.

32:07.080 --> 32:10.200
It's already kind of decent, but in a lot of places,

32:10.200 --> 32:12.600
it's lacking, and we need documentation,

32:12.600 --> 32:14.840
and we can do our research much faster.

32:14.840 --> 32:16.200
So what's called is really great.

32:16.200 --> 32:18.760
We like that tier designs are open-source,

32:19.720 --> 32:22.920
but pushing on that even more would be very cool.

32:22.920 --> 32:25.480
Unified perf and the face has been worked on,

32:25.480 --> 32:27.080
but we would like that as well,

32:27.080 --> 32:29.480
so having access to our cycle, for example,

32:29.480 --> 32:32.600
we are that, because then you can also make that privileged.

32:33.720 --> 32:35.960
Then one thing reproducible builds,

32:35.960 --> 32:37.480
and also kernel headers.

32:37.480 --> 32:41.000
It's compiling kernel headers on risk-five,

32:41.000 --> 32:44.520
and patching the kernel, patching the bootloader.

32:44.520 --> 32:48.760
It's not hard, but it's just involved to do that.

32:48.760 --> 32:52.360
It would be nicer if we could have, basically,

32:52.360 --> 32:55.080
nicer ways to do that on the system directly,

32:55.080 --> 32:57.640
and not having to cross compile on another machine.

32:57.640 --> 32:59.880
And then one thing for the differential fuzzing,

32:59.880 --> 33:02.280
and also for basically kernel mitigations,

33:02.280 --> 33:05.080
would be nice if there would be some standardized DB

33:05.080 --> 33:07.160
for MWNID, and then arch IDs.

33:07.160 --> 33:12.120
So you can nicely spot which is what CPU you are currently running on.

33:12.120 --> 33:15.400
And with that, you can check out these two repositories

33:15.400 --> 33:16.280
for our papers.

33:16.280 --> 33:19.080
There will also find contact information also,

33:19.080 --> 33:21.320
a website for Gauss-Player, the Fuzzer,

33:21.320 --> 33:24.440
and also all the materials for our site channel research,

33:24.440 --> 33:28.040
and with that, we are open to questions and discussions.

33:28.040 --> 33:28.920
Thank you.

33:28.920 --> 33:32.200
APPLAUSE

33:35.320 --> 33:35.960
Yeah?

33:35.960 --> 33:40.760
So, I think Joey, this is a good question for you.

33:40.760 --> 33:43.320
I mean, you're always not responding.

33:43.320 --> 33:44.760
This is not the first question.

33:44.760 --> 33:45.560
Okay?

33:45.560 --> 33:48.200
The regulation barriers, we're discussing them.

33:48.200 --> 33:50.760
We have proposals for them.

33:50.760 --> 33:52.200
We have make an instructions.

33:52.200 --> 33:53.880
We have, particularly Monday,

33:53.880 --> 33:56.520
the CPU is part of our vehicle.

33:56.520 --> 33:59.080
And you can make them privileged by disabling them,

33:59.080 --> 34:02.840
and not even, we're also answers to your microphone search,

34:02.840 --> 34:05.960
usually what most expectations do have these issues.

34:05.960 --> 34:08.200
And what we do is we regulate them.

34:08.200 --> 34:09.720
And if someone tries basically,

34:09.720 --> 34:11.880
we'll talk on their modes and we decide

34:11.880 --> 34:14.360
we will do these people who won't be able to allow this to be

34:14.360 --> 34:15.400
executed as well.

34:15.400 --> 34:16.600
From the perspective to the point,

34:16.600 --> 34:18.520
in the already-print list, you have no money

34:18.520 --> 34:22.040
but things like that allow you for most of these.

34:22.040 --> 34:25.400
And also, the thing is, for example,

34:25.400 --> 34:27.560
in hyperphysers, you can just place

34:27.560 --> 34:30.280
a blind light with the timer to start.

34:30.280 --> 34:34.680
And for the signalization thing,

34:34.680 --> 34:37.880
then you can start to get a done in the thanks to the extension,

34:37.880 --> 34:39.480
who were talking about the crisis,

34:39.480 --> 34:42.920
and the crisis of action,

34:42.920 --> 34:45.480
but who actually have everything like a soft language,

34:46.120 --> 34:46.600
you know, too much.

34:46.600 --> 34:48.360
So if you like.

34:48.360 --> 34:49.000
And another explanation,

34:49.000 --> 34:49.560
if you like a barit.

34:49.560 --> 34:52.360
But if you like the barit's dog control problem,

34:52.360 --> 35:14.640
it likes

35:14.640 --> 35:18.720
The question was whether we are engaging in the right and the security topic.

35:18.720 --> 35:24.240
So with the speculation barriers we were engaging when we posted that there was quite a lot of

35:24.960 --> 35:29.360
interest in that and we also willing to push that further forward.

35:30.240 --> 35:35.920
I think the fancy thing we know that that exists but I think that aims to solve a very different

35:35.920 --> 35:42.000
problem than what we want to have. I think we just want a very simplistic solution that does

35:42.000 --> 35:47.840
one thing that is required and of course this is only a patch for one thing but it's already

35:47.840 --> 35:51.200
like a requirement for certain patches to implement them.

35:55.040 --> 36:00.560
Sorry, I'm very sorry to wrap up but we have to take all these discussions offline because

36:00.560 --> 36:04.560
all the times up. Sorry for that but still thank you for your attention.