WEBVTT

00:00.000 --> 00:10.640
All right. I will go ahead and get started. I'm going to try to do this in about 15 minutes

00:10.640 --> 00:15.280
so that we've got plenty of time for questions and movement. But today, I'm going to be talking

00:15.280 --> 00:22.400
about confidential competing recent past emerging present and long lasting future. I am

00:22.400 --> 00:27.440
Southemic. I am the technical community architect at the confidential computing consortium,

00:27.440 --> 00:31.680
a sub foundation of the Linux Foundation dedicated to this technology.

00:33.760 --> 00:37.200
So I want to start with the past where we're going to talk about kernel to confidential

00:37.200 --> 00:42.400
compute the present, the remote attestation agendas that we're dealing with right now,

00:42.400 --> 00:48.560
confidential containers and CVMs, and then realizing the promise of open security for sensitive

00:48.560 --> 00:55.600
compute, which is really the promise that confidential computing makes. So what is the history behind

00:55.600 --> 01:00.560
confidential computing? And I apologize for these slides being a little bit smaller in text,

01:00.560 --> 01:07.840
but there's an interesting history here. So the concepts behind privacy are very, very old.

01:07.840 --> 01:12.960
They are hundreds or thousands of years old, depending on what culture you are looking at.

01:13.520 --> 01:21.360
But when we look at the intersection of privacy and compute, the first time that we really see this

01:21.360 --> 01:31.760
conceptually realized is in 1978, which was the this right up here, on data banks and privacy

01:31.760 --> 01:38.400
homomorphisms. Now to be clear, this was the same here that BG's fever night was the top

01:38.400 --> 01:43.440
song of the year. So if you remember that song, that might bring you into a place of remembering

01:43.440 --> 01:50.560
that we did not have a lot of compute technology at this time. It's very conceptual. Now in 2009,

01:50.560 --> 01:58.240
we got FHE, and that allowed us to really unlock confidential computing. In 2015, we got our first

01:58.240 --> 02:04.160
TEE's out of Intel. We then got the confidential computing consortium that I'm representing today,

02:04.160 --> 02:08.880
and we have had many players come on board since then in the 2020s and beyond.

02:10.160 --> 02:16.400
But I'd like to step this back to a slightly broader part of history, because I think it's absolutely

02:16.480 --> 02:22.160
essential to understanding confidential computing, especially now I would like to ask this question real

02:22.160 --> 02:29.040
quick. For everyone in the room, can you raise your hand quickly? If this is your first time in this room,

02:29.040 --> 02:35.760
if this is your first time with confidential computing, great. That's why I'm giving the first

02:35.760 --> 02:44.400
talk of the day and making it as like on the floor as possible. So in 1961, we had the first ideas

02:44.400 --> 02:53.920
of a secure kernel, and this came from the Farante Atlas computer. In 1971, in how really

02:53.920 --> 02:58.720
does its first chip? This is also the first time that we see a reference to the kernel generally

02:58.720 --> 03:05.680
and documentation, even though it is not often formally defined before this point. In 1975, and indeed,

03:05.680 --> 03:11.680
we got a bit of a gap in 1985 arm, 1995, in video. I think this is a really, really interesting

03:11.680 --> 03:16.880
to show that no matter at what stage these corporations get involved and invested in the

03:16.880 --> 03:24.640
production of compute, they are all converging on confidentiality and security at this time,

03:24.640 --> 03:35.360
and I believe that's probably why you are in this room. Now confidential computing runs over a

03:35.360 --> 03:43.200
secure kernel. I used to work in a space where we were working on distributed surveillance

03:43.200 --> 03:49.040
algorithms, looking at the way that groups of people behave on the internet, and you can often

03:49.040 --> 03:56.720
use Linux kernels historically to be able to look at these algorithms and see what the value

03:56.720 --> 04:02.320
and the rate of information transfer was in a group of people leading to the successful release of a

04:02.320 --> 04:09.440
kernel. The only kernel where it is not worth doing that analysis for is a secure kernel. Why?

04:09.440 --> 04:16.000
Because it's a credibly boring. Because there are no opinions statements put into the secure

04:16.000 --> 04:21.840
kernel. The secure kernel is based on a predicate calculus, so you're working only on mathematical

04:21.840 --> 04:27.760
proofs. These mathematical proofs observe an object and subject model, and if you're really curious

04:27.840 --> 04:33.360
about understanding the ways that confidential computing gets upstreamed into the secure kernel

04:33.360 --> 04:40.000
over which we operate, then we do have a kernel sig. And I would really recommend maybe not

04:40.000 --> 04:45.360
joining on the live calls, but looking at the recordings and starting to get an idea of how people

04:45.360 --> 04:53.600
communicate in the space. But it gives you an absolute guarantee. It also was the first time

04:54.240 --> 05:01.680
in computational history where we have a uniquely not classified kernel.

05:03.680 --> 05:08.640
And this is the history that we're bringing with us into the future. There is no way,

05:08.640 --> 05:13.840
especially if we get into a post-contemporality. There's no way of doing security by

05:13.840 --> 05:21.200
obscurity. You can only do security by an absolute transparency such that if I leave the instruction

05:21.200 --> 05:27.280
set on the floor and an adversary picks it up, they still are not going to be able to get into

05:27.280 --> 05:30.560
the system. And again, that is the guarantee that we try to make with confidential compute.

05:35.040 --> 05:41.520
So for a long time, I have been trying to figure out how we secure systems with a human

05:41.520 --> 05:48.960
in the loop, either a malicious human in the loop, or a adversarial actor, but increasingly,

05:48.960 --> 05:53.600
and the reason why I brought these three books for you to flip through today is that I am now

05:53.600 --> 05:58.960
almost exclusively interested in understanding how we secure a system to prevent human

05:58.960 --> 06:03.440
machine, but primarily workload identity, because that is the strongest threat vector that we

06:03.440 --> 06:08.480
seem to be seeing. And the three books that I brought that you should flip through today,

06:08.480 --> 06:12.240
because they're going to give you three very different perspectives on the present moment,

06:12.240 --> 06:17.120
and they're going to help you to understand it, compositely. So I would deception this book,

06:17.120 --> 06:24.480
I love this book, teaches you how to do modern honey pots. This is the one from 2013. This one

06:24.480 --> 06:29.680
terrified me, and as the reason why I decided to join the sub-foundation, because in this final chapter,

06:29.680 --> 06:34.080
instead of giving me the best advice that usually gives me, it says, we have observed

06:34.080 --> 06:39.040
that we are seeing so many bot-on-bot adversarials that are traditional techniques

06:39.600 --> 06:45.920
of trying to bait a human being are absolutely irrelevant. We didn't move in to read these two books.

06:46.000 --> 06:52.640
What are our strongest security approaches to this? This is generated AI security. It's a pretty

06:52.640 --> 06:57.040
good book, but the one that you really want to read, if you are just getting started with confidential

06:57.040 --> 07:02.000
computing or with this idea of security trust boundaries and isolation. I'd really recommend

07:02.000 --> 07:06.080
this one, this is from Mike Bursal, the Executive Director of Confidential Computing. So go

07:06.080 --> 07:10.880
talk to them if you want to shortcut on any of your questions for this, but this is a really,

07:10.880 --> 07:17.840
really good book to start with. I wouldn't consider your on-the-floor technical executive book.

07:17.840 --> 07:23.920
That you're going to get from kernel documentation, etc. But this is a very, very, very good,

07:23.920 --> 07:31.280
epistemological deduction, strategy of looking at how to understand systems. This is absolutely

07:31.280 --> 07:35.600
worth your time, especially if you're just getting started in security. I'd really recommend this

07:35.600 --> 07:42.400
as your base. Now, another one of the reasons why I'm really interested in investing this

07:42.400 --> 07:47.360
is because of this paper that came out while I was at National Institutes of Health.

07:49.040 --> 07:57.120
We were unable to release a couple of large data sets in the Washington DC area, which is where

07:57.120 --> 08:09.440
NIH is based, because there had been a breach of Netflix data real raw customer data in a Netflix

08:09.440 --> 08:17.200
data set that was local to the DC area, and it was sufficient to have just that Netflix-breached data

08:17.200 --> 08:23.360
and the data sets that we were trying to produce, which are scrapped to the federal scientific standard

08:23.360 --> 08:30.400
of privacy. But if you in a way, those with another data set, then you can use spark vector,

08:30.400 --> 08:35.920
mathematics, and you can absolutely identify an individual or a group of individuals, and look at

08:35.920 --> 08:42.880
those lower bounds. This is what adversarials do. This is what they calculate before they decide

08:42.880 --> 08:48.080
to try to bake the end of your system, to see if it's worth their time. And what that means to

08:48.080 --> 08:53.600
me is that confidential computing and privacy enhancing technologies are only going to become

08:53.600 --> 08:59.360
more important. They are going to become more important because right now, the legislative and

08:59.360 --> 09:08.880
regulated idea of what is confidential, of what needs to be held private, is absolutely insufficient

09:08.880 --> 09:14.720
if you're looking at the modern landscape of threats. It is true that there are many data sets

09:14.720 --> 09:21.600
sitting out there. Right? You're Netflix watching patterns can make you re-identifiable. So if

09:21.600 --> 09:28.960
those get breached, you could be in a bad situation. So I do think we're going to see a massive

09:28.960 --> 09:34.320
expansion of privacy enhancing technologies, and we're also going to see a bunch of composite threats

09:34.320 --> 09:39.280
come through in the next five years. Hopefully, very, very likely if you're sitting in this room,

09:39.280 --> 09:42.640
you're not going to be dealing with the breach side of it. You're going to be dealing in the

09:42.640 --> 09:50.800
solution space. So in order to understand how we protect data and now that data is so enriched,

09:52.240 --> 10:00.960
there are three states in which data can exist in transit, at rest, and in use, confidential computing

10:00.960 --> 10:09.120
aims to protect the compute, protect during the bitflex. There's three threat vectors that this

10:09.200 --> 10:14.880
strongly protects you against. And I use these because I'm often having to communicate to people

10:14.880 --> 10:19.760
that are just starting to understand confidential computing and whether or not it's worth the money.

10:19.760 --> 10:24.160
Every single one of these is way more expensive if it breached than investing in a year of confidential

10:24.160 --> 10:30.080
compute. So inside of threats, during data analysis, this is really important to understand that

10:30.080 --> 10:36.240
we're isolating all the way down to the workload identity. Such that if I have a power user,

10:36.240 --> 10:43.680
and they have supposedly all access to the data at all times, you have only cleared specific

10:44.560 --> 10:51.680
individuals and specific workloads that can execute into that space. So even if they wake up on

10:51.680 --> 10:57.680
about day, even if someone steals their login information, this can be a setting with a couple of

10:57.680 --> 11:02.240
additional controls where you've got a very, very high bar for security, particularly if you're

11:02.240 --> 11:07.680
dealing in the GPU massive, massive data space where you can have a massive dump. Number two,

11:07.680 --> 11:13.440
compromised applications, send it, handling sensitive data. This is what we typically think about.

11:13.440 --> 11:17.600
This is your financial and health care data sets. It just has to do with the sensitivity of the data

11:17.600 --> 11:23.440
within them. And then number three, what we're saying much more of now, is confidential, multi-party

11:23.440 --> 11:27.840
data collaboration. I'm going to go a little bit more into detail on how and why this is done.

11:28.080 --> 11:34.160
Let me go to our use cases. So the threat model that we are dealing with in confidential compute

11:34.160 --> 11:41.200
is very big. Right now, there's a lot of attention around regulated, but it is for both regulated

11:41.200 --> 11:48.080
and non-regulated industries because they share a growing composite threat. I do want to mention

11:48.080 --> 11:54.000
that there's a really, really interesting, at the edge of understanding privacy and compute,

11:54.000 --> 11:59.040
humanity is going to have a really, really interesting talk later today on how they're trying to

11:59.040 --> 12:06.480
get this done. So for those of us, again, about half of you are new to this dev room,

12:06.480 --> 12:14.000
how does the use of secure primitives enable confidential computing? So there are four

12:14.000 --> 12:19.120
security primitives involved in confidential computing. Confidentiality, I think that's pretty

12:19.120 --> 12:24.560
straightforward, protecting sensitive data from an authorized access, even in especially during

12:24.560 --> 12:30.880
processing, integrity, making sure that nothing has been altered. Attestation, which

12:30.880 --> 12:35.040
verifies the trustworthiness of computing, and we're going to go a little bit into depth on that.

12:35.040 --> 12:41.360
And very importantly, in this case, a hardware route of trust, a foundational,

12:41.360 --> 12:47.600
a mutable hardware component that anchors security operations like encryption, secure

12:47.600 --> 12:54.880
route, and system trust verification. And to really, really make clear how much that is essential

12:54.880 --> 13:00.720
to this before this dev room was called confidential computing, it was called hardware-added

13:00.720 --> 13:08.400
trust execution. That is what we are doing today. So I wanted to add a little bit into

13:08.400 --> 13:13.120
a remote attestation and confidential computing, but I do assume that if you are interested in this,

13:13.120 --> 13:18.800
that you attended or will look at the recordings for the attestation workshop that was yesterday,

13:18.800 --> 13:23.200
or a lot of real and good work was done, and there's going to be a dev room tomorrow,

13:23.200 --> 13:26.880
which you've already seen. You should absolutely attend. It's going to have a lot of the same

13:26.880 --> 13:33.200
faces on this room because they are so intertwined. So remote attestation is a security mechanism

13:33.200 --> 13:38.240
that's used to verify the trustworthy of remote systems runtime state. Our focus here

13:38.240 --> 13:44.480
is one time verification, evidence-based trust, dynamic security, and secure communication.

13:44.480 --> 13:52.080
If this is new to you with the word attestation, in security, we also have supply chain security,

13:52.080 --> 13:59.280
and they will often throw around the word attestation 100% of the time, they mean an S-bomb

13:59.280 --> 14:05.520
attestation. Can I attest to my bill of materials that these are true for a moment in time?

14:05.520 --> 14:11.360
That is not this. This is a very technical definition for remote attestation,

14:11.360 --> 14:15.440
and make sure that if you're talking to someone outside of CC, that you just make sure that you

14:15.440 --> 14:20.080
get the attestation definitions right off the bat, or you may be talking past each other.

14:22.480 --> 14:28.560
Remote attestation, right? So we have an understanding of this, but I just spent an entire year

14:29.440 --> 14:36.480
listening to every single confidential computing consortium discussion on this topic,

14:36.480 --> 14:42.080
and this is my favorite quote. This is from November 4th from an engineer in the attestation

14:42.080 --> 14:51.120
sig. Attestation gives you an organizational endorsement of proper governance by showing measurements

14:51.120 --> 14:57.440
are expected by the infrastructure. This seems to be the language that communicates this very clearly

14:57.440 --> 15:04.720
to non-engineers, and it's the one that I really like for myself. And in this, if you're at the

15:04.720 --> 15:09.200
workshop yesterday, this is something that we really really got into. This is not my area of

15:09.200 --> 15:16.080
extreme expertise. We have those experts in the room, but we have authentication, and we have

15:16.080 --> 15:22.160
attestation. These are both different and combined ideas. You can have the attestation key,

15:22.160 --> 15:29.600
the AK, and we have the TLS identity keys, but I really want to move us on because we need to be

15:29.600 --> 15:36.080
really focusing on a protocol integration that ensures attestation is integrated, but does not

15:36.080 --> 15:45.520
become a replacement for our understanding of authentication. And there are different things.

15:46.480 --> 15:50.560
Authentification confirms who is making the request, and this is at the TLS level.

15:50.560 --> 15:55.200
Attestation confirms where and how the request is made.

15:59.200 --> 16:04.960
A couple of the resources that I find interesting and useful here is the attestation at the

16:04.960 --> 16:10.240
TLS level. There are a couple, I believe, some of the authors of this are in this room right now.

16:10.240 --> 16:15.120
There's the device attestation model for CC from Intel, which is worth reading as an overview.

16:15.680 --> 16:19.600
And the attestation dev room tomorrow, which we have already discussed.

16:22.720 --> 16:27.600
So attestation is one of the most critical primitives in confidential computing because it ensures

16:27.600 --> 16:34.640
that we can run workloads securely. But what about in run environments? That is a really

16:34.640 --> 16:38.880
fun and interesting question that we are going to get answered for ourselves today, and a top

16:38.880 --> 16:49.600
coming up at 1-10. So secure primitives ensure that we have the building blocks,

16:49.600 --> 16:56.160
attestation, encryption, secure storage, and that those are in place to support trusted execution

16:56.160 --> 17:01.680
and secure data management. And another really good resource for this is one of our open

17:01.680 --> 17:09.600
source projects, Keystone, which uses this over-risk file. And I know that we have the definition

17:09.600 --> 17:15.760
for TEE's before, but I'm going to borrow again from Mike Bursel's slides because this is the best

17:15.760 --> 17:23.760
visual that we have for this. There are three types of isolation. One workload from workload isolation.

17:23.760 --> 17:29.440
Number two, host from workload isolation. This is typically what you're going to be seeing if you're

17:29.440 --> 17:35.680
just throwing something up into the cloud. And number three is workload from host isolation.

17:35.680 --> 17:40.240
Now, this is not something that VMs and containers can traditionally do.

17:42.080 --> 17:51.360
That is why within the definition of the CCC's definition for this, we absolutely rely on hardware

17:51.360 --> 17:58.080
attestation because it allows us to handle sensitive data and sensitive applications with the complete

17:58.080 --> 18:03.040
isolation down to Type 3, which implicitly includes Type 1 and 2. It's the highest level

18:03.040 --> 18:10.480
security that we can provide for computer data. And that very fundamentally is what we mean

18:10.480 --> 18:16.000
when you read the words that confidential computing is the protection of data in use by performing

18:16.000 --> 18:24.800
computation in a hardware-based attested execution environment. So we've got confidential computing

18:24.800 --> 18:34.960
and confidential CVMs. So with CVMs, the emphasis is providing a secure and isolated virtual

18:34.960 --> 18:42.560
machine that can run applications. Now, we see the major shift from digital to EEs to full confidential

18:42.560 --> 18:49.840
virtual machines, leveling using AMD and Intel's chip bases. Our key challenges that we're trying

18:49.840 --> 18:55.760
to fix here are measured boot attestation and memory encryption. A really, really good session

18:55.760 --> 19:01.600
right after this is going to demystify that forest. Now, you have to understand confidential containers

19:01.600 --> 19:07.200
as well, which are extending confidential computing into containerized applications. And they secure

19:07.200 --> 19:11.920
containerized workloads to be run in a secure isolated manner while we're doing data and use.

19:11.920 --> 19:15.760
There's going to be so many good talks on this. But if you are really interested in understanding

19:15.760 --> 19:20.640
the build-up to confidential computing beyond today's talks, I absolutely recommend confidential

19:20.640 --> 19:29.680
computing demystified from as well. A big part of confidential adoption, confidential computing adoption,

19:29.680 --> 19:37.040
is to allow for cloud-native workloads. And at 1245, we've got yet another talk on trust no one

19:37.040 --> 19:43.360
secure storage with Coco, very worth your time. One of the most compelling use cases right now,

19:43.360 --> 19:47.040
so when you look at the technology itself when we're looking at the open source projects,

19:47.040 --> 19:52.960
I tend to like to break them down by chip. You're looking at either cloud environments, mobile

19:52.960 --> 19:58.000
IT, or we're looking at the frameworks and structures that allow us to do distributed systems,

19:58.000 --> 20:06.080
like their clues. These are slides from red hats, oxal-sad. These are excellent. So without

20:06.080 --> 20:11.120
giving you a specific use case, you can now understand that it's partner interaction to

20:11.120 --> 20:16.160
protected data sets trying to interact. Secure cloud burst, right? This is particularly

20:16.160 --> 20:21.600
for like medical use cases right now where they need to burst out with the has to stay confidential.

20:21.600 --> 20:27.520
IP protection, edge use cases, total time isolation, when you've got multi-tenants in the

20:27.520 --> 20:34.560
workloads base, and digital sovereignty. This slide I love, though, because this shows you

20:34.640 --> 20:41.760
the likelihood and distribution between bare metal and public cloud, which really does make sense

20:41.760 --> 20:46.560
depending on the use cases. If you have experience in this space, you understand that these also

20:46.560 --> 20:53.280
align to some security standards. To the actual use cases that I find very compelling,

20:53.280 --> 20:57.520
I'll go through these quickly because you can get access to them. I found the use case

20:57.520 --> 21:04.880
of using confidential computing to store and compute or execute over sensitive databases for

21:04.880 --> 21:10.320
human trafficking, very interesting, because it's the first time legally that we have been able

21:10.320 --> 21:15.680
to do so. Confidential computing provides such a high bar of controlling sensitive data,

21:16.720 --> 21:24.320
securing sensitive data that we were able to begin working across different locations on the earth,

21:24.400 --> 21:30.720
not sharing the data, but simply allowing workload execution into the local space without breaking

21:30.720 --> 21:36.720
any of the regulations without reproducing data sets, which would be harmful to reproduce. This

21:36.720 --> 21:42.080
is getting above and beyond typical regulation, meaning we are working in a paradigm shifting space.

21:43.440 --> 21:47.360
Another topic that's absolutely worth your time, you want to look at sort of the biggest scale

21:47.360 --> 21:51.360
and the biggest efficacy of this. It's going to absolutely be sovereign cloud. And a great

21:51.360 --> 21:57.920
example of this is Italy, Switzerland, and France using this to combine their efforts. These are

21:57.920 --> 22:03.280
three countries with very different privacy expectations, so it's very interesting to look at this.

22:04.000 --> 22:12.000
So regulation, we got done is the first example of a clear, written requirement in which confidential

22:12.000 --> 22:17.760
computing fits all of the use cases. So in Article 8 paragraph 2, we want to ensure confidentiality

22:18.000 --> 22:22.880
in use. Those are the two words that we need, confidential computing, secure site in use,

22:22.880 --> 22:29.120
and allows for isolated, productive processing, and it is the unique solution for securities

22:29.120 --> 22:34.400
data mandate in this case. Other things that you must be paying attention to are the CRA,

22:34.400 --> 22:39.680
the IATF draft, which many people in this room have been working towards, and the AI control

22:39.680 --> 22:45.840
matrix, which is coming out in a couple of months, from the cloud security alliance, and many

22:45.840 --> 22:51.760
of the CCC members are actively engaged in making sure that CCC is represented in that. You'll get

22:51.760 --> 22:57.760
the slides. But this is really getting as a composite understanding. You cannot understand regulation

22:57.760 --> 23:03.600
without understanding all of the regulation at once, so make sure to look at these. So CCCs

23:03.600 --> 23:09.920
exploding, regulators are watching, and zero trust is evolving. And for everyone in this room,

23:09.920 --> 23:14.480
whether you've been putting years of work towards this, if you are just considering that this

23:14.480 --> 23:18.480
might be a place where you want to put your engineering time, understand how cool this is.

23:18.480 --> 23:25.040
This is not just about protecting data. This is about building compute, where trust is built in,

23:25.040 --> 23:30.720
and not added on. A couple of the things I think are very cool that are coming into this list. The

23:30.720 --> 23:37.280
future of Austin is here. It's on these sessions. And I think we've covered it all. This has been

23:37.280 --> 23:43.680
a quick intro to the CCC. If you have not joined a Linux Foundation generally, as a

23:44.480 --> 23:48.960
corporate member, or as an individual, anyone interested in the internet, you can join in our

23:48.960 --> 23:54.480
SIGs, our special interest groups. Somebody might find really, really interesting. Our

23:54.480 --> 24:00.400
attestation SIG, or if you are generally more interested in how we're getting this move to

24:00.400 --> 24:08.000
into regulation, join the governance risk and compliance SIG. Again, this is a fully open institution.

24:08.000 --> 24:13.040
You can join any of these calls, and you can always look at the recordings, come talk to me

24:13.040 --> 24:20.320
if you want to learn more. Thank you so much.