WEBVTT

00:00.000 --> 00:17.240
All right, so, please settle down, while more people are tricky to lean, we have

00:17.240 --> 00:28.080
max already starting with Kerala's bomb, hey, I hope, I think it's so long, yeah, I

00:28.080 --> 00:34.200
am as a, hey, I'm max, I work at a IT consistency in Munich, Germany, doing some

00:34.200 --> 00:40.920
consulting stuff, and one part was working on creating an S-bomb for the Linux

00:40.920 --> 00:47.760
cone, and with the goal to have that tooling part of the cone sauce so that it can

00:47.760 --> 00:54.600
always be generated automatically. The tool, the development part of the tool is currently

00:54.600 --> 01:02.080
in this repository, and it's to generate an S-bomb, you just take the sauce fee, the

01:02.080 --> 01:07.880
object fees or the build output, and it will generate a S-bomb for you, and the vision

01:07.880 --> 01:12.760
is what's in its merge, and once it's contributed, you just call make S-bomb, and you

01:12.760 --> 01:21.200
have the S-bomb of the current cone, but let me explain how we get the metadata or the build

01:21.200 --> 01:30.940
graph for that S-bomb. The Kerala build, so that's a specific thing within the

01:30.940 --> 01:41.760
Kerala build, is that it generates .zmd files, this Java, this .zmd files here, they contain

01:41.760 --> 01:47.800
metadata about the build and about what happened, and we pass them, and an example for

01:47.880 --> 01:53.320
that is, so these are two examples, they are either just contain the make come up that

01:53.320 --> 01:59.840
was used for that specific target, or they even contain additional source and dependency

01:59.840 --> 02:11.320
information. If we look into that, it's, that's an example, it's the VM Linux file, and

02:11.320 --> 02:19.720
the corresponding .zmd file is shown here, and you see it calls Aldi, it has a lot of

02:19.720 --> 02:26.120
.ofs, as inputs, and it builds as an output, the VM Linux file, and for that we see, we

02:26.120 --> 02:33.560
can start constructing the graph of dependencies or file level dependencies within the Linux

02:33.560 --> 02:44.040
file. Another example is, here, the Kerala info.0, and that's one of the examples where

02:44.040 --> 02:51.000
the dependencies are already mostly extracted by the K-build build system, so it says,

02:51.000 --> 02:56.640
that's my source file, it's the .s file, and that are my dependencies, some .h files, so

02:56.640 --> 03:09.360
it's easier to extract the graph from the build, and additionally to these two cases, where

03:09.360 --> 03:15.360
we have the metadata, oh, I start slowing down, I see I was a little bit stressed in the beginning,

03:15.360 --> 03:23.960
so I'm not getting relaxed, right, additional to the .zmd files that we need, or that we have,

03:23.960 --> 03:30.040
where the data is already present, there are two cases where we need to fill some gaps,

03:30.040 --> 03:41.240
there are .appercase s files that contain .inkbin, I think that's include binary statements,

03:41.240 --> 03:46.920
these we need to pass them by hand, and add them to the build graph that we are building,

03:47.000 --> 03:53.000
and these files that are included, then again, have .zmd files and we can follow the graph

03:53.000 --> 04:00.840
even further, and there are some other cases where we currently still have to hard count the

04:00.840 --> 04:09.400
dependencies, so that there are some gaps that we still need to fill manually, but we hope that

04:09.480 --> 04:18.520
we can improve the cable script, and other tools to fill the gaps and get rid of this annoying

04:18.520 --> 04:26.280
ugly hard coding at some places, but I always link the issues in this light, if you download

04:26.280 --> 04:33.480
them online afterwards, so that's basically the first part of the presentation already done,

04:33.480 --> 04:40.920
I think I am too fast, but yeah, more time for questions, so we build a graph from the

04:40.920 --> 04:49.320
curl build by looking through the .zmd files, these two cases that I have shown, we have the

04:49.320 --> 04:56.360
include binary statements and some hard coding and that results in this build graph, and

04:57.240 --> 05:03.560
we have built tooling just to understand what's happening, but can visualize that, if you

05:04.120 --> 05:12.440
run the scripts, it's JavaScript, Sumable Visualization of the kernel build, and you can find out

05:12.440 --> 05:19.800
how does it all relate, and we also worked on validating if it is complete, since we might miss some

05:19.800 --> 05:30.120
stuff, so we did, we compared our data against an output of an s-traced kernel build, and saw that we

05:30.120 --> 05:39.160
have a 99.6% overlap, so it's still not 100%, so there are still some gaps, but the s-traced build

05:39.160 --> 05:48.600
also overreport some parts, so we are still in a moving towards 100%. And the second thing that we did

05:48.680 --> 05:54.680
is we just removed all files that are not listed and the kernel still built, so that's also a good sign.

05:59.560 --> 06:07.960
Yeah, that was part one, that's the data and how we get that, how we build the graph, and

06:08.040 --> 06:18.840
part two is what's to be generated from that data. We are generating spdx files as s-boms,

06:21.400 --> 06:28.680
I think we have heard a lot about spdx today already, and matching what we have heard today,

06:28.680 --> 06:36.120
we are generating three different s-boms, the output s-bom, I think that's not one of the listed

06:36.200 --> 06:43.880
names of types, but it's this single s-bom that describes the package that contains the

06:43.880 --> 06:50.280
commetadata and contains just the high-level information, it is very small and can be shared easily,

06:50.280 --> 06:56.200
it has some of the essential hashes, it has some of the essential metadata, and it's just the

06:56.920 --> 07:02.280
small thing that can be shared. We have the source s-bom that contains the final level information,

07:03.000 --> 07:09.400
and we have the build s-bom that represents the whole graph that links the sources to the final

07:09.400 --> 07:17.240
output, and yeah there's an edge case, the to distinguish between what the source, what is

07:17.240 --> 07:24.040
the immediate file way we need to have an out of three builds so that these two parts are

07:24.040 --> 07:31.720
than two different directories, that's how we distinguish them. So yeah and just to have also

07:31.720 --> 07:40.520
as light with way to small font, that's how internal structures it's a lot, but I have split it up

07:40.520 --> 07:46.680
into all the individual pieces, and that's the remainder of my presentation just to explain how

07:46.840 --> 07:54.200
that all looks in detail. So the source s-bom that contains the final information it's basically

07:54.200 --> 07:59.960
filled with the hashes of the individual files, some of the statically extractable information like

07:59.960 --> 08:08.600
SPDX license identifiers that are in the file, and we do some basic horrific to guess what type of

08:08.600 --> 08:19.240
file it is if it is a source file if it is a asset, all these things are tried to guess, and that's

08:19.880 --> 08:26.040
a simple entry of one of these files, just what I've said, machine-reelable s-tracing LD,

08:27.640 --> 08:33.640
and the source s-bom is basically a huge list of these, listing exactly the used files,

08:33.720 --> 08:41.880
and they are linked to the licenses with the has declared license relationship since this is part

08:41.880 --> 08:53.000
of the graph structure within SPDX. And on the other hand, on the other side of the spectrum,

08:53.000 --> 09:01.320
there's this output s-bom that contains the software package for the Linux kernel for all the

09:01.320 --> 09:12.200
modules that were contained in the build, and these packages are linked to the files that

09:12.200 --> 09:17.960
represent the packages with the has distribution artifact, so that's the first bridge from

09:17.960 --> 09:24.440
the metadata to files in the file system, and these distribution artifacts are also the tip of the

09:24.440 --> 09:35.320
build tree in the end, and that's the high level build element, since we also

09:35.960 --> 09:47.160
generate the build structure, that's the top of the build tree, it describes the, so it's basically

09:47.240 --> 09:56.360
the entry point of the top level build, but to get a deeper, there's the build s-bom in between

09:56.360 --> 10:04.120
that describes all the real details or the what command was used, what which files depend on which

10:04.120 --> 10:10.440
file, so this is then what this encodes the whole tree, and this is in the middle file,

10:10.840 --> 10:20.200
and all these small builds that are listed or the build objects that are just encoding a single

10:20.200 --> 10:28.600
command that was run during the build, they are linked with the ancestor of relationship to the high level

10:28.680 --> 10:42.840
build, and that's an example of that data, it's a build step, it has a comment that describes

10:42.840 --> 10:50.440
what was the command that was used, and it has two relationships which link the inputs on the

10:50.440 --> 10:56.520
one hand, and which link to the outputs on the other hand, so in the end it's build element that

10:56.600 --> 11:03.640
has these two relationships, and that links files in the file system together, and that builds the graph,

11:06.760 --> 11:17.240
so that's the file structure, and that contained a lot of details that I just went over pretty fast,

11:18.200 --> 11:25.880
maybe they are questions in the end, but yeah what's next with that project, then the big goal

11:25.960 --> 11:36.920
currently is to get it into the kernel source to get it into a sub directory in the kernel sources,

11:36.920 --> 11:49.560
and make it part of that, there's currently a contribution in process, and if that is

11:50.520 --> 11:59.080
merged in the end, you can just after a successful kernel build called make SPX, no,

11:59.080 --> 12:02.600
make S bomb that, that's a type of it should be make S bomb is the command in the end,

12:03.720 --> 12:09.080
I can, I don't fix that right now, but that's the command in the end,

12:10.280 --> 12:14.680
and that is also described in the email conversation that is currently happening, that's the

12:15.400 --> 12:21.000
current contribution that is in progress where we are discussing with maintainers,

12:21.880 --> 12:27.560
how to, how to adopt what to fix and how to get it aligned with the expectations,

12:28.680 --> 12:37.880
and we hope that we soon get green light and that it gets merged, not sure at which

12:38.040 --> 12:45.240
release it will arrive in the end, and further next steps, we are interested in feedback,

12:45.240 --> 12:53.000
if you want to look at the output files, the CI is generating BAM for 203 example builds,

12:53.000 --> 12:59.160
that are uploaded as assets to the CI, so you can look at examples, you can investigate examples,

13:00.040 --> 13:10.600
we are thinking about broadening the support for architectures, since we don't have for all

13:10.600 --> 13:14.520
architecture and analyze what are the gaps, what do we need to get to 100%,

13:15.640 --> 13:21.800
so there's more work to do, and we want to understand how this could integrate with other

13:22.600 --> 13:29.720
buildings, for example, if a Yachter builds something containing the kernel and itself builds

13:29.720 --> 13:35.400
aspects, how to bridge the gap, how to interlink them, and how to have them pointing at each other,

13:35.400 --> 13:42.120
I see a thumb up, yay, and that's it, on the right, that's the QR code to the repository,

13:43.000 --> 13:49.960
and on the left that are my coordinates, you can reach out to me, and now questions,

13:51.800 --> 14:09.160
so BSI is asking for 512 hashes, is it possible to potentially parameterize it so people can

14:09.240 --> 14:20.920
generate the 512 hashes automatically? We are using, so, we're using 256, yeah,

14:23.640 --> 14:29.160
for sure it's possible, we are using the native Python support for generating hashes,

14:29.160 --> 14:33.160
the whole library is built without any external dependencies, that was a requirement,

14:33.160 --> 14:41.000
that's why we use no other open source, SPDX2, but yeah, it's I think the support is there,

14:41.000 --> 14:47.160
if we just need to do a minus switch, or we can add parameters, yeah?

15:03.160 --> 15:14.920
So the question was, what do I think about incompatibility, compatibility

15:14.920 --> 15:25.000
about around the ecosystem, so for example, if we go to Yachter and the Slainx kernel build,

15:25.000 --> 15:39.640
or what ecosystems do you want to compare, from SPDX to CDX, for example, the conversion between

15:41.080 --> 15:51.160
S-POM ecosystems, I think it's a hard problem, I think we, for example, the build tree that is

15:51.160 --> 15:56.520
encoded here, that has an instance of relationship that links the high level built in the

15:56.520 --> 16:03.480
low level builds, which is a concept, I don't know whether it is, there's something that it can

16:03.480 --> 16:15.160
compare to, it's a dependency in Cycle and DX, but at the same time, tool that's what's

16:15.160 --> 16:26.280
understand, dependencies, okay, Anthony, your name was Anthony, right? Anthony said that it would

16:26.280 --> 16:33.000
be a dependency in CDX and you would need to add a comment or metadata to to encode that it's

16:33.960 --> 16:43.320
basically a dependency of builds and other dependency of dependencies, so it's, um, there's a lot to do, yeah?

16:43.320 --> 16:51.320
Yes, the relics kernel has put the SPDX ideas in the source code, yeah, and potentially to this

16:51.320 --> 17:00.520
relicating syntax, a GBL to the zero instead of the zero only, or only, or like, so I try to

17:00.520 --> 17:11.320
convince them to correct that, because we don't care. But even on the SPDX, the question was that

17:11.880 --> 17:19.480
the Linux kernel still uses the GPL 2.0 as, uh, identifies, which are, deprecate, it's deprecated

17:19.480 --> 17:27.960
the right route, but they are still correct, valid. Yeah, but the kernel people want to be short.

17:28.760 --> 17:38.040
Yeah, um, I can repeat the answer, the FSF requires the, there's only the curl wants to keep it short,

17:38.040 --> 17:44.280
but I'm still thinking that GPL 2.0 is still not valid, I identify, but the, deprecate is still

17:44.280 --> 17:53.080
valid as PDX, yeah, any other questions, uh, in the back there? So, the architecture differences,

17:53.080 --> 17:59.400
what are the main features for making this work for, say, how will PC you work this time?

18:00.440 --> 18:06.120
Um, it's this completeness analysis, um, what are the issues with getting it to work for power, PC,

18:06.200 --> 18:13.720
and risk five? The issues are that, um, some of the tools behave differently, um, we have,

18:13.720 --> 18:21.800
I mentioned that we are passing the, uh, the commands, so this example here, so there's a command

18:21.800 --> 18:26.600
string that are, that we are passing, if they are different tools and what we need to write different

18:26.600 --> 18:31.560
passes that support these tools, that is one, um, problem, and the second problem is that,

18:31.800 --> 18:37.800
um, there might be more things to be hard-coded, there might be more edge cases to be supported,

18:37.800 --> 18:44.680
so they, all the completeness analysis needs to be done, and it's, it works, but there's a less

18:44.680 --> 18:49.560
guarantee that it's complete, and it might run into problems, if commands cannot be passed.

18:50.360 --> 18:54.200
They include binary things, and your assembly is different in the other assembly.

18:55.400 --> 18:57.400
So, thank you very much, Max.

19:01.560 --> 19:03.560
Thank you for giving us your work going.

