WEBVTT

00:00.000 --> 00:08.320
All right everyone, Jen Looker here is going to be talking about

00:08.320 --> 00:13.560
porting GGML to Nux Kernel Development Frameworks.

00:13.560 --> 00:16.320
Are we ready?

00:16.320 --> 00:17.320
Nope.

00:17.320 --> 00:18.320
Just staring around.

00:18.320 --> 00:19.320
There we go.

00:19.320 --> 00:20.320
All right everyone.

00:20.320 --> 00:25.600
I give Jen Looker a round of applause.

00:25.600 --> 00:35.600
Okay everyone, my name is Jen Looker.

00:35.600 --> 00:39.680
The title will make sense as I go along.

00:39.680 --> 00:45.880
About me, while the older I get, the harder it gets to say what my specialization is.

00:45.880 --> 00:52.520
I've been working very skills, but whatever I do, I tend to naturally lift towards

00:52.520 --> 00:56.840
the border between software and hardware, it might do.

00:56.840 --> 01:03.680
I'm constantly working for a RIFI startup, but it's employers, I express my collective

01:03.680 --> 01:09.640
personal way to express my collective madness to the lockdown was building analog synthesizers.

01:09.640 --> 01:15.200
Well, it is my personal project, so I has nothing to do with my past or current employer.

01:15.200 --> 01:17.840
And so about this talk.

01:17.840 --> 01:25.760
So this talk is mostly actually about how to run GGML on a constrain environment.

01:25.760 --> 01:31.360
And by the title, I guess your question is what is Nux, and that's the first things I

01:31.360 --> 01:34.160
will try to answer.

01:34.160 --> 01:39.640
And then once I get what Nux says, like hardware and which is a constrain environment to run

01:39.640 --> 01:42.000
on better.

01:42.000 --> 01:46.560
The kernel, the Nux Kernel is a kind of space component, so you know, it handles interrupt

01:46.560 --> 01:51.120
exception and requests from the user space, and there's a proper user space, but usually

01:51.120 --> 01:56.640
you can, you know, you essentially have what you do with people called program.

01:56.640 --> 02:02.640
And yes, the request of this call and the model that I have at the start from any web

02:02.640 --> 02:08.360
notes is that you just have one user space learning, but of course the kernel can support

02:08.360 --> 02:12.720
creating new software, new programs, new space and go.

02:12.720 --> 02:13.720
Right.

02:13.720 --> 02:15.920
So what's our care looks like?

02:15.920 --> 02:23.120
Our care is again, I want to make it portable, so the way I did it, I made it an alphabet

02:23.120 --> 02:28.480
loader, so health is the binary system, and the two were away to see a Nux.

02:28.480 --> 02:33.440
One is the link view of a NL, which is based on segment, but then there's the loader

02:33.440 --> 02:38.600
way to the loader view of a NL, which is based on program ahead of.

02:38.600 --> 02:45.000
So what I do is that when you load the kernel, there was special program methods that

02:45.000 --> 02:50.200
tells where the kernel is, one's things like the frame buffer or the information page or

02:50.200 --> 02:55.440
the memory map and other data, so it's completely separate as a way, but in this also

02:55.440 --> 03:02.400
very portable, so what I had is our call library, and then I have various machine-dependent

03:02.400 --> 03:06.640
platform-dependent functions, and then you can just like the call plus one of these, you

03:06.640 --> 03:12.680
know, gray boxes, which can move to NFI or to booth on SPI, and right now targeting mostly

03:12.680 --> 03:20.600
it is 564, and it is exported to N64, big, so the second part, so after I load the

03:20.600 --> 03:24.960
bootloader, I do have a kernel, the kernel, what is a kernel?

03:24.960 --> 03:32.200
Well, in the most abstract way, a kernel is executable, that is loaded by the bootloader,

03:32.200 --> 03:36.760
and essentially reacts to various events that happen in the system.

03:36.760 --> 03:41.280
So in order to achieve that and making that simple as possible to build a new kernel, because

03:41.280 --> 03:46.640
there was my goal, I have, of course, like two libraries that are dependent on the platform,

03:46.640 --> 03:52.160
one is the hull, which is up to a station layer, but it's mostly meant to boost up the

03:52.160 --> 03:57.360
system, so I've got essentially jumps into the beginning of the, of lib hull, and then

03:57.360 --> 04:02.800
essentially abstract the CPU away, and then there's the platform, the platform is the part

04:02.800 --> 04:09.680
that controls mostly, the intercontroller and now, and provides a timer, because this is

04:09.680 --> 04:17.880
usually a fundamental part, again, I suppose, it's 36, 32-bit and 64-bit, and this five in

04:17.880 --> 04:26.200
both of them, and mostly, so for example, the SIPI version for this five is coming, but

04:26.200 --> 04:27.200
it's not ready yet.

04:27.200 --> 04:31.240
And then, of course, I have our portal, lib hull, which is the interface, which usually

04:31.240 --> 04:36.320
the custom kernel code that you write interface with, and, you know, it provides all the

04:36.320 --> 04:40.120
stuff, which you need when you programming a kernel, like, you know, memory-allocates of

04:40.120 --> 04:45.080
mapping, you know, they use a stack frame to switch threads, and, of course, the page table

04:45.080 --> 04:50.280
to switch page table and global, you know, the global memory currency, and, you know, panics

04:50.280 --> 04:56.880
which is what you mostly run when you run a your own kernel, okay, so what is like to be

04:56.880 --> 05:01.000
the kernel for people interested in, in nox?

05:01.000 --> 05:06.720
Well, the way I see it, as I before, essentially, is a kernel code that is a seed programming

05:06.720 --> 05:11.040
in this case, but you can be voted to new things, the essential reacts to various events.

05:11.040 --> 05:16.480
One is the timer, the other is the intercontroller exceptions, the Q-sup programming device,

05:16.480 --> 05:20.920
which is a seed-school, and, essentially, in order to write kernel under nox, you just

05:20.920 --> 05:26.160
need to implement these functions, and, there you go, and then you can just boot and put

05:26.160 --> 05:27.160
in things.

05:27.160 --> 05:33.320
Of course, having a kernel without a user space, I can call the user devices over a little

05:33.320 --> 05:35.960
news, although this demo is all about this.

05:35.960 --> 05:41.440
And so, I also provide a way to have a standard way to program the, at least the beginning

05:41.440 --> 05:46.000
of our user program, and so, essentially, there's a lip nox user that just ups out the

05:46.000 --> 05:53.000
C-scopes mechanism that a kernel use, and, yeah, that's actually what I'm doing, yes,

05:53.000 --> 06:02.520
as I said earlier, there's only one user space at the beginning, and, yeah, and so, yes,

06:02.520 --> 06:05.600
of course, as I said, like if you want to run multiple programs, the kernel, of course,

06:05.600 --> 06:10.520
need to have a four-core anexic or as powerful whatever you need.

06:10.520 --> 06:15.320
Right, this is important, despite the minor details, because it would be important for the

06:15.480 --> 06:16.320
test.

06:16.320 --> 06:22.040
The talk, there's an anxangiros, it's a call it, and it's a lip, and by the C, that is a C-lab

06:22.040 --> 06:27.040
rate that I call it together, mostly taking bits of net BSD, and right in the stuff in

06:27.040 --> 06:28.040
a simple way.

06:28.040 --> 06:32.760
It's simply a very small lip C, because in order to create a binary, usually, during

06:32.760 --> 06:38.600
the C-run time, which is certainly C, or mostly, and then C, at the end for the various

06:38.600 --> 06:43.760
constructor, I live C, and then the C. So, you know, you have a main function, the main function,

06:43.760 --> 06:48.280
actually, is called by the C, at the C, or the C, or T in general, and then usually have

06:48.280 --> 06:52.000
the various function, like the interface, and stuff that are part of the lip C. And so,

06:52.000 --> 06:55.840
this is the part that you need to create a binary, and if you go back, everything goes

06:55.840 --> 07:01.520
things that I've explained, from this, from the user space, to the kernel, to the arcade,

07:01.520 --> 07:07.280
itself, the whole binary. So, this is, this is the call, this is really the call,

07:07.280 --> 07:11.040
actually, what makes noxveres it bought.

07:11.040 --> 07:18.760
Right. So, another we have, this idea of what noxes, how do I, actually, run GGML on it?

07:18.760 --> 07:25.040
So, let's start a look at what GGML is, at least in my view, so in the most simple way.

07:25.040 --> 07:32.080
Right. The way I think personally, in GGML, not definitely an expert, is that, in order

07:32.080 --> 07:38.160
to build the most minor, minor minimal GGML, I have built, essentially, you can look at

07:38.160 --> 07:42.800
this components, I don't know how to call, but this is how I abstract them. For one day,

07:42.800 --> 07:47.680
there's definitely a part of my abstraction, which is something that, you know, like GGML's

07:47.680 --> 07:53.600
time, or GGML, all of the functions that actually abstract the actual low level call,

07:53.600 --> 07:59.680
then there is utility functions, which are, you know, there's a lot of functions to do

07:59.760 --> 08:05.360
file I here, also for GGML, for dot, for our files of loading models and things like this.

08:06.320 --> 08:11.120
And then there's the function when you actually build the model, so you do, you build the

08:11.120 --> 08:15.520
graph, then there's a table implementation that is the one that actually start programming,

08:16.160 --> 08:22.640
running the compute task, then there's the GGML's function, and this is the, I'm not talking

08:22.640 --> 08:27.680
about, I'm just talking about, you know, CPU executions, so I'm not talking about back-end

08:27.760 --> 08:32.240
stuff like this, but in general, this is the model that I have for GGML and that I'm going to use,

08:32.240 --> 08:38.880
you know, right? So let's have a look at the software side of it, what are the dependencies?

08:38.880 --> 08:44.960
Well, GGML is written in C and at least this part of it, in C, and I'm very minimal set of C

08:44.960 --> 08:50.880
plus plus, luckily. And so usually this is what you get when you try to compile GGML,

08:51.840 --> 08:57.440
there is some standard C, like there's definitely a lib C, that's of course shockingly,

08:57.440 --> 09:02.880
mathematical operations, and usually the lib C doesn't implement this floating point of

09:02.880 --> 09:08.640
relation, that's sort of standard function called lib M, you need, for the C plus plus part,

09:08.640 --> 09:12.800
you need usually like standard C plus plus library and the C plus plus one time,

09:14.160 --> 09:19.840
uses the title, uses P2, and actually as a cell, I will say later, it's actually quite difficult

09:19.840 --> 09:28.640
to separate both of them, and then GGML, GGML, time, uses a POSIC, so, you know, it's a more unique

09:28.640 --> 09:33.920
specific, of course, to do things, and it's like locked monotonic, just back to the POSIC standard.

09:34.800 --> 09:41.120
The venues that I had, that made this so possible with these, were as the fact that the subset

09:41.120 --> 09:46.080
of the C plus plus standard library is very small, is mostly back to the narrator's end,

09:46.160 --> 09:49.600
you know, going to get the crazy stuff, that C plus plus can give to programmers.

09:50.880 --> 09:57.760
Right, so what does it take to port GML to nugs? Let's have a look again, another, you know,

09:57.760 --> 10:03.040
how the software of nugs is architected, what do you get when you're actually running a system

10:03.040 --> 10:09.680
under nugs, so after you've rooted? And, well, we support the MPs, so we have multiple CPUs,

10:09.760 --> 10:17.280
and in modern architectures after the 361 will say, the focus is mostly on chupy-vierage level.

10:17.280 --> 10:21.280
One is the kernel, and the other is the user. And, as I said,

10:23.200 --> 10:29.680
when we put on the nugs, when I calculate, when I calculate this, nugs will create,

10:30.480 --> 10:35.840
we will have like a kernel image, and I usually match on each CPU, each being the same.

10:36.800 --> 10:42.480
And, of course, you can run different code for HPP, you can specialise it, but like at the beginning,

10:42.480 --> 10:47.040
you give this to welfare, you will be sure that HPP has the same alpha mark everywhere.

10:48.880 --> 10:55.520
Right, so, when I found myself in today's, and I got GML link and say, how would I like to run?

10:56.240 --> 11:02.720
And, this layout is increasingly flexible, because you can do everything, and so, I wanted to create

11:02.720 --> 11:07.680
a compute platform, because I just wanted to say, okay, let me see what I can get, if I can run today.

11:07.680 --> 11:14.960
So, I decided that I actually like the idea to dedicate entire CPU that runs and interrupted,

11:14.960 --> 11:22.800
inside the system, so, instead of having threads that get created, and so, locally, I can just simply

11:24.320 --> 11:29.440
dedicate entire CPU, ask compute threads. And then, of course, it would be very useful,

11:29.440 --> 11:32.800
as if you have this machine that boots and start doing this calculation without the ability

11:32.800 --> 11:37.200
to communicate or to the saved things. So, I can use the bootstep CPU, which

11:38.480 --> 11:43.600
most of the time, it's easier to just, you know, the speed that start in the system to have a

11:43.600 --> 11:50.320
system interface. It's actually at the most minimal, as you can think of, or whatever you want to run.

11:51.360 --> 11:55.520
Unicunners will be a good idea, because essentially, the just use one thing.

11:56.240 --> 12:02.560
And so, this is the plan, this is the model that I want to, for this approach, for time to get

12:02.560 --> 12:10.240
some sense into notes. Right, how do you, and so, here's the thing. So, I'm going to look at

12:10.240 --> 12:16.480
a compute CPU, as it said, we have kernel and user space, and there are two ways to run things.

12:17.120 --> 12:23.520
One, I can just directly run GGLML, and whatever, in the tech computing vector in kernel mode,

12:23.520 --> 12:29.040
so in highest, in the higher privileged mode, which in the nox model, actually that means that

12:29.040 --> 12:34.800
interrupt will not interrupt, you will not interrupt it ever, not even by interrupt. All, I can do

12:34.800 --> 12:41.120
something which is, you know, more say, and usually, which is like a kernel that is just a compute

12:41.120 --> 12:47.760
support, and then the GGLML computer is running in user space, so that you can request stuff, and you

12:47.760 --> 12:54.800
can just, but it would be interrupted. So, usually, the reason to run things in user space is

12:54.800 --> 12:59.760
because there's a corruption in the user space mode, usually the kernel is not affected, so the

12:59.760 --> 13:05.520
sub-each of the system is not affected. The problem is, of course, that it's also easy to put libraries,

13:05.520 --> 13:10.800
because you can just compile way more stuff, and having the compute support to implement the

13:10.880 --> 13:17.360
stuff needed, there's a possibility that by, by interrupting the stuff and also fact that,

13:17.360 --> 13:21.200
you know, that there's some privileged operation, you need to do the system, back and forth,

13:22.080 --> 13:28.480
and this was the demo, so I decided to go for the first option, and so running everything in the

13:28.480 --> 13:35.920
kernel and see what happens. So, this is what end when the, what the software's fact looks like,

13:36.880 --> 13:42.160
we got GGLML first, of course, we need to have the Liban, because let's see, we give them courage.

13:43.920 --> 13:49.120
I'm going to use Libnu's compute is the first library that I used there was on top of LibiC,

13:49.120 --> 13:54.720
which is essentially the part that allows me to allocate thread just by scheduling a CPU,

13:54.720 --> 13:59.360
so the CPU are waiting, and when I'm in the thread, essentially the needs to compute with

13:59.360 --> 14:04.160
set to a CPU, okay, now start, and of course, it stops and implemented the whole condition

14:04.160 --> 14:09.360
waiting for the various CPUs, so essentially it kind of emulates the thread tool, and then

14:09.360 --> 14:15.760
I have to write something that is awful to look at, but it works, which is LibgGLML UX,

14:16.400 --> 14:21.360
which is well, the dirty work is done, and so everything that actually needed that was missing

14:21.360 --> 14:26.720
is both there, so one is the C++ on time, which is like some sector completely function that

14:26.800 --> 14:39.520
we need to implement for the C++ code compiles to run, okay, yes, and yes, so you need that,

14:39.520 --> 14:45.440
and then you need, if see, essentially, to LibiC, because LibiC is minimal, so you know,

14:45.440 --> 14:50.480
sometimes you need, sometimes you need like Q source, and stuff like this in GGLML,

14:50.480 --> 14:54.800
and so I said, okay, instead of adding to LibiC, I'm just going to add another library that just

14:54.800 --> 15:00.960
includes next user headers, and so I did that, and only sanction one there, and then, yes,

15:00.960 --> 15:06.160
and so of course, I have to map it side to no compute, and I have to implement it in GGLML's time

15:06.160 --> 15:14.640
directly to use the Libnux timer, so putting all together, this is what this has been my,

15:14.640 --> 15:21.280
what I did, this is our system looks like right now, so right now, in order to test this,

15:21.280 --> 15:26.800
I, as you can see, I have all the GGLML running in the CPU, in the compute part,

15:26.800 --> 15:32.640
and I needed a quick demo, so I took example GPT2, and of course, the model is loaded in memory

15:32.640 --> 15:39.280
shed, I could just land directly, booting from hardware, yeah, you can find the code there,

15:39.280 --> 15:44.800
is originally, I was trying to put last directly to nuts, then I said, okay, you know what,

15:44.800 --> 15:49.840
I can actually do directly by implementing the tent pool, it's going to be simpler, and it was,

15:50.080 --> 15:56.240
and yeah, so it's a prototype, it's just, I wanted to say, look, we can do it, which was my goal,

15:56.240 --> 16:02.960
and also to learn what it takes to run GGLML on a very minimal environment, what you need to,

16:02.960 --> 16:08.960
to look at the various dependencies in order to understand it. So yeah, the code is there,

16:08.960 --> 16:13.840
it's for anyone who wants to read it, and the final consideration, well, honestly,

16:13.840 --> 16:19.920
porting it was much, much easier than expected, when you, when you see as it was last code,

16:19.920 --> 16:24.480
usually you do not expect that to see that, but actually, it actually is very saying, and

16:25.360 --> 16:29.840
it's, it's, it's, it's still very, very valuable to port in various different environment,

16:31.360 --> 16:35.840
the only thing that I would, I would do is like simple modification, but unfortunately,

16:35.840 --> 16:40.800
they're in two, the core architectures like, for example, GGLML, C contains everything from

16:40.800 --> 16:45.600
file I owe to various plus or plus section definition, and probably separating different file would

16:45.600 --> 16:51.280
be very nice, it was very impossible, very difficult, so that's why I had to learn the

16:51.280 --> 16:57.840
pizza to do the various, to implement the type of, it would be very nice, if the type of

16:57.840 --> 17:03.840
would just be linked in a separate file, it would just change it, and yeah, and again,

17:03.840 --> 17:10.560
like the plus one section is defining GGLML, C, and I would need to do some ifs to define

17:10.640 --> 17:15.120
all those things like the posit head interface, and that's pretty much it, this is where you

17:15.120 --> 17:18.400
can find more information about it, it really was!

