WEBVTT

00:00.000 --> 00:15.600
Okay, hi, hi everybody, approaching to the end of the post, and today I'm going to update the

00:15.600 --> 00:23.600
talk I did it in back in the web procedure, can we run back in Tokyo, I did a similar talk

00:23.600 --> 00:28.480
there, I think most of the guys here in Europe, my lobby attend that conference, I'm going

00:28.480 --> 00:36.640
to repeat it, although it's online, but also we give some updates, it's the two months

00:36.640 --> 00:42.240
talk in the December now, it's one month, and then we make it a lot of great progress.

00:42.240 --> 00:49.360
So today's talk is all about picking the record, how to we get, we spy, upstreaming

00:49.360 --> 00:58.120
to in better than any other of the ISA and trying to match of the X86 as much as we can.

00:58.600 --> 01:04.920
And I'm feeling, I'm the flunked up, but deep computing, so actually I'm a software guy, before I

01:05.800 --> 01:13.160
I unknow my money from software, and I star, I'm VM guy and compiler guys, I put a most of the VM

01:13.960 --> 01:21.720
from old age, I already am older to JavaScript, to old adult go, whatever VM I put most of the stuff,

01:21.720 --> 01:27.240
and I also started compiler guy, I familiar with all, in back in the old age, the MIPS compiler

01:27.480 --> 01:32.280
opens it for, so I'm the founder of the computing, I found the company hardware company in

01:32.280 --> 01:39.240
Co-Wick, and I'm very serious software guy, I understand where Alan Key says that, so I don't

01:39.240 --> 01:44.840
know the money from software and losing all in hardware, it's very good.

01:47.240 --> 01:53.640
So, and I make Naptop, this file laptops, so this is the 2003 and we make the most expensive

01:53.640 --> 02:02.840
wrist file laptop in the world, now is in museums, so it's $5,000, and yeah, we didn't make them up,

02:03.880 --> 02:12.360
but 100 pieces, hopefully, is the Pentate N, and then in 2004, I make a very affordable

02:13.160 --> 02:17.240
second generation of laptop, we have A core, and then that's separate, as well.

02:17.240 --> 02:23.560
So, I stuff in and out, and the first ever time, you went to a federal account, if I support,

02:24.600 --> 02:29.800
and that's the beauty of a wrist file, and after a while, I started thinking about, I only care about

02:29.800 --> 02:33.480
wrist file, I don't care about battery, I don't care about screen, I don't care about everything else,

02:33.480 --> 02:40.360
keyboard, on the hardware, so when I do laptop, I stuff in and out that, I lose so much money on the

02:40.360 --> 02:45.800
rest of the things, nothing to do with the wrist file, and I stuff like that with framework, so they have

02:45.800 --> 02:50.200
a booth there, so you can see, they have a laptop and support multiple architecture, I think you

02:50.200 --> 02:54.680
guys should know about it, if you want to do kernel, you can port all different architecture from

02:54.680 --> 03:00.120
xxx to arm to this file, they won machine, you can support all the architecture with different

03:00.120 --> 03:07.160
motherboards, so do, do, do, do, do, do, take a look, and then, and although it's nothing to do,

03:07.160 --> 03:11.800
kernel, I would soon pass it by this, basically it's very good for our European cultures,

03:11.800 --> 03:18.440
so it's DIY, fix it yourself, last forever, very monumental, so, and then I did the first

03:18.440 --> 03:26.360
aroma, a framework, motherboards for them, the same fork hole is around 150 dollars, 80 grams,

03:27.160 --> 03:35.240
but now we move about probably 200 dollars, because DDR, price cannot crazy, so, and then as you

03:35.240 --> 03:43.880
mimic a Nala ball, with 64 DDR, and it double the price now, if or whoever buy the first patch,

03:43.880 --> 03:52.280
you already get rich, so, and much, and now two months, two months ahead of the mass production,

03:52.280 --> 03:59.080
so now we get a 16-core, two one-by-gigahertz, and it's a pretty usable list file now, so everyone

03:59.080 --> 04:03.800
won the kernel should help on this file kernel, and you will write the height, because I didn't

04:03.880 --> 04:09.480
this file kernel need more help than any other architect, and the architectures, so, for ARM,

04:09.480 --> 04:16.040
I've got a lot of money, ARM have a lot of money, they have a lot of people, I'm working the kernel

04:16.040 --> 04:23.560
by for this file, we don't, so, now, back to the state, we'll talk about why this file

04:23.640 --> 04:31.000
and upstream main, the differences, so, for ARM, whatever ARM, SOC come out,

04:32.280 --> 04:38.200
they will sell meanings, so they don't really care about human, why should they care,

04:38.200 --> 04:47.320
right, their SOC value, and then, and for this file, it's really bad, because, for Rau,

04:47.320 --> 04:52.440
the risk file, the whole SOC where, from the kernel to 2-chain, to upper stack,

04:52.440 --> 04:58.760
whatever, it's not optimized, because the hardware is not there yet, so SOC won't catch up,

04:58.760 --> 05:07.160
so, for us, looking at hardware, I have a very funny experience, the three years ago,

05:07.160 --> 05:13.560
my 4-core machine running dead-time, the damn bin, is very slow, but now running damn bin 13,

05:13.640 --> 05:21.240
it's much more faster, it's bizarre, all right, the reason why it's because,

05:22.520 --> 05:29.240
this file software is better and better, similar, so that means I wish for hardware,

05:30.680 --> 05:37.960
it has to leave a very long silicon on time, so that demands more upstream main needs to be

05:38.040 --> 05:45.160
the need of long time, so that's why I would say that compared to ARM,

05:46.280 --> 05:51.000
we specify it desperately need upstream main, more than any other architecture,

05:52.680 --> 05:58.920
and then the other thing is that all the SOC guy, including myself, I'm making a motherboard,

05:58.920 --> 06:04.600
I have to guarantee the mass production, so I only allowed to work on LKS,

06:05.560 --> 06:13.000
because if you use upstream cano, the main 9, I know idea where the problem is, it's so freaky,

06:14.840 --> 06:20.760
if it gives me a 6-tock 20, if you're not LTS, I'm scared,

06:21.560 --> 06:29.640
because I know idea where the hardware, the hardware, so that is the hardware parameter, I'm saying

06:29.640 --> 06:34.520
that we have to work on LTS, but at the time we work on LTS guarantee the mass production quality,

06:35.160 --> 06:42.680
and then we off the track, we off the main 9, probably, I remember I gave my Roma second,

06:42.680 --> 06:49.800
left up to quite KH, I have seen, can you help me to upstream everything? It said,

06:49.800 --> 06:56.120
you don't, you have 100 or what, 60,000 nine of code change, no chance,

06:57.400 --> 07:04.920
is that even then this can't do that, but anyway, so that, that is the pain of it, so,

07:04.920 --> 07:11.640
and I think there's two methodology, we got the company wrong, because from over our SOC guy,

07:11.640 --> 07:17.000
at the mass production guy, we still inherit a lot of habit for ARM,

07:17.400 --> 07:26.200
which we work on LTS, but in fact, we should work on the post same as X86, we work on

07:27.000 --> 07:34.360
main 9 and back port to LTS, but that is the wrong thing we have done, so, and I give you

07:34.360 --> 07:42.920
your status, so now I give you to the first generation motherboard status, and it's where funny,

07:43.000 --> 07:52.360
I'm not sure you guys can see it, the old PowerPoint, and then 2018, we got the IP,

07:52.760 --> 07:59.400
and then SOC will four years later, we get it into the SOC, and then I take another two years

07:59.400 --> 08:06.360
to get into my motherboard, so it's 60 years, so, but you need to find out, take a look at

08:06.360 --> 08:16.200
the upstream progress, our SOC is the first, when upstream is back in 2023, it's painful, it's low,

08:17.000 --> 08:23.000
so, and according to a QRKH, there's a one thing to do, okay, this has done the upstream

08:23.000 --> 08:32.200
means since FPGA evaluation, revocations, so the mass patch, I still ongoing, 30 spray with GPU

08:32.280 --> 08:39.080
stuff related, so we know, right, mean this, we know we can solve the GPU saga, mean this kind,

08:39.080 --> 08:47.480
no one else can, and that's it, and then average the upstreaming, and another thing is that we

08:47.480 --> 08:54.520
let go of experience of upstreaming, so it takes 10 rounds, ping-pong, 10 rounds, so it's a bit one patch,

08:55.240 --> 09:04.680
okay, this first motherboard, it's pretty much the upstreaming, okay, and then the second one

09:04.680 --> 09:13.800
is even more awful, the CPU IP is related, absolutely means not even done yet, since the IP

09:13.880 --> 09:22.440
introduced in 2020, so that's the one thing to do, okay, so how are we going to do better, so I

09:22.440 --> 09:31.640
forced all my SOC guide to do more upstreaming, and we spied the two better, and graph says that as

09:31.640 --> 09:44.520
well, so this is the work work back in 2020, the 2008 says that you need to upstream

09:44.520 --> 09:53.800
stuff from the simulations, right, that's x86 away, I'm the end to it, right, so it's actually 15 years

09:53.880 --> 10:04.280
ago, 20 years ago, so experience, so, and how do we do it, so basically, I forced my the first

10:04.280 --> 10:11.080
generation motherboard SOC guide, I said you had to upstream before the chip comes back, and and we

10:13.960 --> 10:21.480
and actually is scary, because of all our engineer, even have experience, is arm cultures,

10:22.280 --> 10:30.920
the race scale, the level that it before, so, and and a lot of things as well, even the main

10:30.920 --> 10:38.040
time that we concern, you don't have hardware, should I allow you to submit it, so that there's a problem,

10:43.880 --> 10:49.800
and the approach, we do it the simple one first, Pintential, or majority of the CPU IP related

10:49.800 --> 10:56.200
first, FBJ before, then we do it easier to difficult, that's the way we do it on the K3,

10:57.640 --> 11:05.640
so, and how's the progress, so we do meet a few surprise, during the process of it, before,

11:05.640 --> 11:10.360
during the FBJ verification, and the tip out before the chip comes back, the three months will be

11:10.360 --> 11:18.600
a lot of surprise, so first thing, we realize that people look at the call, if we do in the

11:18.600 --> 11:23.720
corner, then we know we stray way, know what's the problem of the call, decide how we're designed,

11:24.840 --> 11:32.040
we call it, but a little asymmetry, that means they have mixed up, mixed cluster,

11:32.120 --> 11:44.040
RV-25, one RV-25, and they have different RVV, and and I have AI customer instruction in one,

11:44.040 --> 11:48.760
so to the software point of view, I still remember Quest email to me, it's a unit,

11:48.760 --> 11:55.720
the school up, you should let me know this diagram nine months ago, basics is very painful to get

11:55.720 --> 12:02.600
upstream, and software support, so and besides that, we have some DMA zone easier as well for the

12:02.600 --> 12:09.560
two-peck, and then the other thing is that when we start the silicon upstreamming, we do

12:09.560 --> 12:15.320
share a lot of co-out for all the community guys, so initially I invite people to a private,

12:15.320 --> 12:19.640
private repository to look at our call, we scale off with this closer or the hardware information

12:19.640 --> 12:25.240
before the product announced, but and then at the end I mean to convince the associate

12:25.240 --> 12:30.360
guys that we have nothing to lose, it's bad enough, let's open it up, let the world say it,

12:31.400 --> 12:40.120
so that's why we open source everything, first-ever open source everything before the chip comes out,

12:41.160 --> 12:47.160
so and then we do pick a few moves, the FPG target, so we make the wrong target, we separate the

12:47.320 --> 12:53.400
patch, people refuse it, and then we, we, we didn't go for any of the most of the testing,

12:53.400 --> 13:01.480
we just do upstreaming without testing it, so it's very brave, and, and yes, when in reputation

13:01.480 --> 13:08.280
ruined, according to Grasse, nothing wrong, we just fix it if there's a bug, right, nothing

13:08.280 --> 13:15.000
shamed about it, so that's what, not the progress is, so what can we improve it, and then

13:15.000 --> 13:22.760
I will say that, we still can do better, especially one for all the CPU features, we can do a lot of

13:22.760 --> 13:27.960
better, so so far we, I forgot to bring the board, it can feel free to come down to our both,

13:27.960 --> 13:34.360
we already have the latest chip with the manufacturer sample board working, and then we still have

13:34.360 --> 13:39.800
found some major CPU features, it doesn't work, for similar hypervisor, actually we should

13:39.800 --> 13:45.240
turn all those software back in the FPGA stage, we haven't done all the replication, also,

13:45.240 --> 13:50.840
we have to, we have seen that, for some AI features, we still haven't got it working,

13:50.840 --> 13:57.000
we should turn that back in the FPGA replication at that time, and also upstream in at that time,

13:57.000 --> 14:01.560
then we pretty much get everything ready, but now we haven't got it, so there's still a lot

14:01.560 --> 14:11.480
of things to improve, and of course now we learn, SOC, you have to have consistency, we use as many

14:11.480 --> 14:18.600
as you can, so, for space mid, the K3 and K1, they have shared majority of the 60%, 70,

14:18.600 --> 14:26.600
they are the same IP, so all those upstreaming effort for K1, and we use most of the upstream

14:26.600 --> 14:36.440
effort from K1 anyway, so it's much more faster, so that's all for the day's talk, so,

14:36.440 --> 14:44.280
and this first ever, I think we have done better than now, including Qualcomm, and we continue

14:44.280 --> 14:50.680
to do better, for less chip, for the test server chip, and because it's less IP in both, and

14:50.680 --> 14:56.600
look forward to you guys' help, and especially we read it in RIS5B, a lot of enough resources.

14:58.840 --> 15:10.120
Thank you, thank you, thank you guys, yeah, you can go up to the GitHub and take our upstream

15:10.120 --> 15:15.720
patches, it's not done by me, done by the SOC guide, but they were great.

15:16.040 --> 15:21.400
Any questions?

15:21.400 --> 15:38.600
Yeah, hi, thank you for your talk, regarding upstreaming the data sheets of all these processes

15:38.600 --> 15:43.720
that are you interested in to go upstream, is it easily available for download?

15:44.280 --> 15:51.720
Yes, we are aware that one to have our upstream, we release all the data sheet, and it

15:51.720 --> 15:59.880
includes our schematic, all right, thank you, we have to do better than then now, I have no

15:59.880 --> 16:02.920
CPC, we should have no CPC, we should have no CPC, we should have no CPC, we should have no CPC.

16:07.160 --> 16:07.960
Other questions?

16:13.400 --> 16:21.240
So, do come up to the upstream canal log, we did pretty well, actually, the SOC guide did about 10

16:21.640 --> 16:27.960
patches before the chip comes back, where they walks a lot, I don't know.

16:30.120 --> 16:38.360
Not really a question, but more for our comments, so we have TH5020, we have SLC at scale

16:38.360 --> 16:43.560
way, and we have to maintain an all fork of the canal because of that, because of the lag of

16:43.640 --> 16:51.560
the upstreaming, so I think upstream batch may line, it's something that will benefit all of us.

16:55.960 --> 17:00.440
Definitely, we will upstream as much as we can, except the GPU, GPU, I can't deal with it now.

17:02.760 --> 17:10.040
I've no experience from working on a CPU, but working up from a device I've been made quite a

17:11.000 --> 17:17.160
good experience by implementing it in Qemu, and doing a software simulation, and then we had

17:18.360 --> 17:25.480
driver stuff and so on, pretty good shape when we had they have already. I don't know how it works

17:25.480 --> 17:30.840
out for this five, probably it's too complicated. I think it's rare complicated, because I think using

17:30.840 --> 17:36.440
simulator, same as I'm a VM guide, usually I do, similar to first, before I touch the hardware,

17:36.520 --> 17:43.400
otherwise it's too pretty, but we think that because so much peripherals, we have to code that,

17:43.400 --> 17:45.000
we have to code that, we have to code that, we have to code that, we have to code that, we have to

17:45.000 --> 17:52.920
do many math. So the only one we can emulate, majority of the CPU features, okay, I have a

17:52.920 --> 17:57.880
wiser, blah, blah, blah, blah, blah, blah, and AI, those things, okay, she's going to simulator,

17:57.880 --> 18:03.160
and they start upstreaming, and then the FPGA, and then also as well, so that we should do better

18:03.240 --> 18:07.960
seriously, but the rest of the code that the I.O. and need to do hardware, all right,

18:08.840 --> 18:14.120
CPU user would definitely even do better. Thank you. All right, out of time, thank you.

18:14.120 --> 18:16.120
Thank you very much, thank you.

