WEBVTT

00:00.000 --> 00:13.000
Hello everyone, good evening. Our next speaker is Rex Ha, he is a research scientist at

00:13.000 --> 00:19.000
Homeroom and he is going to talk and let us know more about building local AI with full

00:19.000 --> 00:26.000
tech approach, give a warm round of applause.

00:26.000 --> 00:36.000
Okay, hello, first of them, I'm Rex, a researcher at Homeroom. We are by Zincapar.

00:36.000 --> 00:42.000
Today, I will take you on a journey through an anatomy of thinking machine, exploring how

00:42.000 --> 00:49.000
local interact with us in both hardware and software levels.

00:49.000 --> 00:55.000
A little bit about us, we are building complete environment for open source, local AI.

00:55.000 --> 01:04.000
Our focus is on running AI locally from serving our MS on servers to laptops to even

01:04.000 --> 01:11.000
edge device like mobile robots. We also train a few integrations of a real-time

01:11.000 --> 01:18.000
multi-moder models. We have learnt a few products like

01:18.000 --> 01:26.000
John, Chachibity, Affinity, that's the application that lets you run over any open source

01:26.000 --> 01:33.000
model on your machines. If you prefer to host it yourself, we have Cortex, a local

01:33.000 --> 01:43.000
AI platform. We also release a new model, a family of speech-language models that

01:43.000 --> 01:51.000
can respond to you in a real-time. Additionally, we are also experimenting with robotics simulations

01:51.000 --> 02:01.000
as well. We are basically everything for AI enthusiasts, small businesses and enterprise

02:01.000 --> 02:08.000
need to run a local LM. This brings to the full control for you to easily modify

02:08.000 --> 02:17.000
model knowledge, to adapt with your need and eliminate concern about data privacy.

02:17.000 --> 02:23.000
After focusing on local inference, we wanted to create our own model.

02:24.000 --> 02:30.000
When we look at the foundation of the model, it's too expensive. We can not

02:30.000 --> 02:37.000
complete with mixaxe like meta-open AI or Google, so we think of a different way

02:37.000 --> 02:46.000
that we utilize. How about you train your LM? I saw a few hands, but it's good.

02:46.000 --> 02:53.000
How about you in the multi-model, LM? Oh, I see no. Oh, I see over there are some people.

02:53.000 --> 02:59.000
Well, you guys are like really deep into the field. But then come to the hardware.

02:59.000 --> 03:07.000
Like, have you built your own desktop with Chibiu? Oh, a lot.

03:07.000 --> 03:12.000
And like, you build a Chibiu cluster? Oh,

03:13.000 --> 03:18.000
actually, I haven't built one. Like, like, my other college built it for me to train

03:18.000 --> 03:26.000
model. I know it really know about it. And last one, have you debug X100 cluster for inference

03:26.000 --> 03:34.000
and FI-2 name? Wow, you're rich. You're rich.

03:34.000 --> 03:40.000
Well, I have, I like, that one is like my pen pie of every night. I have to like be

03:40.000 --> 03:47.000
in stuff. Yeah. Okay. So, great. Some of you have, like, I saw a lot. So, you have, like, knowledge,

03:47.000 --> 03:53.000
technically one knowledge, but some dozen so I, I may have everyone to cast up.

03:53.000 --> 03:59.000
So, I will do a quick recap on LM sense of with, like, really high level of restrictions.

03:59.000 --> 04:08.000
So, yeah. Okay. So, what is LM? But attention, guys, researchers out there, please cover your eyes

04:08.000 --> 04:14.000
because I will, like, do a really simple, seriously over simplification for this.

04:14.000 --> 04:20.000
So, please cover your eyes. You already know it. Okay. First, we have a neural network

04:20.000 --> 04:27.000
layers, just for my layers. You see, like, me making up the way you digitalize our version

04:27.000 --> 04:34.000
of our brain. And at a perfect, not really, they're not perfect. They're not really perfect

04:35.000 --> 04:41.000
of, like, me making our brain. But do they work? Yes, probably. But can they handle a lot

04:41.000 --> 04:48.000
of data? Oh, hell, yeah. They, like, they have all of the internet of us. Right.

04:48.000 --> 04:54.000
And then, like, there's an encoding layers, which is, like, a component that can help

04:54.000 --> 05:02.000
to get the model to translate human language into a machine language. And then, we have, like,

05:02.000 --> 05:08.000
the decoder layers, which is translate back the after calculations and stuff. It

05:08.000 --> 05:14.000
will translate the machine language back to the human language. Yeah. But the main focus

05:14.000 --> 05:20.000
of this talk is on how this AI can run on our machine. So, the key thing you will need to know

05:20.000 --> 05:28.000
that these layers, just the stack of components, not let the machine can think. Okay.

05:28.000 --> 05:35.000
So, another piece of knowledge that you may need to know is that what exactly inside of the

05:35.000 --> 05:42.000
model. So, let's say, when you were, say, you were running a mixture of llama, what

05:42.000 --> 05:48.000
does that really mean? Like, there are different versions of mixture. You can see, like, this

05:48.000 --> 05:55.000
one is, like, 4 gigabytes of storage. And at one, it's, like, 24 gigabytes. So, what is the

05:55.000 --> 06:02.000
value difference here? The answer lies in the models parameters. The more parameters

06:02.000 --> 06:09.000
the means, like, the is larger and also is smarter. So, it comes with, like, we need a lot

06:09.000 --> 06:17.000
of, we need a lot more hardware. Another factor is that quantization model is not linearly.

06:17.000 --> 06:24.000
You can see, it's like a logarithmic row. So, scaling up the models and just sustain

06:24.000 --> 06:33.000
approach. But let the smaller models, smarter, is the way. So, and I, and believe me, this graph

06:33.000 --> 06:39.000
which was through in the last year. But this year, smaller models, like, we saw that

06:39.000 --> 06:47.000
two when, seven, be llama, three, they are all really good. And we can do use it for the

06:48.000 --> 06:56.000
basis. So, as expected, with a previous point, bigger models, require more memory. So,

06:56.000 --> 07:03.000
if we want to serve, let's say, eight users in the row. The V-ramlet, it increase even more.

07:03.000 --> 07:09.000
And if we want to run it with, like, flow 16 for the better performance, the GPU needed

07:09.000 --> 07:15.000
is going to be insane, right? Yeah. And this is why, and we are making so much money this

07:15.000 --> 07:21.000
way. And, yeah. And also, that's why, like, multiple companies right now, like Amazon,

07:21.000 --> 07:28.000
Google, or even some startups. They are trying to develop their own hardware to, supposedly,

07:28.000 --> 07:37.000
to influence the language model. So, to make it more practical, you want the most intelligent,

07:37.000 --> 07:44.000
I, I, to serve you at home. So, how many GPUs do you need to run 80, 70 million

07:44.000 --> 07:54.000
model on a consumer cart like, 30, 60 is, like, it cannot. But for, like, even, like, we

07:54.000 --> 08:02.000
100, you need, like, 16, but for, 100, you need, like, a row for, but those are not, like,

08:02.000 --> 08:08.000
simple machine. And you can see on the right here is a picture of my colleagues house

08:08.000 --> 08:14.000
before we move everything into the data center. If you can see, like, when we run

08:14.000 --> 08:21.000
ding a 70, be model, into our internal use, it could hit, like, his house up for a whole

08:21.000 --> 08:28.000
winter. Yeah. And, like, if you, an Apple fan, you might think that you can

08:28.000 --> 08:39.000
start getting max to run the models. Okay. You are stuck in to run the model. Yeah.

08:39.000 --> 08:44.000
Actually, this is, like, you can see, they actually using it to run the model. But,

08:44.000 --> 08:49.000
not just getting, please, no, don't, you do not run your house like that, like, please,

08:49.000 --> 08:56.000
don't, please, don't do this at home. Okay. But, but this view is lush enough, that company

08:56.000 --> 09:02.000
is, like, I said, before, like, start up congrock, they introduced new, new thing, new

09:02.000 --> 09:07.000
hardware, cone LPU, which is the hardware specifically to design for the last

09:07.000 --> 09:14.000
wish model inference. So, this makes the prediction progress insanely fast. So, all

09:14.000 --> 09:20.000
of those just to support a beefy model, that's now even more talkative. Recently, we have

09:21.000 --> 09:28.000
heard a lot about, like, open AI 01. Yesterday is, like, O3 and the tsunami of AI

09:28.000 --> 09:35.000
advancement in the west cone, beefsick. They are own, like, the success, like, lay on

09:35.000 --> 09:41.000
the technique called test time computes. So, which mean, essentially, is, like, when

09:41.000 --> 09:47.000
you let the model model thing longer, or let it, like, talk longer. And, hopefully, it

09:47.000 --> 09:57.000
becomes smarter. But, well, yeah, it's not a way the case. Okay. Okay. But, the

09:57.000 --> 10:03.000
mentality away here is that, we, we all know that techniques, these techniques will

10:03.000 --> 10:11.000
be more viral, and also it consume even more GPU time. So, this brings to another aspect.

10:12.000 --> 10:18.000
Okay. So, GPUs now would, like, not only need more size, like, bigger size,

10:18.000 --> 10:24.000
they also need to calculate things faster. So, you have probably heard of

10:24.000 --> 10:29.000
such a flop on the news when every day, or modes there and GPU, right?

10:29.000 --> 10:36.000
These measure how many floats by operations are up a machine can perform. So,

10:36.000 --> 10:43.000
more importantly, we found that, when we testing on our, when we develop the app,

10:43.000 --> 10:49.000
we found that the memory bandwidth is crucial for faster inference. Inside

10:49.000 --> 10:55.000
of the GPU, a lot of data change flows happen between SRAM and HBM. So, increasing

10:55.000 --> 11:03.000
of memory bandwidth boost data. And, yeah, we have another option, if we don't want to

11:03.000 --> 11:09.000
choose NVIDIA GPU, you don't want NVIDIA to get rich. So, you can switch to

11:09.000 --> 11:17.000
Snapdragon chips, which extend the AMD include NBUs. Okay. So, next up,

11:17.000 --> 11:25.000
people want to move to the desktop workstation. And, the car is more

11:26.000 --> 11:32.000
about, so, which is communicate with GPUs and all the components. Here,

11:32.000 --> 11:37.000
you can put a multiple things into onto the motherboard and how it doesn't

11:37.000 --> 11:43.000
break. These components are typically less and provide more computing

11:43.000 --> 11:50.000
power. And, the next level, we scale up to servers. They make a better

11:50.000 --> 11:57.000
components offering, we can run it 24 hours per day. And, each of the

11:57.000 --> 12:04.000
unit is designed to be modular. So, we have multiple CPU and multiple GPUs

12:04.000 --> 12:11.000
connected together. This could give you, okay. So, at the right side,

12:11.000 --> 12:17.000
could give you a roughly ideal of what needs the server. This is a really old

12:17.000 --> 12:24.000
architecture, up to this day. But, yeah, you can look at it as a reference.

12:24.000 --> 12:29.000
So, when you have a lot of money, you don't want, you don't have to use it.

12:29.000 --> 12:35.000
You can buy multiple server and connect it into a data center

12:35.000 --> 12:44.000
racks. This will connect it through infinity and buildings. Okay. So, those are all

12:44.000 --> 12:52.000
of the intuition I want to give about hardware. So, let's come to the main of

12:52.000 --> 12:57.000
the talk. So, this is the talk is about like the Antonomy of machine

12:57.000 --> 13:03.000
doing inference. Okay. So, I just want to say something. We have learned

13:03.000 --> 13:08.000
over the past 20 months of starting the project. As of this week,

13:08.000 --> 13:15.000
we have been a lot of it over 2,000,000 times. And, well, we have gotten at least

13:15.000 --> 13:22.000
86,000 cross reports. And, this is only count because we enable the crossing

13:22.000 --> 13:27.000
part over the past three months. So, over like, what are the half year, we have

13:27.000 --> 13:33.000
a lot of users across like crazy out there. So, yeah, you can see here,

13:33.000 --> 13:39.000
AI is still in this infancy. We are solely in the game. If you hear,

13:39.000 --> 13:45.000
I'm talking about AI, I'm inferring things today. It's still really early.

13:45.000 --> 13:51.000
And, I met that 30 years from now. AI will be much more seamlessly integrated

13:51.000 --> 13:58.000
into our machine. And, your laptop will feel more human light than ever. So, yeah.

13:58.000 --> 14:05.000
So, let's talk about how machine with single CPU inference requests.

14:05.000 --> 14:10.000
When you ask a model to think, what happened in the hardware's field?

14:10.000 --> 14:18.000
Okay. Let's pretend that I'm a machine. And, I want you to ask me,

14:18.000 --> 14:24.000
what is Jan? Then, if I want to generate the tokens like Jan,

14:24.000 --> 14:30.000
he says something something. The first thing first, I asked a CPU.

14:30.000 --> 14:35.000
Let's make the function code to retrieve the model weight from this,

14:35.000 --> 14:40.000
which is like HDD or RSD. I have to pull another model.

14:40.000 --> 14:44.000
And, let's say, it's like foggy by a storage size of the model.

14:44.000 --> 14:48.000
So, I have pulled it into the CPU.

14:48.000 --> 14:54.000
Yeah. On the other side, I want my simple step diagram.

14:54.000 --> 14:59.000
So, I draw it to simplify everything to for people to understand,

14:59.000 --> 15:03.000
people that are inside of those wire and chips.

15:03.000 --> 15:09.000
Okay. Then, after all of the model weight load from this,

15:09.000 --> 15:14.000
let's assume that we will have a latch in our V-Dram GPU.

15:14.000 --> 15:20.000
So, I want every single layer of the model onto the GPU.

15:20.000 --> 15:29.000
And then, when, and then now it's like, your problem right now is coming through over.

15:29.000 --> 15:41.000
Okay. At first, the prompt will be like tokenize our simple way to translate into the machine language.

15:42.000 --> 15:46.000
And then, it will execute through transformer layers.

15:46.000 --> 15:50.000
Usually, it's in parallel in the GPU and then,

15:50.000 --> 15:55.000
generate a possible predictions.

15:55.000 --> 16:01.000
And the final prediction, the next squad, is met by the CPU,

16:01.000 --> 16:04.000
which, as a look-up table to find the best,

16:04.000 --> 16:09.000
then the most suitable word for the given context.

16:09.000 --> 16:13.000
A response is then returned to the user.

16:13.000 --> 16:20.000
So, well, this is a journey of how a simple question from a user to the machine

16:20.000 --> 16:24.000
through everything, it thinks, and it will respond.

16:24.000 --> 16:29.000
For the next token, it will continue the process again and again,

16:29.000 --> 16:33.000
until it reached the end of the answer.

16:34.000 --> 16:37.000
Well, that was an easy one, right?

16:37.000 --> 16:43.000
But what if you have like a multiple GPUs and one to use a bigger model,

16:43.000 --> 16:47.000
which doesn't fit into the single GPU, right?

16:47.000 --> 16:52.000
Same first step, we also need the model to load into the happiest.

16:52.000 --> 16:56.000
From the happiest into the CPU.

16:56.000 --> 17:00.000
And the CPU wishes for the model weight into the GPUs.

17:00.000 --> 17:03.000
And here is the interesting part.

17:03.000 --> 17:08.000
The model will fit up, and yes, and yes,

17:08.000 --> 17:11.000
it will distribute the layers into GPUs.

17:11.000 --> 17:16.000
So, like the CPU right now is like GPU 1.

17:16.000 --> 17:18.000
Do you got this one?

17:18.000 --> 17:20.000
Do you got this layers, GPU 2?

17:20.000 --> 17:22.000
You got this layer.

17:22.000 --> 17:25.000
Depends on the algorithm, do you want to use these

17:25.000 --> 17:28.000
are like the two main common algorithm?

17:29.000 --> 17:36.000
Which we have like pipeline parallelism, which is like we split the model horizontally

17:36.000 --> 17:38.000
between GPU 1 and GPU 2.

17:38.000 --> 17:41.000
So, it's your like, it's your own half like 16 of layers,

17:41.000 --> 17:43.000
and other one have 16.

17:43.000 --> 17:47.000
And the other one is like TensorFlow parallelism,

17:47.000 --> 17:50.000
which split the model vertically.

17:50.000 --> 17:55.000
So, layer will be distributed in GPU 1 and 2 accordingly.

17:56.000 --> 18:00.000
So, everything works nearly the same as in the single GPU right.

18:00.000 --> 18:03.000
So, the model is now nearly the same.

18:03.000 --> 18:08.000
The model is now encoded into the machine language with some layers

18:08.000 --> 18:11.000
running through the GPU 1 and some like,

18:11.000 --> 18:14.000
when we run through GPU 2.

18:14.000 --> 18:17.000
And after processing the results are combined.

18:17.000 --> 18:20.000
And the final prediction is met.

18:20.000 --> 18:22.000
It's like nearly 100%.

18:23.000 --> 18:25.000
Like, it's 90% of similar to the,

18:25.000 --> 18:28.000
we got a one single GPU.

18:29.000 --> 18:33.000
And finally, when we stack up the GPU stacks together,

18:33.000 --> 18:36.000
we become a multi-note.

18:40.000 --> 18:41.000
Okay.

18:41.000 --> 18:45.000
So, if you have a multi-note scaling of HPC data center.

18:45.000 --> 18:46.000
So, okay.

18:46.000 --> 18:49.000
To be honest, like, as I said before,

18:50.000 --> 18:54.000
I have not, like, fully understanding about what this is.

18:54.000 --> 18:58.000
Because, like, my hackware colleague will do all of this.

18:58.000 --> 19:04.000
But the main thing we want to check here is that all of the other companies

19:04.000 --> 19:07.000
out there, like, open AI, like Google.

19:07.000 --> 19:09.000
It holds, they hold,

19:09.000 --> 19:12.000
charge it with the audio menu on, like,

19:12.000 --> 19:17.000
for our users on this kind of setting.

19:18.000 --> 19:21.000
So, let's come with me on this.

19:21.000 --> 19:25.000
So, let's start, like, at our former beginning,

19:25.000 --> 19:27.000
we always have to load now.

19:27.000 --> 19:32.000
I will talk about more advanced techniques that,

19:32.000 --> 19:35.000
right now, the companies out there,

19:35.000 --> 19:38.000
we've used to serve their models.

19:38.000 --> 19:41.000
First thing first, we have the KVCatch.

19:41.000 --> 19:45.000
Since the code is usually autoregressive,

19:46.000 --> 19:49.000
which means each of the newly predicted tokens

19:49.000 --> 19:52.000
only depends on the previous token.

19:52.000 --> 19:54.000
So, at each generation step,

19:54.000 --> 19:59.000
we are recanculating every attention for the previous tokens.

19:59.000 --> 20:03.000
Which is not the most efficient way, right?

20:03.000 --> 20:05.000
If we calculate it one,

20:05.000 --> 20:07.000
we can cache all of those,

20:07.000 --> 20:11.000
and then just use those to calculate the next token.

20:12.000 --> 20:18.000
That's a really high level of what is KVCatch working.

20:18.000 --> 20:23.000
This is the common text that anthropic open AI

20:23.000 --> 20:26.000
and also Google used to serve the user,

20:26.000 --> 20:30.000
because this technique can reduce, like,

20:30.000 --> 20:33.000
90% of the inference time,

20:33.000 --> 20:39.000
if we end just easy way to enable this one.

20:39.000 --> 20:42.000
Another technique is that speculative decoding,

20:42.000 --> 20:46.000
where we use a last main model with,

20:46.000 --> 20:48.000
like, the best performance we can have,

20:48.000 --> 20:51.000
alongside with a smaller model.

20:51.000 --> 20:54.000
The reason for a smaller model,

20:54.000 --> 20:56.000
a small model, right here,

20:56.000 --> 21:00.000
is that it can handle simple, repetitive tasks,

21:00.000 --> 21:03.000
and quickly generate token inspiration,

21:03.000 --> 21:08.000
and it could significantly reduce the inference time.

21:10.000 --> 21:14.000
Okay, so this is the comparison from open AI,

21:14.000 --> 21:17.000
the speculating decoding.

21:17.000 --> 21:20.000
The right size, you can say, is really fast.

21:20.000 --> 21:22.000
But, like, the left side,

21:22.000 --> 21:25.000
if you only use the big model,

21:25.000 --> 21:31.000
it takes ages to just generate the answer.

21:31.000 --> 21:34.000
Okay, it's finished, thank you.

21:35.000 --> 21:38.000
Okay, so this is, you can see,

21:38.000 --> 21:40.000
like, given Android Compatty,

21:40.000 --> 21:42.000
he's, like, a top lid of,

21:42.000 --> 21:46.000
X, the X, open AI lid.

21:46.000 --> 21:50.000
So, he took about a beauty of this technique.

21:50.000 --> 21:53.000
Okay, so, after, like,

21:53.000 --> 21:56.000
looking down all of the steps,

21:56.000 --> 21:59.000
like, how the normal text base,

21:59.000 --> 22:02.000
generate if AI work within our machine.

22:02.000 --> 22:05.000
Now, we, if we want to add one more step,

22:05.000 --> 22:07.000
like a sci-fi movies,

22:07.000 --> 22:10.000
where we can command the hour machine,

22:10.000 --> 22:12.000
then it will understand with our voice,

22:12.000 --> 22:15.000
and it will respond back.

22:15.000 --> 22:17.000
What would it look like?

22:17.000 --> 22:21.000
So, without in my, we started project isigo.

22:25.000 --> 22:29.000
Oh, I don't think the audio go through.

22:29.000 --> 22:39.000
Uh, can we somehow get the audio hidden?

22:39.000 --> 22:42.000
A little bit of technical help.

22:51.000 --> 22:53.000
Maybe I will just, like,

23:00.000 --> 23:04.000
hey, this is itigo.

23:04.000 --> 23:06.000
The local real-time voice AI,

23:06.000 --> 23:09.000
it can understand human speech and talk back.

23:09.000 --> 23:11.000
And with the latest checkpoint,

23:11.000 --> 23:13.000
you can run this little guy on your device,

23:13.000 --> 23:15.000
and it's fully open soft.

23:15.000 --> 23:17.000
But your scene here is the demo version of it,

23:17.000 --> 23:19.000
and it's running on just one 3090.

23:19.000 --> 23:22.000
So, let's give it a try.

23:22.000 --> 23:27.000
Hey, it'sigo, what's the meaning of strawberry in Japanese?

23:27.000 --> 23:30.000
Strawberry in Japanese is called itigo.

23:30.000 --> 23:34.000
It's a sweet and juiciness that's very popular in Japan.

23:34.000 --> 23:36.000
Here we go.

23:36.000 --> 23:41.000
Okay, so all of this is just like running on a single 3090 machine,

23:41.000 --> 23:43.000
and it reads, respond in real-time.

23:43.000 --> 23:46.000
And like, this is the only, like, a project of three months or less,

23:46.000 --> 23:51.000
is actually thinking around and trying new stuff.

23:51.000 --> 23:55.000
Okay, I will go quickly, go through the intuition of how it works

23:55.000 --> 23:57.000
to speed up the top-congeneration.

23:57.000 --> 24:02.000
So, that's why we can see this general thing is really fast.

24:02.000 --> 24:06.000
And this is an interesting fact,

24:06.000 --> 24:10.000
because like, we train our isigo entirely on text data,

24:10.000 --> 24:12.000
not a single audio file.

24:12.000 --> 24:14.000
Yes, you, you hold it right.

24:14.000 --> 24:17.000
We're training a speech understanding model

24:17.000 --> 24:20.000
without any audio files.

24:20.000 --> 24:22.000
How, how it goes.

24:23.000 --> 24:29.000
It actually, like, we, we did it with the simple model,

24:29.000 --> 24:32.000
like, transform model, transform model,

24:32.000 --> 24:36.000
which ended with, like, the input essay, text English,

24:36.000 --> 24:38.000
Germany, our French.

24:38.000 --> 24:42.000
We input it both, all of them into the model,

24:42.000 --> 24:45.000
and let it predict the speech tokens.

24:45.000 --> 24:46.000
Yeah.

24:46.000 --> 24:49.000
It's just like the beauty of the transform model,

24:49.000 --> 24:53.000
because it's working as a translator between human language

24:53.000 --> 24:56.000
to the speech language.

24:56.000 --> 24:59.000
So, with the spirit of open source,

24:59.000 --> 25:01.000
we met everything available,

25:01.000 --> 25:05.000
even our training code, the data set, the model ways,

25:05.000 --> 25:08.000
and even our discussions,

25:08.000 --> 25:11.000
even some fail attempts and papers.

25:11.000 --> 25:15.000
And so, if you want to see us fail in real time,

25:15.000 --> 25:18.000
you can, like, look into our discussion,

25:18.000 --> 25:20.000
and watch the live stream.

25:20.000 --> 25:22.000
I will be there, and, like,

25:22.000 --> 25:26.000
sometimes I will say something bad over there.

25:26.000 --> 25:31.000
Okay, so, okay, the last thing I want to discuss today

25:31.000 --> 25:35.000
is, like, how to bring the machine into a real world.

25:35.000 --> 25:37.000
We have talked about the live cycle,

25:37.000 --> 25:41.000
AI within a machine, but what if we could put the AI

25:41.000 --> 25:45.000
to interact with a real real environment?

25:45.000 --> 25:47.000
So, before it can do that,

25:47.000 --> 25:50.000
we need to train it on a simulated environment.

25:50.000 --> 25:53.000
So, unlike human baby,

25:53.000 --> 25:57.000
we would take, like, 10 or 12 months to start working.

25:57.000 --> 25:59.000
We can simulate thousands,

25:59.000 --> 26:02.000
or even 100,000 robots in parallel,

26:02.000 --> 26:05.000
to help them to learn the tasks,

26:05.000 --> 26:10.000
like, working in multiple environments.

26:10.000 --> 26:13.000
Okay, so here's not the machine fan

26:13.000 --> 26:15.000
from head up, and video,

26:15.000 --> 26:18.000
but the department said that the simulation

26:18.000 --> 26:20.000
has to write the robots learning curve

26:20.000 --> 26:23.000
by 10,000 times compared to real life.

26:23.000 --> 26:26.000
Yeah, it's insane, right?

26:26.000 --> 26:30.000
So, this is, hopefully,

26:30.000 --> 26:32.000
in new guys can hear it.

26:32.000 --> 26:35.000
This is a really, like, initial attempt

26:35.000 --> 26:40.000
to control a robot using an easy-go model.

26:41.000 --> 26:43.000
So, this is a voice,

26:43.000 --> 26:46.000
we will put it into the model.

26:48.000 --> 26:51.000
The place is not going to.

26:51.000 --> 26:54.000
So, let me just play the video.

26:54.000 --> 26:58.000
So, right here, we used our voice

26:58.000 --> 27:01.000
to talk to the robots.

27:01.000 --> 27:05.000
We said that it's a left 40 degree.

27:06.000 --> 27:09.000
And then we go straight ahead.

27:13.000 --> 27:15.000
I don't know if you guys can hear the voice from,

27:15.000 --> 27:18.000
but it's like, to left 40 degrees,

27:18.000 --> 27:20.000
and then we'll go straight ahead.

27:20.000 --> 27:22.000
So, in the model,

27:22.000 --> 27:26.000
the robot will execute the action.

27:26.000 --> 27:29.000
So, the behind the scene of all of those is that,

27:29.000 --> 27:32.000
the easy-go is just just like,

27:32.000 --> 27:35.000
it will use the voice,

27:35.000 --> 27:38.000
convert it into the JSON format right here.

27:38.000 --> 27:41.000
And now, which is the represent of,

27:41.000 --> 27:44.000
it's to this represent of the action.

27:44.000 --> 27:47.000
So, every that I then stream into

27:47.000 --> 27:49.000
the robot in simulated environment

27:49.000 --> 27:51.000
and the same process will apply

27:51.000 --> 27:55.000
when it moved to the real world later.

27:55.000 --> 27:57.000
So, what's next?

27:57.000 --> 28:01.000
After the simulation comes with the physical,

28:02.000 --> 28:06.000
the seem to real is the term used to just like,

28:06.000 --> 28:09.000
just for everything the robot has learned

28:09.000 --> 28:12.000
from simulated virtual, real world.

28:12.000 --> 28:14.000
And you can see here, like,

28:14.000 --> 28:16.000
at home real,

28:16.000 --> 28:19.000
we just have just bought a real robot to test it out.

28:19.000 --> 28:21.000
So, everything is still ahead.

28:21.000 --> 28:26.000
And we update our process,

28:26.000 --> 28:28.000
publicly even if I fail,

28:28.000 --> 28:32.000
so, if you guys can see it through our discord.

28:32.000 --> 28:34.000
And yeah, that's it.

28:34.000 --> 28:35.000
Another talk.

28:35.000 --> 28:37.000
Hopefully, it gives you a glimpse of

28:37.000 --> 28:39.000
the current state of AI right now.

28:39.000 --> 28:40.000
Thank you for listening.

28:41.000 --> 28:42.000
Thank you.

28:42.000 --> 28:45.000
Thank you.