WEBVTT

00:00.000 --> 00:08.000
I think we can start, and people are still coming in.

00:08.000 --> 00:14.280
So basically this talk is going to be about how can we, as an open source community, be able

00:14.280 --> 00:20.200
to build a manoried and especially general purpose, you manoried, right, that can do kind

00:20.200 --> 00:21.200
of anything.

00:21.200 --> 00:25.800
I'm going to highlight draw, which is a project I'm building, and feel free to test

00:25.800 --> 00:26.800
it out.

00:26.800 --> 00:28.200
You don't even have to test it out.

00:28.200 --> 00:32.720
I hope that this talk inspire you enough to give it a try.

00:32.720 --> 00:37.720
So you may remember me from first-end last year, I came with a small robot that was able

00:37.720 --> 00:40.080
to cut itself, to do stuff.

00:40.080 --> 00:45.040
And the cool thing is that after this talk, there was the school founder of hugging

00:45.040 --> 00:49.680
faith that gave me a call, who was like, that's really impressive, that's really cool.

00:49.680 --> 00:54.480
You should join us to do human rights, and so that's why I'm doing this talk right now.

00:54.480 --> 00:58.480
That he was not really like enthusiastic about the robot coding himself, it was kind

00:58.480 --> 01:01.720
of like, yeah, maybe we should just do end-to-end learning, so that's kind of what we're

01:01.720 --> 01:03.320
going to talk today.

01:03.320 --> 01:06.000
And for me, I'm just a full-time open source developers.

01:06.000 --> 01:11.040
I have my own company, and you can hire me if you want, but yeah.

01:11.040 --> 01:19.160
So maybe just for start, why do we need humanoids, why can't I have all the things like

01:19.160 --> 01:20.160
all the type of robots?

01:20.160 --> 01:26.280
The thing is AI strangers award, if you probably realize with Chargipity, and Robotics

01:26.280 --> 01:29.320
have suffered for years from not being able to generalize, right?

01:29.320 --> 01:35.280
So if you have very good robots in factories, very capable, but they will never be able

01:35.280 --> 01:40.680
to generalize now, if you want him to do pancake, he'll never be able to do.

01:40.680 --> 01:46.400
But the thing is that we kind of know how to generalize with artificial intelligence when

01:46.400 --> 01:48.560
it's about mimicking humans, right?

01:48.560 --> 01:52.080
So Chargipity, you give him a lot of text, and then he starts speaking.

01:52.080 --> 01:55.040
We don't really know why, but he starts doing this.

01:55.040 --> 01:58.240
And so we kind of want to do the same thing with robots.

01:58.240 --> 02:02.640
And the way to do it with robots is we probably need to have a robot in the human shape, so

02:02.640 --> 02:06.160
that we can show him how to do stuff, and then he's going to be able to do so.

02:06.160 --> 02:09.800
It's kind of our hope, and nobody should work, but through this, we don't actually know

02:09.800 --> 02:15.240
how AI works, and so this is kind of like, how it is.

02:15.240 --> 02:20.720
So let's maybe define what's open source general purpose seminars.

02:20.720 --> 02:24.120
There's kind of like three big components that we need to pay attention.

02:24.120 --> 02:25.640
First one is AI, right?

02:25.640 --> 02:28.880
We need AI to be general purpose.

02:28.880 --> 02:34.000
We need hardware, we need open source hardware, hopefully, so that we can really think

02:34.000 --> 02:36.960
around as a hardware, and then we need software.

02:36.960 --> 02:41.040
And if we have those three components that is open source, it means that we have full control

02:41.080 --> 02:45.400
of the human rights, and we can repair it, and maintain it, which is actually the hardest thing,

02:45.400 --> 02:46.400
right?

02:46.400 --> 02:50.640
It's really easy to buy, it's really easy to build, it's very hard to maintain, and it's

02:50.640 --> 02:54.680
true for AI, true for software, it's true for hardware.

02:54.680 --> 03:00.120
Like if you go to this adventure, you will break things, if the AI is a hardware, and being

03:00.120 --> 03:04.720
able to just really point apart, it's going to go a long way.

03:04.720 --> 03:09.320
And if in the future we want to have children to live on Tatooine and be able to build

03:09.320 --> 03:17.880
this 3PO, this is the way to go, there's no way to do it with closed source hardware.

03:17.880 --> 03:19.280
But is it really attributable?

03:19.280 --> 03:23.640
Well, the thing is, there's a huge community of people doing open source humanoids now,

03:23.640 --> 03:27.200
it's kind of like a booming thing.

03:27.200 --> 03:33.400
I work with Hagenphase, they have all kind of a naxumbly kit from scratch, and they will

03:33.400 --> 03:38.840
also teach you how to print 3D parts, build it, connect several motors, and how to connect

03:38.880 --> 03:41.480
it to computers to be able to control it.

03:41.480 --> 03:44.520
And in this way, you can also build your own humanoids, right?

03:44.520 --> 03:48.520
You don't even have to follow someone, you can just take maybe a template and modify it.

03:48.520 --> 03:56.040
So I would deeply, if you into hardware and deeply calling you to try it out, I think

03:56.040 --> 04:01.160
it's there, it could be very fun, we can project there.

04:01.160 --> 04:06.960
And also, like yeah, it's a couple of movements, and then AI is also having a moment,

04:07.000 --> 04:11.280
especially open source AI, from I probably heard about deep seek,

04:11.280 --> 04:16.360
mistrol, Kuan Vial, Hagenphase, there's a lot of open source motors that is popping up,

04:16.360 --> 04:21.000
but also about the training algorithm behind it, because at the end of the,

04:21.000 --> 04:25.360
a huge chunk is going to be about training, and there's even more specialized,

04:25.360 --> 04:27.560
especially for robotics, that is popping up.

04:27.560 --> 04:32.160
So if you into AI, I would highly recommend to check those things out as well.

04:32.160 --> 04:39.600
And so this seems to be kind of a solve issue, or at least there's a huge community around it already.

04:39.600 --> 04:45.200
And so the missing part is kind of the software, from my opinion, right?

04:45.200 --> 04:47.920
And this is what I'm trying to solve with straw.

04:47.920 --> 04:51.440
The idea is that you have hardware on one side, you have AI on the other side,

04:51.440 --> 04:57.200
and you're really trying to make those two connect in a way that is very efficient and very fast.

04:57.200 --> 05:01.760
So if you only have AI, what you typically do is kind of request reply part,

05:01.760 --> 05:04.800
and where you have an API, you send a request to whatever you want,

05:04.800 --> 05:08.400
and then you get a response, it doesn't really work for robotics.

05:08.400 --> 05:11.440
What you want is kind of a pub sub type of system,

05:11.440 --> 05:16.160
where you would be able to publish a data from, for example, a camera or microphone,

05:16.160 --> 05:18.880
and then the AI is going to be able to subscribe to them,

05:18.880 --> 05:23.440
and then get the latest data, and then be able to inference or whatever,

05:23.440 --> 05:25.520
and reply back.

05:25.520 --> 05:27.440
And so that's what we're trying to do.

05:27.760 --> 05:30.720
It's actually not very complicated to do pubs, so if you into software,

05:32.240 --> 05:35.920
what is actually very hard to do is how do you distribute the code from

05:37.760 --> 05:42.240
on different type of hardware, different type of requirement from the software side.

05:42.240 --> 05:44.400
You might have code that is written in Python, if you say,

05:44.400 --> 05:47.280
you might have code written in Rust, if it's for robotics.

05:47.280 --> 05:52.400
And so how do you make it very easy and accessible for anyone to start on the weekend,

05:52.400 --> 05:57.200
and not have to install a million compilers and be able to figure out,

05:57.200 --> 06:01.440
if it's Linux, if it's Nvidia, if it's different type of hardware acceleration.

06:02.000 --> 06:06.000
And so what we try to do is to push everything on tip,

06:06.000 --> 06:12.080
which is Python, the Python package manager, and then make it just peep installable.

06:12.080 --> 06:14.720
So everything from there should be peep installable,

06:14.720 --> 06:17.520
and this is kind of the way we think to source a problem,

06:17.520 --> 06:22.160
people will be able to automatically detect which OS using,

06:22.160 --> 06:26.960
which architecture using, and hopefully we can even push it more with

06:27.040 --> 06:28.880
a which acceleration you're using as well.

06:28.880 --> 06:34.480
And we can also package Rust or CC++ code into Python as well,

06:34.480 --> 06:35.920
and then people can just use it from there.

06:37.120 --> 06:42.880
All right, so maybe I can just show you very quickly how do our works,

06:42.880 --> 06:45.760
what hard work is that you define a YAMO graph.

06:45.760 --> 06:48.000
So you have kind of a YAMO configuration,

06:48.000 --> 06:50.400
what I say, okay, I want a camera, I want a narrow model,

06:50.400 --> 06:51.600
and I want a visualization.

06:52.240 --> 06:54.400
This is the input, this is the output,

06:54.720 --> 06:59.680
so the camera is going to send a flow of images,

06:59.680 --> 07:02.800
and then we will have a narrow model on the vision,

07:02.800 --> 07:06.480
so it's just going to subscribe to it with an input keyword, right?

07:06.480 --> 07:09.920
And then you can have environment variables to specify it.

07:09.920 --> 07:12.320
And so I can show you quickly how it looks like,

07:12.320 --> 07:14.720
if you're using your own computer,

07:14.720 --> 07:19.200
so it's kind of the repository from Gidham,

07:19.200 --> 07:22.240
and in tight-door repository, you have an example space,

07:22.240 --> 07:24.960
where you will have all the examples that we've built,

07:24.960 --> 07:27.440
so one we're going to look at today is a VLM1.

07:27.440 --> 07:29.440
So VLM sent for vision language model,

07:29.440 --> 07:32.960
so one that's able to react to text and image at the same time.

07:32.960 --> 07:38.080
And so I'm going to show you one that is very easy,

07:38.080 --> 07:41.440
that I just show you, and it's just like three nodes,

07:41.440 --> 07:44.640
so three, three separate process, right?

07:44.640 --> 07:47.840
And each node is a separate process.

07:47.840 --> 07:49.360
I don't know if I mentioned this.

07:49.360 --> 07:51.600
And so I say, oh, here's three nodes.

07:51.600 --> 07:54.800
I have one output from the camera, which is an image.

07:54.800 --> 07:59.520
I have a ticker because I want to coin VL model,

07:59.520 --> 08:04.560
so coin VL is an AI model to do inference on the timer,

08:04.560 --> 08:06.720
and then we have a clock, right?

08:06.720 --> 08:10.960
So if I do this, I have to first install everything, right?

08:10.960 --> 08:14.000
My computer maybe he doesn't have to install everything.

08:14.000 --> 08:16.320
I already installed the right, it's very easy.

08:16.320 --> 08:18.880
And when I have to install nodes, I hadn't just say,

08:18.880 --> 08:24.240
do our build, and then he's going to figure out maybe too fast.

08:24.240 --> 08:26.720
So let's say I want to start from scratch.

08:26.720 --> 08:28.160
I want to create a new environment.

08:28.160 --> 08:32.080
Let's say I build a new environment randomly, 3.11.

08:32.080 --> 08:36.080
What it's going to do is I can use UV, which is a Python package manager,

08:36.080 --> 08:38.560
build also in Rust, so it's safe.

08:38.560 --> 08:41.120
And then I say, OK, I want to use this one.

08:41.120 --> 08:42.960
And now I want to install everything from this.

08:42.960 --> 08:47.840
And then we use UV, it's going to pull everything has to pull,

08:47.840 --> 08:50.560
and then it's going to install everything has to install,

08:50.560 --> 08:52.240
and then it's just going to work.

08:52.240 --> 08:55.200
And hopefully, finger crossed, it's just work from scratch.

08:55.200 --> 08:58.640
But the idea that everything from there is to confirm by the developers,

08:58.640 --> 09:01.600
so you kind of put all the specification within

09:01.600 --> 09:06.480
Pyproject, Pypy, until in a way that it should work for you on your computer.

09:06.480 --> 09:09.040
When I want to run it, I just have to say, do our run.

09:11.840 --> 09:14.720
OK, I think I missed the slide, but it's OK.

09:14.720 --> 09:17.360
And then it's going to be able to pop it.

09:17.360 --> 09:19.440
And now it's going to take a bit of time,

09:19.440 --> 09:23.280
because we have to download the model from the internet.

09:23.280 --> 09:24.720
We can also have local models.

09:24.720 --> 09:26.320
You can have adapters.

09:26.320 --> 09:29.120
That's what's been shown later.

09:29.120 --> 09:32.320
But one, OK, I'll see if they're my phone is fine.

09:32.320 --> 09:37.680
And then, say, maybe I just show you what I just showed you before,

09:37.680 --> 09:38.720
which is this.

09:38.720 --> 09:40.880
So Pypy install is just a Pypy install.

09:40.880 --> 09:41.840
There's no compiler.

09:41.840 --> 09:43.040
It's compile of free.

09:43.040 --> 09:46.000
The idea is that a lot of people, especially if you're starting up with the buildings,

09:46.000 --> 09:49.600
there's so many, so many things to learn that you might not want to have

09:49.600 --> 09:52.000
to install compilers and stuff.

09:52.000 --> 09:56.560
And when you run it, it's separate process, linked with chat memory.

09:56.560 --> 10:01.040
So that's a very big thing to have a performance system,

10:01.040 --> 10:02.560
especially for war weeks.

10:02.560 --> 10:04.560
And so now everything is running.

10:04.560 --> 10:09.920
And so now I will have the VR model, like it may be smaller.

10:10.880 --> 10:12.640
Can I zoom in?

10:12.640 --> 10:13.440
No.

10:13.440 --> 10:15.040
All right, I'm sorry.

10:15.040 --> 10:20.080
Just zoom, zoom, maybe?

10:20.080 --> 10:21.840
OK, zoom in, perfect.

10:21.840 --> 10:26.320
So you can see that the model is just describing.

10:26.320 --> 10:28.720
But he's able to have, is it too small, still?

10:28.720 --> 10:30.000
Yeah, maybe.

10:30.000 --> 10:31.120
OK, sorry.

10:31.120 --> 10:33.040
Sorry, sorry.

10:33.040 --> 10:35.360
Zoom in.

10:35.360 --> 10:37.360
Zoom in.

10:37.360 --> 10:40.480
OK, is it better?

10:40.480 --> 10:42.240
Is it OK?

10:42.240 --> 10:43.280
Oh, yeah.

10:43.280 --> 10:45.920
So now he's just describing what he's seen.

10:45.920 --> 10:49.840
And if he's seen me, he's going to say, hey, there's a guy looking at him.

10:49.840 --> 10:52.640
So I try to do peace sign.

10:52.640 --> 10:54.480
He's going to say, oh, you're making a peace sign.

10:54.480 --> 10:57.760
And so he's very descriptive, right?

10:57.760 --> 11:01.120
All right, he's carrying, OK, that's fine.

11:01.120 --> 11:03.440
And so it's very capable, right?

11:03.440 --> 11:06.720
Everything is running locally on my MacBook laptop.

11:06.800 --> 11:10.720
And now, like, OK, it's really cool, it's really nice.

11:10.720 --> 11:14.000
But it's actually very able to see stuff, right?

11:14.000 --> 11:16.800
If it is built, it's right with it, it's the built to see stuff.

11:16.800 --> 11:20.720
And so yeah, so that's super easy stuff.

11:20.720 --> 11:22.800
And so you can do cool demos.

11:22.800 --> 11:28.320
And so that's kind of like a very quick getting started with Dora.

11:28.320 --> 11:34.000
And yeah, so I just said general is important.

11:34.000 --> 11:36.480
And we tried it basically.

11:36.480 --> 11:39.360
And we said that, oh, if we show him, it's very fast.

11:39.360 --> 11:44.320
And compared to CCP, or HTTP, if you do API calls, especially when you

11:44.320 --> 11:45.680
have more and more data, right?

11:45.680 --> 11:50.720
So if you have HTTP or you have multiple cameras on the reachy,

11:50.720 --> 11:53.680
currently we're going to do a demo with this three.

11:53.680 --> 11:55.360
So it can very take some times.

11:55.360 --> 11:57.840
And if you really want to have a quick feedback,

11:57.840 --> 12:00.240
it's going to be very important.

12:00.240 --> 12:04.400
And also, we realize that not only is CPU kind of

12:04.400 --> 12:06.480
sharing very important, you also have the V-run,

12:06.480 --> 12:11.920
and you might want to do like Shed V-run between nodes, right?

12:11.920 --> 12:14.800
And this is also possible with Cuda, hopefully in the future,

12:14.800 --> 12:16.240
we can do it with more.

12:16.240 --> 12:18.400
And it's also safe a lot of time because you don't have to

12:18.400 --> 12:21.440
copy from CPU to CPU, CPU to GPU.

12:21.440 --> 12:22.720
So yeah, all right.

12:22.720 --> 12:26.000
And this is all possible because we're using Apache Arrow,

12:26.000 --> 12:29.040
which is a unified memory format.

12:29.040 --> 12:31.600
And basically they say, OK, an array, it's like this.

12:31.600 --> 12:33.040
F-16 arrays like this.

12:33.040 --> 12:35.680
And then you can use it from Python, C-C++, Rust.

12:35.680 --> 12:39.600
And there's already a lot of libraries that can use

12:39.600 --> 12:43.840
this unified memory format from new MPHPANA to PyTorch.

12:43.840 --> 12:48.480
And the good thing is that you can use it on CPU, but not on GPU.

12:48.480 --> 12:53.120
So the latest benchmark on GPU is actually using Arrow GPU.

12:53.120 --> 12:57.200
And so it's kind of not just for Cuda, but it will be able to

12:57.200 --> 13:02.000
scale on, for example, macOS or other types of platform.

13:02.000 --> 13:05.200
And you will be able to share CPU memory from Python to Rust,

13:05.200 --> 13:05.760
hopefully.

13:05.760 --> 13:08.640
So that's kind of the idea, there's a talk, I haven't, like,

13:08.640 --> 13:12.000
develop this, there's a talk in this for them from Arrow.

13:12.000 --> 13:14.400
If you want on Thunder, I think.

13:14.400 --> 13:15.600
Yeah.

13:15.600 --> 13:18.400
And so if you want, OK, so that's very cool.

13:18.400 --> 13:23.040
If you want, we can do a last demo, which is with the richy.

13:23.040 --> 13:26.080
And so what we need to do, if you want to make it even more capable,

13:26.160 --> 13:30.000
it's just to connect the output of the VLM into a robot,

13:30.000 --> 13:32.320
and then we have a moving robot, right?

13:32.320 --> 13:37.440
We just saw that it was very able to describe stuff in the world.

13:37.440 --> 13:41.520
And so yeah.

13:41.520 --> 13:48.640
And yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah.

13:48.640 --> 13:52.000
It's such an amazing part, yeah, okay.

13:52.000 --> 13:54.880
So yeah, so what you do is you take the VLM output,

13:54.880 --> 13:59.200
and then whatever he says, you're going to map it to an action.

13:59.200 --> 14:03.680
And so for example, if I try to say hello richy,

14:03.680 --> 14:05.600
can you do a hand wave?

14:05.600 --> 14:07.440
He's going to be able to hand wave me.

14:07.440 --> 14:11.040
And so basically, he's able to use this VLM description, right?

14:11.040 --> 14:12.480
And to do an action.

14:12.480 --> 14:14.720
And if I give him stuff, he's able to give stuff,

14:14.720 --> 14:16.640
he's able to go left, right?

14:16.640 --> 14:20.400
Depending on if we're sending on the left, on the right side of the camera.

14:20.400 --> 14:26.640
And also what we can do is have a speech to text model.

14:26.640 --> 14:31.120
And so what is going to be able is that now we have a fixed prompt,

14:31.120 --> 14:35.120
but we can also have a modified prompt, okay?

14:35.120 --> 14:39.920
So yeah, this is good.

14:39.920 --> 14:44.480
And so for example, we can get a napkin.

14:46.480 --> 14:47.680
Yeah, okay, that's fine.

14:48.640 --> 14:51.280
And so for example, with the speech to text,

14:51.280 --> 14:55.360
we just add a no, no, the separate process is going to run a whisper model.

14:55.360 --> 15:00.240
And then he's going to be able to translate whatever we say to a text,

15:00.240 --> 15:02.800
and then we're going to put the text into the prompt,

15:02.800 --> 15:05.760
and then you're going to be able to modify his VLM, right?

15:05.760 --> 15:08.800
Instead of just having us, he's going to be able to,

15:08.800 --> 15:11.200
I think he's kind of stuck because it's mine, okay?

15:11.200 --> 15:13.200
And then he's going to, yeah.

15:17.680 --> 15:19.680
Yeah, yeah, yeah, yeah, yeah.

15:23.360 --> 15:24.640
Yeah, that's kind of the idea.

15:24.640 --> 15:27.440
It's a very, very simple stuff, nothing to complicated,

15:27.440 --> 15:31.600
but you can be like, but wait, wait, he used to describe stuff.

15:31.600 --> 15:36.560
Now he's moving this kind of weird, the thing is like we train it, basically.

15:36.560 --> 15:39.600
And how we train it is that we collect a lot of data,

15:39.600 --> 15:42.400
and then we say, okay, so when I do this,

15:42.400 --> 15:44.640
you're going to wait, and you're going to say hand wave.

15:44.640 --> 15:47.840
And when I do this, you're going to take a napkin,

15:47.840 --> 15:51.600
and then he's going to be able, then we're going to have a training pipeline,

15:51.600 --> 15:56.240
that's going to be able to map the VLM output to those very keywords, right?

15:57.200 --> 16:00.000
But this is just one way to do training,

16:00.000 --> 16:02.080
and hopefully there's multiple ways.

16:02.080 --> 16:04.640
We don't want to fit to just one AI model, right?

16:04.640 --> 16:06.640
We don't want to fit to any type of AI model,

16:06.640 --> 16:09.920
you might have different type of stuff you want to do, right?

16:10.480 --> 16:14.320
And so yeah, but this is all possible with storage,

16:15.360 --> 16:20.080
we have a couple of training pipelines, and so there's kind of the idea.

16:23.360 --> 16:24.480
Okay, what's next?

16:24.480 --> 16:28.400
Then basically what we're going to do is add even more models.

16:28.400 --> 16:33.440
So as you can see now, the robot is only doing pretty fine stuff.

16:33.440 --> 16:39.040
And the thing is like, we currently don't know how to do AI infertile action,

16:39.120 --> 16:41.120
in an environment that he never seen.

16:41.120 --> 16:42.480
It's kind of like extremely hard.

16:42.480 --> 16:45.600
If we in the lab, and I know exactly where stuff is,

16:45.600 --> 16:48.480
and the light is always constant, I will be able to have

16:48.480 --> 16:51.040
predefined movement, and he's going to be able to grab stuff

16:51.040 --> 16:54.320
in a way that he's kind of generalistic, but we haven't figured out

16:54.320 --> 16:56.800
this on the completely unseen environment,

16:56.800 --> 16:58.560
so that's why I can't really show it.

16:58.560 --> 17:00.640
But we're working really hard, okay?

17:00.640 --> 17:03.600
And we're getting there, and hopefully next year,

17:03.600 --> 17:07.200
we could have something like this, and we also want more robots.

17:07.360 --> 17:09.600
This one is a rich sheet too.

17:09.600 --> 17:11.280
There's some part of this open source, and some part

17:11.280 --> 17:13.680
is not open source, and I know that we at first

17:13.680 --> 17:17.440
them, so hopefully we can have even more open source stuff.

17:17.440 --> 17:22.480
And we can directly from the paper installation,

17:22.480 --> 17:26.480
it just work on your computer as well, and add even more demos

17:26.480 --> 17:28.240
from speech to text to text to speech.

17:28.240 --> 17:29.440
I said text to speech.

17:29.440 --> 17:32.080
I can maybe show you later, text to speech demo.

17:32.080 --> 17:35.840
And we also have, yeah, latency optimization,

17:35.840 --> 17:38.560
and make draw more mature.

17:38.560 --> 17:40.000
So thanks for listening.

17:40.000 --> 17:42.160
I think I may be early a bit, yeah, okay.

17:42.160 --> 17:46.320
And so yeah, so this is first them 2024.

17:46.320 --> 17:48.880
It really helped us a lot.

17:48.880 --> 17:52.240
So we're very, super happy to be at first them 2025.

17:52.240 --> 17:54.560
And if you want, I can do just one last demo.

17:57.840 --> 18:04.160
And I can show you kind of a speech to speech demo, hopefully.

18:04.160 --> 18:08.400
And so it might, yeah, my screen is sharing, okay.

18:08.400 --> 18:16.720
And so, so the thing is, the thing is, we have kind of

18:16.720 --> 18:20.800
dumb down the robot to be able to move, so he's not able to

18:20.800 --> 18:24.160
to talk to us anymore, but we also trying to fix this out.

18:24.160 --> 18:25.680
If we, basically, if you talk to the road,

18:25.680 --> 18:28.000
it's just going to say wait or like, hand away.

18:28.000 --> 18:30.560
And so that's not really what we want.

18:30.560 --> 18:36.240
But I can show you with Quen Vial, if I say hello, how are you?

18:36.240 --> 18:51.600
And so basically, I can have, is working.

18:51.600 --> 18:52.320
What do you see?

18:52.320 --> 18:58.640
Stan's in front of a chokeboard with two written on it.

18:58.800 --> 19:02.720
A person stands in front of a chokeboard with two hundred and two written on it.

19:06.720 --> 19:10.720
Yeah, so I'm kind of, I wasn't really too prepared.

19:10.720 --> 19:13.040
But basically, the idea is that you can ask him stuff.

19:13.040 --> 19:14.880
And I think there's a lot of noise.

19:14.880 --> 19:17.840
And then he's going to be able to reply to you in a way that it's

19:17.840 --> 19:18.720
like, comprehensive.

19:18.720 --> 19:22.720
And you can put in a speech deck to speech node at the very end,

19:22.720 --> 19:24.320
so that he makes noises.

19:24.320 --> 19:27.120
But like, he's kind of like looping on himself now.

19:27.120 --> 19:29.440
And kind of not really working.

19:33.840 --> 19:35.120
OK.

19:35.120 --> 19:36.960
What is the men doing with his hands?

19:45.120 --> 19:47.440
Man's holding up two hearts with his fingers.

19:47.440 --> 19:50.720
So yes, that's kind of the idea.

19:50.720 --> 19:52.880
All right, that's all I have for today.

19:52.880 --> 19:54.320
Do you guys have any questions?

19:54.320 --> 19:56.960
Yes, we're standing in front of a chokeboard with a number two

19:57.920 --> 19:59.920
right now, right now, right now.

20:07.920 --> 20:11.920
So what is the hardware board that's processing this?

20:11.920 --> 20:13.600
Is it hardware, go?

20:13.600 --> 20:17.840
The board on this computer on.

20:17.840 --> 20:21.120
Also, you have a small computer on this,

20:21.120 --> 20:23.040
but just from the manufacturer itself.

20:23.040 --> 20:26.800
But the idea is that the software and the AI can run on any computers.

20:26.880 --> 20:29.360
Hopefully, if the computer is able to run it.

20:29.360 --> 20:33.440
So if it's Linux, Mac, Apple, or whatever,

20:33.440 --> 20:36.560
we're trying to make it run anywhere.

20:36.560 --> 20:38.640
Yeah, yeah, for the software sign.

20:38.640 --> 20:39.840
Yeah.

20:39.840 --> 20:41.440
Yeah.

20:41.440 --> 20:43.200
That's good for FB.

20:43.200 --> 20:46.800
Your project is trying to stop a similar problem.

20:46.800 --> 20:49.600
FB is trying to keep forward FB.

20:49.600 --> 20:53.520
Yes, so, so, so, so, so, so, so, so, so, so, so, so, so, so, so, so, so, so you

20:53.520 --> 21:00.080
question is about, is am I trying to kind of do the same thing as Andre did for phones

21:00.080 --> 21:04.880
on robots, kind of, right, I wouldn't like say this because, you know,

21:04.880 --> 21:09.040
but Andre is a bit pretty big, but the idea is like, at the very least,

21:09.040 --> 21:13.680
yeah, we we really want to make it, so that it's very easy for anyone

21:13.680 --> 21:19.760
to pipeline stuff and make it run and be be able to people install it.

21:19.760 --> 21:22.800
Hopefully, everything is open, so, so, so, so, so, so, so, so, so, so, so, so, so, so,

21:23.360 --> 21:25.760
If people want to have, like private closest thing, feel free

21:25.760 --> 21:29.120
that's totally fine, but yeah, that's kind of the idea, and everything is

21:29.680 --> 21:32.800
is, but the big thing is, like, you have to have good communication,

21:32.800 --> 21:39.780
everything is communicating with 잘� nochering, yeah.

21:40.640 --> 21:41.860
Should I?

21:41.880 --> 21:45.040
Do you have the list of open source open source, you've�rian projects?

21:45.040 --> 21:49.040
I don't actually, I could—I was, yeah, but I don't…

21:49.040 --> 21:52.640
There's probably, like, a potato for this probably, yeah.

21:52.640 --> 22:09.640
How did you come around with the performance differences with Ross, and especially how do you feel like this project is going to close the gap with something like Ross?

22:09.640 --> 22:17.640
Because I found Ross to be a very frustrating system to use, but I think a lot of people use it just because it has all these simulation tools and visualization tools.

22:18.640 --> 22:24.640
Completely a bit. So actually I put Ross to you here. It's kind of like if you use Ross to.

22:24.640 --> 22:30.640
So Ross to has really been built for C++ and if you use in Python it's very slow.

22:30.640 --> 22:37.640
So this is kind of a graph of like what the difference isn't in Python with like no, like from Vacy, Ross to with Dua.

22:37.640 --> 22:44.640
And so normally Dua should be very fast compared to Ross to in Python on CC++. They also use trial memory.

22:44.640 --> 22:46.640
So like the performance should be similar.

22:46.640 --> 22:53.640
And but they they have some form of like GPU stuff, but it's apparently it's very hard as well.

22:53.640 --> 22:59.640
The idea that we like I think Ross to was built in a way that it's okay for robotics for robotic people.

22:59.640 --> 23:02.640
And what I'm trying to do is just doing robotics for AI people.

23:02.640 --> 23:11.640
And yeah, but I think they have a very good stuff as well on many different fronts that I may never be able to to bridge.

23:11.640 --> 23:17.640
But the idea is like if we just can do a very good work on AI, I think it's good enough for me. So that's very good.

23:25.640 --> 23:29.640
So I was curious about how you found

23:30.640 --> 23:37.640
As curious how you found your interactions with like the rest of hugging face and the level of robot project.

23:37.640 --> 23:38.640
Do they like call people's work?

23:38.640 --> 23:44.640
No, we were we closely the thing is basically there's another guy working with me called Philip, but he's been.

23:44.640 --> 23:49.640
He has a family thing of good family thing, but he's not working at the moment.

23:49.640 --> 23:56.640
And so I'm just trying to to to make Dua progress, but hopefully in the future I can work more and more with hugging face have more and more integration.

23:56.640 --> 24:06.640
There's a lot of things that's very common. I know there's a lot of things that we also can do with Dua that we can't do with like hugging face ecosystem at the moment.

24:06.640 --> 24:14.640
As as it is in the way that we like for me, for example, more and more with move forward what we want to do is thrust on Python, right.

24:14.640 --> 24:20.640
So for example, the visualization that you saw it seemed to rerun visualization is fully.

24:20.640 --> 24:25.640
This is fully rust based. It's just packaged on Python, right.

24:25.640 --> 24:31.640
And what we want is Dua is to be able to bring all the rust ecosystem into AI people, right.

24:31.640 --> 24:34.640
In a way that's very efficient in some way as well.

24:34.640 --> 24:39.640
And so that's why we really want to integrate both, but like hugging face house or AI model or thing.

24:39.640 --> 24:42.640
So how do we make those two? I combine, you know.

24:42.640 --> 24:46.640
That's kind of what we want to do and we work really closely and hopefully we we can have more.

24:46.640 --> 24:51.640
But as a moment, I'm kind of like not enough time to do it or so that's kind of how it is.

24:51.640 --> 24:54.640
Yeah, we're going to reply. Okay.

24:54.640 --> 24:56.640
All right, so thank you very much for listening.

24:56.640 --> 24:59.640
Very happy to tell you all.

