WEBVTT

00:00.000 --> 00:14.000
Hi everyone, my name is Sun, and today I'm going to talk about a fun project that I do like

00:14.000 --> 00:18.000
a weekend project that I stand on.

00:18.000 --> 00:24.000
Start brings Lamar Cp into the web using web assembly.

00:24.000 --> 00:30.000
So my talk will be divided into this point where I first introduce myself.

00:30.000 --> 00:36.000
Why I do this quickly more and then show you how it works and some children's

00:36.000 --> 00:40.000
that I placed on the web and my plan for the future.

00:40.000 --> 00:46.000
So my name is Sun, I'm a software engineer, a talking size, I draw a talking size that I

00:46.000 --> 00:48.000
find myself very new.

00:48.000 --> 00:54.000
And I'm one of the Lama Cp, an active engineer.

00:54.000 --> 01:02.000
Here's my shithap is NGXN, and my slogan is doing AI for fun, not for both of you.

01:02.000 --> 01:06.000
I actually had this way before, I draw an actual hacking phase.

01:06.000 --> 01:15.000
Okay, so some of my work on Lamar Cp is the first, like big things that I did for

01:15.000 --> 01:20.000
project was the chat template things, and then I had a refactorring for the

01:20.000 --> 01:24.000
support for low-rank adaptation.

01:24.000 --> 01:28.000
I'm also one of the core mentioned on Lamar server.

01:28.000 --> 01:36.000
That's now we are able to bring it into something called hardware inference endpoint.

01:36.000 --> 01:46.000
I'm one of my very, very big tasks that I'm actively doing is to reflect the

01:46.000 --> 01:54.000
monthly model, especially the vision part, and it's still going nowhere, but it's a big thing.

01:54.000 --> 02:02.000
And one of the things I want really to do is to add the web shithap you back into

02:02.000 --> 02:04.000
which is your main.

02:04.000 --> 02:08.000
And if you go to shithap, here's me by the way.

02:08.000 --> 02:13.000
Okay, so, but Sun, this is a serious thing.

02:13.000 --> 02:17.000
Why don't we do like this fun thing?

02:17.000 --> 02:21.000
So let me show you what do I mean by fun.

02:21.000 --> 02:26.000
Yeah, so you all know this guy from like 20 years ago.

02:26.000 --> 02:33.000
And now, what if I make it just a little bit smarter?

02:33.000 --> 02:35.000
Yeah.

02:35.000 --> 02:38.000
So hopefully this demo works.

02:38.000 --> 02:40.000
So it's a creepy.

02:40.000 --> 02:43.000
Yeah, nice.

02:43.000 --> 02:47.000
So the fun thing is this one directly on browser.

02:47.000 --> 02:51.000
The model is already downloaded into a local storage.

02:51.000 --> 02:57.000
It's not really a local storage, but yeah, it's being cut in the browser.

02:57.000 --> 03:01.000
And the inference is done using when assembly.

03:01.000 --> 03:05.000
Yeah, so that's the fun thing.

03:05.000 --> 03:11.000
And the high this demo is a project that I made in my free time.

03:11.000 --> 03:13.000
It's gone web lemma.

03:13.000 --> 03:14.000
Yeah.

03:14.000 --> 03:17.000
So this machine has to get started.

03:17.000 --> 03:20.000
It might be interesting.

03:20.000 --> 03:27.000
Yeah, so why I create in the first place, so long story, so long time ago,

03:27.000 --> 03:30.000
when I haven't joined, I haven't faced yet.

03:30.000 --> 03:34.000
I was a GPU for like very poor, not just poor.

03:34.000 --> 03:40.000
And then I also want to push lemma to delete me.

03:40.000 --> 03:45.000
Actually, it's the limit of my hardware that I had at the time.

03:45.000 --> 03:49.000
And so I was very inspired by whisper, CPP.

03:49.000 --> 03:53.000
The web SMD version that you can run directly on browser.

03:53.000 --> 03:55.000
And also it's just so fun.

03:55.000 --> 03:56.000
Not for fun.

03:56.000 --> 04:01.000
To make my voice, not to compete with like production ready for work out there.

04:01.000 --> 04:05.000
Like over internet or web NLM.

04:05.000 --> 04:06.000
Okay.

04:06.000 --> 04:08.000
So what is your goal?

04:08.000 --> 04:11.000
The first goal is firstly to create like a wrapper.

04:11.000 --> 04:12.000
Cheers.

04:12.000 --> 04:13.000
TypeScript libraries.

04:13.000 --> 04:16.000
That you can do a web developer.

04:16.000 --> 04:18.000
You can use this in your project.

04:18.000 --> 04:21.000
Just like running just one command, NPM install.

04:21.000 --> 04:23.000
It's strongly typed.

04:23.000 --> 04:24.000
TypeScript.

04:24.000 --> 04:25.000
And zero dependency.

04:25.000 --> 04:26.000
It's amazing.

04:26.000 --> 04:31.000
So let's show you what that means.

04:31.000 --> 04:34.000
So this little demo here that you show here.

04:34.000 --> 04:37.000
That you see here.

04:37.000 --> 04:39.000
How I made it.

04:39.000 --> 04:43.000
Under the hood is that I import this.

04:43.000 --> 04:46.000
By the way, this reaction is project.

04:46.000 --> 04:48.000
So I import it.

04:48.000 --> 04:49.000
It's a library.

04:49.000 --> 04:51.000
And then I download this model.

04:51.000 --> 04:52.000
I defy here.

04:52.000 --> 04:54.000
I just bought the model from HangiFest.

04:54.000 --> 04:57.000
Then I create a chat completion.

04:57.000 --> 05:00.000
I had a system message that say, hey, you know,

05:00.000 --> 05:03.000
we have a clip here.

05:03.000 --> 05:05.000
Okay.

05:05.000 --> 05:06.000
So that is.

05:06.000 --> 05:09.000
I also have another demo that you can see.

05:09.000 --> 05:14.000
Right on the digital page is here.

05:14.000 --> 05:16.000
And the demo is more functional.

05:16.000 --> 05:19.000
You have like a list of models that you can try.

05:19.000 --> 05:21.000
And then you can chat with it.

05:21.000 --> 05:22.000
Yeah.

05:22.000 --> 05:23.000
This is a demo.

05:23.000 --> 05:25.000
So yeah.

05:26.000 --> 05:29.000
So now the complicated path.

05:29.000 --> 05:33.000
It's a technical, like, deeply technical path.

05:33.000 --> 05:38.000
It might not be very interesting, but yeah, very unique.

05:38.000 --> 05:39.000
Okay.

05:39.000 --> 05:43.000
So that's one thing called EN script.

05:43.000 --> 05:50.000
And it's the thing that allows user to compile with

05:50.000 --> 05:55.000
or any CPP project into WebAssembly.

05:55.000 --> 05:57.000
So that should be simple.

05:57.000 --> 06:00.000
I just take EN script and then compile.

06:00.000 --> 06:01.000
Right.

06:01.000 --> 06:04.000
Turns out not that straightforward.

06:04.000 --> 06:09.000
Then I had many challenges on the way.

06:09.000 --> 06:13.000
But there's that form in challenge.

06:13.000 --> 06:14.000
That I find.

06:14.000 --> 06:16.000
And I wanted to share with you today.

06:16.000 --> 06:17.000
Yeah.

06:17.000 --> 06:21.000
So how it's work is.

06:21.000 --> 06:27.000
Firstly, from perspective, WebAssembly, everything is nice.

06:27.000 --> 06:31.000
Not a string or like a number.

06:31.000 --> 06:35.000
So what I end up doing is the first play.

06:35.000 --> 06:40.000
Is that I add a small wrapper, a team wrapper.

06:40.000 --> 06:46.000
That access JSON from JS from JS work.

06:46.000 --> 06:50.000
And then I translate it into API code, native API code,

06:50.000 --> 06:55.000
CPP, I say API code to the Lama CPP library.

06:55.000 --> 06:57.000
It's work pretty well.

06:57.000 --> 07:02.000
But then I realize that it's a lot of work because I have to,

07:02.000 --> 07:05.000
like, parse the JSON in the JS.

07:05.000 --> 07:08.000
So it's a CPP work.

07:08.000 --> 07:10.000
And it will be extremely slow.

07:10.000 --> 07:15.000
For example, when I call the tokenizer tokenized function,

07:15.000 --> 07:19.000
waste return, bunch of tokens, like thousands of tokens.

07:19.000 --> 07:20.000
It starts to slow.

07:20.000 --> 07:21.000
Slow down.

07:21.000 --> 07:24.000
So I am now moving away from that.

07:24.000 --> 07:26.000
Infliver binary protocol.

07:26.000 --> 07:29.000
Which is inspired by my protocol.

07:29.000 --> 07:30.000
But I invented.

07:30.000 --> 07:32.000
I'm not copying it.

07:32.000 --> 07:34.000
And I am inventing a new step here.

07:34.000 --> 07:35.000
Yeah.

07:35.000 --> 07:40.000
Next thing in my list is something called Defi system.

07:40.000 --> 07:41.000
Yeah.

07:41.000 --> 07:44.000
We don't know Defi for matches you have.

07:44.000 --> 07:46.000
We know when we lose.

07:46.000 --> 07:50.000
The problem is that in the early day.

07:50.000 --> 07:51.000
OK.

07:51.000 --> 07:56.000
So I just load this using the default file system.

07:56.000 --> 07:58.000
Ian Squidon.

07:58.000 --> 07:59.000
It works.

07:59.000 --> 08:02.000
And I'm using too much memory.

08:02.000 --> 08:03.000
Why?

08:03.000 --> 08:05.000
I start to die deeper into the code.

08:05.000 --> 08:09.000
And turns out the MMR function.

08:09.000 --> 08:13.000
That LMSVP will use this function for reason.

08:13.000 --> 08:19.000
Because we don't want to minimize the copy of Defi.

08:19.000 --> 08:20.000
OK.

08:20.000 --> 08:24.000
So when I look into the source code of Ian Squidon,

08:24.000 --> 08:25.000
what is that?

08:25.000 --> 08:28.000
It's exactly opposite of that.

08:28.000 --> 08:30.000
It's first the allocating memory.

08:30.000 --> 08:32.000
That is why.

08:32.000 --> 08:33.000
So what is that here?

08:33.000 --> 08:35.000
It's basically a copy.

08:35.000 --> 08:38.000
So this copy is a chunks of the file.

08:38.000 --> 08:40.000
So why is it not nice?

08:40.000 --> 08:45.000
Because we end up not knows if you load as you.

08:45.000 --> 08:49.000
If you have a 200 megabyte, you end up using 400.

08:49.000 --> 08:50.000
Yeah.

08:50.000 --> 08:51.000
So yeah.

08:51.000 --> 08:52.000
This is how it's looked like.

08:52.000 --> 08:53.000
I draw.

08:53.000 --> 08:54.000
I've re-explained.

08:54.000 --> 08:57.000
But I have just set it in the network.

08:57.000 --> 09:00.000
So it's copied the file into the worker.

09:00.000 --> 09:04.000
Where it's stored temporary into a buffer in JavaScript.

09:04.000 --> 09:08.000
And each time I go and not, it's got to be the buffer back to the

09:08.000 --> 09:10.000
HIP memory of WebAssembly runtime.

09:10.000 --> 09:11.000
OK.

09:11.000 --> 09:13.000
So what is the solution?

09:13.000 --> 09:16.000
I invent my own thing is gone HIP address.

09:16.000 --> 09:21.000
Which it used this stream API of the browser.

09:21.000 --> 09:27.000
And instead of having to temporarily write the file into a buffer

09:27.000 --> 09:29.000
inside the worker, inside JavaScript,

09:29.000 --> 09:37.000
I stream each chunk directly to the HIP memory of the WebAssembly runtime.

09:37.000 --> 09:39.000
So what now?

09:39.000 --> 09:41.000
What about a map function?

09:41.000 --> 09:49.000
So I ended up with patched function to return a pointer to directly to the location

09:49.000 --> 09:51.000
of the file in the HIP memory.

09:51.000 --> 09:55.000
So how it looks is this function.

09:55.000 --> 09:58.000
You don't need to care about the first part.

09:58.000 --> 10:02.000
Just need to care about the last two lines.

10:02.000 --> 10:05.000
Where I return the pointer is the pointer to the file.

10:05.000 --> 10:09.000
HIP pointer to the file plus the position in the file.

10:09.000 --> 10:11.000
So I want to map.

10:11.000 --> 10:13.000
So it's just return a pointer.

10:13.000 --> 10:15.000
Not a copy.

10:15.000 --> 10:16.000
OK.

10:16.000 --> 10:19.000
Next thing is apply storage.

10:19.000 --> 10:20.000
OK.

10:20.000 --> 10:24.000
So we never want to like each time you run the file.

10:24.000 --> 10:25.000
You run it.

10:25.000 --> 10:30.000
You need to read out loads of how one shaker by model or something.

10:30.000 --> 10:35.000
So in the first day, I used something called catch storage,

10:35.000 --> 10:38.000
which is a very nice thing.

10:38.000 --> 10:40.000
It's easy to use.

10:40.000 --> 10:43.000
Then the storage is very limited.

10:43.000 --> 10:45.000
And also it does not support stream.

10:45.000 --> 10:47.000
It's time I read something.

10:47.000 --> 10:48.000
It's in my copy.

10:48.000 --> 10:49.000
First it is load.

10:49.000 --> 10:52.000
If I into browser memory.

10:52.000 --> 10:53.000
OK.

10:53.000 --> 10:56.000
So I turn my attention to something called index dv.

10:56.000 --> 10:58.000
Yes, it's better.

10:58.000 --> 11:02.000
Not the problem is that it's actually, at least on 5th of,

11:02.000 --> 11:05.000
it's actually stored when it's stored to this.

11:05.000 --> 11:08.000
It's stored inside an SP light library.

11:08.000 --> 11:11.000
It lasts not support stream.

11:11.000 --> 11:15.000
Maybe it lasts, but it requires hacking.

11:15.000 --> 11:18.000
The next thing is that it does not have like a hat,

11:18.000 --> 11:21.000
a maximum capacity.

11:21.000 --> 11:22.000
Yeah.

11:22.000 --> 11:24.000
But still, it doesn't have stream.

11:24.000 --> 11:27.000
So it does not benefit to me.

11:27.000 --> 11:32.000
So at the end, I turn my attention to something called OPMS,

11:32.000 --> 11:34.000
or it's in private file system.

11:34.000 --> 11:37.000
So what is there is that it actually, when it's used,

11:37.000 --> 11:41.000
a fun thing is that when you store the file into OPMS,

11:41.000 --> 11:45.000
it actually stores the file as a real file on file system.

11:45.000 --> 11:46.000
On your list.

11:46.000 --> 11:50.000
And I reverse engineering it to know, I can show it to you.

11:50.000 --> 11:51.000
Yeah.

11:51.000 --> 11:53.000
That is a soft stream.

11:53.000 --> 11:55.000
Yes, either an I will show you.

11:55.000 --> 11:56.000
It's very cool.

11:56.000 --> 11:57.000
Okay.

11:57.000 --> 12:01.000
And it also does not have a maximum capacity.

12:01.000 --> 12:04.000
Most browsers just cancel this in my capacity.

12:04.000 --> 12:08.000
Most on the free spy on your list.

12:08.000 --> 12:09.000
Okay.

12:09.000 --> 12:14.000
So how it looks like now is that when I open the file,

12:14.000 --> 12:17.000
the browser gives me a file object.

12:17.000 --> 12:21.000
I think a bit like file is greater, not a copy of the file.

12:21.000 --> 12:23.000
It's like a pointer to the file.

12:23.000 --> 12:24.000
Yeah.

12:24.000 --> 12:28.000
And then I can stream it back to the browser.

12:28.000 --> 12:32.000
Sorry, to the worker to the where I think we won't have.

12:32.000 --> 12:34.000
So there will copy.

12:34.000 --> 12:35.000
Yeah.

12:35.000 --> 12:36.000
Yeah.

12:36.000 --> 12:40.000
So how that is looked like, maybe a little bit more in the,

12:40.000 --> 12:42.000
just look at the function signature.

12:43.000 --> 12:48.000
It returns a promise of the file on in the case.

12:48.000 --> 12:50.000
For example, the file does not exist.

12:50.000 --> 12:54.000
But the most important thing is that this returns this file.

12:54.000 --> 12:55.000
Okay.

12:55.000 --> 12:58.000
So what's a very cool file.

12:58.000 --> 13:00.000
This is that now.

13:00.000 --> 13:04.000
I cannot only just use the,

13:04.000 --> 13:07.000
the file provided by OPS.

13:07.000 --> 13:10.000
But I can also load my own file.

13:10.000 --> 13:14.000
I can also load my own file and I can also load my own file.

13:14.000 --> 13:18.000
Each browser, this button on the file, when you want to upload some file to

13:18.000 --> 13:19.000
Internet.

13:19.000 --> 13:23.000
If I see browser returned a file object with a stream.

13:23.000 --> 13:26.000
Yeah.

13:26.000 --> 13:27.000
Let's go.

13:27.000 --> 13:30.000
But then it's too slow.

13:30.000 --> 13:32.000
Even with.

13:32.000 --> 13:34.000
This flag, this two flag.

13:34.000 --> 13:36.000
The first flag is to enable the support for something,

13:36.000 --> 13:43.000
single instruction manipulator, and the second flag is to activate the

13:43.000 --> 13:52.000
lever trees, lever tree optimization of the compiler, which should allow

13:52.000 --> 13:56.000
something complexurized. So what does it mean by

13:56.000 --> 14:01.000
vectorized? That is it pick up on the loop and it try to use

14:01.000 --> 14:08.000
the same thing on that. But the second flag failed to do that.

14:08.000 --> 14:12.000
The solution is obvious to rely on this function in

14:12.000 --> 14:16.000
Cindy. Sometimes you might already know what I'm going.

14:16.000 --> 14:21.000
Yeah, especially. But then I only have one

14:21.000 --> 14:24.000
thunder and my thunder does not just with the computer.

14:24.000 --> 14:27.000
I also have a wife. Please.

14:27.000 --> 14:40.000
Okay, so. Yeah. I just asked it. And that works just

14:40.000 --> 14:48.000
kind of maybe. But at least two-ex speed, double the speed on

14:48.000 --> 14:53.000
most of the quantization. And almost triple the speed on

14:53.000 --> 14:56.000
the quantization to do it. Which is this?

14:56.000 --> 15:00.000
It's a bit quantization. Okay.

15:00.000 --> 15:03.000
Something I plan to do in the future. Firstly is the

15:03.000 --> 15:08.000
wedge CPU. What is still very early development?

15:08.000 --> 15:12.000
I still have a lot of trouble with this. But I have a very simple

15:12.000 --> 15:16.000
POC pull-up on set for now. I have to revise the kernel

15:16.000 --> 15:19.000
in something complex and please share the language.

15:19.000 --> 15:24.000
I have a fixed-down issue on mobile. For example, most important

15:24.000 --> 15:29.000
is about iOS device. Then I also want it to

15:29.000 --> 15:32.000
compatible with something called WRC, which is where

15:32.000 --> 15:36.000
assembly system interface. We should allow it to just

15:36.000 --> 15:39.000
for example, read the file directly from the file

15:39.000 --> 15:42.000
system retail. Here I'm integrating it to run

15:42.000 --> 15:46.000
with our browser. Yeah. I will also want to

15:46.000 --> 15:49.000
support the lower adapter, which should be

15:49.000 --> 15:51.000
something very simple to do. Just I don't get

15:51.000 --> 15:54.000
as a time. Yeah. It's a weekend for the

15:54.000 --> 15:57.000
better way. It's still here. Okay. And last thing I

15:57.000 --> 16:01.000
want is, it's not the last thing. It's just

16:01.000 --> 16:04.000
last thing here is the list. But the support for

16:04.000 --> 16:08.000
multi-modal is still waiting for my

16:08.000 --> 16:13.000
refactoring on LAMACP. But yeah, I want it to be

16:13.000 --> 16:17.000
very seriously. So, take care. I'm taking my

16:17.000 --> 16:20.000
time on this. So, that is. Thank you for your

16:20.000 --> 16:28.000
attention. Thank you. Thank you very much. We have

16:28.000 --> 16:32.000
time for one question. If somebody has the

16:32.000 --> 16:36.000
question and enough energy to ask it. Okay. Thank

16:36.000 --> 16:39.000
you very much. Thank you. Thank you. Thank you.

16:43.000 --> 16:48.000
Thank you. Thank you. Thank you.

