WEBVTT

00:00.000 --> 00:10.160
Hey everybody, thanks a lot for coming, much appreciated.

00:10.160 --> 00:18.020
So I am dead program, some people call me Ron, a couple of humans call me dad, that's not

00:18.020 --> 00:24.180
really too important, I am dead program, I'm not dead though, not yet, I'm a technologist

00:24.180 --> 00:30.380
for hire, that means that I will, in fact, in exchange for fiat currency, create code of

00:30.380 --> 00:35.700
indeterminate quality naturally, and I have a small boutique consultancy called the hybrid

00:35.700 --> 00:41.620
group where we create the software that makes your hardware work, and some of us work

00:41.620 --> 00:46.500
on a few open source projects, one of which is tiny go, so a very cool project,

00:46.500 --> 00:52.340
Graham, talked about it a bit, and you're going to hear using tiny go, that's a temporary

00:52.340 --> 01:01.060
condition, thank you by the way, also go CV, which are the go wrappers for open CV,

01:01.060 --> 01:09.060
so computer vision using web assembly, first question, why, I mean why would you do that?

01:09.060 --> 01:14.660
Well, I'm glad you asked that question, so what we're really talking about is an industrial

01:14.660 --> 01:20.300
computer vision systems, right, like the serious stuff, generally running on an embedded

01:20.300 --> 01:24.020
Linux device, so we're not talking about the browser at all, forget the browser, I mean

01:24.020 --> 01:29.020
don't forget the browser, but we're not talking about web assembly and browsers, usually

01:29.020 --> 01:32.660
edge computing devices of some kind, so they're very close to the problem they're trying

01:32.660 --> 01:38.740
to solve, and often also cloud computing devices running in giant server forums and

01:38.740 --> 01:44.980
mysterious locations, so what's on my application is like agriculture, transportation,

01:44.980 --> 01:49.540
energy, security, and of course manufacturing, which is what I've been doing the last

01:49.540 --> 01:54.420
few years building we do a little batteries using some of this stuff, so a lot of computer

01:54.420 --> 02:00.140
vision applications have the same basic structure, really the same basic design patterns,

02:00.140 --> 02:05.500
but there's lots of different parts to integrate, and naturally hard to run on different

02:05.500 --> 02:09.780
kinds of hardware, if you have a heterogeneous environment in your factory or something

02:09.780 --> 02:14.060
like that, you know generally it's been quite difficult to get it to work, also very hard

02:14.060 --> 02:21.820
to update, I mean who, who among us is not brick to really important device, no, no blame,

02:21.820 --> 02:27.100
no harm, no judgment, all right, and there's also some need for customization, it's not

02:27.100 --> 02:31.380
just a matter of tweaking a few parameters, but there is some intelligence behind the people

02:31.380 --> 02:35.140
need to configure these systems, in order to get them to do whatever the job is, they're

02:35.140 --> 02:40.300
supposed to be getting done, so that's the reason why we created wasm vision, which

02:40.300 --> 02:44.500
is basically intended to help you get going with computer, I handling the basic design

02:44.500 --> 02:50.020
patterns that you would normally do, which is capturing video, processing this video in some

02:50.020 --> 02:56.460
sort of way, and then saving it or streaming it or making some type of rest or GOPC or

02:56.460 --> 03:00.900
other type of call to get something else to happen as a result of whatever it is, ingest

03:00.900 --> 03:10.340
some data, so wasm vision consists of command line interface and the wasm vision engine

03:10.340 --> 03:17.380
and then wasm vision processors, so the engine is written and go, go is a great language

03:17.380 --> 03:24.220
for concurrency and you could do a lot of things at one time, it then combines open CV, anyone

03:24.220 --> 03:32.380
here use open CV or no open CV, open CV is really the gold standard as at world computer

03:32.380 --> 03:37.420
vision, it's been around for I don't know 15 or 20 years, it's got hundreds of not thousands

03:37.420 --> 03:42.540
of algorithms, untold number of students and postdoctoral candidates have worked on open

03:42.540 --> 03:50.900
CV, also ffm pad, which is really an amazing open source tool for processing video and

03:50.900 --> 03:58.260
G streamer using GOCV to control all of this, and of course was zero, we've got a one

03:58.260 --> 04:09.020
of the maintenance of was zero here, give a big hand, so we all, all of this is combined

04:09.020 --> 04:15.220
into a single statically linked binary with absolutely no external dependencies, yes, this

04:15.220 --> 04:20.780
all combines together, so here's the architecture on a weight, sorry, different architecture,

04:20.780 --> 04:26.020
you can see this, so here's the actual architecture of wasm vision, so the wasm vision

04:26.020 --> 04:32.020
engine, we capture some video, then we process that using the wasm runtime in this case

04:32.020 --> 04:38.740
was zero, each one of these frames is then passed to an individual processor and we can

04:38.740 --> 04:42.980
do different things by either capturing or outputting the different streams using those

04:42.980 --> 04:48.820
capabilities I mentioned earlier, so wasm vision processors, they can be written using

04:48.820 --> 04:54.540
GO, specifically using tiny GO, obviously we like tiny GO, they could also be written

04:54.540 --> 05:00.820
using rust, very cool language, and C, yes, no one ever got fired for choosing C for your

05:00.820 --> 05:05.380
industrial system, right, I know everyone's here is like yeah, but do I have to, I don't

05:05.380 --> 05:09.820
know, you're going to have to talk to someone else if you have to, but you can, so let's

05:09.820 --> 05:14.820
is all possible because of something called wasm CV, so wasm CV is in a series of interfaces

05:14.820 --> 05:19.380
for computer visions, specifically for wasm, it's one of the projects that we released

05:19.380 --> 05:24.260
the end of last year, and it basically, these interfaces specifically for computer vision

05:24.260 --> 05:31.380
based around the patterns used in open CV, so it's defined using wasm, what is the, what

05:31.380 --> 05:36.860
assembly interface type definition, it's like an IDL and then face definition language,

05:36.860 --> 05:43.740
and it's part of the wasm component model, however wasm vision and wasm CV are not actually

05:43.740 --> 05:50.740
using the wasm component model, reasons for that, just we don't actually need it, we're

05:50.740 --> 05:55.340
just using with to generate the wapers that we use in these different programming languages

05:55.340 --> 06:01.260
and also the documentation, also important note we're using on the wasm, something called

06:01.260 --> 06:06.100
wipes that was actually created by Orcenium by Graham who was give a great talk earlier,

06:06.100 --> 06:09.740
on the wasm, that's how we're doing the integration, so on the client, we're using

06:09.740 --> 06:15.260
these wapers that are generated using wit with wasm CV and then on the host side we're

06:15.260 --> 06:22.060
using wipes, so quick review on how computer vision works, the basic fundamental unit is

06:22.060 --> 06:27.100
the mat, or the matrix, so I actually explain things with ledos, why I understand them,

06:27.100 --> 06:33.780
as best I can, so then the mat can also have red, green, and blue colors, actually generally

06:33.780 --> 06:38.900
it's blue, green, and red that that's not important right now, look if they're stacked

06:38.900 --> 06:43.060
together in this kind of order, and this is just the way that this information is represented

06:43.060 --> 06:50.980
inside of OpenCV, then with OpenCV, the capture process takes a mat, passes the type

06:50.980 --> 06:56.980
of image processing, like perhaps a Gaussian blur, and then passes it to some other type

06:56.980 --> 07:01.100
of processing, for example, a deep neural network to do some type of other processing or

07:01.100 --> 07:07.820
recognition, all right, so how wit works very quickly, it's got interfaces that we define

07:08.140 --> 07:15.420
collection of functions, it's got basic types, like floats, ends, and such that we can

07:15.420 --> 07:19.740
define, it's got something called records which are basically like strokes where the information

07:19.740 --> 07:25.820
is actually passed between the host and the guest in web assembly, and then resources,

07:25.820 --> 07:30.140
the resources are very interesting and important as you'll see, it's something that's actually

07:30.140 --> 07:35.900
entirely owned by the host, but that the guest can call, so the information is not actually

07:35.980 --> 07:40.540
passed over entirely into web assembly, some people would like to do everything in web assembly,

07:40.540 --> 07:45.740
I'm not actually a fan of that, and you'll see why in a few minutes. So let's pick a very,

07:45.740 --> 07:51.980
very quick look at the wit file for the world, used by WawesomeCV, so we see we have a world,

07:51.980 --> 07:58.140
it's got basic types, including the mat, and if we look at the mat, we see that we've got a definition

07:58.140 --> 08:04.940
using a resource, we can construct a mat, clone it, close it, see how many columns and rows are in

08:04.940 --> 08:12.060
that image and so on, and then we've got different basic core functions, for example drawing circles,

08:12.060 --> 08:19.260
rectangles, and in this case performing a blur function, all right, so this is the definition

08:19.260 --> 08:25.180
of in the wit format that we're then going to use to generate our wrappers, so everything is

08:25.180 --> 08:31.020
generated from these wit files, we don't do anything manually, so the WawesomeCV docs themselves

08:31.020 --> 08:36.060
are also generated, so here we can take a look at the automatically generated docs created

08:36.060 --> 08:41.740
for the mat that we just saw before in the IDL, so let's take a quick look at some awesome

08:41.740 --> 08:49.980
vision processors in go, who here is used go, excellent, so you're familiar with packages,

08:49.980 --> 08:56.300
so WawesomeCV is just a go package, so we just say package WawesomeCV, and everything is generated

08:56.300 --> 09:01.420
from those wit files using wit bind gen go, which is a set of tooling maintained by the

09:01.420 --> 09:06.380
bytecode alliance, which helps generate very nice idiomatic wrappers for go from those wit files,

09:07.420 --> 09:12.780
and you build it just using tiny go, you just say tiny go build output blur.wawesome,

09:13.340 --> 09:19.500
our target is Wawesome Unknown, we generally don't use WCP1 or P2, we're just it's raw,

09:20.380 --> 09:27.180
hardcore minimalist web assembly, so let's take a look at the code for the blur processor,

09:28.140 --> 09:33.660
very quickly, so here is our tiny go program, package main, like any other go program,

09:33.660 --> 09:39.500
we import WawesomeCV, the mat, and types, and then we have a process function, this is really

09:39.500 --> 09:44.700
the entry point for all of the Wawesome vision processors, and it always is the same, it passes

09:44.700 --> 09:49.100
in the single parameter, which is the image that we'd like to process in the form of a mat,

09:49.260 --> 09:54.780
and then it returns a mat, that way you could do whatever processing, and then return the image of

09:54.780 --> 10:03.340
whatever it is you ended up with, and so in this case we're going to say CV dot blur of a kernel of 25

10:03.340 --> 10:08.860
by 25 by 25, so a 25 by 25 pixel blur area, we're going to log some info and then just return

10:08.860 --> 10:17.340
that blur image, so now let us see if the demo gods will curse me or what they will do, so let's see,

10:18.300 --> 10:26.460
make blur, all right, we're running, so you're wondering where do I see it, but of course in

10:26.460 --> 10:32.140
WawesomeVision we stream everything that's totally headless, right, so I have one, I'm very blurry,

10:32.140 --> 10:43.900
nice to meet you, the two are the supplies, all right, thank you, so

10:44.540 --> 10:51.740
Rust, WawesomeVision processors and Rust, it's a crate, of course it is, who here is using Rust?

10:53.340 --> 10:59.020
Cool, very cool language for doing things in Webassembly, so we use WootBineGen, which is one of the tools

10:59.020 --> 11:05.260
used for doing these rapid generation, again idiomatic Rust is being created, and how we build it,

11:05.260 --> 11:10.940
cargo build, again target Wawesome Unknown Unknown, because we're going for the most minimal possible

11:10.940 --> 11:17.180
module in size, and then let's take a look at the Rust code, so Rust code is a little bit longer,

11:17.180 --> 11:23.420
but not very much, no standard because it runs raw, bare, no operating system, we've got our

11:23.420 --> 11:28.780
external crate that we're bringing in for WawesomeCV, and then here's our external function,

11:28.780 --> 11:34.220
again it's a process function, what does it take, as is parameter, a mat, what does it return,

11:34.300 --> 11:41.500
a mat, and of course what does it do, it calls CVBourer, it seems to come from familiar, right,

11:41.500 --> 11:47.100
passing in the kernel of 25x25, it's doing the exact same thing except in Rust, and then we return

11:47.100 --> 11:55.900
that process image, so now let's see if the demo God is once again, so if we run,

11:55.900 --> 12:06.300
all right, so that seems to work, so far, so good, and then still blurry, it's the exact same

12:06.300 --> 12:14.940
blur, it works in Rust, amazing, more to two of the supplies, and of course the programming

12:14.940 --> 12:23.660
language of our forefathers C, it literally is my grandfather's language now, so we include

12:23.660 --> 12:31.260
Heather Files in C, who here is UC, who here uses C frequently today, not too many,

12:31.820 --> 12:38.780
it's not a dying language, it's just sort of on the lawn, kiss, good night, so we use with my

12:38.780 --> 12:44.380
gen against, to generate the things, and naturally it's very easy to compile your C code using

12:44.380 --> 12:52.060
playing, now to be fair, both the tiny doke compiler and the Rust compiler are doing all of

12:52.060 --> 12:56.380
these same things, it's just they're hiding in the way from you, now hopefully that's all right,

12:56.380 --> 13:00.060
as long as you know what it's doing, if you ever do run into a problem where you don't know

13:00.060 --> 13:04.460
however, it might be a little tricky, so I guess maybe that's one benefit, let's take a quick look

13:04.460 --> 13:10.060
at the C processor code, so here we've got, again, the process function, we pass to the mat,

13:10.780 --> 13:16.620
we do our blur function, passing that size of 25x25, and then we return it,

13:18.300 --> 13:23.900
you see a pattern for me, here, by any chance, and so if I say bake blur,

13:25.900 --> 13:33.260
oh, make it, I don't know what make it is, so if we say make blur C, once again, we're running the same thing

13:34.220 --> 13:43.660
and I'm still blurry, amazing, three languages, same blur, all right, let's take a very brief look

13:43.660 --> 13:49.260
in a few of the processors, how we do it on time, we're running out, very little to do it lots

13:49.260 --> 13:55.260
of time, no wait, strike that, reverse it, so basically the processors we provide them,

13:55.260 --> 13:59.900
basic building blocks built in, but they can be trained together so you could take the

13:59.900 --> 14:04.540
output of one to the next one, you know, because again, it's a pipeline, so the pattern,

14:04.540 --> 14:08.700
you know, it's a create your own processors, so a quick tour of some of the included ones,

14:08.700 --> 14:13.740
askify, this one is a particular favor of the mine, so askify's written and go,

14:13.740 --> 14:19.420
we process our mat, we take it and resize it, we then convert it to asky art,

14:19.420 --> 14:23.580
because we're actually living in the matrix, I mean, I thought that was what was happening,

14:23.580 --> 14:26.540
and then we return that, so let's see if it works,

14:30.220 --> 14:41.820
I'm actually, I should open now, let's go for it, here I am, hello, I'm alive, wait,

14:41.820 --> 14:47.340
I'm in the box, I thought it was supposed to fit outside the box, now you might be wondering,

14:47.340 --> 14:56.300
so what about the stringiness, like it's still screen, I look pretty good in the form of bits,

14:56.300 --> 15:07.020
I have to say, all right, let's keep going. The mosaic processor, so we could basically call

15:08.220 --> 15:13.740
any of the capabilities that OpenCV provides, one of them is loading deep neural networks,

15:13.740 --> 15:20.620
so I really like the capability of taking a fast neural style transfer of network and putting

15:20.620 --> 15:25.980
myself through it just to see what happens. So again, process function, now one thing worth

15:25.980 --> 15:31.100
noting here is that we're calling, then, not reading and we're loading this model, so

15:31.100 --> 15:35.340
while some vision can do the same sort of thing as a lot of other cool hipster software nowadays,

15:35.340 --> 15:39.500
which is, if you don't have the needed assets or models or files that will automatically

15:39.500 --> 15:45.260
download them for you, so this will download the mosaic nine-on-x file from one of the servers and

15:45.260 --> 15:50.700
put it on your machine if you don't already have it. So we take it, we convert our image to this

15:50.700 --> 15:55.500
blob, which is the format we need to pass into a deep neural network. We set our networks

15:55.500 --> 16:01.020
input, we pass through the network to do the processing, and then based on those results,

16:01.020 --> 16:05.660
we're going to take them and then display them back into this new image that we return.

16:06.860 --> 16:14.060
All right, so a lot of stuff, but quickly, so let's see what happens in the land of the mosaic.

16:14.540 --> 16:27.980
So far, so good. Yes, and here I am in the form of an artistic mosaic. This is my best form.

16:30.620 --> 16:36.460
Now, this is all processing entirely on the CPU in my machine. I'm not using any GPUs whatsoever,

16:36.460 --> 16:42.300
and it's got actually pretty good performance, so don't sell your Nvidia yet, but

16:44.300 --> 16:51.740
so let's stop that real quick. All right, one of the processes we have uses O-Lama,

16:51.740 --> 16:57.420
and it'll be here here at the Royal Lama. So one of the capabilities of O-Lama is

16:57.420 --> 17:02.300
whether known as vision models. Vision models are generally used for one of two things. One is create

17:02.300 --> 17:07.580
images from text, the other is create text from images. That's what I'm going to do. So here's a

17:07.580 --> 17:11.420
quick look at the architecture. We've got the awesome vision engine that's doing our capturing.

17:11.420 --> 17:16.540
Then using the O-Lama processor, it's passing each frame to the lava processor in O-Lama.

17:17.580 --> 17:25.260
Let's take a very brief look at the code. So same pattern, once again, we process. We love the

17:25.260 --> 17:31.900
configuration. We create some JSON that we're going to pass to O-Lama. We pass that JSON along with

17:31.900 --> 17:37.340
the image that we've just captured, and then let its results come back, and then display them.

17:38.300 --> 17:47.260
So, let us see what happens. Actually, first, let's make sure I'm actually running it.

17:49.260 --> 17:54.780
And it doesn't look like it. So we have a stalker starts O-Lama.

18:00.540 --> 18:05.660
All right. So it's running. So if I say

18:07.420 --> 18:13.260
stalker, exact O-Lama, and run the nano lava model, which is a very small Lama model.

18:14.940 --> 18:20.860
All right. So let's see if it's actually working. This one does use the GPU.

18:25.900 --> 18:33.820
So if I run Nvidia SMI, I'm not seeing a Lama sharing up in there, M-I.

18:37.340 --> 18:41.740
It should be. I guess we'll find out very quickly if it's actually doing it or not, won't we?

18:42.620 --> 18:50.140
All right. So if I say make O-Lama, that should now run our processor. So how do we know if

18:50.140 --> 18:58.460
it's working? Well, it should actually display some text. So it should be looking at me passing it

18:58.460 --> 19:04.540
through the deep neural network model using this lava model. And then, if all works well,

19:04.620 --> 19:08.140
the fan kicked in. So this is the CPU. I guess I didn't get the GPU to work.

19:09.020 --> 19:15.020
But it'll still happen. It'll just be very slow. So in a little bit, it will come up with a

19:15.020 --> 19:21.180
description of what it thinks they're seeing, which is hopefully me. Now generally, it's not very

19:21.180 --> 19:26.540
complimentary. It makes it actually kind of kind of rude comments about my physical appearance to be

19:26.620 --> 19:34.860
honest. I hear it shurning away. It's thinking very, very hard. We're getting so close.

19:37.900 --> 19:42.780
Well, now the fan is working. So it's doing something. Oh, wait, here's something. The fan just

19:42.780 --> 19:48.380
kicked in doubly. Now it's really trying hard. We'll give it one more minute and we'll go to the next

19:48.380 --> 19:53.740
thing because I mean, who has time for this? Plus, it was a little gratuitous. I'll be honest.

19:54.700 --> 20:05.420
All right, three, two, one. Sorry. No, a llama today. All right. But you got the idea.

20:05.420 --> 20:12.860
It actually does work if I got in the GPU on. All right, video drone. So this is a flight control

20:12.860 --> 20:18.540
station using face tracking with a machine learning model. For this, I'm going to use my DJI

20:18.540 --> 20:24.300
tele drone that I brought with me and plug in my dual shock joystick that I stole from one of my

20:24.300 --> 20:31.100
kids. Just as like I've spent a lot of time with this particular control surface, I will say.

20:31.980 --> 20:39.580
One of my favorite things, playing with joysticks. Here is the actual drone. Let's see if we can

20:39.580 --> 20:52.060
take a look real quick if I say make sure video. Then I should have my camera. Yes, here we go.

20:52.060 --> 20:59.420
So we can take a look here's the drone. To tell our drone, I plug it in. I must turn it on.

20:59.740 --> 21:08.300
This is breathing. We know that because there's a light. Wait, where's the light?

21:12.300 --> 21:19.340
Maybe I turn it off. Oh, there we go. Okay. Excellent. I've got the joystick. You see that there.

21:20.460 --> 21:27.100
I'm tangled in these cables. But that'll be fun. For this, I'm going to use something called

21:27.100 --> 21:31.980
Gobot, which is written and go and I'm going to use a unit face detection model. Quick review,

21:31.980 --> 21:37.660
computer vision models, take the images, pass it through different layers of neural networks,

21:37.660 --> 21:42.860
in order to basically come up with a prediction about whether or not they see a face. With video

21:42.860 --> 21:46.780
drone, we're going to fly the drone using that joystick. We're going to use Watson vision, the

21:46.780 --> 21:51.340
process, the images, and then based on that, we're going to display whether or not we actually see

21:51.340 --> 22:01.260
human. The quick look at the code. We start the joystick. We start the drone. That's basically it.

22:02.860 --> 22:07.420
The one thing that's interesting is we're going to proxy the video from the drone to Watson

22:07.420 --> 22:13.660
vision because remember it's got capabilities for streaming and we can do that. And now let us see

22:14.620 --> 22:25.660
what will happen. First we must connect to it. We'll see the drone. Not that one.

22:26.940 --> 22:36.220
I was a little too quick on the click. There we go. So we should be connected to the drone.

22:37.180 --> 22:51.180
I am plugged in. We are running, I don't want to run that. I want to say run drone stream.

22:56.380 --> 23:02.700
Switch over to my other one. I have to run multiple versions of go at the same time just because

23:02.700 --> 23:18.220
I'm like that. Fasten what use it? That's right. I forgot. If we make drone, I'm not the

23:18.220 --> 23:25.180
drone stream. I guess it would be video drone. So theoretically we're connected to the drone.

23:25.900 --> 23:28.700
That's actually flashed over to the video so we can see where that's seen.

23:33.420 --> 23:44.060
Not seeing anything. Should be connected. Oh, there we go. That's better.

23:45.500 --> 23:53.020
All right, so that's the drone. And let's see, trying to remember the controls. I forgot to

23:53.020 --> 23:59.820
look at the controls. That's always fun. Let's see. As I recall, this makes it fly.

24:03.340 --> 24:07.820
Yeah, that was it. All right, so let's see if it's actually, oh my god, it sees humans.

24:09.420 --> 24:14.460
Now, one thing that's really good to know about drones in general is that battery life is very limited.

24:15.260 --> 24:19.660
So you don't have to be faster than the drone just faster than your friend. I think it's how that

24:19.740 --> 24:23.740
one does. It's about an eight minute battery life. I probably should have warned you in the front

24:23.740 --> 24:31.580
row before I threw the drone in the air. Yeah, all right, so it takes a look at the pilot.

24:32.860 --> 24:36.780
Takes a look at the other humans. Let's get some altitude.

24:39.660 --> 24:43.660
Yeah, there's quite a few of you in here aren't there. See, oh, this is it.

24:43.660 --> 24:47.020
Awesome vision with live drone flight. It actually works.

24:54.700 --> 25:00.540
Well, I'm probably out of time and I've already done enough for two of this flying. So, come home

25:00.540 --> 25:05.580
little bird. And that's it.

25:14.060 --> 25:21.740
There's my landing messages land land land land land land are really please land. All right, so

25:22.860 --> 25:28.700
the end is here. So was that fun? Did you have fun with that? Yeah, yeah.

25:28.700 --> 25:34.620
For industrial stuff, that's pretty fun, right? So version 0.2 just was released right before

25:34.620 --> 25:38.540
fast. I'm conference driven development. It is a thing. Check it out,

25:38.540 --> 25:47.180
watch them vision.com. Follow us on the featherverse. So I'm blue sky. And thank you very much.

