WEBVTT

00:00.000 --> 00:20.800
Yes, I started this library that's called the Libs of Cura to support cameras on Linux

00:20.800 --> 00:26.400
because cameras are difficult and they should not be difficult, they should be rather easy.

00:27.040 --> 00:33.040
And yeah, why am I here? Because I want to do to continue talking about this.

00:34.640 --> 00:39.840
Well, the first presentation I gave was at 38C3 at the Congress. How many of you have been at the Congress?

00:39.840 --> 00:45.360
Hands up and up? Well, there are some people. Yeah, so I'm going to go a bit deeper than that.

00:45.920 --> 00:51.040
But since you haven't been a doesn't really matter, but the story started with the Librim 5.

00:51.120 --> 00:57.520
Oh, I forgot mine, but how many of you have lived in 5? Oh, I'm so proud.

00:57.520 --> 01:00.800
When I was working in Librim 5, we had to support the cameras at some point.

01:00.800 --> 01:04.800
And this is where the story started. I had to write some kind of a library for the cameras.

01:04.800 --> 01:10.720
And it turns out that it's kind of difficult because of not using USB.

01:10.720 --> 01:13.280
Like, you have a webcam, you have a webcam in your laptop.

01:13.280 --> 01:17.760
We have a different kind of a camera on your mobile phone and like,

01:17.760 --> 01:22.960
webcam's run over USB. Typically, most cameras in most laptops run over USB.

01:22.960 --> 01:29.920
And USB is quite amazing because it does everything for you on camera and cameras with the inside the phone.

01:29.920 --> 01:34.080
You want to have more control because the person who is using the camera on the phone,

01:34.080 --> 01:37.280
like, probably is the first private camera that they are using.

01:37.280 --> 01:41.840
And they want really good quality. They want everything to be done properly and powerful.

01:41.840 --> 01:44.880
And like, not just built into the tiny controller in the camera,

01:44.880 --> 01:50.480
but actually running on the main CPU with like all the computational photography staff and so on.

01:50.480 --> 01:56.880
And this has become a problem in 2025 for Linux because Linux phones are becoming popular.

01:56.880 --> 02:00.560
So it's a problem that's a sign of actually making it.

02:00.560 --> 02:05.280
And yeah, USB cameras, it's all due to the air quite nice.

02:05.280 --> 02:09.200
Everything is inside. You see all those, all those elements.

02:09.280 --> 02:12.800
There's like a sensor scalers that sticks wide, but as the buyer so on.

02:12.800 --> 02:18.000
So on those are not optional, except for maybe this JPEG encoding.

02:18.000 --> 02:21.520
So they're really nice. You just ask for a picture and you get a picture.

02:21.520 --> 02:25.280
Even get a native JPEG for a classic. This is amazing.

02:25.280 --> 02:28.880
Meanwhile, on the Libre5, you don't have that.

02:28.880 --> 02:32.480
You don't have the components which are required to build a camera.

02:32.480 --> 02:36.400
Because they are not part of the standard, which is not used to be standard.

02:36.400 --> 02:39.360
Like, there is no standard actually that much.

02:39.360 --> 02:43.920
There is like a statistics unit, but it is supported in hardware.

02:43.920 --> 02:46.640
It is not supported in the driver. I was able to make it work.

02:46.640 --> 02:50.480
It's kind of, and yeah.

02:50.480 --> 02:53.760
So is USB really that different from like,

02:53.760 --> 02:58.000
where people not using USB on the phones instead?

02:58.000 --> 03:03.040
Maybe like, there is enough support to provide drop pictures, not just JPEGs.

03:03.120 --> 03:07.760
There is enough support to create your noise, reduction algorithm.

03:07.760 --> 03:11.280
And maybe there is some place for computational photography.

03:11.280 --> 03:13.600
But why are they not using it?

03:13.600 --> 03:18.240
I think it might have something to do with complexity of the USB protocol itself.

03:18.240 --> 03:24.000
Or maybe the USB protocol that controls the camera is called UVC.

03:24.000 --> 03:26.640
Maybe it's just not powerful enough regardless.

03:26.640 --> 03:28.800
It can do everything, but maybe it's not powerful enough.

03:28.800 --> 03:33.920
As you can see, this is like the device diagram of USB device.

03:33.920 --> 03:38.240
And the frames come from the top from the sensor all the way down to the bottom

03:38.240 --> 03:39.600
through all the processing notes.

03:40.720 --> 03:43.680
But this is quite simple. It just goes, but put, put, put, put, put, put, put, done.

03:43.680 --> 03:48.800
And the images down and the end. But on an Android phone, the diagram can look like this.

03:50.160 --> 03:56.640
In USB, I have not found a way to actually do some extra routing to the side and back.

03:56.640 --> 04:04.480
And so this might be one of the reasons why USB didn't really catch up in more advanced photography.

04:06.560 --> 04:10.880
There are also problems where you produce several pictures at the same time.

04:10.880 --> 04:15.040
Basically, this is one of the few laptops that doesn't use USB.

04:15.040 --> 04:20.320
And this is a complicated pipeline that you have to support in any library.

04:20.320 --> 04:26.640
Only by 5, it looks kind of like this again. The problem is, for all those different diagrams,

04:26.640 --> 04:32.400
you have to, if you bring a camera library, that's just does the right thing, which makes it simple.

04:32.400 --> 04:35.440
The only, if you want to make it simple, it has to do the right thing basically.

04:36.160 --> 04:43.680
For this set of different devices, you have like one device here, another device here, one more device here.

04:43.680 --> 04:49.920
You have to configure them all in such a way that the frame from the top is in a format compatible

04:49.920 --> 04:54.880
with the output from the top, the input on the next level.

04:54.880 --> 05:01.920
And then it gets transformed. And then the output on the bottom has to be the same as the input from the top of the next device.

05:03.280 --> 05:07.200
So, yeah, this is the pipeline. And let's take it a bit complicated.

05:07.200 --> 05:11.360
This pipeline creates an image in the raw format on the top.

05:11.360 --> 05:14.880
Then there's some kind of a processing device. And this processing device,

05:15.840 --> 05:22.000
combines the image maybe, and it creates two images. One image for saving as a photo,

05:22.000 --> 05:28.080
when you click the shutter, but another image so you can preview the photo before actually committing to it.

05:29.440 --> 05:33.840
So, let's imagine what that looks like.

05:34.720 --> 05:39.280
So, for example, for the sensor, it's quite easy. You only have output.

05:39.280 --> 05:50.240
Well, it can probably include some scalar inside, whatever. It can have two outputs. One thousand pixels by 500 pixels, another 2000 pixels.

05:51.280 --> 05:56.880
And there, the processing node, which takes an image of any size in a particular format,

05:56.880 --> 06:00.560
and outputs two different images in two different formats. So, okay.

06:01.760 --> 06:09.200
You have two different notations here. And here is a notation with no variables.

06:09.200 --> 06:12.880
And there is like a variable here. Look at that. There's an A. There's a B.

06:12.880 --> 06:20.240
And there's a half an A. And there's half a B. So, quick exercise in outputs two over here,

06:21.600 --> 06:27.360
which possible sizes could we get? Come on. Come on. There are multiple options.

06:27.360 --> 06:34.960
There are multiple answers. Okay. I'm going to tell this for you. If you have an input

06:35.920 --> 06:43.280
of like A by B. And the output is A half B half. So, you can get from this option.

06:43.280 --> 06:50.160
From this output from the sensor, you get either, yeah, you get a 1000. Wait, no.

06:50.720 --> 07:03.920
500 by 250 or from this option, you get 1000 by 500. Right? So, this is like a kind of a pipeline.

07:03.920 --> 07:10.400
And it has a sensor and inputs, a certain outputs. And basically, if you want to

07:11.440 --> 07:16.320
use a camera that kind of passes the frames from top to bottom, your library has to

07:16.320 --> 07:26.000
compute everything. So, that is matching. Yeah. So, oh, welcome to the Sudoku conference.

07:26.720 --> 07:33.120
Have you had previous opinions with Sudoku? Hands up, hands up? Yes. Do you enjoy Sudoku?

07:34.000 --> 07:41.280
Oh, there are people who enjoy Sudoku, even. What is Sudoku? Sudoku is a kind of a game where

07:41.280 --> 07:47.440
you feel in numbers based on other numbers. And those are the three principles of Sudoku.

07:47.440 --> 07:54.320
The first principle is that every row has to have distinct numbers from one to nine.

07:55.200 --> 08:03.520
As you can see, all numbers are distinct. And also, the same applies to the columns.

08:04.160 --> 08:11.200
All numbers and every column in column A and column B and in column C. Every number is distinct.

08:12.480 --> 08:20.000
And the final property is within every square nine by nine. The numbers have to be distinct.

08:21.200 --> 08:26.720
Okay, there's a wrong idea. And does it look like something? I think it looks like something.

08:27.680 --> 08:32.080
It looks like source code. It looks like a solver. It looks like

08:33.040 --> 08:40.080
analog. It is actually, we already, we actually just now created a solver. Okay, well,

08:40.080 --> 08:45.200
I'm simplifying it a little bit. There needs to be some extra supporting code, but it doesn't do anything

08:45.200 --> 08:54.400
magical. We create a solver, a Sudoku solver. Oh, I got. And if we try to use that solver,

08:54.960 --> 09:00.320
it's actually really simple. You can try this procedure yourself. Just stand as QR code

09:00.480 --> 09:07.760
go to the URL. You give it a table. And it's solving the constraints that we presented previously.

09:07.760 --> 09:16.480
Those constraints, there are going to get solved if you just give a table for this kind of a

09:16.480 --> 09:23.760
code. It's going to basically fill in the spaces, which are signified here by underscore.

09:24.400 --> 09:30.160
And this is just a fully declarative way of doing things. What we did when we described

09:30.160 --> 09:33.760
the constraint that the present and the everything else is solved by the computer.

09:37.600 --> 09:43.440
Does this look familiar? We have a kind of a constraint here. The constraint on the sensor

09:43.440 --> 09:49.200
is that it outputs certain resolutions. The constraint on processing units is that it expects certain

09:49.280 --> 09:56.960
resolution, and it produces a certain different resolution. So maybe you can write it like this.

09:58.080 --> 10:05.680
A sensor takes a sensor defines a valid relationship of width, height, and format. And there's

10:05.680 --> 10:12.960
basically the same thing as we already looked at. And the processor creates a different relationship.

10:13.600 --> 10:20.720
It's a relationship between the input frame and output frames. And the relationship is that

10:20.720 --> 10:27.600
one of the input frames, the big one, is the exact same resolution with just a different format.

10:28.320 --> 10:36.720
And for the preview, it's half a resolution and still a different format. So if you create a little

10:36.720 --> 10:48.400
bit of a solver with those constraints, let me explain that. The solver gets a query which defines

10:48.400 --> 10:59.040
for a sensor that behaves like this. It takes some kind of input values. And then we have a processor

10:59.040 --> 11:05.360
that takes the same input values. And returns a frame that is 1,000 pixels width.

11:06.320 --> 11:11.440
What kind of values are there that we expect from the sensor? Basically, kind of a query. You could

11:11.440 --> 11:20.240
do the same thing with SQL, but it wouldn't be so nice. So it wouldn't in a production system look

11:20.240 --> 11:25.920
like that. There's actually, if you remember, there was an entire tree of different nodes in a

11:25.920 --> 11:31.360
real device. So this is a very simplified version of that. But if you traverse the entire tree,

11:31.440 --> 11:36.880
this is probably something that you can do in SQL very nicely. So if you have a solver like that,

11:37.760 --> 11:42.000
you can't support any arbitrary device with any arbitrary configuration.

11:43.680 --> 11:52.800
And that's why the kernel exposes this information easily to the user space so that the libraries

11:52.800 --> 12:02.800
can actually do this. And now it doesn't. So this is like used to me, I kind of knew that,

12:02.800 --> 12:07.200
but I realized that last week again, when I was trying to add support to this laptop so that I could

12:07.200 --> 12:15.440
make a presentation demo. So there's one problem. There are calls which actually, whenever you set

12:15.440 --> 12:22.480
an output or an input on a device and please use this resolution and then the device would tell you

12:22.480 --> 12:26.800
which resolution. Actually, no, everything depends on everything. If you set a resolution here,

12:26.800 --> 12:32.000
there are possible resolutions somewhere else changes. The format changes. You can't make sense

12:32.000 --> 12:39.120
out of it. But even not that. I wanted to go into detail on this. There is not even a consensus

12:39.120 --> 12:45.600
in the kernel spaces. Whether some resolutions should tell you which devices should tell you

12:45.600 --> 12:52.400
which resolution is the support. So like this has strong consequences. You can't write a driver

12:52.400 --> 12:58.640
in user space without knowing everything about your devices. So the driver might work because

12:58.640 --> 13:05.520
like, yeah, the devices have some commonalities, but the driver will not use the devices fully.

13:06.240 --> 13:13.040
And that also means that if you have a new design, a new set of devices, you must upgrade the user

13:13.040 --> 13:20.160
space software. It will otherwise not work perfectly. You might get like it because like I'm

13:20.160 --> 13:27.040
not going to get into details. It kind of kind of, to sort of work. But in the end, you basically

13:27.040 --> 13:32.240
have to write this kind of constraints that we already wrote before. You have to write those constraints

13:32.240 --> 13:37.920
for every device that you want to support in order to support it properly. So every library has to

13:37.920 --> 13:43.680
have something like that. But one of the principles that I want to follow would live off

13:43.680 --> 13:49.840
purest is to make things simple. And one of the things that are not simple is finding memory

13:49.840 --> 13:55.760
problems. Debugging them is especially not simple. So I am using grass for this library.

13:56.480 --> 14:01.920
And grass is this nice properly that's called borrowing. This is like the prototypical interface

14:02.000 --> 14:08.560
for taking frames. Let's have like a stream of frames coming from the camera and ask the

14:08.560 --> 14:13.680
stream for a next frame. This is like the basic loop for doing anything you want. And then you process

14:13.680 --> 14:20.560
the buffer, you take the next frame, the next frame, back and loop. So let's try to break this.

14:20.560 --> 14:28.480
Let's try to take a frame out of this without putting it back. And then try to take the next frame.

14:29.280 --> 14:34.480
This is actually clever. This is the reason that I'm using grass. If you try to take this frame away,

14:34.480 --> 14:40.480
without if you try to take a new frame, without putting the old frame back, you're going to get an

14:40.480 --> 14:48.560
error. Look at that. This example basically shows you, you take one frame over here.

14:49.120 --> 14:53.520
And then you're trying to get clever. You're trying to lose this memory here. You're trying to lose

14:53.600 --> 14:59.600
this buffer. What happened here? You're trying to lose this buffer and process it somewhere else.

14:59.600 --> 15:04.400
Like maybe you save this memory and you're trying to be sneaky, send it to another process or something.

15:05.440 --> 15:11.680
And then you're trying to take the next frame. But, but, oh gosh, what happened here?

15:15.520 --> 15:18.400
No! Come on.

15:23.680 --> 15:36.960
But, well, I swear, computers. Computers are mistake. I sincerely believe computers are mistake.

15:38.320 --> 15:42.800
So when you try to be clever and lose the first frame in order to get the next frame,

15:42.800 --> 15:49.360
last you're going to tell you, no, you are not going to do that because you still haven't returned

15:49.360 --> 15:54.240
the first frame. And, yeah, that's one of the reasons that I chose Rust, but there's another

15:54.240 --> 16:00.480
interface that is kind of better and better and not if you really want to go crazy. So, that is the

16:00.480 --> 16:10.000
past of the Live Obscura. In the future, I'm going to try to support more devices and make better

16:10.000 --> 16:13.920
documentation because the documentation is also kind of part of simplicity and user friendliness.

16:14.560 --> 16:19.520
And, yeah, the scripting engine that I showed you before the one in Prologue, I want to continue working

16:19.520 --> 16:27.200
at it. Also, I want to thank to give thanks to prototype fund, which are funding my plan to take

16:27.280 --> 16:31.200
over the world. So, thanks, questions.

16:31.200 --> 16:49.040
Do you feel like from what you're learning by being a little obscure? Like, it's actually

16:49.840 --> 16:58.240
a little bit better. Is it becoming a better library than the camera? I think it's like,

16:58.240 --> 17:03.920
by the physical that it's not supporting as much hardware is definitely better. I think

17:03.920 --> 17:10.320
I am trying to make it an experiment and figure out how things could be done to make it perfect,

17:11.040 --> 17:17.680
or like, basically an experiment means I can fail. So, without like, bothering other people.

17:17.680 --> 17:22.000
So, I'm just trying to go crazy and see maybe some of this craziness is actually really good.

17:34.640 --> 17:40.720
So, was it tested on the Live Obscura or other phone? Actually, this library wasn't tested.

17:40.720 --> 17:45.440
I started working on something else with Live Obscura and that works on the Live Obscura,

17:45.440 --> 17:51.840
that's what we shipped. But this library is like a not quite a continuation. This one is a

17:51.840 --> 17:55.840
a new project, compared to that. So, it has, I've been trying to make it work on the Live

17:55.840 --> 18:01.120
Revive for the past two weeks or something, and I haven't done it yet. Not finished.

18:01.200 --> 18:15.840
Okay, now the questions, let's move on from the internet. Will Live Obscura take any ISP colours?

18:16.480 --> 18:24.400
ISP colours? Yes, in its signal. Will Live Obscura take on any ISP colour stuff? I'm not quite sure

18:24.480 --> 18:32.640
what is meant by that. But there is like a library. ISP is image signal processor. So, I created

18:32.640 --> 18:39.440
library that is called crispy and it does GPU image processing. So, it kind of does some of the

18:39.440 --> 18:43.680
colour stuff and it does some of the format conversion stuff. So, maybe that answers a question.

18:45.200 --> 18:51.360
GPU work is hard. Actually, on the Live Revive there's a specific problem where it only supports

18:51.360 --> 18:58.480
OpenGLES20. And I think the last time this was mainstream was like iPhone 4 or something.

18:58.480 --> 19:01.920
It's really difficult to find any resources to work with it.

19:21.360 --> 19:24.960
Thanks for watching, and I'll see you in the next video.

