WEBVTT

00:00.000 --> 00:10.000
Okay, so welcome to my presentation, why I'm here?

00:10.000 --> 00:13.000
I'm here because I like to analyze massive data sets.

00:13.000 --> 00:15.000
I like to play with data.

00:15.000 --> 00:17.000
I like to experiment.

00:17.000 --> 00:21.000
So today I will show something from my experience.

00:21.000 --> 00:22.000
And what we will need?

00:22.000 --> 00:24.000
We will need a nice data set,

00:24.000 --> 00:27.000
some idea and the way to present it.

00:28.000 --> 00:30.000
There are plenty of data sets,

00:30.000 --> 00:32.000
plenty of open data sets,

00:32.000 --> 00:35.000
like you can download Wikipedia and analyze it.

00:35.000 --> 00:39.000
But we will be interested in just special data sets.

00:39.000 --> 00:41.000
And there are a few also open,

00:41.000 --> 00:44.000
like you can download open street map,

00:44.000 --> 00:47.000
pre-process it and visualize something like

00:47.000 --> 00:51.000
the number of traffic lights,

00:51.000 --> 00:54.000
in different locations in India,

00:54.000 --> 00:57.000
or in the Netherlands,

00:57.000 --> 00:59.000
and where are more traffic lights,

00:59.000 --> 01:03.000
or you can take data from crowdsource

01:03.000 --> 01:06.000
sensors about temperature, air quality.

01:06.000 --> 01:10.000
There are plenty of data sets like sensor community.

01:10.000 --> 01:16.000
But today we will first look at ADSB.

01:16.000 --> 01:18.000
Let me do a quick check.

01:18.000 --> 01:21.000
Does everyone know what is ADSB?

01:21.000 --> 01:23.000
Please raise your hand.

01:24.000 --> 01:28.000
I expected something closer to 100%.

01:28.000 --> 01:32.000
But anyway, ADS-B.

01:32.000 --> 01:36.000
Automatic dependent surveillance broadcast.

01:36.000 --> 01:40.000
Actually, this does not make it clearer.

01:40.000 --> 01:43.000
But actually, it is the data that is broadcast

01:43.000 --> 01:47.000
by transponders inside every airplane.

01:47.000 --> 01:49.000
And on the airplane,

01:49.000 --> 01:53.000
also inside helicopters, inside many drones.

01:53.000 --> 01:57.000
Many drones are also required to have transponders.

01:57.000 --> 02:00.000
And inside ground vehicles,

02:00.000 --> 02:03.000
like snow, blow machines,

02:03.000 --> 02:07.000
working on air tracks.

02:07.000 --> 02:12.000
Inside even ground stations.

02:12.000 --> 02:16.000
And this data is available to everyone.

02:16.000 --> 02:17.000
Unencrypted.

02:17.000 --> 02:20.000
And if you buy cheap radio receiver,

02:20.000 --> 02:25.000
you can get this data and analyze it.

02:25.000 --> 02:27.000
But actually, yeah,

02:27.000 --> 02:29.000
if you look at this data,

02:29.000 --> 02:31.000
it will look like flight radar 24,

02:31.000 --> 02:37.000
or many similar websites that present it in real time.

02:37.000 --> 02:41.000
And if you buy a radio station and collected,

02:41.000 --> 02:45.000
you will get the data from like maybe 100 kilometers around you.

02:45.000 --> 02:49.000
But there are plenty of services that are name it exchanges.

02:49.000 --> 02:51.000
ADS-B data exchanges.

02:51.000 --> 02:53.000
Like ADS-B.feel,

02:53.000 --> 02:55.000
is ADS-B.1.

02:55.000 --> 02:56.000
ADS-B.low.

02:56.000 --> 02:58.000
I don't know why it is name it.

02:58.000 --> 03:00.000
This way, nothing finds there.

03:00.000 --> 03:03.000
It's a great service with daily data sets that

03:03.000 --> 03:05.000
sets provided in public domain.

03:05.000 --> 03:07.000
So you can download all of them.

03:07.000 --> 03:09.000
And no one will ask you,

03:09.000 --> 03:13.000
why what you are going to do with that airplane's life.

03:13.000 --> 03:16.000
They provide real time data feed,

03:16.000 --> 03:18.000
not to the public,

03:18.000 --> 03:22.000
but they gave me a special permission to work with this data.

03:22.000 --> 03:24.000
For ADS-B exchange,

03:24.000 --> 03:26.000
that is commercial service,

03:26.000 --> 03:30.000
but they provide samples that you can analyze as well.

03:30.000 --> 03:33.000
Some of these exchanges will give you a real time feed

03:33.000 --> 03:38.000
if you provide your data from your radio station.

03:38.000 --> 03:41.000
Okay, so let's take a look at ADS-B.low.

03:41.000 --> 03:44.000
They publish all these data sets on GitHub,

03:44.000 --> 03:45.000
and every day,

03:45.000 --> 03:49.000
it's about 1.5 gigabytes of data.

03:49.000 --> 03:51.000
It is available for several years.

03:51.000 --> 03:53.000
So if you terabytes maybe like 5,

03:53.000 --> 03:55.000
10 terabytes,

03:55.000 --> 03:56.000
it's sizable,

03:56.000 --> 03:59.000
but it's available for a single machine,

03:59.000 --> 04:01.000
for a simple enthusiasm to take

04:01.000 --> 04:04.000
and analyze all of this data.

04:05.000 --> 04:07.000
I'll do the easiest part.

04:07.000 --> 04:10.000
It's probably the most boring part of my presentation,

04:10.000 --> 04:15.000
but the easiest part is I will load all of this into Clickhouse.

04:15.000 --> 04:17.000
Clickhouse, what is this?

04:17.000 --> 04:20.000
It's my favorite database.

04:20.000 --> 04:23.000
But just in case, if you don't know,

04:23.000 --> 04:25.000
it's an open source,

04:25.000 --> 04:27.000
analytics database management system,

04:27.000 --> 04:30.000
available under Apache 2.0 license.

04:30.000 --> 04:32.000
It is used by thousands of companies,

04:32.000 --> 04:33.000
including the largest ones,

04:33.000 --> 04:35.000
like every API company,

04:35.000 --> 04:37.000
oh, sorry, every AI company,

04:37.000 --> 04:38.000
every AI company,

04:38.000 --> 04:40.000
every UI company you hear,

04:40.000 --> 04:41.000
every day,

04:41.000 --> 04:43.000
every company you do not hear,

04:43.000 --> 04:46.000
often it's available since 2016,

04:46.000 --> 04:49.000
and developed since 2009.

04:49.000 --> 04:53.000
You can download it as a single binary distribution.

04:53.000 --> 04:55.000
You can use it on your laptop,

04:55.000 --> 04:57.000
whatever laptop.

04:57.000 --> 04:59.000
Linux, Mac, freebies,

04:59.000 --> 05:02.000
the ARM machine 686,

05:02.000 --> 05:05.000
even risk five boards,

05:05.000 --> 05:08.000
clickhouse will work there.

05:08.000 --> 05:13.000
Clickhouse is just a popular database,

05:13.000 --> 05:14.000
like Postgres,

05:14.000 --> 05:17.000
but for analytics.

05:17.000 --> 05:20.000
So let's take a look at the data.

05:20.000 --> 05:22.000
It is the result of

05:22.000 --> 05:24.000
producing of this binary

05:24.000 --> 05:28.000
code of a radio signal

05:28.000 --> 05:29.000
with the tool,

05:29.000 --> 05:31.000
name it the RHSB.

05:31.000 --> 05:33.000
And it generates JSON,

05:33.000 --> 05:35.000
and JSON looks like this.

05:35.000 --> 05:38.000
It contains metadata about particular

05:38.000 --> 05:39.000
airplane,

05:39.000 --> 05:40.000
like this,

05:40.000 --> 05:41.000
the type of airplane

05:41.000 --> 05:43.000
description,

05:43.000 --> 05:44.000
timestamp,

05:44.000 --> 05:47.000
and set of traces.

05:47.000 --> 05:49.000
Where this thing

05:49.000 --> 05:52.000
where at a particular moment of time?

05:52.000 --> 05:57.000
What was the hiding velocity

05:57.000 --> 05:59.000
pressure?

05:59.000 --> 06:01.000
And for hiding,

06:01.000 --> 06:03.000
there are different,

06:03.000 --> 06:04.000
and for velocity,

06:04.000 --> 06:06.000
there are different kinds of velocity,

06:06.000 --> 06:07.000
like through airspeed,

06:07.000 --> 06:09.000
or speed,

06:09.000 --> 06:11.000
relative to the ground,

06:11.000 --> 06:14.000
so plenty of data.

06:14.000 --> 06:16.000
And we'll do,

06:16.000 --> 06:17.000
like this.

06:17.000 --> 06:20.000
We'll just split this into data file,

06:20.000 --> 06:23.000
then we will join these files using

06:23.000 --> 06:26.000
Clickhouse local tool.

06:26.000 --> 06:28.000
What is Clickhouse local?

06:28.000 --> 06:30.000
It is a common line tool

06:30.000 --> 06:32.000
that lets you run SQL queries.

06:32.000 --> 06:34.000
On top of local files,

06:34.000 --> 06:36.000
or remote files,

06:36.000 --> 06:38.000
or external databases,

06:38.000 --> 06:41.000
or any kind of data sets.

06:41.000 --> 06:43.000
It's like a database,

06:43.000 --> 06:45.000
but inside the common line.

06:45.000 --> 06:47.000
So let's join it to denormalize

06:47.000 --> 06:50.000
and let's create a table.

06:50.000 --> 06:53.000
Here is the table schema,

06:53.000 --> 06:57.000
and there are a lot of,

06:57.000 --> 06:59.000
like usual stuff.

06:59.000 --> 07:00.000
Usually columns,

07:00.000 --> 07:02.000
like time, date,

07:02.000 --> 07:04.000
metadata,

07:04.000 --> 07:06.000
ground, speed, and so on.

07:06.000 --> 07:11.000
And there are just a few additional stuff.

07:11.000 --> 07:14.000
Additional stuff looks like this.

07:15.000 --> 07:17.000
This column, Mercator X,

07:17.000 --> 07:20.000
it is a generated column.

07:20.000 --> 07:22.000
You can see it like materialized,

07:22.000 --> 07:26.000
using some expression.

07:26.000 --> 07:29.000
And the expression involves some

07:29.000 --> 07:31.000
logarithm,

07:31.000 --> 07:32.000
tangent,

07:32.000 --> 07:33.000
pi,

07:33.000 --> 07:35.000
some arithmetic.

07:35.000 --> 07:37.000
And the question for you,

07:37.000 --> 07:40.000
what is it?

07:40.000 --> 07:41.000
What is this magic?

07:41.000 --> 07:43.000
What does it do?

07:44.000 --> 07:46.000
What?

07:46.000 --> 07:50.000
It's a conversion of latitude and longitude

07:50.000 --> 07:54.000
into web, Mercator projection,

07:54.000 --> 07:59.000
mapped into the range of you in 32 data type.

07:59.000 --> 08:04.000
So we will have numbers from zero to four billion something.

08:04.000 --> 08:06.000
And if you take a square of these numbers,

08:06.000 --> 08:09.000
it will represent web, Mercator,

08:09.000 --> 08:12.000
so you can just visualize it.

08:12.000 --> 08:14.000
And we created two indexes,

08:14.000 --> 08:16.000
minimum and maximum of this web,

08:16.000 --> 08:20.000
Mercator coordinates.

08:20.000 --> 08:21.000
And at the end,

08:21.000 --> 08:25.000
you can see some additional magic.

08:25.000 --> 08:28.000
And here is this additional magic.

08:28.000 --> 08:30.000
Order by,

08:30.000 --> 08:35.000
Morton and code of Mercator X and Mercator Y.

08:35.000 --> 08:36.000
So we take two,

08:36.000 --> 08:39.000
you insert the two numbers,

08:39.000 --> 08:42.000
and put them into a function,

08:42.000 --> 08:44.000
name it Morton and code.

08:44.000 --> 08:46.000
And another simple question,

08:46.000 --> 08:51.000
there should be at least one person in the audience,

08:51.000 --> 08:54.000
aware of what I'm going to do with this.

08:54.000 --> 08:57.000
Yeah, exactly.

08:57.000 --> 09:02.000
Can't know it's two numbers into one number.

09:02.000 --> 09:05.000
Using Morton and code,

09:05.000 --> 09:06.000
Morton curve.

09:06.000 --> 09:08.000
What is Morton?

09:08.000 --> 09:17.000
Let's imagine you have a square with two coordinates.

09:17.000 --> 09:23.000
And you create a single number by mixing this coordinates.

09:23.000 --> 09:27.000
So you split this plane into four quadrants.

09:27.000 --> 09:33.000
And numbers, this quadrant, like one, two, three, four, like zigzag.

09:33.000 --> 09:36.000
And inside each quadrant,

09:36.000 --> 09:39.000
you do the same split.

09:39.000 --> 09:41.000
And if you look at the bits of this number,

09:41.000 --> 09:45.000
it will be identical to taking the first bit of the first number,

09:45.000 --> 09:48.000
the first bit of the second number, the highest bit.

09:48.000 --> 09:51.000
Then the second bit of the first number,

09:51.000 --> 09:53.000
the second bit of the second number,

09:53.000 --> 09:56.000
and you like mix this bits together.

09:56.000 --> 09:59.000
And you get this space feeling curve.

09:59.000 --> 10:02.000
It's that I don't have a picture for this,

10:02.000 --> 10:05.000
but I hope you imagine this picture right now.

10:05.000 --> 10:10.000
Or you're thinking, I'm just talking some gibberish.

10:10.000 --> 10:12.000
But okay.

10:12.000 --> 10:16.000
No, it's worse.

10:16.000 --> 10:20.000
And it's worse than a Hilbert curve.

10:20.000 --> 10:24.000
It is like conceptually simpler.

10:24.000 --> 10:26.000
And when I created this service,

10:26.000 --> 10:28.000
we only had Morton curve support.

10:28.000 --> 10:32.000
Now we have both Morton and Hilbert, but it's a really good question.

10:33.000 --> 10:37.000
Hilbert curve provides continuous properties,

10:37.000 --> 10:40.000
and it's a little bit more locality.

10:40.000 --> 10:42.000
Okay, now I have a table,

10:42.000 --> 10:44.000
and I created this script.

10:44.000 --> 10:49.000
And after a day, it has loaded 95 billion records,

10:49.000 --> 10:53.000
around 3.1 terabytes of compressed data.

10:53.000 --> 10:55.000
And the question is,

10:55.000 --> 10:59.000
what to do with all this data?

10:59.000 --> 11:02.000
We need to do something beautiful, right?

11:02.000 --> 11:05.000
And I want to visualize it in the browser

11:05.000 --> 11:08.000
to aggregate and generate records.

11:08.000 --> 11:12.000
And let's take a look at the result first,

11:12.000 --> 11:15.000
and we will explain it later.

11:15.000 --> 11:18.000
So I created the website.

11:18.000 --> 11:19.000
Here is the website.

11:19.000 --> 11:21.000
I hope my mobile connection will work,

11:21.000 --> 11:22.000
yes it works.

11:22.000 --> 11:27.000
And it represents all the tracks of airplanes.

11:27.000 --> 11:30.000
The density of these tracks visualized,

11:30.000 --> 11:33.000
and I can do any sort of stuff.

11:33.000 --> 11:39.000
I can like visualize them by airline.

11:39.000 --> 11:46.000
Interesting if my mobile internet will work with all these pictures.

11:46.000 --> 11:48.000
And to the top, as the top,

11:48.000 --> 11:51.000
you can see that we have different data sets,

11:51.000 --> 11:55.000
like planes, places, doors, photos, and you.

11:55.000 --> 11:58.000
And let me go.

11:58.000 --> 11:59.000
What?

11:59.000 --> 12:01.000
Yeah, there is the one mask.

12:01.000 --> 12:04.000
Let's load Elon Musk.

12:04.000 --> 12:06.000
Interesting why do we have time out.

12:06.000 --> 12:14.000
But I hope with Elon Musk, we will get to this beta.

12:14.000 --> 12:17.000
Where is Elon Musk?

12:17.000 --> 12:20.000
Where is his airplane?

12:20.000 --> 12:25.000
He's trying different things.

12:25.000 --> 12:30.000
And actually, I do have mobile connection.

12:30.000 --> 12:33.000
And there are some tracks from Elon Musk.

12:33.000 --> 12:49.000
And he did some stuff in Italy.

12:49.000 --> 12:52.000
But much more in London.

12:52.000 --> 12:56.000
And interesting, which airports he used in London.

12:56.000 --> 12:58.000
I don't know.

12:58.000 --> 13:02.000
But also, yeah, I explored this data set,

13:02.000 --> 13:06.000
so I know where it is.

13:06.000 --> 13:09.000
Elon Musk.

13:09.000 --> 13:15.000
And about Wi-Fi, the interesting thing,

13:15.000 --> 13:18.000
if we have time for changing Wi-Fi.

13:18.000 --> 13:24.000
But let me say that Elon Musk also visited Florida quite frequently.

13:24.000 --> 13:30.000
Found each airport and taxes for some reason.

13:30.000 --> 13:31.000
OK.

13:37.000 --> 13:39.000
About Wi-Fi.

13:39.000 --> 13:44.000
Let's try to keep it like this.

13:45.000 --> 13:50.000
And also, in addition to airplanes, we have a data set of doors.

13:50.000 --> 13:53.000
And if we look at the just density of doors,

13:53.000 --> 13:54.000
it's uninteresting.

13:54.000 --> 13:56.000
There are some doors, doors, observations.

13:56.000 --> 14:00.000
By the way, the data is from E-Bird project,

14:00.000 --> 14:04.000
which is a crowdsource project for doors watchers.

14:04.000 --> 14:05.000
I'm not a door watcher.

14:05.000 --> 14:10.000
I almost became a door watcher while doing this project.

14:10.000 --> 14:14.000
But I'm not.

14:14.000 --> 14:20.000
And we can visualize, by, for example, the family,

14:20.000 --> 14:23.000
the family of doors.

14:23.000 --> 14:27.000
And if we use this rectangular selection tool,

14:27.000 --> 14:30.000
we can do something, something really interesting.

14:30.000 --> 14:37.000
Let me go to places like closer to Antarctica.

14:37.000 --> 14:42.000
And on Antarctica, closer to Antarctica, we have some tracks.

14:42.000 --> 14:45.000
That look like airplane tracks.

14:45.000 --> 14:49.000
But now airplanes play there.

14:49.000 --> 14:56.000
These are doors flying between different interesting islands.

14:56.000 --> 15:05.000
And if we use this rectangular selection, we can check particular types of doors.

15:05.000 --> 15:09.000
Like this penguin, is it penguin?

15:09.000 --> 15:11.000
No.

15:11.000 --> 15:14.000
The pictures are from Wikipedia.

15:14.000 --> 15:19.000
So some pictures are automatically selected.

15:19.000 --> 15:22.000
Giant patrol.

15:22.000 --> 15:28.000
But also interesting, if we click on some of these,

15:28.000 --> 15:31.000
we will have the same map.

15:31.000 --> 15:35.000
But filter it by this particular door.

15:35.000 --> 15:40.000
And we can look at the distribution across all over the world.

15:40.000 --> 15:44.000
Like in New Zealand, there is also such a door.

15:44.000 --> 15:47.000
And more over, we can click here.

15:47.000 --> 15:50.000
And here we will see a SQL query.

15:50.000 --> 15:57.000
And this SQL query is running basically in real time across all of this data set.

15:57.000 --> 16:01.000
And you can change it, you can do your own visualizations.

16:01.000 --> 16:05.000
But also, interesting, we have a data set of photos.

16:05.000 --> 16:13.000
From flicker, flicker provides many photos with permissive licenses.

16:13.000 --> 16:16.000
At least some of them.

16:16.000 --> 16:22.000
And let's take a look at a few photos from Belgium.

16:22.000 --> 16:27.000
Where is Belgium, like here is Brussels?

16:27.000 --> 16:33.000
Yeah, here it is.

16:33.000 --> 16:36.000
And again, the density is not so interesting.

16:36.000 --> 16:42.000
We can look at the coloring based on the hash of text.

16:42.000 --> 16:49.000
But it is also not so interesting as if we click on this button.

16:50.000 --> 16:55.000
And this button will do something unusual.

16:55.000 --> 16:57.000
It will take every tile.

16:57.000 --> 17:01.000
And for every tile, it will do a SQL query in Clickhouse.

17:01.000 --> 17:07.000
And the SQL query will select the best photo from this particular location.

17:07.000 --> 17:10.000
That means what is the best photo?

17:10.000 --> 17:15.000
There are about 20 of interesting stuff.

17:15.000 --> 17:21.000
I don't know, it's public data based on Clicker.

17:21.000 --> 17:29.000
I don't know, maybe we should just do something like this.

17:29.000 --> 17:38.000
And this selection to it should provide us a report about this particular photos.

17:38.000 --> 17:43.000
Okay, and there is also one interesting data set name at you.

17:43.000 --> 17:46.000
I will not show it to you.

17:46.000 --> 17:51.000
It will be at the end of the presentation.

17:51.000 --> 17:53.000
Not now, let's go back.

17:53.000 --> 18:00.000
And let me explain how all of this, how is all of this implemented.

18:00.000 --> 18:04.000
I also have another data set, David Ships.

18:04.000 --> 18:10.000
For Ships, there is a similar original protocol name at AIS.

18:10.000 --> 18:13.000
Automatic identification system.

18:13.000 --> 18:21.000
It is used for Ships, so they don't have less chance to collide with each other and so on.

18:21.000 --> 18:23.000
Either to navigate.

18:23.000 --> 18:26.000
And again, you can buy a radio station.

18:26.000 --> 18:29.000
There is a data exchanges.

18:29.000 --> 18:31.000
And we did exactly this.

18:31.000 --> 18:34.000
We bought a radio station installed.

18:34.000 --> 18:37.000
It's done not by me, but my colleague.

18:37.000 --> 18:41.000
And we feed this data into AIS Hub.

18:41.000 --> 18:48.000
And they provide us all the real time data from all over the world.

18:48.000 --> 18:52.000
We did not have an open source tool to process this data.

18:52.000 --> 18:57.000
So we end up writing a small script in Rust.

18:57.000 --> 18:59.000
Why in Rust?

18:59.000 --> 19:05.000
Because if you know some language where you are professional, you use this language for real stuff.

19:05.000 --> 19:12.000
If you want to write some draw a script or just have a pan you write in Rust.

19:12.000 --> 19:19.000
Now this basically just single library, that is already available as a crate.

19:19.000 --> 19:22.000
And the result looks like this.

19:22.000 --> 19:24.000
It's pretty cool.

19:24.000 --> 19:30.000
So this is a near router, I'm certain.

19:30.000 --> 19:34.000
An open.

19:34.000 --> 19:38.000
And interesting, what is this?

19:38.000 --> 19:44.000
Maybe you can guess, maybe you can just read the map.

19:44.000 --> 19:46.000
What?

19:46.000 --> 19:48.000
No, it's not a router.

19:48.000 --> 19:52.000
A router that was on the previous slide.

19:52.000 --> 19:57.000
This is something like, yes, really cool.

19:57.000 --> 20:03.000
Unfortunately, I like these small circles.

20:03.000 --> 20:05.000
And these small circles.

20:05.000 --> 20:07.000
And this is some port.

20:07.000 --> 20:09.000
And it is a pretty big port.

20:09.000 --> 20:11.000
Maybe one of the biggest.

20:11.000 --> 20:13.000
Maybe anyone can guess?

20:13.000 --> 20:17.000
Yeah, it's Singapore.

20:17.000 --> 20:18.000
Okay.

20:18.000 --> 20:21.000
So what is the implementation of all of this?

20:21.000 --> 20:23.000
And what makes it possible?

20:23.000 --> 20:26.000
Let me just quickly introduce it.

20:26.000 --> 20:30.000
So the first is built in Rust API.

20:30.000 --> 20:32.000
Clickhouse, provide an HTTP API.

20:32.000 --> 20:34.000
It is included.

20:34.000 --> 20:36.000
You can just do a fetch from JavaScript.

20:36.000 --> 20:38.000
And that's it.

20:38.000 --> 20:40.000
It provides parameterized queries.

20:40.000 --> 20:45.000
And not like in other databases when you just write a question mark.

20:45.000 --> 20:47.000
It does it slightly differently.

20:47.000 --> 20:51.000
And the difference is all of this is type save.

20:51.000 --> 20:54.000
This is type save parameter.

20:55.000 --> 21:01.000
And you can create parameterized use as in this example.

21:01.000 --> 21:03.000
It also provides materialized use.

21:03.000 --> 21:08.000
So you don't have to always query all the data.

21:08.000 --> 21:17.000
You can create tables that will represent just aggregations or samples from the original dataset.

21:17.000 --> 21:20.000
And I use this samples for my service.

21:20.000 --> 21:21.000
It creates materialized use.

21:21.000 --> 21:28.000
That will sample 10% and sample 1% of the data.

21:28.000 --> 21:31.000
It provides access control restrictions.

21:31.000 --> 21:37.000
Including crawl, base access control, roll, roll, I will secure this and so on.

21:37.000 --> 21:38.000
You create users.

21:38.000 --> 21:40.000
You restrict them.

21:40.000 --> 21:46.000
Nothing, nothing particularly different but it is just a match for data base.

21:46.000 --> 21:50.000
You can configure all of this stuff.

21:50.000 --> 21:52.000
You can also configure quotas.

21:52.000 --> 21:58.000
Like how many requests from a single IP address can be done in a period of time.

21:58.000 --> 22:01.000
From a single IP network.

22:01.000 --> 22:05.000
So the service will at least not be easily overloaded.

22:05.000 --> 22:10.000
It will be protected from simple dos attack spot.

22:10.000 --> 22:15.000
Not necessarily from D does but anyway.

22:16.000 --> 22:19.000
Functional indexes.

22:19.000 --> 22:22.000
Including sorting keys by function.

22:22.000 --> 22:25.000
Materialized columns.

22:25.000 --> 22:30.000
Friendly SQL language which allows you to do a lot of interesting stuff.

22:30.000 --> 22:38.000
Like unrestricted use of alias parameterized table identifiers.

22:38.000 --> 22:44.000
Or stuff like order by with fill.

22:44.000 --> 22:49.000
Which lets you not just order by some condition.

22:49.000 --> 22:55.000
But also make it continuous.

22:55.000 --> 22:59.000
It requires a support for all data formats.

22:59.000 --> 23:00.000
You can imagine like this.

23:00.000 --> 23:04.000
We see us with text based formats binary formats.

23:04.000 --> 23:07.000
Protobuff Apache Parked.

23:07.000 --> 23:11.000
Arrow native.

23:11.000 --> 23:16.000
Some structure data like JSON.

23:16.000 --> 23:20.000
Not just JSON like message pack as well.

23:20.000 --> 23:23.000
So a lot of stuff built in.

23:23.000 --> 23:28.000
Compression out of the box and not on the internal compression inside database.

23:28.000 --> 23:32.000
But also the ability to process compressive data sets.

23:32.000 --> 23:34.000
On the fly.

23:34.000 --> 23:39.000
Without any additional tools.

23:39.000 --> 23:42.000
It supports objects storage.

23:42.000 --> 23:45.000
If you use it with the cloud like AWSS3.

23:45.000 --> 23:46.000
GCS.

23:46.000 --> 23:50.000
Azure hybrid storage is also possible.

23:50.000 --> 23:53.000
You can use external tables with data lakes.

23:53.000 --> 23:54.000
Iceberg.

23:54.000 --> 23:57.000
Delta lake and so on.

23:57.000 --> 24:01.000
Aquarius are parallel parallelized across CPU course.

24:01.000 --> 24:05.000
And distributed across money machines.

24:05.000 --> 24:09.000
You can create a cluster with multiple starts and

24:09.000 --> 24:11.000
Treblecas.

24:11.000 --> 24:17.000
It has query result cache and call so quite useful for this project.

24:17.000 --> 24:20.000
And there are money modes.

24:20.000 --> 24:22.000
Money modes of operation.

24:22.000 --> 24:24.000
You download a single binary.

24:24.000 --> 24:26.000
And you can run it as a server.

24:26.000 --> 24:29.000
As a common plane tool.

24:29.000 --> 24:36.000
And there is also a Python model for a clickhouse inside your scripts.

24:36.000 --> 24:44.000
It can even replace pandas with pandas compatible API.

24:44.000 --> 24:47.000
And this is specifically useful for data pre-processing.

24:47.000 --> 24:51.000
You just install a clickhouse locally to CLI.

24:51.000 --> 24:54.000
And you pipe something like JSON in.

24:54.000 --> 25:01.000
CSV out or park it out with any types of queries.

25:01.000 --> 25:05.000
So in my opinion, my honest opinion.

25:05.000 --> 25:11.000
Clickhouse is the best database for analytic applications.

25:11.000 --> 25:14.000
It is fast, scalable, resource efficient.

25:14.000 --> 25:15.000
Easy to use.

25:15.000 --> 25:18.000
It is pleasant to work with.

25:18.000 --> 25:20.000
Robust reliable.

25:20.000 --> 25:27.000
And let me say that friends and you are my friends.

25:27.000 --> 25:33.000
Friends, tell friends to use clickhouse.

25:33.000 --> 25:34.000
OK.

25:34.000 --> 25:38.000
And before we will continue with that data set,

25:38.000 --> 25:44.000
I would also like to invite you to the dinner that we organized today

25:44.000 --> 25:50.000
in just like two hours.

25:50.000 --> 25:52.000
So you are invited.

25:52.000 --> 26:03.000
Now let's go to that service and try to find out something interesting.

26:03.000 --> 26:07.000
So do we continue to look at the photos?

26:07.000 --> 26:10.000
No, no.

26:10.000 --> 26:18.000
We have one interesting data set, name it you.

26:18.000 --> 26:25.000
And the question is, what does it show?

26:25.000 --> 26:29.000
But how do I know your location?

26:29.000 --> 26:36.000
How can I obtain your location legally?

26:36.000 --> 26:39.000
Not a data from what?

26:39.000 --> 26:44.000
No, photos is another data set.

26:44.000 --> 26:48.000
And yes, this data set about you.

26:48.000 --> 26:50.000
But how exactly?

26:50.000 --> 26:54.000
What does it show?

26:54.000 --> 27:05.000
It looks quite similar to many previous data sets, but it is kind of special.

27:05.000 --> 27:08.000
Sorry?

27:08.000 --> 27:11.000
How do you translate this data?

27:11.000 --> 27:13.000
It's my question.

27:13.000 --> 27:15.000
Sorry, I would like it to guess.

27:15.000 --> 27:17.000
Please.

27:17.000 --> 27:23.000
Not quite.

27:23.000 --> 27:28.000
At least there are no so many antennas.

27:28.000 --> 27:34.000
Even on the most popular ideas, the exchanges there are just thousands of antennas.

27:34.000 --> 27:38.000
Here we have many more dots.

27:38.000 --> 27:42.000
Any more hypothesis?

27:42.000 --> 27:44.000
Yep, please.

27:44.000 --> 27:50.000
Copy and solve the by IP location for post project.

27:50.000 --> 27:52.000
Not quite.

27:52.000 --> 27:54.000
But it's a good hypothesis.

27:54.000 --> 28:00.000
If I collect the IP and IP geolocation, probably I should not store it for a long time.

28:00.000 --> 28:03.000
Yep, it's more hypothesis.

28:03.000 --> 28:09.000
But if you're not actually store location, but it's just light pollution.

28:09.000 --> 28:13.000
The cities are here, so people are there.

28:13.000 --> 28:15.000
Yeah, that's interesting.

28:15.000 --> 28:17.000
But it is not light pollution.

28:17.000 --> 28:19.000
And let me explain why.

28:19.000 --> 28:24.000
Why isn't it look like light pollution?

28:24.000 --> 28:29.000
Let me zoom out.

28:29.000 --> 28:33.000
For example, what is?

28:33.000 --> 28:39.000
What is here?

28:39.000 --> 28:49.000
Any more hypothesis?

28:49.000 --> 28:51.000
What?

28:51.000 --> 28:53.000
GPS art?

28:53.000 --> 28:55.000
GPS art?

28:55.000 --> 28:56.000
Not quite.

28:56.000 --> 29:03.000
Actually, it is not related to GPS art at all.

29:03.000 --> 29:14.000
Let me zoom in to some locations where there is almost nothing.

29:14.000 --> 29:20.000
Should I change my file?

29:20.000 --> 29:23.000
Because I don't like it doesn't look good.

29:23.000 --> 29:27.000
Because it's a good file for you.

29:27.000 --> 29:31.000
Does it require any else?

29:31.000 --> 29:33.000
Does it require any else?

29:33.000 --> 29:39.000
Okay, I hope it will work.

29:39.000 --> 29:47.000
I hope.

29:47.000 --> 29:55.000
I feel it was the Wi-Fi works, no?

29:55.000 --> 29:58.000
Was them dwells?

29:58.000 --> 30:08.000
Actually, I trust my little bit more.

30:08.000 --> 30:16.000
So let's zoom in to a certain location where there is almost nothing.

30:16.000 --> 30:23.000
Well, you still have a few minutes to guess.

30:23.000 --> 30:26.000
At least try.

30:26.000 --> 30:35.000
Some weird hypothesis.

30:35.000 --> 30:38.000
Yeah, yes.

30:38.000 --> 30:44.000
Well, let's make people guess first.

30:44.000 --> 30:57.000
Because I really want to make sure people have a chance to guess.

30:57.000 --> 31:00.000
Can you guess?

31:00.000 --> 31:02.000
It's no people looking at this website, right?

31:02.000 --> 31:03.000
We've said it's not.

31:03.000 --> 31:05.000
Exactly.

31:05.000 --> 31:12.000
This website also records where exactly people look at the map.

31:13.000 --> 31:19.000
Yeah, and this place is as a density of points.

31:19.000 --> 31:21.000
This is basically the whole idea.

31:21.000 --> 31:24.000
And now let's go to your questions.

31:24.000 --> 31:26.000
Yeah, questions.

31:36.000 --> 31:39.000
Questions about this website?

31:39.000 --> 31:43.000
It's not mental at all when a particle or a location or something.

31:43.000 --> 31:48.000
I think I'm happy to.

31:48.000 --> 31:51.000
Thanks, this is really cool.

31:51.000 --> 31:57.000
So for you, it looks like it's more like an experiment, right?

31:57.000 --> 32:00.000
It's not your main focus of work.

32:00.000 --> 32:02.000
It still looks very impressive.

32:03.000 --> 32:14.000
What you've been interested in building some kind of tool, which would kind of make it easier for people to have some data visualized using this approach.

32:14.000 --> 32:17.000
But like, supply and their data.

32:17.000 --> 32:19.000
Yes, a really good question.

32:19.000 --> 32:21.000
Yeah, this is basically a hobby project.

32:21.000 --> 32:23.000
It is also open source.

32:23.000 --> 32:27.000
So we can press on this link about.

32:27.000 --> 32:30.000
And we will see a GitHub repository.

32:30.000 --> 32:34.000
I don't have any particular expectations about this project.

32:34.000 --> 32:38.000
But it is fairly generic.

32:38.000 --> 32:40.000
It is also quite simple.

32:40.000 --> 32:56.000
So all the source is basically a single HTML page and a few JavaScript files.

32:56.000 --> 33:00.000
A few and one source scripts.

33:00.000 --> 33:02.000
Basically three files.

33:02.000 --> 33:06.000
Let me solve the configuration.

33:06.000 --> 33:09.000
There is conflict.js.

33:09.000 --> 33:16.000
And it basically describes the data set and everything we sort of visualize.

33:16.000 --> 33:19.000
Looks like like this.

33:19.000 --> 33:21.000
We have a license.

33:21.000 --> 33:23.000
A set of you make it bigger.

33:23.000 --> 33:24.000
Sorry?

33:24.000 --> 33:25.000
You make it bigger.

33:25.000 --> 33:31.000
So this is a description of the planes data set.

33:31.000 --> 33:39.000
The license set of endpoints to the database, HTTP endpoints.

33:39.000 --> 33:41.000
Data levels of detail.

33:41.000 --> 33:43.000
Here are three levels of detail.

33:43.000 --> 33:45.000
But it could be just one.

33:45.000 --> 33:49.000
And the template query.

33:49.000 --> 33:53.000
Here it is.

33:53.000 --> 33:56.000
It's a template query for the record.

33:56.000 --> 34:00.000
And the description of a separate set of SQL queries.

34:00.000 --> 34:07.000
And if we look at other data sets like the data set name it you.

34:07.000 --> 34:09.000
Here it is.

34:09.000 --> 34:12.000
Instead of the license, we just use this website.

34:12.000 --> 34:14.000
A single data source.

34:14.000 --> 34:16.000
A single level of detail.

34:16.000 --> 34:20.000
A parameterized query for the report.

34:21.000 --> 34:26.000
And the parameterized query for the styles.

34:26.000 --> 34:34.000
The query will return RGB, RGBA binary data.

34:34.000 --> 34:38.000
Which is put on kind of HTML kind of.

34:38.000 --> 34:45.000
And this is basically just at the order data set set.

34:45.000 --> 34:48.000
It's done.

34:48.000 --> 34:50.000
I think we have time for questions.

34:50.000 --> 34:52.000
Thank you.

34:52.000 --> 34:53.000
Thank you.

