WEBVTT

00:00.000 --> 00:12.640
Okay everyone, sit down, have a look at the next session, so I'm delighted to introduce

00:12.640 --> 00:18.160
have you here, Jeremy, who actually picked that up because this room exists because of

00:18.160 --> 00:21.640
the great work that this guy did, so I'm glad we're brought to have you here.

00:21.640 --> 00:32.960
We have got accelerating, question the lessons from 6x query performance groups, so I

00:32.960 --> 00:33.960
better to give you guys.

00:33.960 --> 00:34.960
Who?

00:34.960 --> 00:40.800
Yeah, thank you so much, actually we are finalizing the room, not me, but yeah, it was my idea,

00:40.800 --> 00:42.840
so yeah, that might be there, there's him.

00:42.840 --> 00:46.600
Okay, you know, in every team that is one doing the board and the rest, I'm the rest.

00:47.480 --> 00:53.440
Next year we'll see, anyway, so we are both part of the quest to be a project, another

00:53.440 --> 00:57.840
work on a book, so I don't really contribute much to the code, so in this talk I

00:57.840 --> 01:02.440
want to do an intro the first seven minutes, I'm going to be speaking about you know,

01:02.440 --> 01:07.800
the performance, what we did, and then Jeremy, who is actually occurring in the company,

01:07.800 --> 01:13.920
he will show you what we did, the changes in code and so on, to get the optimizations.

01:14.000 --> 01:19.680
By the way, we are open source database, when I say, when I say we are that six times faster

01:19.680 --> 01:26.400
today that we were at last for them, since times is an average, some queries today are

01:26.400 --> 01:31.920
40, 50 times faster than before, so not pretty much the same, but yeah, on average we see

01:31.920 --> 01:34.480
many queries are about six times faster.

01:34.480 --> 01:38.800
So first of all, we are open source database, of course, we are here as for them, a

01:38.800 --> 01:44.800
party to zero license, we have about 150 contributors, if you are into databases and you

01:44.800 --> 01:51.600
want to contribute, you will really welcome your helping the project, we are all about performance,

01:51.600 --> 01:58.240
because it is used for time series, that means it's often used for real time, for example,

01:58.240 --> 02:05.680
tracking market data and putting a bus board or doing payments and analytics when you are

02:05.760 --> 02:12.240
taking contact with a tracking and energy data, since that you need to have data pretty fast,

02:12.240 --> 02:17.440
we optimize very much for ingestion, on a single instance you can ingest four or five million

02:17.440 --> 02:24.080
records per second, so we try to be quite fast for ingestion, of course we use combinary storage

02:24.080 --> 02:29.680
and partition and everything by time and so on and so forth, the code is mostly written in Java

02:30.640 --> 02:36.400
with some C++ and lately also some Rust, and before you ask me yes, we have to do

02:36.400 --> 02:41.200
lot of things to make Java be that performant, and one of the things we have to do was to write

02:41.200 --> 02:46.560
our own version of the standard library, so we can have much more control about memory management,

02:46.560 --> 02:52.880
we don't advocate or use the garbage collector on the code path, we try to parallelise and

02:52.880 --> 02:59.680
betterise every operation, we wrote our own just in time compiler for SQL queries, and when it

02:59.680 --> 03:04.880
comes to SQL which is how you query QuasDB, since we are time series database, we have some

03:04.880 --> 03:12.320
extensions in SQL to make working with time a bit nicer, you might not be seeing much here in this

03:12.320 --> 03:16.160
light, but that's fine, besides there are some sample queries, I want you to use a station of

03:16.160 --> 03:22.560
this specific SQL we have, for example here we are doing account, but we are sampling in what we

03:23.520 --> 03:29.200
can sample by any like interval you want, so it's like a group by just sampling by time,

03:29.200 --> 03:34.880
or here is another query we are getting a sum of some value for its different value of the

03:34.880 --> 03:41.040
symbol and the site columns, and we are sampling in one of the intervals, but if at any particular

03:41.040 --> 03:46.400
minutes we don't have any value, we are going to interpolate, we are going to be getting rows with

03:46.400 --> 03:50.400
null values, it could be with a previous value, could be with a linear interpolation, different

03:50.480 --> 03:56.000
values for that, we have also some as of join, which is a specific type of join, and we do have

03:56.000 --> 04:01.280
two tables, and for its record of the left table, we are going to find the closed setting time

04:01.280 --> 04:06.960
from the right table, so maybe a table is getting data every few seconds, and the other is getting

04:06.960 --> 04:12.560
two records per hour, as of join will match the closer in time from one and the other, so that's

04:12.560 --> 04:18.240
kind of like the extensions we have in QuasDB, and we have tens of thousands of users which are

04:18.240 --> 04:23.680
very happy with the performance, for this type of queries, when you do queries, which are about time,

04:23.680 --> 04:32.240
we are very performant, but we notice that users very often, even if the main use case is about

04:32.240 --> 04:38.080
time serious, they often want to do queries in which it is not a time component, they don't want

04:38.080 --> 04:43.600
to down sample, they don't want to query a chunk of time, they just want to do a plane of group by,

04:43.760 --> 04:49.680
have everything, group by whatever, without any time filters, without any aggregations, and for those queries,

04:50.080 --> 04:54.560
we were not very fast, we were not bad, I mean we were still faster than a conversational database,

04:54.880 --> 04:59.920
but we were not fast for the queries that were not time serious, and we wanted to be fast also for those

04:59.920 --> 05:05.440
queries, because even if our focus is on time serious and we optimize everything for the use case,

05:05.440 --> 05:11.120
we also want to allow people to be fast in the use cases, so we decided to start having improvements

05:11.200 --> 05:16.800
to the code to make it perform better also for other queries, and before we wanted to

05:16.800 --> 05:21.520
made of changes, we wanted to have a baseline, so like how do you know if you are getting faster,

05:21.520 --> 05:26.720
if you don't know how fast you are, so basically we took a look at some benchmark in the industry

05:26.720 --> 05:32.800
and it's like, oh we want to see benchmarks about not a serious, but about other type of analytics,

05:32.800 --> 05:37.600
and in the end we decided to use clickbench, clickbench is a benchmark by click house,

05:38.240 --> 05:41.760
click house is a different database, they actually have a session here in this room today,

05:41.760 --> 05:47.360
and they are very much about generally, it's an awesome database for genetic analytics,

05:47.360 --> 05:53.120
and they have this benchmark, the clickbench benchmark, which was released three years and a half ago,

05:53.920 --> 05:58.000
and it is very nice for us, because most of the queries on the benchmark

05:58.000 --> 06:03.440
are not having to do with time, they are just group buys and sample buys and lot of aggregations,

06:03.440 --> 06:07.440
a lot of conditions, but not in a lot of time, which is pretty good for us, because that is actually

06:07.680 --> 06:14.640
what we wanted to improve, the rate asset on clickbench is about 80 gigabyte of

06:14.640 --> 06:21.280
uncompressed data, it has almost 100 million rows of 100 columns each, so it's not a super big

06:21.280 --> 06:26.000
data set, but it's good enough for testing and for benchmark in this type of queries, and this

06:26.000 --> 06:30.880
type of queries that you have on that benchmark, there are 40, 23 different queries, some of them

06:30.880 --> 06:37.760
very simple, like select whatever from the table, or then they are starting adding filters

06:37.760 --> 06:43.600
in a quality, a quality contesting, then we start getting more interesting things like

06:43.600 --> 06:50.800
order buy the sending, which is challenging, like on a database, or here we have a case combined

06:50.800 --> 06:56.080
with group buys and with conditions, and with limits, and with order buy, and then you have

06:56.080 --> 07:00.320
some crazy queries with a lot of aggregations, so this was pretty good for us, because it's the

07:00.320 --> 07:05.280
kind of thing we wanted to improve, and the first thing we wanted to run the benchmark

07:05.280 --> 07:09.680
three years and a half ago, we wanted to appoint it, and then not really, but

07:09.680 --> 07:17.360
because it is down here somewhere, we were about 33 times as low as the top performers,

07:18.800 --> 07:25.520
it's not, I mean it hurts, but you know we wanted to improve, on time series, we are very fast,

07:25.600 --> 07:32.960
but in other queries, a year later, we were already at the top, no we are not in the top,

07:33.680 --> 07:41.840
and we are here, we are here, we are, oh 15, only 15 times as low as on the top performers,

07:41.840 --> 07:48.400
I mean 15 times as low as is but, okay, but we were 33 times as low as a year, so it's like,

07:48.400 --> 07:53.440
this is getting somewhere, the things we are doing, they are making their integral quality faster,

07:53.520 --> 07:58.960
so you know, you said, we are happy, we are happy, it's like pretty cool, one year later,

07:58.960 --> 08:05.360
September, but here, hey, I don't need to scroll, we are right here, we are only

08:07.360 --> 08:14.800
3.1 times as low as that the top guys, actually click house, which is here,

08:14.800 --> 08:17.680
everything on this benchmark, of course, is optimized for click house, because it doesn't have

08:18.640 --> 08:23.360
is to 15, and actually the top one is the one is one, it is one point five, why,

08:23.360 --> 08:29.200
because in this benchmark, what they do is they take the best run of each query, if you have

08:29.200 --> 08:34.720
like the ideal database, that will perform faster in all the queries, you will have a one,

08:34.720 --> 08:39.920
but the database at the top here, it performed better than the others, only in 15 queries,

08:39.920 --> 08:44.640
in other queries, they are all the others that are faster, so the ideal database will be one,

08:44.720 --> 08:50.000
but the first one in the ranking is actually one point five, and we are here 3.10,

08:50.000 --> 08:53.920
it's not too bad, and actually last week, within that release, it has nothing to do with

08:53.920 --> 09:00.800
performance, but still, we got a bit faster than before, and now we are 3.0, and if you notice,

09:00.800 --> 09:06.160
I'm finishing already, if you notice here, we have like mix hardware, metal machine,

09:06.160 --> 09:12.480
for X machines, blah blah blah, if you, only compare with the same instance, we are actually

09:12.560 --> 09:16.160
still like the other people are faster than us for identical queries, because we are about time

09:16.160 --> 09:24.400
serious, but we are already only, where we are, so we are already only 2.6 times in this benchmark,

09:24.400 --> 09:29.680
the total one is 1.3, so you know, it was about improvement from the 3 years ago, that's kind of

09:29.680 --> 09:35.280
the ideal, so how we got here, that's the talk, how we made this happen, so well, it's took

09:35.280 --> 09:39.920
over 3 years of time, of course we've been doing other things, we haven't been just

09:40.080 --> 09:44.560
focused, obsessing about benchmarking, we've been developing, that's how it's replication,

09:44.560 --> 09:50.560
what a hitbox, a park, a JSON integration, a lot of different things, adding window functions

09:50.560 --> 09:54.960
we haven't had 3 years ago, so we've been doing other things, many of the contributions have

09:54.960 --> 09:59.760
come from the community, most from the core team, we tried many things, and we fell at some of

09:59.760 --> 10:04.400
these things, and now I'm going to pass it together with, sorry, took the money, I have this one,

10:04.400 --> 10:09.760
so she'll be enough, so however one, so we have about 20 minutes to cover these 80 patches,

10:09.760 --> 10:16.480
have we ever been talking about in 2 years of progress, so let's go, so I cherry picked just a few

10:16.480 --> 10:23.360
of them and it will start with something simple, we were missing some functions for sometimes,

10:23.360 --> 10:30.880
so the example of a simple optimization with a big impact is just implementing a new function

10:30.880 --> 10:37.680
of count distinct for integers, because we two years ago we were lacking this, so to be able

10:37.680 --> 10:45.280
to even execute the test queries, we had to cast that integer to our chart and then aggregate that,

10:45.280 --> 10:52.080
well that's not going to be any fast, so that was an example of one of this like small

10:52.080 --> 10:58.880
effort, big impact optimizations, but let's do something more interesting, so we have just

10:58.880 --> 11:05.280
in time compile for queries, so quasi be a little bit of an intro, it's a project mostly in Java,

11:05.280 --> 11:11.120
but it has some C++, C and Rust parts, and this is one of them, so basically when you have an

11:11.120 --> 11:18.480
SQL query, we've learned predicate, then we do our best if we can to take that predicate,

11:18.480 --> 11:26.480
turn it into machine code for your architecture, if you are in on Intel and the query has a good

11:26.480 --> 11:33.360
fit, then we do synth instructions for vectorization to execute it as fast as the CPU can and

11:33.360 --> 11:45.760
really our goal is to not obstruct the CPU, but how it works, we receive your query, we parse it,

11:45.760 --> 11:52.160
this produce AST, then from this AST we produce some kind of intermediate representations,

11:52.160 --> 11:57.840
you can think of, I don't know, maybe if you know LVM bit code or something something similar,

11:57.840 --> 12:04.640
and then we store this bit code to some of heap slash native memory and then we call our C++

12:04.640 --> 12:11.200
backhand, which will read that intermediate representations and we'll spit out the machine code

12:11.200 --> 12:16.800
for the target architecture, right now covered, it's Intel only, but we are looking whether to

12:16.880 --> 12:24.240
implement this for arm as well or not, and yeah, the output of this is basically a function,

12:24.240 --> 12:33.920
which we then call from Java via GNI, and on the input you can see here is the address of our data,

12:33.920 --> 12:38.960
and things like where the output should be located, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah, yeah,

12:39.920 --> 12:47.360
little detour, we don't use typicrafing, I believe postgres can use LVM for GIT, we don't use LVM,

12:47.360 --> 12:55.680
we use really low level library called ASMGIT, and sometimes sometimes QuestDB, the Java part,

12:56.640 --> 13:02.240
people say that it's Java in the name only because it looks more like C, so when, when,

13:02.240 --> 13:07.840
when a QuestDB developer writes Java, it looks like C, and this is what, how it looks like,

13:07.840 --> 13:14.560
when QuestDB developers writes C, then it looks like assembler, that's the API of the ASMGIT library

13:14.560 --> 13:19.360
if you use, and that's also the reason why we don't support arm right now, because we need basically

13:19.360 --> 13:27.120
to write the same for, for the arm instructions said a neon and things like that, yeah, yeah, yeah,

13:27.120 --> 13:38.560
by Intel, I mean, I mean, the X86 or 64 bits, with with with with AFX to instruction set, yeah, so we implemented

13:38.560 --> 13:44.560
the, we implemented the, the first version of the GIT compiler for some filters, it cannot compile

13:44.560 --> 13:50.880
all filter filters, and then we did a bunch of improvements like the URL in the clickbench is a

13:50.880 --> 13:57.200
varchar, and originally if, if the filter included varchar, the, the, the, the, it would not be

13:57.200 --> 14:03.280
eligible for that, so we did some improvements like, okay, if the, if the filter includes varchar,

14:03.840 --> 14:08.480
and it's checking if it's now or not, well, that's still easy, then now it's eligible,

14:08.480 --> 14:13.520
so minor improvements there as well. So, like, so that's pretty working or not, but you don't

14:13.600 --> 14:18.800
go to, so it, I thought that you would show the example, yeah, yeah, yeah, yeah, one, so that,

14:18.800 --> 14:23.680
that's quite for example, what, what I did is like, I, I started multiple instances, of course,

14:23.680 --> 14:29.680
they be here, so this is, this is quite the, be, absolutely on the latest version, we will release

14:30.400 --> 14:35.840
on Wednesday, and this quick takes, it's a difference between Google, the, the call path and the

14:35.840 --> 14:40.800
whole path, but still it was like three seconds to execute this query, so if I go to,

14:40.800 --> 14:47.120
quiz with the five years ago, and the same query which will be this one, let's see how much it picks,

14:48.800 --> 14:54.960
so you took on the, oh, sorry, let me, let me run again, no, it happens, it happens, there is somebody

14:54.960 --> 15:02.240
of it is sometimes, okay, let's see, is the same query, so three seconds, well, that was interesting,

15:02.960 --> 15:07.760
and there's not, no, that's cool, that's cool, I mean, it's all good, oh, no, I know what,

15:08.320 --> 15:13.840
this is, this is the same, sorry, I, I switched inversions here, so inversions six, actually,

15:13.840 --> 15:19.840
let me just see what's happening here, this is the same query, because the inversions six,

15:19.840 --> 15:26.880
actually, I see in this one is taken half a second, let me see, this is the one, yeah,

15:27.680 --> 15:34.960
anyway, it looks same, that's interesting, okay, one second now, so as you see where it's

15:34.960 --> 15:46.000
lower than before, I'm going to try in the version of year ago, okay, so yeah, in that, that's,

15:46.000 --> 15:51.600
I know what's happening here, we have the, the code one, okay, so now this, this makes sense,

15:51.600 --> 15:58.640
this is working now in 1.8 seconds, inversions of year ago is taken about three seconds,

15:58.800 --> 16:06.880
oh yeah, microphone, I have the microphone over there, sorry for the, I forgot the people on the

16:06.880 --> 16:14.080
stream in, sorry it was just me doing a failed demo, and then on the version of three years ago

16:16.560 --> 16:22.000
two years ago, okay, so actually this is faster, oh yeah, and we knew this, and I'll tell you

16:22.000 --> 16:26.880
what happened, this is kind of a stupid, I'll tell you what happened, so first I was checking the

16:26.880 --> 16:30.880
code run and the code run, the code run means we have to go to the disk, on the second we are

16:30.880 --> 16:36.960
already memory, so when we are in the code run in the first version of the second, the first one was faster,

16:36.960 --> 16:41.360
why the one of two years and a half ago was actually faster in this particular test,

16:41.360 --> 16:46.720
and we talk about that the other day, it's because we've been adding new data types, so on the version

16:46.720 --> 16:53.760
six, the table, was you seen only, uh, integers, and in version seven, we are actually using short

16:54.320 --> 16:59.680
force on columns, and actually in sort, we don't have optimizations, the ones that are

16:59.680 --> 17:03.600
me were saying, so not optimizations are at the block for some type, but we still don't have

17:03.600 --> 17:09.920
for others, I was having this particular query, so actually the new version is slower because

17:09.920 --> 17:15.440
they changed the type of the column, and for that particular column, we don't have the optimizations,

17:15.440 --> 17:21.200
so yeah, that happens, there's something to it, so at least we can explain it, but that

17:21.200 --> 17:25.360
was happening, we have another one later that's only faster in all the versions, but this was

17:25.360 --> 17:35.040
a good one, yeah, you think they're a good query, okay, so next optimizations, so two years ago,

17:35.040 --> 17:42.960
we would store string types as a UTF-16 encoded, that was one of our legacies of Java,

17:42.960 --> 17:48.160
which is not great on its own, but it gets worse, so the way how it was stored and how it was

17:48.160 --> 17:55.440
organized on a disk is that we had two files, the first one is similar to as a very storing

17:55.440 --> 18:01.040
integer, so the physical representation for integers is really simple, it's basically you have one

18:01.040 --> 18:06.240
flat file, you store arrow, you write the number for that particular column, you store another

18:06.240 --> 18:11.360
arrow, you write the integer, and it's easy because all integer they have fixed size, so if you go

18:11.360 --> 18:17.120
to, if you want to read from ends, and for all, does it work in the microphone? If you read,

18:17.200 --> 18:23.840
if you want to read ends, you just multiply end by the size, and you know, do offset in the column

18:23.840 --> 18:29.760
file, that's easy, but it doesn't work for things like varchar, right, because string can be

18:29.760 --> 18:35.760
showed long, so they don't have fixed size, so you know, what they say about computer science,

18:35.760 --> 18:40.800
everything can be solved with another layer of indirection, so that's exactly what's going on here,

18:41.120 --> 18:49.520
so we have two files for varchar, one is just fixed set offsets into another file where the actual

18:49.520 --> 18:55.120
data are stored, prefix with the size, this is just conceptual, because if you think about it

18:55.120 --> 19:02.320
is UTF16, so each chart here would be 16 based and what not, but that's the concept, it's easy

19:02.320 --> 19:08.640
to implement, it's easy to grasp, but it's not very fast, one reason is, well it's UTF16,

19:08.640 --> 19:15.120
so it's also not very compact on this, can in memory, but the reason why it's slow, if you are

19:15.120 --> 19:25.040
doing full-table scans, and for every error, you need to jump into another file and even, you know,

19:25.040 --> 19:31.840
to check size, because the size is of each string that's stored in the second in the data file,

19:31.840 --> 19:37.680
so it's really slow, because there is a lot of pointer chasing and, you know, not good,

19:38.960 --> 19:43.680
so in the new, about a year ago, I think, how we are, is it a year ago, going to be introduced,

19:43.680 --> 19:51.440
this new new implementation of varchar, I don't know, I lost track anyway,

19:53.760 --> 19:59.040
and my out-farmary motivation was to replace UTF16 with UTF8 for obvious reasons,

19:59.840 --> 20:04.720
and while we were doing this, we were like, well, maybe we can do even better, so we change

20:05.600 --> 20:10.880
things how they are represented, so the image in this is the first column, which is fixed size,

20:11.440 --> 20:17.520
and it doesn't include now just offsets, but it also includes some kind of flags and size,

20:18.160 --> 20:26.400
and prefix, so first few bytes from the actual string, right, so now when we want to check

20:26.880 --> 20:35.920
if particle, if particle string starts with something, or has some size, or even equal,

20:36.640 --> 20:43.120
something we, you know, what our predicted says, we can consult just the prefix in the, in the,

20:43.120 --> 20:48.720
in the first file with, with, with fixed width, and if it doesn't match, then we don't need to go

20:49.440 --> 20:56.000
to the other file at all, and that, that saves a lot of, what of time and point of chasing.

20:57.760 --> 21:04.320
Now, the actual need, the other thing in the room is that even two years ago, we could run

21:05.440 --> 21:12.480
some kind of aggregation queries on multiple threats, but just tiny, tiny subsets of

21:12.480 --> 21:19.200
something we had specialized path using, same instructions for simple aggregations like

21:19.200 --> 21:26.000
a minimum maximum average, but that was really simplistic, we wanted to have something more general,

21:27.040 --> 21:34.480
and, you know, in a, in a, this is the naive or single-traded implementation of group by

21:34.480 --> 21:40.800
aggregation for this query, so we have a table with user ID and, and we want to count

21:42.800 --> 21:47.280
a pair of user ID, we want to, we want to, we want to group by the user ID, so the implementation

21:47.280 --> 21:52.240
is, it's pretty simple, right? So this is our input table, we read it, we read it, we read it,

21:52.240 --> 22:00.160
okay, user ID 142, we put it into hash map, if it's not there, we set count to one, if it's already

22:00.160 --> 22:04.800
there, we increase the counter. It's easy, single-traded implementation, nothing to write home about.

22:05.840 --> 22:10.080
The disadvantage is clear, it's single-traded, right? If I have a CPU, if I have a computer,

22:10.080 --> 22:18.880
if 1208 cores, only one will be used, not good. So the one way to parallelize this is

22:18.880 --> 22:26.480
simple, right? You just split the data set into, let's say, I have only two cores, so let's say,

22:26.480 --> 22:33.520
in this case two, and then each worker thread, I spun task, and each worker thread will get, you know,

22:33.600 --> 22:43.920
half of the data set, so they will produce partial tables with partial results, and once they are

22:43.920 --> 22:50.960
done, I have these two tables, and I manage it, and this is my result. So I was able to distribute

22:50.960 --> 22:58.320
the work and utilize more cores, somehow, but the problem is, there is still a bottleneck,

22:59.200 --> 23:09.040
that the bottleneck is in this merging part, right? Because think of it, if I have many, many users,

23:09.040 --> 23:16.400
this tables will be the individual tables, will be still huge, and the merging is single-traded,

23:16.400 --> 23:21.200
in the knife case, right? Because I, you know, the workers are done, I have a bunch of

23:21.760 --> 23:27.920
intermediate tables with partial results, and I need to merge it into the final final final result,

23:27.920 --> 23:36.640
and that single-traded, this will be my scalability bottleneck, and I'm done, so we can do better,

23:38.480 --> 23:44.560
we can do what we can do, the improved strategy, we employ right now, in the current version of

23:44.720 --> 23:54.880
SDB, is that each worker will have a split its table further into charts, let's say, and then,

23:55.520 --> 24:02.560
per each key, this is my user ID, it uses the hashes of that key, because it needs to know the

24:02.560 --> 24:08.080
hash anyway, because the results are stored in hash tables, so it takes some bits, some bits,

24:08.160 --> 24:21.040
from the key, based on this on this bits, it selects a sub-table in its own partial results table,

24:21.040 --> 24:30.560
and story there, it means that all workers will calculate, will select the same sub-table for

24:30.720 --> 24:39.360
the same key, it means that if one worker calculated that the key 142 belongs to the partition

24:39.360 --> 24:46.960
174, and then there is another worker, which is processing the same key, it will also calculate

24:46.960 --> 24:54.880
that it belongs to this partition, so by the end of this process, I will each worker will have

24:54.880 --> 25:08.240
a bunch of this partial results, and once they are done, I am here, so we filter and aggregated,

25:08.240 --> 25:14.000
now we have the partial results, then we need to do merging of the partial results, but that

25:14.000 --> 25:22.880
this can be done in parallel, so now I only need to submit a task, where we merge partial results,

25:23.840 --> 25:30.320
partition 0 from all workers, then another task, which will, in parallel, merge partition

25:30.320 --> 25:35.680
results from partition 1 from all worker, and at the end of the day, and this can be done in all

25:35.680 --> 25:43.440
in parallel, so, you know, the problem with my high cardinality user ID is gone, because this

25:43.440 --> 25:55.200
merging now is no longer bottleneck, okay, some small optimization we did, I keep talking about

25:55.200 --> 26:02.000
hash tables, and the example I gave you was fairly simplistic, right, it was just use the ID,

26:02.000 --> 26:08.560
it was the only key, but in reality, in clear is that the key can be more complicated, right,

26:08.560 --> 26:14.640
it can be multiple columns, it can be, it can be some columns, it can be fixed, fixed size,

26:14.640 --> 26:21.920
some can be like large R and others, and we realized that at some level we get a lot of

26:21.920 --> 26:30.400
performance boost by using specialized data structures, so if we are running aggregation,

26:30.400 --> 26:40.240
like this one, where the key is only one integer, then we have specialized data structure for

26:40.240 --> 26:46.560
that, then if we have table with two integers, then we have specialized data structure for that,

26:46.560 --> 26:53.760
and you know for this we have data structures for this common cases, and then if your

26:53.760 --> 27:03.200
query doesn't fit it, then it uses something more generic. Yeah, so what we have learned,

27:03.200 --> 27:09.680
we wanted to thank to Clickhouse, I think, for for for for the Clickbench, because it was really

27:09.680 --> 27:15.040
instrumental in in this journey, because, you know, we are still rather small team,

27:15.040 --> 27:22.880
and coming up with a good, good benchmark is hard, it's really hard, and then Clickhouse did a lot of work

27:22.960 --> 27:26.400
for us here, and actually it's pretty cool, because in Clickhouse there are a lot of

27:26.400 --> 27:32.800
in Clickbench there are a lot of further databases now, so we have in Clickbench there are a lot

27:32.800 --> 27:37.280
of databases, you can actually compare with all the evolution of further databases, which is

27:37.280 --> 27:41.840
which is super interesting, and it's very easy to run, it's another thing compared with all the

27:41.840 --> 27:48.720
benchmarks. So yeah, if you want, so yeah, the first thing we learned that if you want to do

27:49.680 --> 27:54.080
for a good time series database, you also need to be a good analytical database, so we have to

27:54.080 --> 28:01.520
apply the same things, doing improvements, it's not, it's not like one we are going to do this,

28:01.520 --> 28:06.160
we are adding like various more things, we saw the queries, we run today, we have

28:06.160 --> 28:10.160
temas lot of things, but there are some data types, for which we don't optimize to be with

28:10.160 --> 28:15.120
further, most of our users don't use the sorted data type, it's on the benchmark, so you know,

28:15.120 --> 28:19.760
it's that query, but it's not one of the ones we see in production, so we optimize when users

28:19.760 --> 28:25.200
complain mostly, it's like we saw, we see a query, that's it's a lot for users, we investigate,

28:25.200 --> 28:31.200
we add things, but yeah, requires a lot of a lot of work to do small things, and eventually

28:31.200 --> 28:35.840
we get to something, and then you get, when you improve some things, you actually improve

28:35.840 --> 28:40.800
others, so when we improve a group by, we had this sample by, which is specifically for us,

28:40.800 --> 28:45.360
sample by time, and that was also faster, because we were improving one thing, they said,

28:45.360 --> 28:49.440
effects, you are improving another thing, and because we're improving identical queries,

28:49.440 --> 28:54.160
we're improving tensile queries, which is pretty interesting, so you know, it's good to see,

28:54.720 --> 28:58.720
and the other thing is like now we know many things that we need to improve, for example,

28:58.720 --> 29:02.800
adding more data types, or adding more functions to parallelize, which is something we

29:02.800 --> 29:09.440
still don't have, and we have a lot of ideas for those kind of things, and I don't want to finish

29:09.520 --> 29:15.840
without showing you a query, which is faster, so actually this group by, that Jeremy was saying,

29:15.840 --> 29:22.400
it is one of the things we improve lately, so that one, for sure, so we faster, without any

29:24.320 --> 29:31.280
without any shadow of a doubt, so this will be latest quest debate, so in the hot run, which is

29:31.280 --> 29:38.400
the second one, in the hot run is taking now 2.6 seconds, in the version of a year ago,

29:39.440 --> 29:45.680
that's the same query, on the second run not the first one, because the first one might be

29:45.680 --> 29:52.560
going to this, I know, after this is just the thing just like, so yeah, on the, well, as you can

29:52.560 --> 30:00.800
see, the older series is a slower, for sure, so it's still still there, okay, so now the second,

30:00.800 --> 30:05.440
the second run of this query, on this version, it should still be around 10 seconds,

30:05.440 --> 30:15.120
so yeah, it's there, so yeah, it's 10 seconds on the, on the, a year ago, 3 seconds now,

30:15.120 --> 30:22.400
and as you go to the 2, 2, 1, something years ago, this query will be less slower than both

30:22.400 --> 30:26.560
the one a year ago and 2 years ago, because of the depreciation, in this case, there is no,

30:26.560 --> 30:31.520
there is nothing there, so I wanted to show you that for real, some queries, they do run faster

30:31.600 --> 30:35.840
in this, in this version that they used to run in the past days, that's all we have, we don't have

30:35.840 --> 30:41.520
more time, so basically, I just want to thank you again, so you're sorry, because you want to

30:41.520 --> 30:47.120
collaborate to collaborate to the project, and yeah, 150 contributors, you want to be 151,

30:47.120 --> 30:52.960
actually it's 153 or something, but whatever, so yeah, just go to get help and contribute, thank you very

30:52.960 --> 30:59.520
much, thank you, thank you, thank you, thank you, thank you, thank you, thank you,

31:01.520 --> 31:21.520
thank you, thank you, thank you, thank you, thank you, thank you, thank you, thank you,

