WEBVTT

00:00.000 --> 00:11.520
Well, that's what we also realize, and we do details, specifications, papers, generators,

00:11.520 --> 00:18.160
data sets, and all sorts of frameworks to make our benchmarks be easy to implement.

00:18.160 --> 00:21.320
The benchmarks are easy to implement, but they are not easy to get audited.

00:21.320 --> 00:27.240
We have TPC grade benchmark audits, which means that we have certified auditors, and if

00:27.240 --> 00:29.520
you're a vendor, you have to talk to an auditor.

00:29.520 --> 00:33.160
It's usually something like 30 to 50,000 euros.

00:33.160 --> 00:37.600
It's a multi-week process, the auditor, we'll talk to the vendor.

00:37.600 --> 00:44.680
They will compile the specifications sheets, and then they will, of course, rerun the benchmark

00:44.680 --> 00:47.240
and write a full disclosure report out of it.

00:47.240 --> 00:51.680
This is kind of a painful process, but vendors go through it, because this is kind of the

00:51.680 --> 00:56.320
gold standard and the graph database space, and we have been doing audits for five years,

00:56.320 --> 01:00.520
probably 50 results, and never had to retract any of the results.

01:00.520 --> 01:06.280
The most issues I could find was a few typos in old food disclosure reports.

01:06.280 --> 01:10.080
And as you can see, we have grown quite well in the last five years, so we have more

01:10.080 --> 01:15.920
and more members, lots of user community meetings, benchmarks, results, and so on.

01:15.920 --> 01:18.800
So what are the challenges in the graph database space?

01:18.800 --> 01:20.080
Well, there's a big decline.

01:20.080 --> 01:22.520
I don't think that needs to be denied.

01:22.520 --> 01:27.080
The high cycle has moved over to AI technologies, and I think there's a huge amount

01:27.080 --> 01:28.080
of confusion.

01:28.080 --> 01:32.880
I'm trying to have the confusion here a bit by setting these categories, but when vendors

01:32.880 --> 01:37.200
go out saying, we are a graph database, you don't need joins, and we will replace all

01:37.200 --> 01:38.200
relational systems.

01:38.200 --> 01:40.840
I don't think that does anything, anyone any good?

01:40.840 --> 01:44.480
I don't think anyone is seriously saying any more that graph database is very

01:44.480 --> 01:46.600
replace relational systems.

01:46.600 --> 01:50.760
There are many niche use cases, and there are vendors who are basically tied to one

01:50.760 --> 01:55.280
or two of their customers, they overoptimize for that, and that ends up in this very

01:55.280 --> 01:58.840
fragmented space with dozens of clear languages.

01:58.840 --> 02:05.840
You can see the effect of that, so between 2013 and 2022, there was 13 fold increase in

02:05.840 --> 02:10.800
the DB engines ranking for graph databases, but since the summer of 2022, it has been

02:10.800 --> 02:11.800
a decline.

02:11.800 --> 02:16.880
So we are now back to basically two thirds of the peak, and the decline doesn't look

02:16.880 --> 02:17.880
very good.

02:18.280 --> 02:22.080
If you have heard about graph frag, the vendors have heard about this, this is the

02:22.080 --> 02:24.840
life that they are trying to cling on to.

02:24.840 --> 02:29.720
This is something that the vendors think is going to help them, and indeed it produces

02:29.720 --> 02:32.560
a lot of attention, and it's nice graph use case.

02:32.560 --> 02:38.560
Okay, to sum up, graph databases have syntax sugar and optimizations for joins.

02:38.560 --> 02:42.680
If you have 10 joins in your query, and for some reason it doesn't work value in a relational

02:42.680 --> 02:47.280
system, you should try a graph database, but then based on the problem you may want the

02:47.280 --> 02:53.640
MongoDB, the Postgres, or the Teradata of your database, and licensing is difficult, check

02:53.640 --> 02:57.920
to licensees before using the software, and check the performance results.

02:57.920 --> 02:58.920
Thanks a lot.

02:59.920 --> 03:17.920
Thank you very much.

03:17.920 --> 03:18.920
Thanks.

03:19.920 --> 03:20.920
Hello.

03:20.920 --> 03:25.920
Thank you.

03:25.920 --> 03:28.920
Hey.

03:28.920 --> 03:32.920
Yes, I'm going to hand over to Danny Allen, then I agree with you.

03:32.920 --> 03:36.920
If you can do all questions outside or somewhere different, that's a nice question.

03:36.920 --> 03:37.920
All right.

03:37.920 --> 03:47.920
I do not know what's the next talk though, because Danny I was my student, so...

03:47.920 --> 03:57.920
Thank you very much.

03:57.920 --> 04:02.920
I don't think you should see the line.

04:02.920 --> 04:03.920
It's too close, aren't you?

04:03.920 --> 04:04.920
Yes, but it doesn't seem about doing anything.

04:04.920 --> 04:08.920
We take all the questions outside, and he's going to let's be grappling.

04:08.920 --> 04:11.920
Do you want to know who's there?

04:11.920 --> 04:12.920
Yeah, okay.

04:12.920 --> 04:13.920
So you've got a map.

04:13.920 --> 04:15.920
Yeah, so we've got...

04:15.920 --> 04:16.920
Okay.

04:16.920 --> 04:18.920
Normally what we've found is the issue I didn't work.

04:18.920 --> 04:21.920
This one does, so don't try that.

04:21.920 --> 04:23.920
All right.

04:23.920 --> 04:24.920
See that.

04:24.920 --> 04:25.920
How's it going?

04:25.920 --> 04:26.920
It's going to be fine.

04:26.920 --> 04:27.920
Yeah.

04:27.920 --> 04:30.920
If we can move, yeah, it gets to go and move outside for questions.

04:30.920 --> 04:31.920
Please, that'd be great.

04:31.920 --> 04:32.920
Here's your...

04:36.920 --> 04:37.920
Sorry.

04:37.920 --> 04:38.920
No.

04:38.920 --> 04:39.920
At least that is him off.

04:39.920 --> 04:40.920
Horrible.

04:41.920 --> 04:42.920
What's your other name?

04:44.920 --> 04:46.920
So, what's your shop out of here?

04:46.920 --> 04:48.920
Yeah, it might be easier, might it.

04:48.920 --> 04:49.920
Hold that.

04:49.920 --> 04:50.920
So, right.

04:50.920 --> 04:52.920
Let's see if that works.

04:54.920 --> 04:55.920
Okay.

04:55.920 --> 04:57.920
That actually meant a bit closer.

04:58.920 --> 04:59.920
Sorry.

04:59.920 --> 05:01.920
It's really hard, actually.

05:01.920 --> 05:02.920
Yeah.

05:02.920 --> 05:03.920
You know what?

05:03.920 --> 05:04.920
I'll show you something.

05:04.920 --> 05:05.920
I think.

05:08.920 --> 05:10.920
Then quite nice happened there.

05:15.920 --> 05:18.920
No idea what's up what this is going.

05:18.920 --> 05:19.920
Okay.

05:19.920 --> 05:21.920
Sure, what's going on?

05:26.920 --> 05:28.920
I think it's what I think what it is.

05:28.920 --> 05:30.920
It's that goes down like this.

05:35.920 --> 05:36.920
Okay.

05:36.920 --> 05:37.920
That goes in like that.

05:37.920 --> 05:38.920
Exactly.

05:38.920 --> 05:40.920
So then, that goes in like that.

05:43.920 --> 05:44.920
Like that.

05:44.920 --> 05:46.920
And then I can do it like this.

05:46.920 --> 05:48.920
It's just right up here.

05:48.920 --> 05:49.920
Oh, that's lovely.

05:49.920 --> 05:50.920
Yeah.

05:50.920 --> 05:51.920
It should be really lovely.

05:51.920 --> 05:54.920
So, unfortunately, the audio is not great for him.

05:54.920 --> 05:55.920
Yeah.

05:58.920 --> 06:00.920
I've got, I've got a bad job here.

06:01.920 --> 06:03.920
I think I'll miss him up here.

06:05.920 --> 06:07.920
Okay, let's try doing it in here.

06:08.920 --> 06:11.920
Okay, let's try doing it in here.

06:11.920 --> 06:12.920
Okay.

06:12.920 --> 06:14.920
That should hopefully be okay.

06:14.920 --> 06:15.920
Okay.

06:15.920 --> 06:16.920
So yeah.

06:16.920 --> 06:17.920
So just project.

06:17.920 --> 06:18.920
Yeah.

06:18.920 --> 06:19.920
Yeah, I'm sorry.

06:19.920 --> 06:20.920
Actually, you know what?

06:20.920 --> 06:22.920
I think it's not in the work.

06:25.920 --> 06:26.920
Sorry.

06:26.920 --> 06:27.920
I'll fix this.

06:27.920 --> 06:28.920
So, yes.

06:28.920 --> 06:30.920
You were on the previous door.

06:30.920 --> 06:32.920
So you know what the situation is with the audio.

06:32.920 --> 06:33.920
I'm really sorry about that.

06:34.920 --> 06:35.920
Yeah.

06:35.920 --> 06:36.920
Yeah. I have some overview of it.

06:36.920 --> 06:38.920
No, not overview of it.

06:38.920 --> 06:40.920
I basically took it over.

06:40.920 --> 06:43.920
Overlapping with a guy around me.

06:43.920 --> 06:44.920
I hope it should be there.

06:44.920 --> 06:45.920
No.

06:45.920 --> 06:52.920
You know, it was interesting to see that the interest in graph actually is really growing.

06:52.920 --> 06:57.920
You know, I'm not sure how much of that is because of graph rag and, you know, the kind of thing.

06:57.920 --> 06:59.920
Maybe it's something there.

06:59.920 --> 07:00.920
I don't know.

07:00.920 --> 07:02.920
I just do not know how this works.

07:04.920 --> 07:06.920
On that side.

07:15.920 --> 07:17.920
Let's try it one more time.

07:17.920 --> 07:18.920
Let's see.

07:18.920 --> 07:19.920
Hold this.

07:22.920 --> 07:23.920
Okay. Hopefully.

07:23.920 --> 07:24.920
Okay.

07:24.920 --> 07:25.920
Can you put on your pocket maybe?

07:25.920 --> 07:26.920
Yep.

07:31.920 --> 07:32.920
I'll add a timer.

07:32.920 --> 07:33.920
Okay.

07:33.920 --> 07:35.920
Next session is going to start in five minutes.

07:46.920 --> 07:48.920
What do you mean? This is my uniform.

07:48.920 --> 07:49.920
This is really good.

07:51.920 --> 07:53.920
This is my uniform.

07:53.920 --> 07:55.920
Oh, I have. Thank you.

07:55.920 --> 07:56.920
I have.

08:02.920 --> 08:04.920
Thank you.

08:33.920 --> 08:34.920
Thank you.

08:34.920 --> 08:40.920
Oh, this is a gig.

09:05.920 --> 09:06.920
Thank you very much.

09:06.920 --> 09:09.920
Thank you so much for agreeing to do this.

09:09.920 --> 09:10.920
Like not.

09:14.920 --> 09:15.920
Yeah.

09:15.920 --> 09:16.920
Yeah.

09:16.920 --> 09:18.920
I have some overlapping slides with camera.

09:18.920 --> 09:19.920
Like overlapping topics.

09:19.920 --> 09:20.920
I hope it's good.

09:20.920 --> 09:21.920
I don't know for sure.

09:21.920 --> 09:23.920
At the beginning, we were able to extend because you know with the dark cone.

09:23.920 --> 09:25.920
We didn't want to fit like that.

09:25.920 --> 09:26.920
Yeah.

09:26.920 --> 09:28.920
So it's like they were like really good talks.

09:28.920 --> 09:30.920
I said they were more talk like not from that cone.

09:30.920 --> 09:32.920
In the proposal, but we were like, I don't know.

09:32.920 --> 09:34.920
I've been in both events because it's like, yeah.

09:34.920 --> 09:35.920
Yeah.

09:35.920 --> 09:37.920
If it's kind of like a advertise for them freeing event,

09:37.920 --> 09:40.920
I would find it is different cities so much.

09:40.920 --> 09:41.920
Yeah.

09:41.920 --> 09:42.920
For them freeing.

09:42.920 --> 09:44.920
It was like, it can be, but it's not.

09:44.920 --> 09:45.920
It's a bit too big.

09:45.920 --> 09:47.920
So they were pretty pretty cool actually.

09:47.920 --> 09:48.920
Yeah.

09:48.920 --> 09:49.920
But like, I don't know man.

09:49.920 --> 09:50.920
So yeah, about about that.

09:50.920 --> 09:51.920
Yeah.

09:51.920 --> 09:52.920
It was like very high rated.

09:52.920 --> 09:53.920
I got it.

09:53.920 --> 09:54.920
Thank you for making this.

09:54.920 --> 09:55.920
I'm happy to.

09:55.920 --> 09:57.920
I think it's really nice to share the,

09:57.920 --> 09:58.920
but it's really nice to share the.

09:58.920 --> 09:59.920
Yeah.

09:59.920 --> 10:00.920
It is.

10:00.920 --> 10:01.920
Yeah.

10:01.920 --> 10:02.920
Yeah.

10:02.920 --> 10:03.920
Yeah.

10:03.920 --> 10:04.920
Yeah.

10:04.920 --> 10:06.920
Thank you.

10:06.920 --> 10:07.920
Happy to.

10:31.920 --> 10:32.920
Yeah.

10:32.920 --> 10:33.920
Yeah.

10:33.920 --> 10:34.920
Yeah.

10:34.920 --> 10:35.920
Yeah.

10:35.920 --> 10:36.920
Yeah.

10:36.920 --> 10:37.920
Yeah.

10:37.920 --> 10:38.920
Yeah.

10:38.920 --> 10:39.920
Yeah.

10:39.920 --> 10:40.920
Yeah.

10:40.920 --> 10:41.920
Yeah.

10:41.920 --> 10:42.920
Yeah.

10:42.920 --> 10:43.920
Yeah.

10:43.920 --> 10:44.920
Yeah.

10:44.920 --> 10:45.920
Yeah.

10:45.920 --> 10:46.920
Yeah.

10:46.920 --> 10:47.920
Yeah.

10:47.920 --> 10:48.920
Yeah.

10:48.920 --> 10:49.920
Yeah.

10:49.920 --> 10:50.920
Yeah.

10:50.920 --> 10:51.920
Yeah.

10:51.920 --> 10:52.920
Yeah.

10:52.920 --> 10:53.920
Yeah.

10:53.920 --> 10:54.920
Yeah.

10:54.920 --> 10:55.920
Yeah.

10:55.920 --> 10:56.920
Yeah.

10:56.920 --> 10:57.920
Yeah.

10:57.920 --> 10:58.920
Yeah.

10:58.920 --> 10:59.920
Yeah.

10:59.920 --> 11:00.420
Yeah.

11:00.420 --> 11:02.420
Yeah.

11:02.420 --> 11:04.420
Yeah.

11:21.420 --> 11:22.420
Yeah.

11:22.420 --> 11:23.120
Yeah.

11:23.120 --> 11:23.920
Yeah.

11:23.920 --> 11:24.920
Yeah.

11:24.920 --> 11:26.420
Yeah.

11:26.420 --> 11:27.420
Acid that because our,

11:27.420 --> 11:28.420
Yeah.

11:28.420 --> 11:29.420
Yeah.

11:29.420 --> 11:29.920
Yes.

11:29.920 --> 11:30.320
Yeah.

11:30.420 --> 11:34.020
So we are going to start in one minute, approximately.

11:34.020 --> 11:37.340
We are delighted to have Daniel, the 10-fold.

11:37.340 --> 11:38.340
10-folder?

11:38.340 --> 11:39.340
Holder.

11:39.340 --> 11:40.340
OK.

11:40.340 --> 11:43.980
I'm getting a good exercise in pronouncing the name today.

11:43.980 --> 11:50.620
Who is going to be talking about empowering data analytics, high performance graph queries,

11:50.620 --> 11:53.860
duct DB with duct LGBTQ.

11:53.860 --> 11:56.860
So who doesn't need more ducts in their life?

11:56.860 --> 11:57.860
Right?

11:58.300 --> 12:02.740
I know there are also creatures starting up ahead of you.

12:02.740 --> 12:07.620
Starting in one minute, so please, again, move in because we have people who are coming

12:07.620 --> 12:10.460
and we don't want them to disrupt yourselves.

12:10.460 --> 12:16.180
If you can, squish in and leave people as places who should be joined like that, that's awesome.

12:16.180 --> 12:19.100
So thank you very much, and over to you.

12:19.100 --> 12:20.060
Thank you very much.

12:20.060 --> 12:23.540
All right, can everybody hear me if I speak like this?

12:23.540 --> 12:24.660
For a bigot?

12:24.660 --> 12:25.660
All right.

12:25.780 --> 12:27.300
Well, my name is Daniel Tanvolde.

12:27.300 --> 12:31.100
I'm a PhD student at the CWA database architects group.

12:31.100 --> 12:38.060
And today, I'll talk to you about using duct DB and duct LGBTQ to do your graph analytics.

12:38.060 --> 12:42.620
OK, so first, my topic is very similar to the one that

12:42.620 --> 12:45.740
Gauber described in the previous talk with graph data management.

12:45.740 --> 12:48.340
So there is some overlap, all admit.

12:48.340 --> 12:53.180
But with graph data management, we often refer to highly connected data.

12:53.260 --> 12:56.380
So many to many relationships are very common.

12:56.380 --> 13:00.460
And very often, tables already represent graphs.

13:00.460 --> 13:06.300
So in this case, we have special vertex tables and special edge tables.

13:06.300 --> 13:08.980
And then people want to do graph exploration over this.

13:08.980 --> 13:14.340
And what that means is that there's pattern matching where we want to find a sub-graph.

13:14.340 --> 13:18.980
We want to do path finding where we want to find the shortest or cheapest path between

13:18.980 --> 13:21.580
any two sets of nodes.

13:21.620 --> 13:27.620
But importantly, people still want their filters and their aggregations.

13:27.620 --> 13:31.220
So these three building blocks are important.

13:31.220 --> 13:34.900
So storing a graph in SQL on the right here, we see a property graph.

13:34.900 --> 13:39.340
And the property graph data model extends the normal graph with labels.

13:39.340 --> 13:43.140
So we have a person label, follows, lives in and city.

13:43.140 --> 13:48.540
And every node in edge can have properties in the form of key value pairs.

13:48.580 --> 13:52.580
So storing this in a relational system is really not the issue.

13:52.580 --> 13:56.060
For every label, we create a different table.

13:56.060 --> 13:59.780
And the properties are the columns within these tables.

13:59.780 --> 14:03.020
All right, nothing special going on here.

14:03.020 --> 14:07.100
But now we're tasked with the following prompt of counting the number of people

14:07.100 --> 14:11.460
Bob indirectly follows, who live in the city, Utrecht.

14:11.460 --> 14:13.660
And Gabber gave it away a bit in the last slide.

14:13.660 --> 14:17.700
But this is what you would have to write in the SQL of today,

14:17.700 --> 14:22.380
SQL 999, where the recursive operator was introduced.

14:22.380 --> 14:28.660
And you have to recursively explore your pulse and keep track of them yourself using an array.

14:28.660 --> 14:32.740
So from a user perspective, this is not nice to write, not nice to read.

14:32.740 --> 14:37.860
I would argue that there may be some of you out there who like to read this.

14:37.860 --> 14:43.340
And from a system perspective, importantly, it's really difficult to optimize these styles of queries.

14:43.340 --> 14:45.780
So what did people do?

14:45.780 --> 14:52.140
They moved towards graph database systems, which had a arguably nicer query language.

14:52.140 --> 14:57.900
So the most popular out there is Neo4j that introduced a visual graph syntax that

14:57.900 --> 14:59.700
Cypher has.

14:59.700 --> 15:03.580
But you also have Amazon Neptune that supports both Sparkle and Gremlin.

15:03.580 --> 15:04.980
You have Genesis Graph with Gremlin.

15:04.980 --> 15:12.100
You have Oracle LFPGX, which supports PGQL, Type Graph with GSQL, or Nebula Graph with NGQL.

15:12.100 --> 15:16.420
And the government mentioned a couple more that I couldn't list.

15:16.420 --> 15:20.620
So the landscape is very scattered.

15:20.620 --> 15:27.140
Now as part of the SQL 2023 standard, yes, there is a standard you can buy it for 221 Swiss

15:27.140 --> 15:28.140
France.

15:28.140 --> 15:36.100
If you really want to, but I don't recommend it because it's not really a light read.

15:36.100 --> 15:40.420
You can buy it, but yeah, SQL property graph queries was introduced.

15:40.420 --> 15:42.380
SQL PGQ.

15:42.380 --> 15:49.420
And this enables you to create a property graph layer over your already existing tables.

15:49.420 --> 15:54.700
And then you can use the visual graph syntax to do your pattern matching and path finding

15:54.700 --> 15:56.780
more easily.

15:56.780 --> 15:58.980
So SQL PGQ is read only style queries.

15:58.980 --> 16:03.220
You cannot do updates or delete with this syntax.

16:03.220 --> 16:08.740
And that's part of the GQL, the graph query language.

16:08.740 --> 16:13.900
And before someone refers me to this XKCD of yes, there's 14 competing standards.

16:13.900 --> 16:17.180
Let's create another one and now there's 15.

16:17.180 --> 16:21.900
Yes, that can be the case, but I would argue that this is part of SQL.

16:21.900 --> 16:24.660
So if you don't like it, you don't have to use it.

16:24.660 --> 16:29.580
You don't have to switch all your data to a new system.

16:29.580 --> 16:34.140
So I think this is more integrated.

16:34.140 --> 16:38.340
This is actually being already implemented in a couple of relational systems.

16:38.340 --> 16:41.980
Oracle is really going in on this.

16:41.980 --> 16:44.700
Dr. B with Dr. PGQ.

16:44.700 --> 16:48.220
There's Google that supports in Spanner.

16:48.220 --> 16:50.660
And actually they edited to the Google SQL.

16:50.660 --> 16:55.380
So also other systems such as BigQuery could support it in the future.

16:55.380 --> 17:01.180
And I know that Postgres has ongoing work with Peter Eisentras.

17:01.180 --> 17:04.540
So it is being implemented.

17:04.540 --> 17:07.780
Now, how would you create a property graph in SQL PGQ?

17:07.780 --> 17:10.620
This is the first step you do.

17:10.620 --> 17:12.580
We create a property graph called social network.

17:12.580 --> 17:17.460
And here we explicitly define what are our vertex tables, in this case,

17:17.460 --> 17:21.420
person and city, and what are the edge tables.

17:21.420 --> 17:24.980
So for the edge table of follows, we go from a source person

17:24.980 --> 17:27.220
to a destination person.

17:27.220 --> 17:30.820
And here we specify the keys that we're going to use to join on.

17:30.820 --> 17:38.260
The joints are very important in our use case, in the analytical space.

17:38.260 --> 17:43.220
And if you define your primary key form key relationship already when you created the table,

17:43.220 --> 17:50.260
you don't necessarily need to repeat them here, because I can deduce them in some cases.

17:50.260 --> 17:55.140
Now we got back to the same query, count the number of people in directly follows,

17:55.140 --> 17:57.780
living in the city of Utrecht.

17:57.780 --> 18:01.540
And so maybe you remember that 3 times 6 plus 3 times 8,

18:01.540 --> 18:04.660
you can also represent as 3 times 6 plus 8.

18:04.660 --> 18:10.260
And this way, you have one less repeated value, in this case.

18:12.260 --> 18:17.380
So we need to use less memory, and operating on this compressed value is actually very efficient,

18:17.380 --> 18:21.700
because we only have one value instead of many duplicates.

18:21.780 --> 18:29.460
So that to be used mostly, hashed joints, which first builds a hash table on the destination key

18:29.460 --> 18:35.540
of the follows relationship in this case, and then it probes it with the source key of the intermediate result.

18:37.860 --> 18:43.540
So how the implementation of the join is inductively, it was slightly changed with this master project,

18:45.060 --> 18:48.740
which is actually very impressive, because the join operator is not easy to work on.

18:49.300 --> 18:51.700
It's a core component of Dr. B.

18:53.700 --> 19:02.180
So before when you wanted to insert a value that had a collision, in this case K2 will have a collision

19:02.180 --> 19:06.980
with K1, then Dr. B would still insert it into this chain.

19:06.980 --> 19:11.380
So you would kind of get collisions, well, you would get non-duplicate values,

19:11.380 --> 19:14.020
or duplicate values in this chain.

19:14.980 --> 19:20.580
But now with this new change, we actually insert it into the next slot.

19:20.580 --> 19:29.940
So you get a unique keys per chain, and this is nice, because, oh wait, yeah, this is nice,

19:29.940 --> 19:36.260
because now we have pure chains, and we don't need to go over all the keys in this chain to check

19:36.260 --> 19:42.340
if there's a matching key. We know that if the first key doesn't match, we know that all

19:42.340 --> 19:49.700
other keys in that chain don't match. And then there's a little optimization, which is the salt,

19:49.700 --> 19:53.780
which is the upper 16 or so bits, because we don't actually need them for the pointer,

19:54.500 --> 20:03.380
that can be used to do a fast filter on these keys. So we benchmark this on TPCH,

20:03.380 --> 20:12.260
which is the industry standard for analytical systems. And we actually saw a big improvement on

20:12.500 --> 20:17.460
especially high scale factors, where there's longer chains, so you don't need to traverse these

20:17.460 --> 20:23.460
long chains, and the filtering on the joints with the salt optimization. And this was actually

20:23.460 --> 20:31.700
merged as part of that to be 1.1.0, and that improved the joint performance quite nicely.

20:33.540 --> 20:38.740
But how does this relate to factorization? Well, this pure chains, you can actually see them as

20:39.300 --> 20:46.500
a factorized result, because now if the pointer in the hash table, we can store that,

20:47.940 --> 20:54.980
and then for every value of E, we can point to this pointer, and then we don't need to repeat

20:54.980 --> 21:00.420
the actual values that are in the chain. So this removes any redundancy from the probe side,

21:01.060 --> 21:08.980
and it removes redundancy from the bill side. And this is actually something we learned from the

21:08.980 --> 21:14.740
3D hash joint paper, which is something very similar, but not quite like this, but this gave

21:14.740 --> 21:22.980
us the idea for this factorization. And additionally, if you have an aggregation on the same key,

21:23.540 --> 21:27.380
arguably maybe a bit specific, but if you have an aggregation on the same key,

21:28.340 --> 21:33.540
then you can actually use this factorization to do the aggregation very efficiently.

21:34.420 --> 21:40.820
So we have an additional data structure where we can store the result of the aggregation for every

21:40.820 --> 21:45.700
key in the chain. So for instance, if we want to get the length of the chain, in this case,

21:45.700 --> 21:51.220
we only have to compute it for the first time we come across this pointer, which is the length

21:51.220 --> 21:56.580
is 2, and then the next time we come across this pointer, we already compute it to value. So we don't

21:56.580 --> 22:03.460
need to traverse the chain again. We only have to look up the value. And this gives a really

22:03.460 --> 22:10.980
nice performance improvement. So the baseline or the vectorization with the pure chains

22:10.980 --> 22:19.540
already gave a bit of a performance improvement about 1.25x. But with the caching, it's actually

22:19.540 --> 22:26.660
17 times faster. And this is maybe a very specific case, but it does show a very nice optimization.

22:28.260 --> 22:34.740
So there's more work that we did that is published as part of the side of the 2025

22:35.460 --> 22:42.500
conference that was work with Paul and me and Peter Bunks. So we also looked at how you can

22:42.500 --> 22:48.500
adaptively use this factorization technique because you don't always want to trigger this factorization

22:48.580 --> 22:52.180
because it does give a lot of over it. But there's more details in the paper.

22:53.860 --> 22:59.700
Okay, I see I'm quite quick with the presentation, but if you want to try out that pjq,

23:00.420 --> 23:05.860
it's really easy to get started. And still that pjq from community load that pjq and you're good to go.

23:06.660 --> 23:12.900
There's a QR code that leads to the website that has more examples of SQL pjq queries and more

23:13.060 --> 23:20.740
documentation. And we're also on get up, I would appreciate a star. And yeah, to conclude,

23:20.740 --> 23:27.060
I introduce dcq, which enable SQL pjq queries within dcb. We outperform Neo4j and crucial to be

23:27.060 --> 23:32.260
for analytical pattern matching. And I showed how you efficiently handling of path finding and ongoing

23:32.260 --> 23:34.260
research work. Thank you very much.

