WEBVTT

00:00.000 --> 00:07.000
Okay, welcome everybody.

00:07.000 --> 00:09.000
Thank you so much for coming.

00:09.000 --> 00:13.000
We're going to talk about observability databases.

00:13.000 --> 00:17.000
All right, so if you call Alexander's talk,

00:17.000 --> 00:20.000
you've covered some stuff that's very relevant.

00:20.000 --> 00:22.000
What I'm going to be talking about is it was a great talk

00:22.000 --> 00:25.000
on comparing B3 indexes and LSM indexes.

00:25.000 --> 00:26.000
If you didn't catch that talk,

00:26.000 --> 00:28.000
I highly recommend that you go check out the recording

00:28.000 --> 00:30.000
after mine because it'll help fill in some of the

00:30.000 --> 00:32.000
more technical details of what I'm talking about.

00:32.000 --> 00:34.000
We're going to kind of step back, look at these things

00:34.000 --> 00:36.000
from a little bit of a high level.

00:36.000 --> 00:38.000
Talk about some of the different trade-offs in the architectures

00:38.000 --> 00:42.000
and when you might want to use some of these things.

00:42.000 --> 00:44.000
Okay, so first, I often give this talk

00:44.000 --> 00:46.000
at more developer-focused conferences,

00:46.000 --> 00:48.000
and as Patrick McAdden says,

00:48.000 --> 00:50.000
a developer will never ask you,

00:50.000 --> 00:51.000
hey, what file system is that?

00:51.000 --> 00:52.000
But you're in this room.

00:52.000 --> 00:53.000
So you're special.

00:53.000 --> 00:55.000
You do care about these nitty-gritty details,

00:55.000 --> 00:56.000
and that's awesome.

00:57.000 --> 00:59.000
Quick note about myself, my name is Josh.

00:59.000 --> 01:01.000
I am an open source advocate at Altinity.

01:01.000 --> 01:03.000
We do hosting and support for Clickhouse,

01:03.000 --> 01:05.000
but our lawyers wanted me to tell you that we're not affiliated

01:05.000 --> 01:06.000
with Clickhouse Incorporated.

01:06.000 --> 01:08.000
We're just humble open source contributors.

01:08.000 --> 01:10.000
I know that in this room,

01:10.000 --> 01:11.000
sometimes it's kind of hard to see the slides.

01:11.000 --> 01:13.000
So if you want to scan this QR code,

01:13.000 --> 01:14.000
you can just download the slides right now

01:14.000 --> 01:16.000
and have them on your own device,

01:16.000 --> 01:17.000
or you can have them for later.

01:17.000 --> 01:19.000
That will be on every slide,

01:19.000 --> 01:21.000
so it'll be available.

01:21.000 --> 01:23.000
Okay, and with that out of the way,

01:23.000 --> 01:24.000
let's dive in.

01:25.000 --> 01:28.000
So first I'm going to talk about what observability is.

01:28.000 --> 01:29.000
Really, really quickly, right?

01:29.000 --> 01:30.000
It's about visibility.

01:30.000 --> 01:32.000
Can you see everything that's happening in your systems?

01:32.000 --> 01:34.000
And understanding,

01:34.000 --> 01:38.000
do you have enough context to know what it is that you're looking at?

01:38.000 --> 01:40.000
And that's a simple definition,

01:40.000 --> 01:44.000
but it requires a lot of effort and a lot of data.

01:44.000 --> 01:45.000
In my experience,

01:45.000 --> 01:47.000
you're going to have probably about 50,

01:47.000 --> 01:51.000
maybe even more times data for observability,

01:51.000 --> 01:55.000
than you will have for the actual system that you're observing.

01:55.000 --> 01:57.000
This is going to vary depending on the shape of your system.

01:57.000 --> 01:59.000
It's just a rough rule of thumb.

01:59.000 --> 02:00.000
But the point is,

02:00.000 --> 02:02.000
we're dealing with a ton of data, right?

02:02.000 --> 02:04.000
Alexander made that point when you're talking about

02:04.000 --> 02:06.000
the trillions of rows and petabytes of data

02:06.000 --> 02:08.000
that we have in time series databases.

02:08.000 --> 02:10.000
Wrong direction.

02:10.000 --> 02:13.000
Okay, so what is actually in all of that data?

02:13.000 --> 02:15.000
Well, the bulk of it is the telemetry itself, right?

02:15.000 --> 02:17.000
So if we're talking about open telemetry,

02:17.000 --> 02:20.000
that means metrics traces, logs, profiles,

02:20.000 --> 02:24.000
and, generally, just all of these things can kind of be thought of as events.

02:24.000 --> 02:26.000
We need metadata that allows us to identify

02:26.000 --> 02:28.000
where those events came from, right?

02:28.000 --> 02:30.000
So that can take the form of labels or tags in

02:30.000 --> 02:32.000
Permitius or an open telemetry.

02:32.000 --> 02:34.000
It's called resource metadata, right?

02:34.000 --> 02:36.000
But basically it describes the things that we're talking about

02:36.000 --> 02:39.000
whereas the signals themselves are the actual samples

02:39.000 --> 02:40.000
of actual values, right?

02:40.000 --> 02:42.000
With that resource metadata,

02:42.000 --> 02:44.000
then we can start to do things like

02:44.000 --> 02:47.000
graphs and topologies of how those resources relate to each other.

02:47.000 --> 02:50.000
We might want to have snapshots or point in time,

02:50.000 --> 02:53.000
you know, views of how things looked at that point in time.

02:53.000 --> 02:55.000
We might want to store deltas because that's a nice cheap way

02:55.000 --> 02:58.000
to store the changes from those snapshots.

02:58.000 --> 03:00.000
And then finally, we're going to have some really simple stuff

03:00.000 --> 03:02.000
right that you can throw in like a sequel-like database

03:02.000 --> 03:04.000
if you wanted to, that's your configuration.

03:04.000 --> 03:08.000
Your alert rules, your users, and your dashboard configurations.

03:08.000 --> 03:12.000
So what do we actually need our database to do in order to do this?

03:12.000 --> 03:16.000
Well, most importantly, we need really, really fast ingestion.

03:16.000 --> 03:20.000
In the worst case scenario, if your ingestion pauses or falls over,

03:20.000 --> 03:23.000
it can actually cause the application sending the telemetry to fall over.

03:23.000 --> 03:25.000
If you're in that scenario, you have done something wrong,

03:25.000 --> 03:27.000
but I have seen it happen.

03:27.000 --> 03:28.000
So that's bad, right?

03:28.000 --> 03:29.000
We don't want that.

03:29.000 --> 03:31.000
We need to be able to ingest the data really, really fast.

03:31.000 --> 03:34.000
We also don't want to lose observability data even if the system doesn't go down.

03:34.000 --> 03:37.000
We have so much data that having efficient compression and storage

03:37.000 --> 03:40.000
is very important for a cost, right?

03:40.000 --> 03:43.000
Typically observability data is oriented around time,

03:43.000 --> 03:47.000
so we want an easy way to manage time oriented data.

03:47.000 --> 03:51.000
And we need real-time right means different things to different people,

03:51.000 --> 03:55.000
but we need something to proximity real-time analytics.

03:55.000 --> 03:58.000
Peter Marshall, I'd like to say anything you can do with a group by,

03:58.000 --> 04:00.000
that's what analytics is.

04:00.000 --> 04:02.000
So what you're saying here, right, is that it's about aggregating

04:02.000 --> 04:04.000
over billions and billions of rows.

04:04.000 --> 04:06.000
It's about being able to do operations like some

04:06.000 --> 04:12.000
or finding specific percentiles or means over tons of data

04:12.000 --> 04:14.000
really, really fast.

04:14.000 --> 04:17.000
That's really what we mean when we say real-time analytics.

04:17.000 --> 04:20.000
So we need fast multi-row analytics.

04:20.000 --> 04:22.000
If we don't know exactly what we're looking for in our logs,

04:22.000 --> 04:24.000
we might need full-tech search.

04:24.000 --> 04:26.000
If we have a really good organized metadata,

04:26.000 --> 04:29.000
like because we're using open telemetry correctly,

04:29.000 --> 04:31.000
then we need good tag label search, right,

04:31.000 --> 04:33.000
which is easier to index than full-tex,

04:33.000 --> 04:35.000
but still requires an index.

04:35.000 --> 04:37.000
If we're doing a learning, we definitely need to be able to read

04:37.000 --> 04:41.000
the last sample across a huge range of time series really, really fast.

04:41.000 --> 04:45.000
And we probably don't really need to be able to update the data, right?

04:45.000 --> 04:48.000
We can kind of treat all of this as an appendomly situation.

04:48.000 --> 04:51.000
There are some use cases where you might need to update your observability

04:51.000 --> 04:54.000
to, we're not really going to talk about that today.

04:54.000 --> 04:57.000
Okay, so databases come in a bunch of flavors, right?

04:57.000 --> 05:00.000
We have OLTP, that's Postgres, that we're all familiar with.

05:00.000 --> 05:04.000
We have OLAP, these are analytics databases that specialize

05:04.000 --> 05:07.000
in kind of what we're talking about here with the real-time analytics.

05:07.000 --> 05:09.000
We have time series databases.

05:09.000 --> 05:12.000
Alexander gave us a great overview of how those work.

05:12.000 --> 05:16.000
And then we have like combination search analytics databases.

05:16.000 --> 05:20.000
This is not like a scientific textonomy, which is sort of how I break things down.

05:20.000 --> 05:24.000
And in reality, oh, right, so we have a couple of examples of each of these,

05:24.000 --> 05:25.000
right, so Postgres is our OLTP.

05:25.000 --> 05:28.000
Cassandra is another OLTP database that we'll talk about a little bit,

05:28.000 --> 05:31.000
although in this room I'm probably not that qualified to talk about Cassandra.

05:31.000 --> 05:34.000
And then we'll talk about elastic and open search,

05:34.000 --> 05:37.000
and Prometheus, and finally, Clickhouse.

05:38.000 --> 05:40.000
But tech armies are hard.

05:40.000 --> 05:43.000
A lot of these databases, kind of borrow tricks from each other,

05:43.000 --> 05:46.000
and can kind of do all of the things.

05:46.000 --> 05:49.000
So if you need to shoehorn a use case into something that was meant for a different use case,

05:49.000 --> 05:50.000
you probably can.

05:50.000 --> 05:53.000
But if we're talking about like archetypes in a two-year space,

05:53.000 --> 05:55.000
this is kind of where I put things.

05:55.000 --> 05:59.000
And I'm a little less sure about solar and Cassandra, so those are kind of great.

05:59.000 --> 06:01.000
But I'm pretty sure on these other things.

06:01.000 --> 06:03.000
This is where they fit in.

06:04.000 --> 06:08.000
All right, the storage on disk is like the most important part of the database architecture.

06:08.000 --> 06:12.000
We heard a lot about this from Alexander, but this is always going to be the bottleneck.

06:12.000 --> 06:14.000
This is always going to be the slowest part of your database.

06:14.000 --> 06:17.000
So really all of the architectural tricks that we're talking about is like,

06:17.000 --> 06:21.000
how do we avoid reading more data than we need from the disk?

06:21.000 --> 06:26.000
How do we lay out the data so that it can be read really, really quickly and sequentially?

06:26.000 --> 06:31.000
We think about Postgres, it does not optimize for this, right?

06:31.000 --> 06:34.000
It uses heat pages and a commit log.

06:34.000 --> 06:36.000
We have time series blocks, that's like Alexander's talk,

06:36.000 --> 06:39.000
goes into great detail kind of on how that works.

06:39.000 --> 06:41.000
And then we have a mutable parts or segments.

06:41.000 --> 06:46.000
These are just like the different ways that we can sort of store our data on the disk in files.

06:46.000 --> 06:50.000
So if we wanted to make a postgres, we'd use a right-of-head log, right, a wall.

06:50.000 --> 06:53.000
We have heat pages and a multi-view concurrency control.

06:53.000 --> 06:56.000
That's how Postgres achieves the multi-threading, which is awesome.

06:56.000 --> 07:00.000
And we have B3 indexes, which Alexander talked about how slow those are for this,

07:00.000 --> 07:01.000
right?

07:01.000 --> 07:03.000
We need, we need, we need really, really fast ingest.

07:03.000 --> 07:08.000
That's going to knock a B3 index over because it's way too many right operations.

07:08.000 --> 07:11.000
This is what Postgres and my school do though, right?

07:11.000 --> 07:15.000
So they're optimized for this update and upset use case with strong asset guarantees.

07:15.000 --> 07:18.000
And I'm sure you've run into this, but like scaling horizontally,

07:18.000 --> 07:19.000
is really, really hard.

07:19.000 --> 07:22.000
And so then you end up with these mission critical, massively, vertically scaled,

07:22.000 --> 07:28.000
transactional databases, which is fine, but it's not great for analytics use cases.

07:28.000 --> 07:30.000
So let's talk about how we actually do analytics.

07:30.000 --> 07:34.000
And we also got a little bit of a preview of this from Alexander, if you're here.

07:34.000 --> 07:36.000
Log structured merge tree, right?

07:36.000 --> 07:39.000
So this kind of emerged in like the early 2010s.

07:39.000 --> 07:42.000
It is the underlying storage for a bunch of databases.

07:42.000 --> 07:46.000
It's also one of the ways that Kafka can persist data, right?

07:46.000 --> 07:49.000
Basically we have key value pairs that are buffered in memory,

07:49.000 --> 07:52.000
and then sorted before they're flush to the disk,

07:52.000 --> 07:56.000
which allows us to have more efficient file structures on the disk

07:56.000 --> 08:01.000
that are already optimized for the reads and the queries that we're going to do.

08:01.000 --> 08:02.000
It's pretty nice.

08:02.000 --> 08:04.000
This diagram comes from a blog from Ben Software.

08:04.000 --> 08:07.000
It's a great blog post if you want to read more about like the details,

08:07.000 --> 08:10.000
the nitty-gritty of how log structured merge tree works.

08:10.000 --> 08:15.000
So one implementation of log structured merge tree is a patchy-lucing,

08:15.000 --> 08:19.000
which is an engine that is used to power a number of databases.

08:19.000 --> 08:21.000
You probably used it even if you don't know that you've used it,

08:21.000 --> 08:25.000
because it's like built into a bunch of like commercial observability platforms,

08:25.000 --> 08:28.000
as well as these open source databases,

08:28.000 --> 08:31.000
Cassandra, Alaska, open search, and solar.

08:31.000 --> 08:33.000
Among others.

08:33.000 --> 08:37.000
It's based on that third storage style that we've talked about a couple of slides.

08:37.000 --> 08:39.000
It go right in mutable parts and we've seen they're called segments,

08:39.000 --> 08:42.000
not parts, but it's essentially the same concept.

08:42.000 --> 08:44.000
And then we have this background compaction process

08:44.000 --> 08:46.000
so that once that stuff is getting written to disk,

08:46.000 --> 08:49.000
it's further optimized into these larger and larger files

08:49.000 --> 08:51.000
that are easier to query.

08:51.000 --> 08:53.000
So it's like this, right?

08:53.000 --> 08:56.000
You have a batch insert that creates a new part or a new segment,

08:56.000 --> 09:00.000
and those parts get merged and background into larger parts.

09:00.000 --> 09:05.000
Sandra, kind of a great example of this architecture.

09:05.000 --> 09:08.000
It's really, really good at storing wide events, right?

09:08.000 --> 09:09.000
It doesn't, it's schemaless.

09:09.000 --> 09:12.000
It's a source key value pairs, like a document database.

09:12.000 --> 09:15.000
So we can kind of just throw whatever wide events we want at it,

09:15.000 --> 09:20.000
and we have some semblance of OLTP, like transactional guarantees,

09:20.000 --> 09:23.000
which is cool, but it's still not fast enough for most of us

09:23.000 --> 09:25.000
or really use cases.

09:25.000 --> 09:28.000
Vector engines and search kind of take that

09:28.000 --> 09:30.000
losing engine, right?

09:30.000 --> 09:33.000
And that, those are immutable segments.

09:33.000 --> 09:36.000
And we add some tricks so that we can find the data really, really quickly.

09:36.000 --> 09:38.000
We already have the way to ingest it really quickly.

09:38.000 --> 09:40.000
Now we need to find it really, really quickly.

09:40.000 --> 09:43.000
So those tricks will dive into each one of these in order.

09:43.000 --> 09:44.000
Inverted indexes.

09:44.000 --> 09:46.000
I think there's a talk after this one that's going to go in

09:46.000 --> 09:49.000
to like more depth on how Inverted indexes work.

09:49.000 --> 09:52.000
Or I might have been before this one, but check that one out.

09:52.000 --> 09:57.000
I will talk about Bloom filters, which is a way to eliminate data that you might not need to read,

09:57.000 --> 10:00.000
and approximate nearest neighbor, which is for vectors.

10:00.000 --> 10:03.000
So Inverted indexes really quickly, right?

10:03.000 --> 10:05.000
You just have basically you flip the script.

10:05.000 --> 10:08.000
So instead of your keys pointing to your data,

10:08.000 --> 10:11.000
your values point to your keys, so you can say,

10:11.000 --> 10:15.000
if I'm looking for cat, which documents contain the word cat,

10:15.000 --> 10:18.000
pretty straightforward.

10:18.000 --> 10:21.000
Bloom filters are a little more complicated.

10:21.000 --> 10:27.000
Basically what we do with the Bloom filter is rather than having all of the different values in our text

10:27.000 --> 10:32.000
that we might want to index and pointing to the keys that contain those things,

10:32.000 --> 10:34.000
which can write like that's a pretty dense index.

10:34.000 --> 10:37.000
That's going to be a huge thing if you've used the last sticker open search,

10:37.000 --> 10:38.000
to build these indexes.

10:38.000 --> 10:41.000
It takes a lot of time, and it's a very big thing that you have to store.

10:41.000 --> 10:43.000
A Bloom filter is much more compact.

10:44.000 --> 10:46.000
And basically what we do is we take a string,

10:46.000 --> 10:51.000
and we run some arithmetic on it to condense it down into many fewer bytes or bits,

10:51.000 --> 10:53.000
and it's glossy, right?

10:53.000 --> 10:58.000
So we don't know necessarily if we use a Bloom filter on a piece of string,

10:58.000 --> 11:01.000
we don't know necessarily that that string does appear, right?

11:01.000 --> 11:05.000
Like if we're running a query for Otalk Lecture prod-01,

11:05.000 --> 11:08.000
then we can look for any row that contains 31,

11:08.000 --> 11:12.000
and know that might contain Otalk Lecture prod-01,

11:12.000 --> 11:16.000
but it might also contain Otalk Lecture prod-10.

11:16.000 --> 11:20.000
So we then have to check all of those rows that had 31 in the index,

11:20.000 --> 11:22.000
and check them, right?

11:22.000 --> 11:26.000
You have to use specific settings for the Bloom filter to get this exact result,

11:26.000 --> 11:28.000
but yeah, that's how Bloom filters work.

11:28.000 --> 11:30.000
They are much sparsher than an inverted index,

11:30.000 --> 11:33.000
which has the benefit that you can maybe fit the entire thing in memory,

11:33.000 --> 11:36.000
which is going to make it a lot faster because then you can go to the memory

11:36.000 --> 11:39.000
and figure out which parts of the disk you actually might need to read.

11:40.000 --> 11:44.000
But you will, of course, have some mild over scans because you have not,

11:44.000 --> 11:47.000
you're not sure about your positives.

11:47.000 --> 11:49.000
Then if we're talking about vectors, right?

11:49.000 --> 11:51.000
Like dog is an easy thing to search for,

11:51.000 --> 11:53.000
but if we want to search for the concept of a dog,

11:53.000 --> 11:56.000
then we're starting to talk about vectors and embeddings,

11:56.000 --> 11:59.000
vectors of course exist in this multi-dimensional space,

11:59.000 --> 12:01.000
and so in that multi-dimensional space,

12:01.000 --> 12:03.000
we can calculate nearness,

12:03.000 --> 12:06.000
and we can sort of define which vectors are nearest to each other

12:06.000 --> 12:08.000
from that we can create neighborhoods,

12:08.000 --> 12:11.000
and then we can create layers on top of that that the query engine can walk

12:11.000 --> 12:16.000
in order to find vectors closest to a specific point in that vector space.

12:16.000 --> 12:20.000
So that's how a approximate nearest neighbor works.

12:20.000 --> 12:25.000
For me, theus, and other databases based on the For me, these architecture,

12:25.000 --> 12:29.000
which includes Loki, sort of follow different architecture.

12:29.000 --> 12:31.000
So we have this like time-series database architecture,

12:31.000 --> 12:33.000
which is a pendulumy, right?

12:33.000 --> 12:36.000
We have these TSTB blocks, and they're a pendulumy.

12:36.000 --> 12:40.000
So we just always take the last sample and pop it on the end,

12:40.000 --> 12:43.000
and our samples are all at the same type,

12:43.000 --> 12:46.000
and we have one file per unique time series,

12:46.000 --> 12:48.000
so we don't really have to store any extra metadata,

12:48.000 --> 12:50.000
right? We just put the sample into the correct file,

12:50.000 --> 12:52.000
and it is where we need it to be.

12:52.000 --> 12:54.000
This seems really simple and awesome,

12:54.000 --> 12:56.000
but there's a problem with it.

12:56.000 --> 13:01.000
Your number of time series is going to follow this formula,

13:01.000 --> 13:03.000
which can explode really, really fast,

13:03.000 --> 13:07.000
unless you really, really carefully guard your cardinality.

13:07.000 --> 13:10.000
That requires everybody sending you data

13:10.000 --> 13:13.000
to be on sort of the same page about what they're going to call things,

13:13.000 --> 13:15.000
and how much cardinality they're going to use up, right?

13:15.000 --> 13:19.000
We don't really necessarily have an easy way to control this

13:19.000 --> 13:22.000
in large organizations, and so then you can deal with time series databases.

13:22.000 --> 13:26.000
They're very difficult to scale and keep maintain performance.

13:26.000 --> 13:30.000
But there is a nice thing about the way that that data is laid on on disk,

13:30.000 --> 13:33.000
if we want to read, like if we're trying to draw a graph on a graph,

13:33.000 --> 13:35.000
you think about what we just read that one file,

13:35.000 --> 13:36.000
and it has all of the samples in it.

13:36.000 --> 13:38.000
That's a really, really simple read operation.

13:38.000 --> 13:39.000
It's really fast.

13:39.000 --> 13:44.000
And also another benefit of having the same type all stored next to each other

13:44.000 --> 13:46.000
is we can compress it really, really easily,

13:46.000 --> 13:47.000
because it's the same type.

13:47.000 --> 13:50.000
And so compression algorithms work great on column-oriented data,

13:50.000 --> 13:53.000
which is sort of what you can think of a time series database,

13:53.000 --> 13:57.000
block as being it's like a single column in a column-oriented database.

13:58.000 --> 14:01.000
Clickhouse combines that column-oriented nature,

14:01.000 --> 14:04.000
and the log structured merge tree, so instead of storing key value pairs,

14:04.000 --> 14:06.000
like in traditional log structured merge tree,

14:06.000 --> 14:09.000
we are now storing entire columns.

14:09.000 --> 14:11.000
That's the main difference, right?

14:11.000 --> 14:14.000
So we still have the background compaction.

14:14.000 --> 14:19.000
We still have temporary buffers, and clickhouse is actually on disk,

14:19.000 --> 14:22.000
so the data is immediately readable and flush to disk.

14:22.000 --> 14:26.000
There's no memory buffer, but you do get all of the benefits

14:26.000 --> 14:29.000
of the background compaction and the merge tree engine.

14:29.000 --> 14:33.000
What this means is that say we have a table that has

14:33.000 --> 14:36.000
1009 columns, it's 59 gigabytes of data,

14:36.000 --> 14:40.000
in a row or into database, we would need to scan all of the data

14:40.000 --> 14:42.000
to answer a question about every row.

14:42.000 --> 14:44.000
We even will only care about one column.

14:44.000 --> 14:48.000
If we care about every row, we have to scan the entire table.

14:48.000 --> 14:52.000
That's a lot of data that we have to read from disk and transfer, right?

14:52.000 --> 14:55.000
So immediately going to a column-oriented database reduces that

14:55.000 --> 14:59.000
from 59 gigabytes to 1.7 gigabytes, so dramatic reduction,

14:59.000 --> 15:01.000
just from going to a column-oriented.

15:01.000 --> 15:03.000
But then because that data compresses even better,

15:03.000 --> 15:06.000
because it's column-oriented, we get down to 21 megabytes,

15:06.000 --> 15:10.000
and then because we can process this with multiple threads,

15:10.000 --> 15:14.000
each thread only has to read 2.6 megabytes instead of 59 gigabytes.

15:14.000 --> 15:18.000
So we're at a fraction of 100th of a percent of the amount of data

15:18.000 --> 15:20.000
that we actually need to read, which is how we take queries

15:20.000 --> 15:24.000
that take thousands of seconds and run them in under a second.

15:24.000 --> 15:28.000
Here's kind of a view of how this looks in Clickhouse, right?

15:28.000 --> 15:31.000
I'm kind of trying to visualize that these are entire columns,

15:31.000 --> 15:33.000
instead of key value pairs.

15:33.000 --> 15:35.000
The data is immediately available,

15:35.000 --> 15:37.000
and eventually we get to this fully marked part

15:37.000 --> 15:38.000
that's going to be really, really efficient,

15:38.000 --> 15:42.000
and we might ship that off to iceberg or something.

15:42.000 --> 15:46.000
Clickhouse uses those other indexes, but it uses them as skip filters,

15:46.000 --> 15:49.000
so it first needs to find that I did some of the data,

15:49.000 --> 15:52.000
and it makes heavy use of sparse indexes.

15:52.000 --> 15:54.000
These basically just point to a large block of data

15:54.000 --> 15:56.000
that's between two points in an ordered key,

15:56.000 --> 15:58.000
and usually when we're talking about a observability

15:58.000 --> 16:00.000
that ordered key is going to be time.

16:00.000 --> 16:03.000
So this allows us to really quickly find, you know,

16:03.000 --> 16:04.000
sections of data that we care about,

16:04.000 --> 16:06.000
and then we can apply those other tricks that I talked about earlier

16:06.000 --> 16:09.000
from open search as filters on this

16:09.000 --> 16:13.000
to decide which rows we actually need to look at further.

16:14.000 --> 16:16.000
All right, so I think I'm getting close to time,

16:16.000 --> 16:18.000
so which ones should you choose?

16:18.000 --> 16:21.000
Well, when I was first working on this talk,

16:21.000 --> 16:22.000
this is what I was running in my home lab,

16:22.000 --> 16:24.000
so those are pretty good choices,

16:24.000 --> 16:27.000
and they'll also scale, right?

16:27.000 --> 16:31.000
Although, permethias, there are some alternatives

16:31.000 --> 16:34.000
that kind of scale better, and maintain the same API,

16:34.000 --> 16:35.000
and I'll talk about those.

16:35.000 --> 16:38.000
But yeah, this is a great starting point.

16:38.000 --> 16:40.000
In our own cloud at Altunity,

16:40.000 --> 16:42.000
we use Victoria metrics instead of permethias,

16:42.000 --> 16:43.000
it's much more performant.

16:43.000 --> 16:44.000
I'm also running this in my home lab now.

16:44.000 --> 16:45.000
I've updated.

16:45.000 --> 16:47.000
It kind of combines some of those tricks,

16:47.000 --> 16:48.000
like the architecture permethias,

16:48.000 --> 16:50.000
with some of those tricks for merge tree,

16:50.000 --> 16:52.000
as Alexander told us all about that was awesome.

16:52.000 --> 16:56.000
Loki, very, very similar to permethias.

16:56.000 --> 16:58.000
It's just an append only thing.

16:58.000 --> 17:01.000
It does not have full text, it does not have full text indexes,

17:01.000 --> 17:03.000
like inverted indexes for full text search.

17:03.000 --> 17:04.000
It does support full text search,

17:04.000 --> 17:06.000
but then that requires standing the whole table,

17:06.000 --> 17:07.000
which is slow,

17:07.000 --> 17:09.000
but if you're just searching on labels,

17:09.000 --> 17:11.000
Loki's really, really fast, really, really simple.

17:11.000 --> 17:14.000
We also use this in our cloud.

17:14.000 --> 17:15.000
Honorable mentions,

17:15.000 --> 17:16.000
these are things I didn't have time to talk about,

17:16.000 --> 17:19.000
but cortex from the CNCF.

17:19.000 --> 17:21.000
Thanos, also from the CNCF,

17:21.000 --> 17:22.000
Memear, which is graphanas,

17:22.000 --> 17:24.000
sort of wrapper of cortex.

17:24.000 --> 17:27.000
Times scale DB is a plugin for post-grace.

17:27.000 --> 17:28.000
Solar droid, right?

17:28.000 --> 17:30.000
These are all viable options.

17:30.000 --> 17:32.000
Follow some of these tricks.

17:32.000 --> 17:34.000
I didn't have time to talk about.

17:34.000 --> 17:35.000
What else should you use?

17:35.000 --> 17:37.000
Well, that first example that I gave, right?

17:37.000 --> 17:40.000
It's more nuanced than just one recommendation.

17:40.000 --> 17:43.000
I'm a huge fan of using what you have until it breaks,

17:43.000 --> 17:45.000
so if you're just trying to keep some small data around,

17:45.000 --> 17:48.000
and not right, like do full observability,

17:48.000 --> 17:50.000
well, then yeah, post-grace can work for that.

17:50.000 --> 17:52.000
If you really need full text search,

17:52.000 --> 17:53.000
and that's your main use case,

17:53.000 --> 17:57.000
there's really not a better option than a elastic and open search.

17:57.000 --> 18:01.000
If you want to use one database for all of your use cases,

18:01.000 --> 18:03.000
traces, metrics, logs, right?

18:03.000 --> 18:05.000
Clickhouse is a pretty good choice.

18:05.000 --> 18:09.000
It can kind of suit all of these use cases pretty well.

18:09.000 --> 18:13.000
And if you need real time analytics and wide event analytics,

18:13.000 --> 18:16.000
clickhouse is also a really good choice for that.

18:16.000 --> 18:19.000
If you need lots of last sample reads,

18:19.000 --> 18:21.000
e.g. you're doing a learning.

18:21.000 --> 18:23.000
Then a time series database, like for me,

18:23.000 --> 18:25.000
the use of the Kramer matrix is going to work really, really well for you,

18:25.000 --> 18:28.000
because it optimizes for that last sample read use case.

18:28.000 --> 18:33.000
And if you want to kind of overview of everything that we've just talked about,

18:33.000 --> 18:34.000
here it is.

18:34.000 --> 18:38.000
This is my last slide with this QR code, if you want to scan it.

18:38.000 --> 18:52.000
And we are hosting Meetup on Monday night at the comments up.

18:52.000 --> 18:56.000
So if you like this talk and you like talking about this stuff in much more depth than

18:56.000 --> 18:57.000
then come join us.

18:57.000 --> 19:00.000
I also have a bunch of stickers on the table over there.

19:00.000 --> 19:01.000
And thank you.

19:01.000 --> 19:03.000
And happy Creen.

19:03.000 --> 19:05.000
Thank you.