WEBVTT

00:00.000 --> 00:07.840
I am a developer advocate at Altinity.

00:07.840 --> 00:10.600
We do hosting services and support for Clickhouse.

00:10.600 --> 00:14.960
So if what I'm talking about to you is interesting, come talk to me, follow me on LinkedIn

00:14.960 --> 00:20.440
or come talk to us at our booth in building UA across the courtyard.

00:20.440 --> 00:21.440
Let's dive in.

00:21.440 --> 00:27.360
So they were going to talk about observability in one, or all you in one as I like to say.

00:27.360 --> 00:32.840
I'm using Clickhouse as a unified database for all of our telemetry signals.

00:32.840 --> 00:40.240
OK, so I spent most of 2022 and 2023 really talking about open telemetry both publicly and

00:40.240 --> 00:44.160
internally at my employer.

00:44.160 --> 00:46.760
And usually I start with a diagram that looks something like this.

00:46.760 --> 00:47.760
It's very familiar.

00:47.760 --> 00:51.800
I'm sure to all of you from the open telemetry documentation or just any blog post that you

00:51.800 --> 00:53.520
write about open telemetry.

00:53.520 --> 00:57.160
In this simplicity of this diagram, it's really useful for explaining the architecture

00:57.160 --> 01:00.360
of open telemetry to someone who's never encountered it before.

01:00.360 --> 01:05.240
But it is very misleading, and it misrepresents the complexity of how we're actually using

01:05.240 --> 01:10.920
open telemetry in our real world applications.

01:10.920 --> 01:16.080
So one of the things that's sort of proudly placed on the open telemetry website, so

01:16.080 --> 01:21.960
as to not, I think, compete with the vendors that are supporting open telemetry, the open

01:21.960 --> 01:26.160
telemetry project does not include any kind of database or back end UI.

01:26.160 --> 01:30.800
That's sort of just up to you, right?

01:30.800 --> 01:34.640
And then we also have this piece of information from the Grafana State of Observability

01:34.640 --> 01:37.360
Report, thank you Grafana folks.

01:37.360 --> 01:42.120
Six is the typical number of observability tools that an organization will have to

01:42.200 --> 01:43.800
deploy in production, right?

01:43.800 --> 01:47.240
And I don't mean like Fortune 100 organizations here.

01:47.240 --> 01:49.240
I mean any organization.

01:52.840 --> 01:58.320
So what I think we actually need, if we kind of take that first graph that we were looking at

01:58.320 --> 02:03.280
and actually just reverse it, we get something that looks more like how we actually

02:03.280 --> 02:05.760
would want to think about observability in the real world.

02:05.760 --> 02:10.040
Some of, you know, on the right, or sorry, all the way there, on my right, on your left,

02:10.080 --> 02:12.600
we have some of my favorite sources of telemetry data.

02:12.600 --> 02:17.240
You're probably using some combination of all of those today in your organization.

02:17.240 --> 02:18.920
We then need to consolidate, right?

02:18.920 --> 02:25.240
We want to take that and in order to process that information and correlate it correctly

02:25.240 --> 02:28.800
in the way that we want, build the topologies that we want, build the dependency graphs

02:28.800 --> 02:29.800
that we want.

02:29.800 --> 02:32.280
We need some way to combine all of that information.

02:32.280 --> 02:35.800
And then ultimately we need to store it so that we can use it for things like alerting

02:35.800 --> 02:38.920
and dashboards and analysis.

02:38.920 --> 02:45.920
So I think this diagram is more realistic than the one that we just had.

02:45.920 --> 02:47.000
And there are more challenges, right?

02:47.000 --> 02:49.000
If we're splitting up our planetary data.

02:49.000 --> 02:54.160
Now, technically, you could build a unified observability solution with a separate

02:54.160 --> 02:59.000
ticket-based for each signal, but you might not want to, right?

02:59.000 --> 03:04.200
But if you don't have that unified, without that unified interface, right, without one way

03:04.200 --> 03:08.600
to view all of your data when you have to go to different tools for your different

03:08.600 --> 03:15.000
types of data, different signals from different sources, you run into problems, right?

03:15.000 --> 03:20.600
We're all very, very smart and good at our jobs, but the more times you have to, like,

03:20.600 --> 03:25.160
correlate an ID from one tool to an ID from another tool, and basically do all of the

03:25.160 --> 03:29.880
joins in your head, just every single one of those little points of friction is a chance

03:29.880 --> 03:34.200
to make a mistake when you're in a really stressful scenario trying to respond to a

03:34.200 --> 03:35.200
production out of it, right?

03:35.200 --> 03:38.080
And it's like, we can't afford mistakes.

03:38.080 --> 03:41.200
So ideally, we should be reducing all of the opportunities for mistakes.

03:41.200 --> 03:44.920
And instead, we don't, right, like, we have this, like, set up where we need eight different

03:44.920 --> 03:49.360
monitors to respond to a production incident, it's kind of crazy.

03:49.360 --> 03:52.040
We got a lot of haystacks to look through, right?

03:52.040 --> 03:56.680
But when we look at what actually goes into those haystacks, right, like, what are we actually

03:56.680 --> 03:59.080
storing in our telemetry databases?

03:59.080 --> 04:00.800
Of course, it's the signals, right?

04:00.800 --> 04:03.920
It's the metrics, traces, and logs, soon profiles and events, right?

04:03.920 --> 04:08.120
Open telemetry is adding more signal types, it's not three pillars, and that's awesome.

04:08.120 --> 04:12.800
We need all of these different data formats to structure the data in the way that's useful

04:12.800 --> 04:16.560
and compressible for the things that your questions were trying to answer.

04:16.560 --> 04:19.920
So that's going to be the bulk of your data, and that's going to be the bulk of your queries.

04:19.920 --> 04:21.800
But there's more stuff than just that, right?

04:21.800 --> 04:25.840
We also need the resource metadata, resource metadata, resources for those not familiar with

04:25.840 --> 04:28.240
the open telemetry terminology.

04:28.240 --> 04:32.240
It's really just the metadata that describes the entities, right?

04:32.240 --> 04:38.880
So the node that the process is running on, the process, things like that.

04:38.880 --> 04:41.320
Maybe the EC2 region that the node is in, right?

04:41.320 --> 04:45.920
Things like that would all go in your resource metadata, and all of that information is useful

04:45.920 --> 04:52.160
because it allows us to tie together, right, those telemetry signals, and contextualize

04:52.160 --> 04:55.880
them and make sense of them and what they actually relate to.

04:55.880 --> 05:00.400
Then from all of those relationships, we can actually create graphs and apologies of

05:00.400 --> 05:01.400
right.

05:01.400 --> 05:05.160
We can create dependency topologies, we can create topologies like James showed to two presentations

05:05.160 --> 05:10.120
to go that show sort of the entire architecture of our application.

05:10.120 --> 05:14.120
These are really, really useful.

05:14.120 --> 05:18.960
At scale, when we talk about really, really large organizations, one of the hardest challenges

05:18.960 --> 05:24.640
is just zeroing in on the data that you care about, and it turns out that if you have a topology

05:24.640 --> 05:29.080
as sort of the front of your filter, that's a really good way to filter down to the specific

05:29.080 --> 05:30.920
data that you care about.

05:30.920 --> 05:36.280
Of course, if we have graphs and topologies, those are kind of expensive to calculate.

05:36.280 --> 05:38.960
So we're going to want to store them and maybe hang on to them for a certain period of

05:38.960 --> 05:39.960
time.

05:39.960 --> 05:40.960
So we're going to have some kind of snapshots.

05:40.960 --> 05:45.360
If we have snapshots, it makes sense that we might have some concept of deltas, in

05:45.360 --> 05:49.440
order to efficiently store changes in those snapshots, and then finally, right, of course,

05:49.440 --> 05:50.440
we need configuration.

05:50.440 --> 05:54.160
We need users, we need the dashboards, all the stuff that actually is probably fine

05:54.160 --> 06:02.120
in like a traditional OLTP database, whereas if you're trying to use OLTP database for

06:02.120 --> 06:04.760
these other things, you're going to run into some scaling problems.

06:04.760 --> 06:09.120
You're going to run into some performance problems.

06:09.120 --> 06:13.320
Okay, so we can't use OLTP, is there a silver bullet?

06:13.320 --> 06:14.320
Is there something we can use?

06:14.320 --> 06:15.320
Well, no.

06:15.320 --> 06:19.360
If there was a silver bullet, honeycomb wouldn't have to build the custom storage engine

06:19.360 --> 06:23.440
that they have for their really cool high-currenality stuff.

06:23.440 --> 06:24.440
We need all of these things.

06:24.440 --> 06:28.120
Right, we need the ability to do full-text search, that's really important for logs.

06:28.120 --> 06:31.680
We need really efficient compression, that's important for everything, but especially for

06:31.680 --> 06:33.840
logs and metrics, right?

06:33.840 --> 06:37.760
We need to be able to the ability to query in real-time, more or less, and get real-time

06:37.760 --> 06:41.800
analytics on the data, as it's coming in, we need to be able to alert quickly, like, on

06:41.800 --> 06:46.680
the order of seconds, not minutes or hours, and any variations in that data.

06:46.680 --> 06:51.640
We need to be able to relate the data to other parts of the data, and at end of our scale,

06:51.640 --> 06:57.240
we're talking about a metric ton of data, right?

06:57.240 --> 07:00.320
So we need this solution to be petabyte scale.

07:00.320 --> 07:05.240
So while there is no silver bullet, Clickhouse comes pretty darn close to checking all

07:05.240 --> 07:07.880
of those boxes.

07:07.880 --> 07:11.440
So I think it's pretty starting to wait.

07:11.440 --> 07:13.080
So let's talk about Clickhouse.

07:13.080 --> 07:17.040
Raise your hand if you are already familiar with Clickhouse, we're using it.

07:17.040 --> 07:18.320
Wow, awesome.

07:18.320 --> 07:19.320
Okay.

07:19.320 --> 07:21.600
That's not the response I'm used to, but I love it.

07:21.840 --> 07:23.320
So click out really quickly, then.

07:23.320 --> 07:25.280
Clickhouse is single compatible.

07:25.280 --> 07:30.240
It is extremely massively scalable, both horizontally and vertically.

07:30.240 --> 07:33.160
And it is really, really, really, really fast, like,

07:33.160 --> 07:38.560
sub-second queries on petabyte's of data, and ingesting millions of rows per second

07:38.560 --> 07:39.720
is absolutely no problem.

07:42.800 --> 07:47.440
And it's really, really powerful for observability use cases specifically.

07:47.440 --> 07:50.240
And I'm going to talk about why that is and how the architecture of Clickhouse

07:50.240 --> 07:54.720
blends itself to observability use cases.

07:54.720 --> 07:57.040
So all of that data that we were looking at a couple of slides

07:57.040 --> 08:00.760
ago, right, the metrics traces, logs, profiles, events, all that fun stuff.

08:00.760 --> 08:02.360
It follows a worm pattern.

08:02.360 --> 08:04.640
Right, once read many, it doesn't change.

08:04.640 --> 08:08.360
Right, a log line is a log line, a trace is a trace.

08:08.360 --> 08:09.600
We never need to update it.

08:09.600 --> 08:13.360
So we need to write it, store it, hold on to it for a certain amount of time,

08:13.360 --> 08:16.200
whatever retention period policies are, right?

08:16.200 --> 08:21.000
And probably query it, read it a ton of times during that lifetime.

08:21.000 --> 08:24.040
Hopefully, if you're not reading it, then, like, why did you collect it?

08:24.040 --> 08:26.360
But that's a different question, a problem.

08:26.360 --> 08:30.760
And eventually it'll get deleted as a part of a bucket.

08:30.760 --> 08:34.600
But we don't ever need to update it along its lifetime.

08:34.600 --> 08:37.040
We think about this kind of data.

08:37.040 --> 08:38.600
Maybe we think about balance trees.

08:38.600 --> 08:41.640
Balance trees are really great for really, really quickly reading all

08:41.640 --> 08:43.320
of that structured data.

08:43.320 --> 08:46.600
But they're really expensive when it comes to inserting the data,

08:46.600 --> 08:49.320
because you have to reorder and reorganize everything

08:49.320 --> 08:52.440
and rebuild your entire tree every time you do an insert.

08:52.440 --> 08:56.840
Not great for tracing data where you're going to be getting 60

08:56.840 --> 09:01.440
spans for one word press query.

09:01.440 --> 09:03.400
So we have a solution for this.

09:03.400 --> 09:05.240
And one solution, this is what Alaska uses

09:05.240 --> 09:07.160
as log structured bird trees.

09:07.160 --> 09:08.960
So this is optimized for ingestion.

09:08.960 --> 09:11.560
And how we do this is we store key value pairs,

09:11.560 --> 09:16.280
offered a numbering, and we keep these really short little bits of data,

09:16.280 --> 09:17.720
and then we can order them.

09:17.720 --> 09:19.880
And then once they're sorted and ordered,

09:19.880 --> 09:21.840
we can flush those to the disk.

09:21.840 --> 09:24.240
We can immediately query them and use them as part of our read

09:24.240 --> 09:25.960
operations.

09:25.960 --> 09:28.960
And in the background, we can compact them.

09:28.960 --> 09:30.880
So we can make them bigger and bigger, keeping,

09:30.880 --> 09:32.720
meaning, hanging the order so that the read operations

09:32.720 --> 09:34.960
are still really fast and efficient, but also

09:34.960 --> 09:37.880
making better use of storage and better use of our nodes

09:37.880 --> 09:40.360
and speeding up those reads even further.

09:40.360 --> 09:42.080
This is really, really cool.

09:42.080 --> 09:44.120
This is a great blog post from Bedstepford.

09:44.120 --> 09:46.360
It goes into more detail on how this works in a last

09:46.360 --> 09:47.680
to specifically.

09:47.680 --> 09:49.760
It's a little bit different in Clickhouse.

09:49.760 --> 09:51.960
So in Clickhouse, instead of storing key value pairs

09:51.960 --> 09:56.000
and impacting them, we're storing entire columns,

09:56.000 --> 09:58.800
which, if you think about time series data,

09:58.800 --> 10:02.880
or basically any type of observability data,

10:02.880 --> 10:04.760
this is going to allow us to do the types of reads

10:04.760 --> 10:08.520
that we need to do as quickly as possible.

10:08.520 --> 10:11.880
There is a way to update and delete in Clickhouse.

10:11.880 --> 10:13.320
I'm not going to talk too much about that today,

10:13.320 --> 10:16.440
but it's possible if you do need to do that.

10:16.440 --> 10:18.720
And then we have the same thing that we saw in the last to great.

10:18.720 --> 10:21.120
We have these unmerged parts getting

10:21.120 --> 10:24.320
compacted into larger and larger parts that can be

10:24.320 --> 10:26.040
queried more and more efficiently.

10:26.040 --> 10:29.200
This is happening over the course of minutes

10:29.200 --> 10:32.840
in the background, as the data is inserted.

10:32.840 --> 10:34.000
So how does this help us?

10:34.000 --> 10:37.760
How does this data structure help us

10:37.760 --> 10:39.720
do observability better?

10:39.720 --> 10:41.920
Well, one thing that gives us right, we mentioned,

10:41.920 --> 10:43.800
it makes it very efficient.

10:43.800 --> 10:46.920
We get very fast writes, because we can just dump

10:46.920 --> 10:48.360
whatever data we have straight into memory

10:48.360 --> 10:50.720
and then let the nodes sort of figure it out.

10:50.720 --> 10:53.480
It's very time friendly, right, because we're ordering the data

10:53.480 --> 10:56.560
usually on the time stamp, right,

10:56.560 --> 11:00.000
that the telemetry signal came from.

11:00.000 --> 11:03.520
This format lengthens up to that ordering very well.

11:03.520 --> 11:05.560
We get very easy clean up through TTO policies.

11:05.560 --> 11:07.120
Again, because everything is ordered by time,

11:07.120 --> 11:09.960
we can just drop off and entire one of those petitions

11:09.960 --> 11:12.440
and handle our TTO.

11:12.440 --> 11:14.760
And since that data is all ordered,

11:14.760 --> 11:16.680
it's very compressible, right?

11:16.680 --> 11:20.440
And that makes it very cost-effective for us.

11:22.720 --> 11:24.240
Clickhouse also has some more features

11:24.240 --> 11:26.360
that lend itself to observability use cases.

11:26.360 --> 11:28.560
I like to say that you can almost build an entire application

11:28.560 --> 11:30.400
inside of Clickhouse using all of the features

11:30.400 --> 11:32.520
that are available.

11:32.520 --> 11:34.240
So we have materialized views.

11:34.240 --> 11:36.920
These allow us to transform data from one format into another.

11:36.920 --> 11:39.720
You might use these, for example, to take open telemetry spans,

11:39.720 --> 11:41.560
or, I don't know, to take zip-conspans

11:41.560 --> 11:44.320
and cast them to open telemetry spans.

11:44.320 --> 11:47.080
It's the same data, right, just slightly different keys.

11:47.080 --> 11:48.520
You can do much more powerful things with that,

11:48.520 --> 11:51.720
but that's a pretty common observability case.

11:51.720 --> 11:55.000
I mentioned we have time to live, so we can build

11:55.000 --> 11:58.160
in the expiration of data, build in policies.

11:58.160 --> 12:00.160
We can also migrate data from hot storage

12:00.160 --> 12:02.360
to cold storage using things like S3

12:02.360 --> 12:05.600
or other object storage, so we can use that to control our costs,

12:05.600 --> 12:07.120
as well.

12:07.120 --> 12:09.520
We can use this on premise, so we can keep data

12:09.520 --> 12:10.720
where I was talking to somebody yesterday

12:10.720 --> 12:14.600
who keeps 10 years of their load data in some cases.

12:14.600 --> 12:17.760
I'm sorry, but you definitely don't want to do that with data dog.

12:21.440 --> 12:24.640
And, right, yeah, so we tear it storage.

12:24.640 --> 12:27.280
OK, and then integrations, right?

12:27.280 --> 12:29.840
This is SQL compatible, so that just basically

12:29.840 --> 12:33.480
opens up the door to integrate with pretty much anything.

12:33.480 --> 12:36.840
There's actually two griffana data source plugins.

12:36.840 --> 12:38.840
One is maintained by an alternative.

12:38.840 --> 12:41.880
By us, it has some nice little macro helper functions

12:41.880 --> 12:44.040
in it to help you create time series data

12:44.040 --> 12:46.480
out of stuff that wasn't originally time series data

12:46.480 --> 12:50.640
and make it look really nice in your griffana dashboards.

12:50.640 --> 12:52.520
Jager, which is a popular tracing

12:52.520 --> 12:55.720
to trace visualization tool, supports Clickhouse

12:55.720 --> 12:57.480
as a back end, out of the box.

12:57.480 --> 12:59.400
You just have to say, yep, I want to use Clickhouse

12:59.400 --> 13:01.280
and give it an instance.

13:01.280 --> 13:04.040
There's C-Loki, which is actually evolved into Korean,

13:04.040 --> 13:07.040
which I'll talk about in a little bit.

13:07.040 --> 13:09.320
And there is a Kafka table engine for Clickhouse.

13:09.320 --> 13:10.840
So if you're one of those organizations,

13:10.840 --> 13:13.080
the already has all of your observability data

13:13.080 --> 13:16.560
on a Kafka stream, you just point that Kafka stream

13:16.560 --> 13:18.680
that your Clickhouse database and boom,

13:18.680 --> 13:22.600
all of your observability data is persisted.

13:22.600 --> 13:25.560
So we end up with something that looks kind of like this.

13:25.560 --> 13:26.840
Oh, right.

13:26.840 --> 13:29.800
There's an open telemetry x-forter for Clickhouse.

13:29.800 --> 13:32.360
So this goes in the collector or in your open telemetry

13:32.360 --> 13:33.480
SDKs.

13:33.480 --> 13:36.560
It's currently supports traces, metrics, and logs.

13:36.560 --> 13:40.040
Support for the other signals will be coming soon.

13:40.040 --> 13:42.400
And so through that, then we get access to not just

13:42.400 --> 13:45.040
like those six, like my favorite things over there.

13:45.040 --> 13:47.400
But all of the 90 plus receivers

13:47.400 --> 13:50.400
from the open telemetry ecosystem are now available to us

13:50.400 --> 13:55.200
as data sources that we can pipe directly into Clickhouse.

13:55.200 --> 13:57.480
Cool.

13:57.480 --> 13:58.160
OK.

13:58.160 --> 13:59.640
So more benefits.

13:59.640 --> 14:03.360
We get excellent compression, even with variable schemas.

14:03.360 --> 14:06.440
So we're taking all of these disparate sources.

14:06.440 --> 14:08.640
We don't have to know ahead of time what schemas we're

14:08.640 --> 14:09.080
dealing with.

14:09.080 --> 14:10.480
We don't have to build that into Clickhouse.

14:10.480 --> 14:15.440
We can use sort of arbitrary maps and JSON columns.

14:15.440 --> 14:17.560
And then we can still take advantage of some of the excellent

14:17.560 --> 14:19.680
compression and really fast reads that Clickhouse is going

14:19.680 --> 14:20.440
to give us.

14:20.440 --> 14:23.360
It'll even inside a JSON map column

14:23.360 --> 14:28.240
turn those into column or storage parts.

14:28.240 --> 14:31.280
We get practically unlimited cardinality.

14:31.280 --> 14:34.480
So CloudFlare gave a great talk at Monadorama last year

14:34.480 --> 14:36.880
where they mentioned, it talks about moving from

14:36.880 --> 14:39.360
permitious to clickhouse for time series data.

14:39.360 --> 14:42.640
Anyone who runs a cardinality issues in permitious?

14:42.640 --> 14:43.400
Yeah.

14:43.400 --> 14:43.880
OK.

14:43.880 --> 14:45.080
No one even wants to raise their hand for that.

14:45.080 --> 14:48.320
But I know you all have.

14:48.320 --> 14:50.040
Low cardinality in permitious, right?

14:50.040 --> 14:52.800
Low cardinality, everything's relative.

14:52.800 --> 14:57.360
And permitious, it means like dozens may be scores.

14:57.360 --> 15:03.080
OK, in Clickhouse, low cardinality means millions.

15:03.080 --> 15:06.600
So we're just talking about completely unrelated orders

15:06.600 --> 15:08.840
of magnitude when it comes to cardinality.

15:08.840 --> 15:12.000
That's really, really useful, too, for this sort of arbitrary

15:12.000 --> 15:14.440
observability data where we don't necessarily

15:14.440 --> 15:16.240
have control over all that on the front end, right?

15:16.240 --> 15:19.200
We're part of an SRE team or some team that's managing all

15:19.200 --> 15:21.280
of this data, but there's a ton of other teams generating

15:21.280 --> 15:24.120
the data, and we can't control what they do.

15:24.120 --> 15:27.680
So cardinality is one of those things that's actually really

15:27.680 --> 15:28.520
hard to control.

15:28.520 --> 15:30.680
It's going to drive your costs up with some of those vendor

15:30.680 --> 15:31.680
tools.

15:31.680 --> 15:34.200
But with Clickhouse, you can kind of worry about it less.

15:34.200 --> 15:38.360
Not, you still do have to worry about it just a lot less.

15:38.360 --> 15:40.520
And like I mentioned at the beginning, right?

15:40.520 --> 15:44.040
It's extremely horizontally scalable, both for ingestion

15:44.040 --> 15:45.320
and for reads.

15:45.320 --> 15:46.840
So this is great for observability.

15:46.840 --> 15:50.080
Because we can scale up to me our own needs, whatever those may

15:50.080 --> 15:52.320
be.

15:52.320 --> 15:53.640
There are some challenges, though.

15:53.640 --> 15:54.680
It's not perfect.

15:54.680 --> 15:55.560
It's not.

15:55.560 --> 15:57.080
It doesn't solve all of our problems, right?

15:57.080 --> 15:58.520
Nothing does.

15:58.520 --> 16:00.840
One of the biggest challenges that people went into using

16:00.840 --> 16:04.200
Clickhouse for observability is that SQL is not from

16:04.200 --> 16:05.040
through all.

16:05.040 --> 16:07.200
And people who spend most of their time dealing with

16:07.200 --> 16:09.960
observability data tend to really like from through all, right?

16:09.960 --> 16:12.200
Like it's a very productive, very tourist language

16:12.200 --> 16:16.760
for the types of queries that it handles.

16:16.760 --> 16:20.400
As far as fair, because both ataltinity and at Clickhouse

16:20.400 --> 16:23.280
incorporated, there is work underway currently

16:23.280 --> 16:26.120
to bring PromQL support to Clickhouse.

16:26.120 --> 16:28.760
So you can just write PromQL queries.

16:28.760 --> 16:29.760
Yeah.

16:29.760 --> 16:30.640
Wouldn't that be awesome.

16:30.640 --> 16:32.760
And then that opens up even more integrations, right?

16:32.760 --> 16:35.720
Like think of all the tools that integrate with PromQL,

16:35.720 --> 16:37.480
then you can just use those with Clickhouse

16:37.480 --> 16:39.040
makes your Grapana dashboard easier.

16:39.040 --> 16:41.640
It makes courses integration possible.

16:41.640 --> 16:43.080
So this is really cool.

16:43.080 --> 16:43.920
OK.

16:43.920 --> 16:47.160
Another challenge with Clickhouse is that some people say

16:47.160 --> 16:51.880
that it can be overly complex for small data volumes.

16:51.880 --> 16:55.560
This data point, this bullet point comes more or less directly

16:55.560 --> 16:58.920
from the Clickhouse in blog, on observability with Clickhouse.

16:58.920 --> 17:00.600
I kind of disagree.

17:00.600 --> 17:05.440
I've spoken to, we have some customers who are dealing with tens

17:05.440 --> 17:08.040
of gigabytes of data on Clickhouse.

17:08.040 --> 17:10.440
And because of all of those features that we were talking

17:10.440 --> 17:12.360
about, right, they'd find that it's worth it,

17:12.360 --> 17:14.120
even though they're not dealing with a scale

17:14.120 --> 17:16.440
that Clickhouse was sort of originally built for.

17:16.440 --> 17:17.840
So you can use this at any scale.

17:17.840 --> 17:21.720
And another thing I would say is, I've got

17:21.720 --> 17:24.840
in 150 gigabytes of observability data out of a demo

17:24.840 --> 17:26.920
running on the Raspberry Pi for a few days.

17:26.920 --> 17:28.920
So what is small, right?

17:28.920 --> 17:30.680
Everything's relative.

17:30.680 --> 17:33.480
And then one final challenge with Clickhouse,

17:33.480 --> 17:35.160
it's just a database, right?

17:35.160 --> 17:37.160
It's not a complete observability solution.

17:37.160 --> 17:38.360
It doesn't have any, you're learning.

17:38.360 --> 17:40.160
It doesn't have any visualization.

17:40.160 --> 17:41.720
It has integrations with all those things,

17:41.720 --> 17:42.240
and that's great.

17:42.240 --> 17:45.360
But by itself, it is, by no means a term key solution.

17:45.360 --> 17:46.400
It's just a storage engine.

17:49.240 --> 17:50.400
We do need that, though, right?

17:50.400 --> 17:51.760
We need that for observability.

17:51.760 --> 17:58.000
We need both the database and a back end UI of some kind.

17:58.000 --> 18:00.640
We need a complete observability solution, right?

18:00.640 --> 18:03.040
So here's kind of a simplified diagram

18:03.040 --> 18:06.200
of what a complete observability solution might look like.

18:06.200 --> 18:07.840
You've got your application SDKs, right?

18:07.840 --> 18:10.280
Open telemetry is awesome for that.

18:10.280 --> 18:12.440
You've got your host and node agents, open telemetry

18:12.440 --> 18:13.600
is also awesome for that.

18:13.600 --> 18:17.640
Bala, which we just heard about, is awesome for that.

18:17.640 --> 18:19.320
Then you need some kind of collection sampling

18:19.320 --> 18:21.080
and processing, open telemetry collector,

18:21.080 --> 18:24.640
sort of the default choice for that, just kind of make sense.

18:24.640 --> 18:26.880
From there, you're going to go to maybe directly

18:26.880 --> 18:28.520
to your storage, or maybe you're going to use some kind

18:28.520 --> 18:31.640
of reverse proxy or gateway sitting in front of your storage

18:31.640 --> 18:36.000
that can help process the data batch the data before it finally

18:36.000 --> 18:38.040
gets persisted.

18:38.040 --> 18:40.840
And then out of the storage right, we need to have an analysis

18:40.840 --> 18:43.480
UI, so our engineers can actually make use of the data.

18:43.480 --> 18:46.560
And they'd be separately from that.

18:46.560 --> 18:49.800
We need something that's going to query the data on a ongoing

18:49.800 --> 18:52.480
basis, polling basis, and alert whenever there's

18:52.480 --> 18:56.200
something that violates one of our alert rules.

18:56.200 --> 18:58.720
So this is a more complete solution.

18:58.720 --> 19:01.240
How do we get to this with Clickhouse?

19:01.240 --> 19:03.440
Well, there are some open source projects

19:03.440 --> 19:05.240
making this possible already, so that you

19:05.240 --> 19:09.760
can have a turn key solution with Clickhouse under the hood.

19:09.760 --> 19:12.680
These are all really great projects, or you can build your own.

19:12.680 --> 19:14.840
We're working with some customers who are doing that.

19:14.840 --> 19:16.840
If you wanted to do that, we're here to help.

19:16.840 --> 19:19.280
But if you just want some of the you can turn on and run,

19:19.280 --> 19:23.240
these are all pretty great choices.

19:23.240 --> 19:26.840
Two that I think maybe are less well-known are co-root

19:26.840 --> 19:28.440
and query, so I'm going to talk about those

19:28.440 --> 19:30.440
in a little bit more detail now, with my last 10 minutes,

19:30.440 --> 19:33.240
and it will take some time for questions.

19:33.240 --> 19:36.680
So co-root, and some of the creators of co-root are here,

19:36.680 --> 19:38.800
if you want to attract them down and talk to them,

19:38.800 --> 19:42.360
it's a battery-included, no code, observability solution.

19:45.040 --> 19:49.200
So here's a spring shot of the dashboard,

19:49.200 --> 19:53.440
and we can see some of those joins in your head

19:53.440 --> 19:55.480
that I was talking about earlier.

19:55.480 --> 19:59.400
We can kind of already see coming up in this.

19:59.400 --> 20:01.360
We're using all of those different signals

20:01.360 --> 20:04.200
to understand the health of these services,

20:04.200 --> 20:05.800
whether we're getting that information

20:05.800 --> 20:08.680
for metrics about the RED metrics,

20:08.680 --> 20:10.320
whether we're getting it from tracing

20:10.320 --> 20:12.480
and upstream and downstream relationships,

20:12.480 --> 20:15.280
or whether we're getting it from explicit error messages

20:15.280 --> 20:16.280
in logs.

20:18.000 --> 20:21.640
Here we can see the topology that is generated by co-root.

20:21.640 --> 20:24.800
It's kind of hard to see on the screen, I'm sorry.

20:24.800 --> 20:27.040
But this is what's actually here

20:27.040 --> 20:28.640
is the open flow material demo project,

20:28.640 --> 20:31.040
running in Docker on my laptop.

20:31.040 --> 20:33.960
I did not do anything, right?

20:33.960 --> 20:38.680
I cloned to the repo and ran Docker Composer,

20:38.680 --> 20:43.600
and I ran the single Docker Composer up command for co-root.

20:43.600 --> 20:46.200
I didn't tell the open telemetry collector

20:46.200 --> 20:48.280
built into the hotel demo project

20:48.280 --> 20:50.200
to export anything to co-root, right?

20:50.200 --> 20:54.000
Co-work used EVPF, and it got all of these relationships

20:54.000 --> 20:55.960
using EVPF tracing.

20:55.960 --> 20:57.040
You can enrich the data, right?

20:57.040 --> 20:58.920
Co-work will allow you to point your collector

20:58.920 --> 21:00.240
and send your open telemetry traces,

21:00.240 --> 21:02.040
and then you get all the extra additional enrichment

21:02.040 --> 21:04.400
of the application context, which is awesome,

21:04.400 --> 21:06.480
but just out of the box, you get this.

21:06.480 --> 21:08.720
And you can even see traffic going to in front of something

21:08.720 --> 21:12.080
like an open telemetry collector, or an old agent.

21:12.080 --> 21:14.640
So this is really, really useful as a first step

21:14.640 --> 21:16.560
when you're setting up the rest of your observability pipeline.

21:16.560 --> 21:18.800
You can see everything that you expect to happen

21:18.800 --> 21:20.520
and everything that actually is happening

21:20.520 --> 21:24.800
with pure automatic instrumentation using EVPF.

21:24.800 --> 21:26.000
This is so awesome.

21:27.320 --> 21:29.080
And then we can use that information, right?

21:29.080 --> 21:31.960
I was talking about how we're combining all three signals.

21:31.960 --> 21:33.600
In the screen, we can kind of see all of you.

21:33.600 --> 21:36.040
You can't quite see the logs that could off at the bottom, right?

21:36.040 --> 21:38.360
But we can see that we're inferring the health

21:38.360 --> 21:41.840
both from the relationships that come from the traces,

21:41.840 --> 21:44.200
that topology that we generated from the traces,

21:44.200 --> 21:47.120
metrics which could have been exported as aggregated metrics

21:47.120 --> 21:51.040
or could have been derived from traces or logs, right?

21:51.040 --> 21:52.560
And then the logs themselves.

21:52.560 --> 21:55.920
So it kind of brings it all together into one tool,

21:55.920 --> 21:59.840
all correlated around the entities, the resources,

21:59.840 --> 22:01.480
which are the things that I, as an engineer,

22:01.480 --> 22:02.920
can actually reasoned about thinking about, like,

22:02.920 --> 22:04.280
I understand what a node is.

22:04.280 --> 22:08.160
I understand my process running in the node.

22:08.160 --> 22:09.920
And that's usually the thing that I care about,

22:09.920 --> 22:13.320
not like a specific CPU core, right?

22:13.320 --> 22:17.200
So this allows me to contextualize all of the logs

22:17.200 --> 22:19.760
and all of the metrics and all of the traces

22:19.760 --> 22:22.120
for the one thing that I really care about.

22:22.120 --> 22:24.760
And then navigate to the other things that I care about

22:24.760 --> 22:27.280
using the topology that we derive from the traces.

22:27.280 --> 22:31.920
OK, the other tool that I think is pretty cool,

22:31.920 --> 22:38.400
based on top of the case, is Korean pronounced Korean.

22:38.400 --> 22:42.880
And this is basically a translation layer that translates

22:42.880 --> 22:46.440
between these various formats.

22:46.440 --> 22:48.440
So you can use logQL, you can just primeQL,

22:48.440 --> 22:52.360
temboQL, rate, all four open-time attributes sources.

22:52.360 --> 22:57.240
And it creates APIs compatible with each one of these things.

22:58.200 --> 23:00.520
Yeah, here's an example of some of the log ingestion

23:00.520 --> 23:03.840
formats that it supports.

23:03.840 --> 23:07.480
OK, so kind of a high level comparison between the two.

23:07.480 --> 23:09.720
Right, core is more of a, like, batteries included

23:09.720 --> 23:10.800
right-e-to-go solution.

23:10.800 --> 23:15.240
It's got its own EVPF-based node agent that you install

23:15.240 --> 23:18.320
on your nodes to get all of that awesome tracing data.

23:18.320 --> 23:21.440
It does allow for OTP injection.

23:21.440 --> 23:24.680
It's a place that's own open-time tricklector inside itself

23:24.680 --> 23:27.960
that you can send as open-time entry data to.

23:27.960 --> 23:29.960
And the scheme that it uses inside Clickhouse

23:29.960 --> 23:33.120
is mostly the standard schema that is also used

23:33.120 --> 23:36.080
by the open-time entry collector exporter for Clickhouse.

23:36.080 --> 23:39.080
So they're kind of compatible with each other.

23:39.080 --> 23:40.640
Core route does have some optimizations

23:40.640 --> 23:45.080
to speed up some of the performance things.

23:45.080 --> 23:49.400
Korean uses its own custom fork of the open-time entry collector

23:49.400 --> 23:54.000
and uses its own Clickhouse schema that is distinct

23:54.000 --> 23:58.240
from the core route and the collector exporter schema.

23:58.240 --> 24:02.120
But it uses the Clickhouse materialized views

24:02.120 --> 24:04.520
to create these projections so that we can translate

24:04.520 --> 24:08.600
between all these different APIs and use right, like,

24:08.600 --> 24:11.280
promQL for open-time entry data or whatever we need to do

24:11.280 --> 24:11.880
with all this.

24:11.880 --> 24:15.240
Oh, no, am I going to make it through this presentation?

24:15.240 --> 24:16.240
Whoops.

24:24.320 --> 24:24.920
No problem.

24:28.320 --> 24:30.600
OK, problem solved.

24:30.600 --> 24:34.880
So let's talk about the scheme is really quick and then questions.

24:34.880 --> 24:36.520
All of these scheme is are doing some things

24:36.520 --> 24:39.000
that are just sort of the, all of these scheme

24:39.000 --> 24:40.040
is have these similarities, right?

24:40.040 --> 24:41.440
Because these are things that just kind of make sense

24:41.440 --> 24:43.560
when we're dealing with this kind of data.

24:43.560 --> 24:45.440
We're using the standard compression.

24:45.440 --> 24:48.200
We're using Delta encoding so with that we only need

24:48.200 --> 24:50.280
to store changes that's going to dramatically reduce

24:50.280 --> 24:53.400
the amount of storage that we need for all of our data.

24:53.400 --> 24:55.000
We're using, I said at the beginning, right,

24:55.000 --> 24:55.880
we need full text search.

24:55.880 --> 24:59.280
How do we get that in Clickhouse with Bloom filter indexes

24:59.280 --> 25:04.920
for maps, resources, and the full text of logs, right?

25:04.920 --> 25:08.160
All of these tools are using the merge tree engine.

25:08.160 --> 25:10.560
Clickhouse has some other table engines,

25:10.560 --> 25:13.480
but merge tree engine by far the most common.

25:13.480 --> 25:15.880
And definitely the best one for observability data.

25:15.880 --> 25:17.880
And of course, you want to partition your data

25:17.880 --> 25:21.120
on the time dimension.

25:21.120 --> 25:24.440
And very common default is a seven-day time to live

25:24.440 --> 25:26.160
for observability data.

25:26.160 --> 25:27.720
That's typically a good amount of time

25:27.720 --> 25:30.200
to make sure that you can respond to or diagnose

25:30.200 --> 25:31.840
your last production incident.

25:31.840 --> 25:33.840
If you don't have audit requirements that are longer

25:33.840 --> 25:36.160
than that, that's a perfectly fine place to leave it at.

25:36.160 --> 25:37.560
What is also a very easy to configure this

25:37.560 --> 25:40.880
if you need to store something for months or years?

25:45.000 --> 25:47.120
The open telemetry collector, export of for Clickhouse,

25:47.120 --> 25:49.160
these are sort of the unique things about its scheme

25:49.160 --> 25:50.840
that it uses.

25:50.840 --> 25:52.920
So it's got maps for the metadata.

25:52.920 --> 25:54.320
It's using those Bloom filter indexes

25:54.320 --> 25:56.360
for really efficient full body text search.

25:56.360 --> 25:58.240
And it uses a materialized view to calculate

25:58.240 --> 26:00.920
the span duration from the start time and the end time,

26:00.920 --> 26:02.960
pretty basic use of them through a lot of view,

26:02.960 --> 26:07.600
that you may also want to adopt in your use case.

26:07.600 --> 26:11.640
Queryn, just a really cool stuff inside its Clickhouse schema

26:11.640 --> 26:13.400
that makes it more optimized, especially

26:13.400 --> 26:14.880
for time series data since it's working

26:14.880 --> 26:16.840
with so much permitias and time series data.

26:16.840 --> 26:19.520
So it's using fingerprints for unique time series,

26:19.520 --> 26:22.720
and then index labels with materialized views.

26:22.720 --> 26:25.080
This allows you for really efficient.

26:25.080 --> 26:26.760
This actually, like at the beginning, I said,

26:26.760 --> 26:28.080
you can't do updates, but Queryn actually

26:28.080 --> 26:30.960
does allow for updates and uses a variation of merge

26:30.960 --> 26:33.320
tree called replacing merge tree.

26:33.320 --> 26:37.520
And null engine is basically like an in-memory,

26:37.520 --> 26:40.080
not persistent table, that Queryn uses

26:40.080 --> 26:42.000
as it's sort of in-memory buffer.

26:42.000 --> 26:44.200
And then it uses materialized views to protect out

26:44.200 --> 26:47.920
from that null engine table to all of its other formatted tables.

26:47.920 --> 26:51.600
So that's kind of a really cool way to deal with time series data

26:51.600 --> 26:55.480
using some of the power built into Clickhouse.

26:55.480 --> 26:58.320
So I'm going to talk about quickly how you might architect

26:58.320 --> 27:00.160
us how you might use this in production.

27:00.160 --> 27:02.640
And I think we'll have time for a couple of questions.

27:02.640 --> 27:06.440
So you're probably going to have an open sum of your collector

27:06.440 --> 27:09.760
or bail or something deployed as a demon set,

27:09.760 --> 27:12.120
if you're running in Kubernetes or deployed on all of your nodes.

27:12.120 --> 27:15.440
If you're not, then, like remember, a few slides

27:15.440 --> 27:19.680
go ahead that like storage gateway, little yellow box on the slide.

27:19.680 --> 27:21.720
You might use an open sum of your collector deployment

27:21.720 --> 27:22.760
for that.

27:22.760 --> 27:24.720
One thing about Clickhouse is that it really doesn't

27:24.720 --> 27:26.600
like a lot of connections.

27:26.600 --> 27:28.920
It can handle a lot of data with each insert,

27:28.920 --> 27:31.680
but you should try to reduce the total number of inserts.

27:31.680 --> 27:33.920
So you don't have partition explosions.

27:33.920 --> 27:35.960
So you can use open sum of your collector

27:35.960 --> 27:38.520
as a deployment as a way to sort of batch those inserts

27:38.520 --> 27:42.640
and have fewer larger connections to your Clickhouse cluster.

27:42.640 --> 27:46.960
Of course, in production, we're going to want to use replicas.

27:46.960 --> 27:51.040
Clickhouse uses either Zookeeper or Clickhousekeeper

27:51.040 --> 27:54.000
for coordination between the nodes in the cluster.

27:54.000 --> 27:54.840
So you're going to want that.

27:54.840 --> 27:57.360
And you're going to want probably exactly three

27:57.360 --> 27:59.040
keeper nodes.

27:59.040 --> 28:00.400
Clickhouse number of clickhouse nodes is going

28:00.400 --> 28:02.160
to depend on your data volume.

28:02.160 --> 28:06.280
And then the utility operator is a fully open source operator

28:06.280 --> 28:09.760
for Kubernetes that makes managing clickhouse inside Kubernetes

28:09.760 --> 28:10.760
a lot easier.

28:10.760 --> 28:12.920
So you don't have to deal with things like upgrades

28:12.920 --> 28:14.880
and persistent volume planes on your own.

28:14.880 --> 28:18.240
You just kind of tell it, hey, this is the layout of my cluster

28:18.240 --> 28:22.240
and the utility operator will figure out the rest for you.

28:22.240 --> 28:24.320
Right, I just mentioned using PVC management,

28:24.320 --> 28:26.560
rolling upgrades, and it comes to some built-in monitoring.

28:26.560 --> 28:27.840
There's some grip on a dashboards.

28:27.840 --> 28:30.920
And some Vermetius X-Waters built into the utility operator

28:30.920 --> 28:32.520
to get you some more information on that layer,

28:32.520 --> 28:35.080
sort of a between the Clickhouse binary itself

28:35.080 --> 28:38.200
and the Kubernetes.

28:38.200 --> 28:39.960
This is a topic for another day.

28:39.960 --> 28:41.800
We didn't get into all of this, right?

28:41.800 --> 28:43.160
So there's a learning.

28:43.160 --> 28:46.680
There's a lot more stuff that you need to do that I mentioned

28:46.680 --> 28:48.680
for a complete observability solution.

28:48.680 --> 28:50.040
But hopefully I've given you some ideas

28:50.040 --> 28:54.440
on how you might be able to build that around Clickhouse.

28:54.440 --> 28:56.640
So why do I think this is really cool?

28:56.640 --> 28:59.160
Why do I think you should actually do this?

28:59.160 --> 29:01.160
Well, the first three points are kind of just

29:01.160 --> 29:03.960
about having one tool instead of three tools, right?

29:03.960 --> 29:06.320
When I'm going in using the open telemetry SDKs,

29:06.320 --> 29:07.920
I love that it's one tool.

29:07.920 --> 29:09.760
And instead of three different SDKs with three

29:09.760 --> 29:11.360
different sets of documentation that I have to learn

29:11.360 --> 29:14.480
just because I want some metrics, some traces, and some loss.

29:14.480 --> 29:18.320
So if we can have fewer tools doing the same work of mortals,

29:18.320 --> 29:20.480
that's usually better, as long as we're not giving up

29:20.480 --> 29:23.880
anything else important to get there.

29:23.880 --> 29:27.760
When we have all of our telemetry signals in one database

29:27.760 --> 29:31.240
that resource metadata that's attached to all of those signals,

29:31.240 --> 29:33.720
it's going to be so much easier for us to standardize that

29:33.720 --> 29:38.000
and maintain consistency within that resource metadata.

29:38.000 --> 29:40.840
We can, by having everything in one place,

29:40.840 --> 29:42.240
we can have some background processes

29:42.240 --> 29:43.720
doing post-talk, dependency mapping, right?

29:43.720 --> 29:47.720
Like we can pre-generate these topologies on the data.

29:47.720 --> 29:51.600
Without going in ahead of time, what those topologies look like,

29:51.600 --> 29:54.320
and we can poorly across signals.

29:54.320 --> 29:56.120
Thank you so much.

29:56.360 --> 29:57.360
Thank you.

29:57.360 --> 29:58.360
Thank you.

29:58.360 --> 29:59.360
Thank you.

29:59.360 --> 30:00.360
Thank you.

30:00.360 --> 30:01.360
Thank you.

30:01.360 --> 30:02.360
Thank you.

30:02.360 --> 30:03.360
Thank you.

30:03.360 --> 30:05.000
Thank you.

30:05.000 --> 30:06.000
Any other questions?

30:06.000 --> 30:07.000
Any other questions?

30:07.000 --> 30:08.000
Any other questions?

30:08.000 --> 30:09.000
Any other questions?

30:09.000 --> 30:10.000
Any other questions?

30:10.000 --> 30:11.000
Any other questions?

30:11.000 --> 30:12.000
Any other questions?

30:12.000 --> 30:13.000
Any other questions?

30:13.000 --> 30:14.000
Any other questions?

30:14.000 --> 30:15.000
Any other questions?

30:15.000 --> 30:16.000
Any other questions?

30:16.000 --> 30:17.000
Any other questions?

30:17.000 --> 30:18.000
Any other questions?

30:18.000 --> 30:19.000
Any other questions?

30:19.000 --> 30:20.000
Any other questions?

30:20.000 --> 30:21.000
Any other questions?

30:21.000 --> 30:22.000
Any other questions?

30:22.000 --> 30:23.000
Any other questions?

30:23.000 --> 30:24.000
Any other questions?

30:24.000 --> 30:25.000
Any other questions?