WEBVTT

00:00.000 --> 00:12.120
Let's start it. Last talk for the day is Gotham, who is a long-time permutacy member and

00:12.120 --> 00:17.360
a lot of other things and who learned yesterday evening that he's going to fill in with

00:17.360 --> 00:21.000
his talk. So thanks for doing this and welcome Gotham.

00:31.000 --> 00:37.120
Hello everyone. I'm Gotham. I'm a permutous maintainer. I've been a permutous maintainer

00:37.120 --> 00:42.000
for like seven years now and then I joined to Fana as an engineer, bidder all of the

00:42.000 --> 00:48.000
permuties, a lot of engineering for five years. Then I got bored. Had a little bit of a crisis

00:48.000 --> 00:54.440
so I became a product manager. I did that for two years and realized I still don't know

00:54.440 --> 01:03.440
what product managers do. So yeah now I'm going back to engineering and this talk is

01:03.440 --> 01:10.440
me kind of struggling with the whole. I'm not writing code. I'm not doing anything useful.

01:10.440 --> 01:16.040
So I decided to tinker with my home lab. Yeah. And I found out that yes, I did find out that

01:16.040 --> 01:20.640
I was doing this talk yesterday. So I wrote the talk today. So I have no idea how long

01:20.840 --> 01:27.840
this is going to take. But I hope it takes long enough. Okay. Let's go. So home lab monitoring

01:27.840 --> 01:40.040
with EBPF, but what do I mean by home lab? You know, I googled it. Can EBPF help your home lab

01:40.040 --> 01:46.960
from blowing up? I don't think so. I hope not. But then I asked Gemini and it generated this

01:46.960 --> 01:52.960
image, which is kind of close. And then I decided to use good old Google, which showed

01:52.960 --> 02:00.960
me this image, which looks really, really professional. Turns out my home lab is

02:00.960 --> 02:05.960
nothing like this. Unfortunately, I don't have pictures of my home lab given I only

02:05.960 --> 02:12.960
found out yes. But it looks something like this in a random bookshelf with a lot

02:12.960 --> 02:20.560
more wires attached. Sometimes, as in right now, it also looks like this, where I just

02:20.560 --> 02:27.360
have random boards at random places just opened up. And I just do lazy to put it all

02:27.360 --> 02:32.960
back together into something neat. Yeah. So I run a lot of different boards, I run K 3 S on

02:32.960 --> 02:36.640
these boards. This talk is not about the home lab. This is about monitoring the home lab.

02:36.640 --> 02:42.880
But what am I actually monitoring? So I am self hosting a few applications. How many

02:42.880 --> 02:47.920
have you here? Self hosting applications? Wow. Holy shit. You are going to love this

02:47.920 --> 02:55.120
talk. Yeah. So self hosting is basically you run the services on your own servers, on

02:55.120 --> 02:59.520
your own infrastructure, and you are not going to pay like big tech or anyone else to

02:59.520 --> 03:06.960
kind of provide you a service. You also own all your data. Yeah. Like I, as I said, I wanted

03:06.960 --> 03:11.560
something technical to do, because I was kind of having a crisis. So I decided to self

03:11.640 --> 03:16.360
host a lot of the applications. And what are these applications that I am self hosting?

03:16.360 --> 03:22.680
They are kind of all kinds of applications. There is a nice website called awesomeselfhosted.net,

03:22.680 --> 03:30.120
with all of these list of applications. Yes, I did not read it three times. But I did. Yeah.

03:31.160 --> 03:37.160
Yeah. So I am using something called beaver habits to track habits. I am through a shame

03:37.160 --> 03:41.240
of how bad my habit tracking is. So I am like, okay, I am not going to show it, but here's the demo.

03:42.120 --> 03:49.080
This is a service written in Python, Django, and something else. And then I use this is a screen

03:49.080 --> 03:54.760
short of the bookmarking service that I use. So the cool thing about this is it not only bookmarks

03:54.760 --> 04:01.160
the page, it also downloads it onto the server so that if the page goes away, I still have a copy

04:01.160 --> 04:06.840
of all the cool articles that I read and I can search through them. And I can use this to circumvent

04:07.400 --> 04:14.120
pay walls and share articles that I have access to with my friends. So basically, I copy the article

04:14.120 --> 04:18.680
on to read it and then I share it with my friends. That way they don't need to pay for all the

04:19.960 --> 04:25.800
services that I am paying. And I also share super, super legal files with all my friends.

04:27.480 --> 04:35.480
Yeah. And the best is this thing called use memos. So I used to have Twitter and I used to use it a lot

04:35.560 --> 04:41.560
and then one day I deleted all my social media. And then one I had really amazing idea that I

04:41.560 --> 04:46.040
wanted to tweet out and I was like, okay, what if I had like my own private Twitter that nobody

04:46.040 --> 04:52.360
reads, but like in like save all the interesting ideas that I have. And this is what I write.

04:54.520 --> 04:59.960
But yeah. So I run all of these services. They're mostly in Go, but there's some in Python,

04:59.960 --> 05:06.680
some in Java, like varied set of services. They're run quite well. And one of the things I want to

05:06.680 --> 05:12.280
give out a shout out to a stale scale. Even though all of these services are running in my office

05:12.280 --> 05:17.720
or like in a room somewhere, I can still access them on my phone and on my computer and I can

05:17.720 --> 05:24.600
like act as if they're all on the internet. So that's really cool. And you can also use stale scale

05:24.600 --> 05:32.440
to expose your office lab, home lab services to the internet. For example, this QR code will give

05:32.440 --> 05:39.160
you the link to read a service that I'm self-hosting the bookmarking service for the instructions

05:39.160 --> 05:44.600
to run tail scale on Kubernetes, for example. It's going to be super slow because you're not

05:44.600 --> 05:49.720
apparently supposed to use it for anything useful, but it works. My friends don't need to pay

05:49.720 --> 05:57.320
for all the stupid shit I do. Cool. So I'm running all of these services and I'm a PM and

05:57.320 --> 06:02.040
I don't promise this export. So I was like, I want to monitor all of these services and I want

06:02.040 --> 06:09.320
to have good alerting on these services so that when they're down, I get page. I want to be on

06:09.320 --> 06:16.440
cotton. You see. So what monitoring am I particularly talking about? Like is it like monitoring

06:16.440 --> 06:21.320
the infrastructure like is there enough CPU? Is there enough disk? Is there enough RAM? Kind of thing?

06:21.320 --> 06:27.000
Actually, not really. Like this has been done enough infrastructure monitoring with

06:27.000 --> 06:31.480
permutation to find out there's like a thousand tutorials that you can use. And I

06:31.480 --> 06:36.840
you basically follow one of those tutorials. And I'm now monitoring infrastructure. That's fine.

06:36.840 --> 06:42.680
This is not the cool part. What I really think is cool is I want to do application

06:42.760 --> 06:47.960
observability, which is basically all these different applications that I'm running that I have

06:47.960 --> 06:53.720
no control over like source code. I mean, I have full control over the source code, but I can't go

06:53.720 --> 06:58.520
go and tell them, hey, you know, you need to be on call for this service. So have good metrics in

06:58.520 --> 07:07.560
alert. So yeah, I want to be able to understand when my journal that holds all my stupid stuff

07:07.640 --> 07:13.160
goes down. Like I want to make sure that this journal is responding quickly. All the services

07:13.160 --> 07:18.840
that I and my friends depend on are working. Yeah, and I want to be alerter than they're not working.

07:20.360 --> 07:25.560
Yeah. And one of the reasons I wanted to do this was I wrote a blog post on why what I don't

07:25.560 --> 07:30.840
like about being a PM. One of the things I don't like about being a PM is I don't get to use my own

07:30.840 --> 07:36.120
tools anymore. It's kind of funny. I was reading this and I was like, I'm an idiot. Because it's

07:36.120 --> 07:48.120
just like I wasn't on call anymore and I sorely miss it. Yeah. Yeah. Yeah. So I wanted to use all these

07:48.120 --> 07:54.600
tools that I'm working on as a product manager and kind of be on call for this home lab service

07:54.600 --> 07:59.960
that I'm running. Yeah. I want to have like fancy dashboards, like requests,

07:59.960 --> 08:04.280
various durations and how long things are taking. Like I want to kind of be able to do that

08:05.800 --> 08:13.320
for all the home lab services like beaver habits here. All right. So I've then looked at all my

08:13.320 --> 08:18.840
services that I'm self-hosting and I found that 90% of the services they don't give a shit about

08:18.840 --> 08:23.640
metrics like they don't care. There's no metrics coming out of these systems. These are these are

08:23.640 --> 08:29.560
people who love building tools and they're built tools for themselves and I don't know how they

08:29.560 --> 08:34.840
monitor it but then don't expose any metrics. The few tools that do expose metrics,

08:35.480 --> 08:39.160
they have all kinds of different formats, different metrics names, different,

08:40.280 --> 08:45.240
somebody was exposing durations in nanoseconds and I was like, who the tell case about

08:45.240 --> 08:50.280
nanoseconds? But yes, they're completely different. So if I were to rely on those metrics,

08:50.280 --> 08:56.120
it meant building one dashboard, like a dashboard for beaver habits, a dashboard for go cap.

08:56.120 --> 09:01.560
You know, ten that each application had to have its own dashboards and had to have its own alerts.

09:02.920 --> 09:09.960
It was like it was getting complicated. And finally, I preferred running doulang services back to

09:09.960 --> 09:17.400
by SQLite and there is no Java agent I could just drop in and automatically I have good metrics.

09:18.600 --> 09:23.320
But yeah, this is a whole light up for myself. I could have picked all the self-hosted Java applications.

09:24.200 --> 09:30.280
So yeah, I mean, this was a good enough challenge and I have a few requirements for the solution

09:30.280 --> 09:37.000
where I was going to pick. It needs to be very, very easy to instrument and I need to have one set of

09:37.000 --> 09:42.760
alerts and dashboards for all the services that I'm running, like it's just one alert that I write

09:42.760 --> 09:48.920
and then the alert when it fires it tells me what application is done. It needs to be Kubernetes

09:49.000 --> 09:54.840
native because that's kind of what I'm running. And the most important thing is I wanted to give

09:54.840 --> 10:00.440
conference talks, so I wanted to do cool stuff that I can block and talk about. I said this is

10:00.440 --> 10:05.880
the most important thing because turns out I did not choose the easy to instrument option. I chose

10:05.880 --> 10:14.360
the cooler option. Yeah, anyways, so for me to have one set of alerts and dashboards, all the applications

10:14.440 --> 10:20.920
need to emit one set of metrics. So basically, I settled on the open telemetry semantic

10:20.920 --> 10:27.480
conventions. There's a link to it here and also post the slides. So basically, the hotels,

10:27.480 --> 10:33.720
semantic conventions, they tell you for databases, for HDDP, for GRPC, this is the metric that you need

10:33.720 --> 10:39.320
to emit. This is the unit and these are the labels. So every application, if it follows the

10:39.320 --> 10:45.480
hotel semantic conventions, we'll emit the same metrics and that way I can have one alert and then

10:45.480 --> 10:52.360
basically one dashboard and all of that. So it came down to two solutions on how you can instrument

10:52.360 --> 10:58.200
your services for open telemetry. One of them is the open telemetry operator, which is a

10:58.200 --> 11:05.960
Kubernetes operator that you run. What it does is you say, hey, this is a Python service in an

11:05.960 --> 11:10.440
annotation and it injects the Python auto instrumenting service, like auto instrumenting agent.

11:10.440 --> 11:15.880
This is a Java service and it injects the Java instrumentation. So you get like in-depth instrumentation

11:17.160 --> 11:21.720
without actually you doing the work. You still just need to tell it, hey, this is a Java application.

11:22.520 --> 11:28.040
It, one of the, back in the idea, it has like one key language support, I say one key is because

11:28.040 --> 11:32.760
compile languages, can't do auto instrumentation. So it doesn't support that, which means goal

11:32.760 --> 11:39.560
language is out. Also for the languages that it supports, open telemetry is so early that Java

11:39.560 --> 11:47.640
is great. Everything else is weird, like some languages do not conform to the auto semantic

11:47.640 --> 11:51.240
conventions yet. So they still do milliseconds and stuff, seconds and things like that.

11:53.480 --> 11:58.920
Having said that, if you use the hotel operator, you can add custom instrumentation inside the app

11:58.920 --> 12:04.200
and then that kind of merges with the auto instrumented data and it's easier to extend the

12:04.200 --> 12:11.480
telemetry that's being generated. Yeah, the two, one more drawback is you need to say, hey,

12:11.480 --> 12:16.120
this is a Java service, this is a GoLang service, this is a Python service and you need to restart

12:16.120 --> 12:23.880
the pod for the instrumentation to kind of happen. The alternate is Bella, which basically is an

12:23.880 --> 12:28.840
MVP of agent that runs in the kernel. It looks at all the network calls, like HTTP calls,

12:28.840 --> 12:35.240
passes it and it's like, okay, this application is making 500 HTTP requests and 200 of them are

12:35.240 --> 12:39.960
succeeding and stuff like that. So it runs in the kernel, there's a help chart, it runs as a

12:39.960 --> 12:46.840
day mindset. It only does not do in that instrumentation, it only does like request, like HTTP

12:46.840 --> 12:51.800
GRPC requests, network level instrumentation, like which service is talking to which other

12:52.680 --> 12:57.560
service and it does like CPU memory RAM. So it doesn't actually do like garbage collection

12:57.560 --> 13:04.840
and stuff like that. But the good thing is it works for any language, including Rust and all of those.

13:05.880 --> 13:10.760
But because it's running separate from the process, you can't actually extend the telemetry

13:10.760 --> 13:15.960
being emitted in the process itself, like, you know, this is like two separate things. But you

13:15.960 --> 13:20.760
don't need to restart or change anything, you run Bella, it hooks into the process, it hooks into

13:20.760 --> 13:25.960
the kernel and it starts emitting matrix logs and traces in the hotel compliant format.

13:27.080 --> 13:34.120
So yeah, I, there's two commands that you need to run to deploy Bella as like cool, I'm run this,

13:34.120 --> 13:38.200
and that exposes prometious matrix on 1990 and I was like, I'm going to scrape this and I'm going to

13:38.200 --> 13:44.040
send it to prometious. Okay, and then I looked up this dashboard made by my colleague,

13:44.040 --> 13:48.840
the open telemetry service dashboard that uses the hotel semantic conventions to build a nice

13:48.920 --> 13:51.640
dashboard that you can't really see, but like first meet really cool.

13:54.040 --> 13:57.800
Yeah, I installed the dashboard, you still can't see anything because there is no data.

13:59.480 --> 14:05.960
I was like, holy shit, what's happening? This is, uh, so I, and friends are there was no data,

14:06.600 --> 14:12.760
because Bella lied to me. They said it is compatible in any Linux environment with kernel

14:12.840 --> 14:20.520
greater than 5.8. And it said, oh yeah, you need BTF and BTF is enabled on most kernel, whatever.

14:20.920 --> 14:29.320
In the Raspberry Pi kernel, there is no BTF. So I had to recompile the kernel with this particular option,

14:29.320 --> 14:35.400
but it took me forever to figure out. So I said, like, you know, it makes, it makes for a cool

14:35.400 --> 14:42.120
dog course. So I blocked the bottle. So this is definitely not the easy way. So yeah, I recompiled my

14:42.120 --> 14:50.680
kernel and I blocked the bottle. And now I went back to this guy and it works. So live demo,

14:50.680 --> 14:56.360
you can kind of see, like, you know, I used the service. Some requests to like P99 was like 700

14:56.440 --> 15:06.840
milliseconds. You can see what endpoints are the slowest. You can see that get link metadata

15:06.840 --> 15:13.160
was taking super slow, but there are no errors. You can see the logs, the traces, all of this

15:13.160 --> 15:19.400
is kind of working because of this hotel service, open telemetry service dashboard. So yeah,

15:20.360 --> 15:28.520
it works. So I decided to do the next best thing, which is alerting. So that when the service goes

15:28.520 --> 15:32.920
down, I get a pinch. I get woken up at 2 a.m. because my journal is down.

15:37.720 --> 15:44.440
Yeah, I'm going back to engineering. It's okay. And then I was like, no, I'm not going to do any

15:44.440 --> 15:51.880
kind of alerting. I mean, just any kind of alerting, I'm going to do SLOs. Yes, I'm going to do SLOs,

15:51.880 --> 16:00.920
as a SLOs or better. So I was like, I want 99% of the requests to succeed for all my applications.

16:00.920 --> 16:06.200
I was like, this is the SLO. And if that doesn't happen, page me. Yes, this was what I did.

16:06.600 --> 16:19.560
And I slept extremely peacefully because, when I was sleeping, I was not using anything. So

16:19.560 --> 16:23.400
when I'm not using anything, even if the service is dead, there are no metrics, which means

16:23.400 --> 16:31.080
it's 100% up. All the requests are succeed. Okay? So this is not good. But I was like, okay,

16:31.400 --> 16:36.680
I mean, again, this is not a useful because, you know, when you try to use it, it'll throw a 500

16:36.680 --> 16:41.000
and you get notified. I mean, you can see it's down. You don't need to get notified. Like, you know,

16:41.000 --> 16:48.600
okay, fine. But there was a more interesting failure case. So while I was writing my blog post

16:48.600 --> 16:54.760
on Recompiling the kernel, I broke the kernel. So the service was not running on the system anymore.

16:54.760 --> 17:00.520
There's no system to run. And then there is no pot running. There are no metrics. And then

17:00.520 --> 17:05.640
there are no metrics. It's 100% up. So I got no alert. So I was waiting. I'm like, yeah,

17:05.640 --> 17:15.480
I broke my Raspberry Pi. I'm going to get an SLO. I got no SLO. It's like, fuck. So yeah,

17:15.480 --> 17:21.720
so I was like, I realized the best form of alerting is black box exporter. So this is a very simple

17:21.720 --> 17:27.560
exporter. So basically, you say, you say, hey, Prometheus just checked with the black box exporter

17:27.560 --> 17:34.200
if the endpoint is up. The black box exporter calls the endpoint, HTTP 200. It's up. So if,

17:34.200 --> 17:38.440
so basically, because Prometheus checks every minute, there's a request every minute to the service,

17:38.440 --> 17:43.880
I think I do it every 15 seconds or something. And basically, black box exporter, if it's down,

17:43.880 --> 17:49.320
it's like, okay, I can't actually open this website. I can't reach it. It's down. And I get an alert.

17:49.320 --> 17:56.120
It actually works. Yeah, I also kind of use this thing called awesome Prometheus alert. It's an amazing

17:56.280 --> 18:03.560
repository of alerts that you can download. There's a nice black box link. I just copy paste those alerts

18:04.440 --> 18:10.120
and they work really, really well. And you can see it actually page me in this black box probe,

18:10.120 --> 18:16.120
HTTP failure, that box probe failed and stuff. But I want to tell you something else that I do with my home lab.

18:16.120 --> 18:21.480
So I use this thing called air gradient, which is an open source air quality monitor. You plug it in

18:21.560 --> 18:27.480
and then it exposes Prometheus metrics. And every time I fart, I get an air coil. Hi,

18:29.320 --> 18:37.160
Hi, we'll see alert. Yes, I was small office. And I was sensitive alert. It's okay.

18:38.360 --> 18:47.320
But yeah, the TLDR is, if you're using a cello for home lab monitoring, I don't think it's very useful.

18:47.400 --> 18:54.280
Unless you have a backup synthetic monitoring alert, I send it to my SLOT in this mean. Yes. Okay.

18:55.320 --> 19:00.280
And I wrote an internal article about like all the things I loved about SLOS as I was trying

19:00.280 --> 19:09.560
to create an SLOS. And again, I kind of like forked a lot of holes on some of the SLOS like I realized

19:09.560 --> 19:14.680
I created bad SLOS and then I realized half the teams inside to find a created bad SLOS or half like a few of them.

19:15.640 --> 19:21.560
But yeah, you do SLOS do need a backup alert. Like when, for example, if there's no pod,

19:21.560 --> 19:26.120
you need an alert, if the pod is crash, looping, you need an alert. If you can't scrape something

19:26.120 --> 19:30.360
you need an alert because for a SLOT fire, there need to be metrics and you need to make sure

19:30.360 --> 19:36.520
there are metrics being emitted. Yeah. And now I'm going to rant about push versus pull because

19:36.520 --> 19:42.760
open telemetry is push, put me this is pull. But I think it's relevant. We have internally

19:42.840 --> 19:49.560
idealized as I was writing this internal rant on SLOS. Critical alerts on prompts

19:49.560 --> 19:55.160
scraped failing. So I'm not able to scrape something pod crash looping and Google Google load

19:55.160 --> 20:00.440
balance errors. For me, the most interesting thing is this prompts scraped failed. So if you're using

20:00.440 --> 20:05.960
prompt years, you're running three replicas of a service, prompt years knows that three replicas

20:05.960 --> 20:11.000
are running and it tries to get metrics from all three. And if it fails to get metrics from one of

20:11.080 --> 20:16.520
those because it's dead or it's not like responding, it's it up equals down and we have a

20:16.520 --> 20:23.720
prompt scraped failed alert. But if you're using open telemetry and push, there is no equivalent

20:23.720 --> 20:29.400
of this because there can be 10 replicas running. Two of them are like dying and throwing errors

20:29.400 --> 20:34.840
to the users. But for some reason, if they fail to send any requests, you're still going to be up.

20:35.640 --> 20:40.280
So I don't know how to solve this. But this is also the big advantage of pull. So use

20:40.280 --> 20:47.240
for meetings. Yeah. So that's basically, yeah, now I have alerts that work, which is black

20:47.240 --> 20:56.680
box, it's border, SLOS are shit. You don't really work for my use case. Yeah. Cool. The caveat

20:58.040 --> 21:05.160
is that bail is still little rough. Like, as I try to do this, I found some a lot of minor issues.

21:05.320 --> 21:10.520
But I also found a couple of interesting issues where, like, you know, magically bail

21:10.520 --> 21:16.440
was like filtering out a Python process because of some edge case. But the really good thing is

21:16.440 --> 21:22.360
the theme is super responsive. I mean, because I'm DMing all the members saying, my home lab

21:22.360 --> 21:29.640
monitoring is not working. But they fixed a lot of these. But I like, yeah, it's still super early.

21:29.640 --> 21:33.560
But I think what they're building is really cool when you run one command and you have consistent

21:33.560 --> 21:41.800
metrics everywhere. Yeah. The next experiment in my home lab, which runs in my office,

21:41.800 --> 21:49.880
it's office lab, is going to be this cool thing, where Ali Baba, Data Dog and Kesma, they did this

21:49.880 --> 21:56.120
cool thing, where if you're using go, you don't need to instrument anything in the code base,

21:56.120 --> 22:03.000
and you can kind of run in with one command. So it walks the whole code base, injects instrumentation

22:03.000 --> 22:09.240
wherever required, and then compiles it. So you get the same benefits of, like Java, agent,

22:09.240 --> 22:16.440
like auto instrumentation, even with go. Now, this is new, this is not the easy path to do, mainly

22:16.440 --> 22:22.040
because every time there's a new release, I will need to manually compile this, each application

22:22.040 --> 22:27.000
with this special command, and then deployed in my own Kubernetes cluster. I will need to write

22:27.000 --> 22:31.000
all this automation, but that will be a future blog post and talk. So yeah, that's the next

22:31.960 --> 22:37.960
experiment. Cool. Thank you.

22:43.960 --> 22:45.960
Yes, questions.

22:49.960 --> 22:55.240
Thank you for the talk. I wanted to ask you to elaborate more about push horses, pull,

22:55.320 --> 23:00.840
smithx, for metrics, because I need to roll the parameters and just use the pulling.

23:00.840 --> 23:04.680
Now, I'm learning more about open telemetry and investing more time into it, and that was the biggest

23:04.680 --> 23:08.360
red flag initially that I'm like, this doesn't make sense, this is going to cool issues.

23:09.160 --> 23:12.280
Yes, it is going to cause issues, so don't use skillet.

23:13.880 --> 23:19.240
I think it's like a fundamental trade-off, it's much easier to implement push, because you can push

23:19.240 --> 23:23.560
anywhere, but with Prometheus, you need Prometheus to go and scrape all these applications.

23:24.440 --> 23:36.200
So it needs to run in the same cluster and stuff, so like, yes. So again, in Prometheus,

23:36.200 --> 23:41.720
we're working on implementing the best of both worlds. So the big advantage of push, pull,

23:41.720 --> 23:46.760
or, for me, is it knows how many replicas exist, and that they should be sending metrics.

23:47.400 --> 23:51.960
Now, you can use open telemetry to push, and if you know that there are 10, but you're only

23:51.960 --> 23:55.320
getting metrics from 8, you can say those two are down, and that's something that we're

23:55.320 --> 23:59.720
implementing in Prometheus, like kind of combining the best of both worlds. But today, you need to be

23:59.720 --> 24:04.440
kind of careful and have a lot more alerts to make sure every single application that's supposed to

24:04.440 --> 24:10.280
be running is sending you metrics. So yeah, we're going to fix it the easy way, an implement

24:10.280 --> 24:14.280
the equivalent of up equals 0, but until then, just be careful.

24:14.360 --> 24:16.360
Yeah.

24:17.400 --> 24:24.120
More questions? How many of you are going to use bailar to instrument your home lab now?

24:24.120 --> 24:28.520
Okay, cool. It actually is the funny thing, yes, you don't need there. You just need black

24:28.520 --> 24:36.360
box exporters, but what you can use bailar to it's cooler. You can do another blockpost or

24:36.360 --> 24:46.200
leg conference, talking out yet. All right, then thanks a lot.