WEBVTT

00:00.000 --> 00:10.880
My name is Roman. I'm a software engineer. I like go, play in video games and listen in

00:10.880 --> 00:18.080
podcasts. I leave in Ukraine, in United Kingdom, in Austria, and now I'm living in Poland.

00:18.080 --> 00:23.720
And in open-source, I've created things like click house data source. The first click

00:23.720 --> 00:30.760
house data source for Grafana and some balance in proxy for click house as well. And for

00:30.760 --> 00:37.160
last few years, I'm working on Victorian metrics. So, Victorian metrics, it's an open-source

00:37.160 --> 00:44.360
solution for monitoring. It collects metrics, processes metrics, provides interface for

00:44.360 --> 00:48.920
in Grafana for clarity in this metric. So, if you didn't know about such a project,

00:48.920 --> 00:55.000
visit them on GitHub, it's open-source, Apache to license. Yeah, please take a look.

00:56.120 --> 01:03.000
And what this talk is about? It's mostly inspired by experience of maintaining the project

01:04.200 --> 01:10.280
and especially open-source project. And it's about complexity of distributed systems.

01:10.280 --> 01:17.880
It will be useful for people who like or at least did try to make dashboards in Grafana

01:17.960 --> 01:21.880
and of a mirror with Grafana and Prometheus. Okay, so let's go.

01:23.240 --> 01:28.920
By the way, disclaimer, all of the content in this slides are generated by a human, not by AI.

01:32.120 --> 01:38.440
Rod map. He provides a transparency for all the people to be able to use this project.

01:38.440 --> 01:43.800
Like, for example, Prometheus is very good example of good open source project. It has all of this.

01:43.800 --> 01:50.040
And if you didn't put a star on that GitHub repo, I encourage you strongly to go and start

01:50.040 --> 01:57.880
Prometheus. But this is how it usually looks like in real world. There could be a brilliant software

01:57.880 --> 01:59.480
solving many, many problems.

02:03.160 --> 02:08.920
Scode there may be adds a readme file for 10 lines of code, like how to start it and that's all.

02:09.800 --> 02:16.600
And maybe this project is indeed cool and people will start using it. But if the project itself

02:16.600 --> 02:22.280
doesn't have this transparency, doesn't have the commutation, users will have the more questions they

02:22.280 --> 02:27.880
generate. And if there is no documentation, maintainer becomes a documentation of the project.

02:29.480 --> 02:32.520
And this is kind of a bad situation because everyone gets upset.

02:33.400 --> 02:39.400
The developer doesn't have much time to develop the project itself. He has to answer questions.

02:40.120 --> 02:45.960
Users are upset because they don't have the answers quickly enough, so they likely to leave the project.

02:45.960 --> 02:53.640
So what we need to do, we need to convert black box into some centransparent by adding all this stuff inside the project.

02:54.600 --> 03:00.040
So users can answer on their own questions with help of documentation,

03:00.040 --> 03:05.400
or with help with other community members. And then the developer and maintainer could do what

03:05.400 --> 03:10.920
he wants to do, like, improve the code, add more features to fix box, etc.

03:13.160 --> 03:19.720
Okay, so why this feature is here? Well, first of all, I find it title very interesting because

03:19.720 --> 03:27.160
it has the world world monitor in it. And if you read it, like a shepherd and her dog, monitor

03:27.160 --> 03:32.680
the ship. I really like this title and it's pretty elusive because you can think of

03:32.680 --> 03:40.840
shepherd as a devops, the ship as a infrastructure or applications or services, and a dog,

03:40.840 --> 03:47.960
a tool that shepherd uses to monitor all this stuff. And it's pretty cool, but now imagine

03:47.960 --> 03:55.720
that you have not like 10 or 20 ship, what about 1000 ship? Will it be enough to have one dog,

03:55.800 --> 04:03.000
to monitor 1000 ship? Maybe shepherd needs to run a distributed system of the dogs to monitor

04:03.000 --> 04:10.760
all this ship. And this is like, like, imagine, we need to monitor 1000 ship, and now we need to

04:10.760 --> 04:17.160
monitor distributed system of the dogs. And this becomes complex, very fast, and when I need to

04:17.160 --> 04:21.400
demonstrate complexity of distributed system, I'm usually showing this picture.

04:21.800 --> 04:28.760
So, this picture of Cortex, it's an open source monitoring solution based on the primitives,

04:28.760 --> 04:34.680
and it has many, many moving parts. Like, at least me personally, I find it complex.

04:35.400 --> 04:41.080
If I will have to run it, I will have to understand all this stuff. And if something goes wrong,

04:41.080 --> 04:47.720
I need a way to understand where it goes wrong and fix it. And maybe I will not have much time

04:47.800 --> 04:55.000
to do this. So, what we can do to help people who use this project to make it better?

04:55.960 --> 05:00.360
Well, from my experience of maintaining the two metrics, we can do the following things.

05:00.360 --> 05:06.280
We need to write good documentation. We need to instrument it with helpful logs and help

05:06.280 --> 05:12.440
meaningful metrics. And we need to provide a learning rules and dashboards. And this is exactly

05:12.440 --> 05:17.400
what we will be talking about right now. So, vector metrics is also distributed system.

05:17.400 --> 05:22.120
It also can be compact. There is no a silver bullet. If you need to monitor hundreds of millions

05:22.120 --> 05:27.320
of active time series, it will be complex. And there will be questions on GitHub as can why

05:27.320 --> 05:32.840
something doesn't work good. And here's how we do with that. Well, one of the tools is the

05:32.840 --> 05:38.840
Cortex for another sport that we provide. So, this is how this dashboard looks like.

05:40.440 --> 05:45.400
It's a shift in with every distribution of vector metrics, like every user can install it

05:45.400 --> 05:49.560
and gets all this numbers here. But does it actually explain to the user how

05:49.560 --> 05:54.120
vector metrics works? Well, if you first time log in into this dashboard,

05:54.120 --> 06:01.880
the only thing you can say about all green and all good, right? But okay, so what we can do with this?

06:02.920 --> 06:09.240
If you will try to Google how to make a good observability, how to provide

06:09.240 --> 06:14.120
understanding of the system for users, you will find some recommendations like

06:14.120 --> 06:17.880
recommendation of red method by the networks. They will recommend you to

06:19.640 --> 06:25.480
have at least three signals describing your system. And it's very simple. This signals are rates,

06:25.480 --> 06:30.600
errors, and duration. For example, you have to monitor a web server,

06:30.600 --> 06:37.000
web server accepts requests. This requests represent rate. This requests could fail,

06:37.000 --> 06:43.240
so we have errors. And this request have latency. This is duration. So, if we put all these three

06:43.240 --> 06:47.000
signals on the dashboard, we at least have some characteristic of the system,

06:47.000 --> 06:54.120
with this can say if it's okay or not. The reset technique by Google called for golden signals,

06:54.920 --> 07:01.320
and this is the same as red, but plus one signal saturation, which usually stands for measuring

07:01.320 --> 07:06.840
the finished resources like CPU saturation, network saturation, disk saturation, etc.

07:07.560 --> 07:15.080
So, we of course have the signals on the dashboard. So, vector metrics can accept

07:15.080 --> 07:20.840
right? It accepts rates. It can produce errors, of course, and there is latency for

07:20.840 --> 07:25.000
red request, for example. So, we have all these four golden signals on the dashboard,

07:25.560 --> 07:31.080
but the question is, are they actually helpful? Are they explaining the system?

07:32.040 --> 07:37.800
The problem of these signals is that they don't answer on the question, why? I'm having errors.

07:37.800 --> 07:44.520
Why? My latency is so high. For user, the system still remains a black box. It doesn't

07:44.520 --> 07:51.320
explain anything. It only shows the problem indicates it. So, what can we do? Well,

07:52.040 --> 07:58.920
Rapana doesn't provide, isn't a troubleshooting system. It doesn't provide you all these tools.

07:59.080 --> 08:04.600
So, we remet it with the technology of our time, but let's let's try to do something,

08:04.600 --> 08:11.320
and we will try to utilize this info buttons on the graphana panels. So, in the term metrics,

08:11.320 --> 08:17.800
cluster of the board, we heavily use this help tool tips, and they're mostly next to every panel,

08:19.160 --> 08:22.760
and here's how they look like. So, when user clicks on the info button,

08:23.560 --> 08:28.760
there is a description, what this panel means, how to interpret it, interpret it. Sorry.

08:29.000 --> 08:33.400
For example, the lower the better, and you can immediately say that is it good or no,

08:33.400 --> 08:39.560
and then there is extra description like what you can do if it reaches the threshold,

08:39.560 --> 08:47.400
whatever, and for the explanation given into the real mechanism, how this thing works inside.

08:47.400 --> 08:52.360
You can go to the GitHub discussion, you can go to the documentation, et cetera, and understand

08:52.360 --> 08:58.760
something how it works. So, the troubleshooting guides it, we build, using the dartboard,

08:58.760 --> 09:04.520
looks like this. For example, user opens a dartboard, he sees this first of the signals,

09:04.520 --> 09:10.760
and there is a normally with that signal. So, in this way, in this example, we have data points

09:10.760 --> 09:17.080
in direction right, it shows deep anomaly. What user should do? He should click on the info button,

09:17.160 --> 09:23.240
and there will be explanation what this panel means, and it says, hey, if you see problems with this

09:23.240 --> 09:28.600
graph, please go check the insert metrics. And the insert metrics is just another section on the

09:28.600 --> 09:33.800
dartboard. So, user can go to the very insert section, open the panels, and there will be

09:33.800 --> 09:39.560
probably another anomaly. In this case, there is a storage connection situation panel,

09:39.560 --> 09:45.240
that shows the anomaly. So, user can click on info button again, and there will be explanation

09:45.320 --> 09:52.360
of what this panel means, and what needs to be checked. So, it suggests that when insert talks

09:52.360 --> 09:58.760
to some VM storage, this is a state full storage, charging vector metrics, and probably you

09:58.760 --> 10:05.320
need to check either the situation resources of VM insert or VM storage. So, we follow the advice,

10:06.040 --> 10:12.440
and we check the resource situation of the VM insert. And we see that actually, there is no

10:12.440 --> 10:18.760
increase in situation, it's opposite. VM insert resource usage goes down. So, probably it's not

10:18.760 --> 10:26.520
VM insert full. So, let's continue following the advice and check VM storage metrics. And the next

10:26.520 --> 10:31.640
anomaly of VM storage, explaining that, hey, that's probably caused by background merge,

10:31.640 --> 10:38.680
you need to check the situation of CPU and IO. So, following this for steps already gives some

10:38.680 --> 10:43.880
understanding for the user, that it's not only writes into the black box, that inside the

10:43.880 --> 10:49.960
reason VM insert, and VM insert talks to the storage charts, and one of the connections to the storage

10:49.960 --> 10:56.040
chart was saturated. Something with this one single connection, or with one single storage chart.

10:56.040 --> 11:01.000
So, this kind of gives a context, a direction where user can take a look.

11:01.640 --> 11:08.600
Yeah, so that's the idea, is to hide the complexity of the distributed system, hide the complexity

11:08.600 --> 11:13.160
of the code and metrics itself, you don't need to run arbitrary expressions, you're just looking

11:13.160 --> 11:18.760
on the graphs and following the devices, and you can indicate the malicious agent behind this.

11:21.400 --> 11:26.360
Okay, so we maintain this dashboards for a couple of years, and I would like to share the top

11:26.360 --> 11:32.600
cool features that we started to use. With this time, maybe you will find it useful from the

11:32.600 --> 11:38.440
practical perspective. So, first is, of course, this helpful tips, I encourage everyone to put

11:38.440 --> 11:45.080
them on your panels and explain in a prompt way what the meaning. For example, we have some

11:45.080 --> 11:50.120
notion called slow queries, you may not have idea what it is, but the graph itself will

11:50.120 --> 11:56.360
tell you that it probably depends on this common line flag settings. If you want to change it,

11:56.360 --> 12:05.560
somehow, it is very easy to find it. What else? We do not show all the resources, all the components

12:05.560 --> 12:11.240
that we monitor, we show only outliers. So, the term metrics cluster is a distributed system.

12:11.240 --> 12:17.480
It can be small, like three instances, it can be very big, like 100 instances. If I would like

12:17.480 --> 12:24.600
to show the memory usage of all the components, if I would put 100 of them on this graph,

12:24.600 --> 12:32.040
it will be a mess. I can't read it. I'm just a human. So, what we do here is the two things.

12:32.040 --> 12:37.240
Well, first, we show relative usage of memory. It's a percentage usage, because different

12:37.240 --> 12:43.400
components can have different limits on memory or in CPU or whatever. And also, we not show

12:43.480 --> 12:49.480
in every scene, we show the outliers are components. So, we show metrics consists of three

12:49.480 --> 12:55.720
component types, and we show only those who consume the most, only those instances that consume

12:55.720 --> 13:03.960
the most of the memory. And looking at this panel, helps me to understand pretty quickly, if I'm

13:03.960 --> 13:10.520
okay, like if something is 90%, I'm probably not okay. And you don't need to check 100 lines,

13:10.520 --> 13:16.840
you know, the tone distance that you are not okay. What you can do next when you found the outlier?

13:16.840 --> 13:23.480
Well, there is a very cool feature in Grafana. You can click on this panel on the line, and

13:25.480 --> 13:32.200
it will take you to another panel that will represent this metric in a different perspective.

13:32.200 --> 13:37.640
So, in this case, we go from the relative representation of outliers to the absolute

13:37.720 --> 13:43.880
representation of all the components. This feature is super cool, but you know what is

13:43.880 --> 13:49.960
bad about this feature, anyone knows? Do you know that this feature exists?

13:52.280 --> 13:57.240
Yeah, this feature doesn't exist, and Grafana doesn't, sorry, it exists, but you don't know about

13:57.240 --> 14:03.240
this, and Grafana doesn't help you to show on the panel that you can actually click on the line.

14:04.200 --> 14:09.320
So, I would think maybe we can enhance this, and I created a feature request to Grafana,

14:10.040 --> 14:16.520
which about it's create an alternative use for a panel. So, for example, I want to show CPU usage,

14:17.160 --> 14:24.440
and would be nice if I can click if I can switch between two modes, a relative mode,

14:24.440 --> 14:30.840
an absolute mode, and this will be a display in the title of the panel. So, if you like this proposal,

14:30.920 --> 14:33.560
this is a ticket 9, 9, 8, 6, 1.

14:40.440 --> 14:47.000
Okay, let's go next. The geometric components are mostly configured with common line flags,

14:47.800 --> 14:56.360
and we try in very hard, so user don't need to configure that. We have this meaningful default,

14:56.440 --> 15:02.760
with optimal work of the components, but there are always some corner cases when something needs to be

15:02.760 --> 15:08.760
tuned up. And usually this corner cases is the main source of misconfiguration, of course.

15:09.480 --> 15:16.520
So, in order to spot instantly, if the reason is configuration, we expose every common line

15:16.520 --> 15:22.360
fact in form of a metric, and this metric called flag, and it shows you the value of the flag,

15:22.360 --> 15:29.240
the name of the flag, and if it was overreaden by a user. So, we have this panel with a non default

15:29.240 --> 15:36.680
flag, and I can instantly see that someone said like they want to select 250 million of unique

15:36.680 --> 15:42.120
time series per request, which probably could lead to problems with memory usage when you do this.

15:44.440 --> 15:51.000
What next? We have a user notation, so we expose the versions of the component in the form of a

15:51.000 --> 15:57.240
metric, so we then can build an annotation query, which will show on the graph when version of any

15:57.240 --> 16:02.440
component has changed. And it's like an operator. If you see the annotation on the graph,

16:02.440 --> 16:07.560
and something went bad after that, some performance degradation, you don't need to think, you just

16:07.560 --> 16:15.240
roll back, and then you investigate the something, what actually caused that. We also use annotation

16:15.240 --> 16:23.400
for restart, and restart could happen in many cases, but it could be also out of memory exception,

16:23.400 --> 16:29.000
if something crashes because of lack of memory, you have to take a look on that, or if you change

16:29.000 --> 16:33.880
the common line fact, you also had to restart it, so probably the user changed configuration when

16:33.880 --> 16:41.880
they did this. What else? Of course, the term metric components produce some logs, warning errors,

16:41.960 --> 16:48.200
and etc, and we expose that in the form of metrics as well. And even if I don't have access

16:48.200 --> 16:53.960
to the logs of the components, I still can check them on the panel in Grafana, and they have

16:53.960 --> 17:00.040
this label pointing to exact line of the code, which produced this error. And since this,

17:00.040 --> 17:04.520
the source code is open, I can just check which line of the code produce it, and I can guess

17:04.520 --> 17:08.520
what is the error without having to take a look on the logs itself.

17:12.360 --> 17:18.120
Yeah, so all this dashboard goes out of the box, and we recommend in our best practice

17:18.120 --> 17:23.400
system to use the dashboard, and we encourage everyone to use it, and we also ship it with the

17:23.400 --> 17:28.200
alerting crews, which we think are very helpful to understand if system is okay.

17:29.000 --> 17:34.680
This is how alerting crew looks like. It contains the similar context, the summary and description

17:34.680 --> 17:40.840
point in what the problem is, and how to solve this, and also they contain a link to the

17:40.840 --> 17:45.960
dashboard, the same dashboard that I showed you. So if you use our alerting crews and our

17:45.960 --> 17:51.880
dashboard, and you receive the alerting crews firing, you can just click on the link, and it will take

17:51.880 --> 17:57.240
you to the panel, which explain what's happening here, and you will get the context what happened

17:57.240 --> 18:03.160
before. You will have the information tool seep with recommendations, and yeah, I think this

18:03.160 --> 18:07.560
interconnection between alerting crews and dashboard is pretty helpful, at least it helped me

18:07.560 --> 18:19.400
to troubleshoot it much faster than before. All this is also depends heavily on the documentation.

18:19.400 --> 18:25.240
I ran this command two days ago, and it says that we have 90,000 lines of documentation in

18:25.240 --> 18:31.240
the project itself. It doesn't mean that we add documentation every day. We constantly refine it,

18:31.240 --> 18:36.760
trying to make it clear, but yeah, this is what it takes to describe the distributed system.

18:37.960 --> 18:44.360
And when we accept pull requests, we also require from user not only to make a cool change

18:44.360 --> 18:49.880
in test. We also require to make a documentation change, because if you introduce a new flag

18:49.880 --> 18:54.200
or change the behavior of existence, this should be reflected in the documentation.

18:54.200 --> 19:01.640
Because otherwise, it will go unthink pretty fast. And other features that we use in the

19:01.640 --> 19:07.960
documentation, they'll start it using not so long ago, is the version label. So when we

19:07.960 --> 19:13.800
introduce a new feature, and we're mentioned to upstream our documentation render sit instantly.

19:13.800 --> 19:17.960
And some users when reading through the documentation, they can find the feature as it doesn't

19:17.960 --> 19:24.840
still exist in their version, at least. So we have this macro in markdown that's

19:24.840 --> 19:29.880
called available from, which points to exact version when this feature was introduced, and

19:29.960 --> 19:35.800
user can get from the documentation when this flag was added. Of course, the result automation,

19:35.800 --> 19:39.960
you don't need to put exact version, you can put a placeholder in it, and then on release,

19:39.960 --> 19:45.320
it will be automatically replaced with the actual version for you and published with the latest tag.

19:47.720 --> 19:53.640
Okay, so as I said, we cross-refer our documentation to keep it fresh. We'll use the

19:53.640 --> 20:01.480
documentation links in our dashboards, in our other includes, in our code, in every public platform

20:01.480 --> 20:06.520
where we help users to answer their questions. And this is what helps us to keep it up to date.

20:06.520 --> 20:11.960
And we also do really care about broken links. If something was answered on the GitHub

20:11.960 --> 20:18.040
like three years ago with the documentation link, it still should work. Amen right now. So if

20:18.040 --> 20:22.680
you find any broken links in our report, just let us know. We really try and to keep them

20:23.240 --> 20:30.280
alive. And yeah, we do care about all this that I showed. All this is available in open source,

20:30.280 --> 20:35.640
and we really want you to use that. And we really do care about this because we use it every day.

20:35.640 --> 20:41.560
We do put the same dashboards and alerts and calls internally. So the term metrics provide the

20:42.280 --> 20:47.960
enterprise support to customers, and those customers can send data to the laboratory to our

20:47.960 --> 20:53.160
cloud. So we basically receive in the same metrics of the vector metrics component to our cloud,

20:53.160 --> 20:57.960
and then we can reuse the same dashboards that I showed you, the same alerts and refer to the same

20:57.960 --> 21:04.840
docs. And this is how it usually looks like. We have support engineers who receive the trigger

21:04.840 --> 21:10.680
alerts. These are software engineers, the maintainers of the vector metrics itself. So when

21:10.680 --> 21:15.640
they receive an alerts, they can immediately see if this meaningful alert, if it's a false positive,

21:15.640 --> 21:19.320
or not. And if it is a false positive, maybe it needs to be changed in the upstream.

21:20.040 --> 21:24.840
Then they go to the dashboard and they check the same panel, and they also can apply any

21:24.840 --> 21:30.840
modifications to it, maybe make it more clear to change the description, etc. Then we apply changes

21:30.840 --> 21:37.720
in the upstream in the first place, and then from the upstream it goes to the internal system.

21:38.280 --> 21:43.320
And yeah, then the system back again to the green state. So this is why it's called

21:43.960 --> 21:51.240
monitoring of monitoring, and this is how we're used our own dashboards and alerting tools.

21:52.040 --> 21:57.720
Yeah, so you can check our dashboards in our public playground. All this again is public. You can

21:57.720 --> 22:03.160
try it. You can see all the descriptions. You can play with expressions. You can check our alerting

22:03.240 --> 22:07.240
tools and documentation. And that's it.

22:17.720 --> 22:21.320
So thanks a lot. Are there any questions?

22:21.320 --> 22:25.240
Yes, there's a question.

22:33.800 --> 22:39.400
Thank you for the brilliant talk. I would like to ask you about, I would like to ask you about

22:39.400 --> 22:44.280
monitoring of monitoring, actually, where some kind of harbids or something like

22:44.840 --> 22:47.960
can show you that monitoring is actually broken.

22:47.960 --> 22:55.400
For example, if we have automations that shows that monitoring is broken,

22:55.400 --> 23:01.560
yeah, well of course, like if you check our best practices, the commutation is such as that

23:01.560 --> 23:05.880
every monitoring should have a monitoring of monitoring. So basically we're on two

23:05.880 --> 23:10.920
these sort of metric systems that monitor each other, like cross monitoring. If something dies,

23:11.000 --> 23:17.240
there's another system we'll let you know. Is there one more here?

23:20.680 --> 23:26.360
And do you then have also different infrastructure for these two instances? Because we have

23:26.360 --> 23:32.600
at the moment this problem that we would like, for example, Loki with an object door back end,

23:32.600 --> 23:39.720
but Loki is also collecting lots of our object door solution. So we don't really want to

23:39.800 --> 23:43.000
depend all Loki on the thing that Loki is monitoring.

23:45.000 --> 23:50.600
Yes, so the question is, do you need to have a different infrastructure to cross monitoring?

23:50.600 --> 23:56.360
Yes, that would be the best way how you can do this. They need to be independent for sure

23:56.360 --> 24:01.160
if you can afford that, if you can do that in front of me. Yes, I recommend to do this.

24:01.160 --> 24:07.480
But you also can use like, um, kills which mechanism if I'm not mistaken, like if,

24:09.720 --> 24:13.880
yeah, you're right, basically, but there are cheaper ways to do this to at least notify you

24:13.880 --> 24:19.880
that you're monitoring is broken. More questions? Yeah.

24:24.760 --> 24:30.040
When you had the list of errors, a number of errors or warnings in the logs,

24:31.000 --> 24:33.320
could that cause an issue with cuttingality?

24:34.200 --> 24:40.360
So the question is, if I have many logs with errors, does it cause issues with cuttingality?

24:40.360 --> 24:47.960
No, because the number of lines with the errors is limited, it's finished. So if we count the number

24:47.960 --> 24:54.040
of unique logs, it probably could be 100 at atmox, but I believe it much lower, so it doesn't

24:54.040 --> 25:02.520
go. It just looks dangerous, it is not in reality. Anyone? Oh, up there?

25:06.440 --> 25:11.480
Hi, thanks for the talk. I really like the section with the tool tips on the dodge parts,

25:12.360 --> 25:18.200
but to be honest, I would never want to thought of this myself, and I guess my question is,

25:18.200 --> 25:21.800
once you first started getting users onboarded on this dodge part, and you're

25:21.800 --> 25:27.240
building a general, how did you make sure that these features and this documentation is

25:27.240 --> 25:31.080
discoverable enough for people to actually RTF them? Thanks.

25:33.640 --> 25:38.440
The question was, how do I find that people actually use the information tool tips,

25:38.440 --> 25:46.680
and if they are helpful? Yeah, so we don't have any way to know that, but we have a lot of

25:46.680 --> 25:53.400
question on the GitHub and public platforms. And here's how we can make our support better.

25:53.400 --> 25:59.880
So we first think that we ask, is can you give a screenshot of your graphana? And we also have

25:59.880 --> 26:04.920
troubleshooting checklists that we also have a link, and we share it with user, and there is a

26:04.920 --> 26:09.400
steps like what you need to do, like check this panel, check this panel, read the stuff, and

26:10.280 --> 26:16.360
yeah, there is no clear way, but this is how we usually help people with let them know that

26:16.360 --> 26:21.080
this is the list, and then community is supposed to take it off from there.

