WEBVTT

00:30.000 --> 01:00.000
I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to

01:00.000 --> 01:30.000
anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't

01:30.000 --> 02:00.000
want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody

02:00.000 --> 02:30.000
I don't want you, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to anybody, I don't want to talk to somebody, I don't want to talk to somebody, I don't want to talk to people, I don't want a thing, I don't want people, I don't want駄 resources, I don't want people, I don't want to talk with anybody, I don't want anybody, I don't want to talk to anyone,

02:30.000 --> 02:33.000
what we can learn from formula 1, incident management.

02:33.000 --> 02:35.000
Very excited about this topic.

02:41.000 --> 02:44.000
Can I get started in a minute or so?

02:44.000 --> 02:47.000
OK, let's wait a minute or so.

02:47.000 --> 02:49.000
So, first of all, let's start by.

02:49.000 --> 02:51.000
Thank you, everyone, for being here.

02:51.000 --> 02:52.000
Seeing so many people.

02:52.000 --> 02:55.000
In the talk, it's about incident management and formula 1.

02:55.000 --> 02:59.000
So, I'm assuming you guys are very interested in incidents, not formula 1, right?

02:59.000 --> 03:01.000
That's my assumption.

03:01.000 --> 03:02.000
Okay.

03:02.000 --> 03:03.000
Let's give it a minute.

03:12.000 --> 03:14.000
And other colors like the N1.

03:23.000 --> 03:24.000
Like this?

03:24.000 --> 03:25.000
It's better?

03:25.000 --> 03:26.000
Yeah.

03:26.000 --> 03:27.000
I can speak louder.

03:27.000 --> 03:29.000
So, let's give it a go.

03:29.000 --> 03:31.000
So, thank you, everyone.

03:31.000 --> 03:33.000
My name is Ricardo.

03:33.000 --> 03:36.000
And this talk is in the monitoring and observing room,

03:36.000 --> 03:39.000
but it is slightly different from what you've been having up until now.

03:39.000 --> 03:41.000
So, I work for a product company.

03:41.000 --> 03:44.000
So, incidents are the thing that we focus a lot,

03:44.000 --> 03:46.000
and it's our very important for us.

03:46.000 --> 03:50.000
So, as most things, in software engineering,

03:50.000 --> 03:54.000
which is a very recent engineering practice,

03:54.000 --> 03:56.000
when we compare to other types of engineering,

03:56.000 --> 03:59.000
there's a lot that we can learn from other areas, right?

03:59.000 --> 04:02.000
So, for example, if we think about patterns like both heads,

04:02.000 --> 04:04.000
it comes from the shipping industry.

04:04.000 --> 04:06.000
When we talk about canary deployments,

04:06.000 --> 04:09.000
it comes from the mining industry.

04:09.000 --> 04:11.000
So, if we start looking at other industries,

04:11.000 --> 04:13.000
what can we learn from that?

04:13.000 --> 04:17.000
So, today we're going to start analyzing

04:17.000 --> 04:21.000
and a specific incident during an F1 race,

04:21.000 --> 04:24.000
and we're going to see the things that they put in practice,

04:24.000 --> 04:27.000
that they do, and what we can learn from that.

04:27.000 --> 04:29.000
So, either we like it or not.

04:29.000 --> 04:33.000
F1 is one of the most demanding engineering practices in the world, right?

04:33.000 --> 04:38.000
So, they can win or lose a championship by 1,000 of a second.

04:38.000 --> 04:40.000
So, that's how critical that is.

04:40.000 --> 04:42.000
So, 1,000 of a second is the difference between

04:42.000 --> 04:44.000
becoming first or second.

04:44.000 --> 04:46.000
So, they're really precise on things.

04:46.000 --> 04:50.000
And they're basically striping the human being to an engine, right?

04:50.000 --> 04:51.000
That's all they do.

04:51.000 --> 04:53.000
And all the aerodynamics is to make it go fast.

04:53.000 --> 04:54.000
That's basically it.

04:54.000 --> 04:57.000
So, when they have an incident, what do they do?

04:57.000 --> 05:01.000
So, let's pick a specific incident that happened in the 2020,

05:01.000 --> 05:04.000
and let's see the bunch of things that they did to actually

05:04.000 --> 05:05.000
make it successful.

05:05.000 --> 05:09.000
And at the end, I'll promise, I'll show you what this success was.

05:09.000 --> 05:12.000
So, I'm going to show you like 15 to 20 second videos.

05:12.000 --> 05:14.000
I don't know if we have how you or not,

05:14.000 --> 05:17.000
but all those videos have subtitles so it should be fine.

05:17.000 --> 05:19.000
So, let's start by setting a stage and understanding

05:19.000 --> 05:20.000
what the incident was.

05:20.000 --> 05:26.000
So, we don't have sound, but I'll narrow it.

05:26.000 --> 05:30.000
So, basically, this is a race in Hungary in 2020.

05:30.000 --> 05:31.000
This is Max for stopping.

05:31.000 --> 05:33.000
For those who don't know who Max for stopping is.

05:33.000 --> 05:35.000
He's currently a four-time old champion.

05:35.000 --> 05:40.000
At this point in his career, he didn't have any world champions yet.

05:40.000 --> 05:43.000
But it was probably the first year where Red Bull was actually

05:43.000 --> 05:46.000
a challenger to Mercedes, and there was some kind of pressure

05:46.000 --> 05:47.000
to actually perform.

05:47.000 --> 05:49.000
This is a third race in the championship.

05:49.000 --> 05:50.000
So, basically, what happened?

05:50.000 --> 05:54.000
So, Max is challenging Mercedes, and Louis Hamilton,

05:54.000 --> 05:57.000
who probably most of you know, for the world title.

05:57.000 --> 06:01.000
He didn't have his best qualifying, he qualified seventh.

06:01.000 --> 06:03.000
So, if he wants to continue to challenge,

06:03.000 --> 06:04.000
he has to do a very good race.

06:04.000 --> 06:08.000
He needs to get him somewhere in the podium.

06:08.000 --> 06:11.000
So, he has to at least get four places during the race.

06:11.000 --> 06:15.000
On Sunday, during the formation lap, which is the lap

06:15.000 --> 06:18.000
that the drivers used to get their tires warm.

06:18.000 --> 06:21.000
So, tire temperature is a very critical thing in Formula 1.

06:21.000 --> 06:24.000
So, if it's too cold, they can try fast.

06:24.000 --> 06:27.000
If it's too hot, they're burning rubber too fast,

06:27.000 --> 06:32.000
which means they need to go a lot more times to change the size.

06:32.000 --> 06:33.000
So, what happens?

06:33.000 --> 06:36.000
During the formation lap, cold weather.

06:36.000 --> 06:39.000
So, Max is trying to put his tires as hard as possible.

06:39.000 --> 06:42.000
But there's also a lot of rain in the track.

06:42.000 --> 06:45.000
Max goes too deep into a corner, and what happens?

06:45.000 --> 06:49.000
Brakes lock, and he goes into a barrier.

06:49.000 --> 06:52.000
So, if you notice, and I'll just put it again, just for context.

06:52.000 --> 06:57.000
If you notice, the moment he breaks, or the moment he crashes,

06:57.000 --> 06:59.000
there's a timer counting on top.

06:59.000 --> 07:01.000
And I'll explain what that timer is in a second.

07:01.000 --> 07:03.000
So, he's breaking tires lock.

07:03.000 --> 07:05.000
He goes into a barrier, 22 minutes.

07:05.000 --> 07:07.000
I'll explain in a second what that means, right?

07:07.000 --> 07:14.000
So, first thing that we need to have on incidents.

07:15.000 --> 07:19.000
We need to make fast decisions with limited information, right?

07:19.000 --> 07:22.000
So, in a middle of a set one, there's an outage.

07:22.000 --> 07:25.000
You don't have time to get all the data, right?

07:25.000 --> 07:28.000
You'll have partial information, and you need to make fast decisions.

07:28.000 --> 07:29.000
Let's see what they did.

07:29.000 --> 07:32.000
Let's see what was one of their problems.

07:32.000 --> 07:35.000
So, they're trying to make a decision here.

07:37.000 --> 07:38.000
I'll let the video play.

07:38.000 --> 07:40.000
It's like 15 seconds.

07:45.000 --> 07:48.000
So, setting the stage again.

07:48.000 --> 07:51.000
They have 22 minutes to get the car ready to race, right?

07:51.000 --> 07:53.000
And they have to make the decision.

07:53.000 --> 07:56.000
They do a pit lane start, which means that they put the car on the pit lane.

07:56.000 --> 07:58.000
They have all the tools there.

07:58.000 --> 07:59.000
They have all the mechanics.

07:59.000 --> 08:01.000
Everything is there, but they will start last.

08:01.000 --> 08:05.000
So, all the cars will go out, and max will start last.

08:05.000 --> 08:09.000
So, it will means that he will probably have to overtake over 20 cars,

08:09.000 --> 08:10.000
just make it on a podium.

08:10.000 --> 08:14.000
Or, they put it on the track, but they have to fix it there.

08:14.000 --> 08:17.000
So, they have to take all their tooling there.

08:17.000 --> 08:21.000
Everything there, mechanics, tools, front wing.

08:21.000 --> 08:23.000
But they lose four minutes.

08:23.000 --> 08:24.000
Why is that?

08:24.000 --> 08:28.000
Because four minutes before the race, all the four tires need to be on the ground.

08:28.000 --> 08:31.000
So, it means that they just lose four minutes,

08:31.000 --> 08:33.000
and they have to take the car off the grid.

08:33.000 --> 08:34.000
Right?

08:34.000 --> 08:35.000
Here's a problem.

08:35.000 --> 08:37.000
And they need to do a decision.

08:37.000 --> 08:39.000
They only have to let me treat data.

08:39.000 --> 08:42.000
So, a formula one car, quality has thousands of sensors.

08:42.000 --> 08:45.000
They have a lot of telemetry there, but no one looked at the car.

08:45.000 --> 08:47.000
At this point, no one.

08:47.000 --> 08:48.000
No one, no one.

08:48.000 --> 08:51.000
They assume there is a suspension break, front wing to change,

08:51.000 --> 08:53.000
and they need to make a decision.

08:53.000 --> 08:54.000
And they make a decision.

08:54.000 --> 08:58.000
So, we're going to the grid, and we're going to do everything there.

08:58.000 --> 09:00.000
So, when we're talking about fast decisions,

09:00.000 --> 09:01.000
we'll leave it till information.

09:01.000 --> 09:05.000
So, we need to make decisions that are decisively,

09:05.000 --> 09:07.000
even if we don't have the full picture.

09:07.000 --> 09:08.000
So, what we want to do?

09:08.000 --> 09:10.000
So, we want to minimize the impact.

09:10.000 --> 09:14.000
So, we need to act fast to stop small problems from getting too big.

09:14.000 --> 09:18.000
So, in this example, it's probably not the best examples,

09:18.000 --> 09:22.000
but if you have a set one, most of the times you need to make a fast decision

09:22.000 --> 09:25.000
before things get too big, and you're over your head.

09:25.000 --> 09:27.000
You also need to protect your customers, right?

09:27.000 --> 09:30.000
So, maybe you're approaching in a finance industry,

09:30.000 --> 09:33.000
and if you don't do something, your customers are losing money, right?

09:33.000 --> 09:35.000
So, that's something that you need to do quickly.

09:35.000 --> 09:38.000
Maybe you need to hold your whole system down so that you can lose money.

09:38.000 --> 09:40.000
So, you have to preserve revenue.

09:40.000 --> 09:42.000
So, at the end of the day, most companies live on revenue.

09:42.000 --> 09:46.000
So, you'll have to preserve the revenue company is making.

09:46.000 --> 09:49.000
And, of course, you want to maintain customers' stress.

09:49.000 --> 09:51.000
You want to show your customers that you're in control.

09:51.000 --> 09:55.000
You know that you have an incident and you're doing everything as possible

09:55.000 --> 09:57.000
to get to resume service.

09:57.000 --> 10:01.000
So, as engineers, we need to become comfortable with this discomfort, right?

10:01.000 --> 10:02.000
Incidents will happen.

10:02.000 --> 10:05.000
It's not a matter of if they will happen.

10:05.000 --> 10:06.000
It's when.

10:06.000 --> 10:07.000
They will always happen.

10:07.000 --> 10:11.000
So, we need to become comfortable with the fact that we will need to make decisions fast

10:11.000 --> 10:13.000
and we don't have the full pictures.

10:13.000 --> 10:18.000
And we need to keep in mind that many of those decisions will be sub-optimal

10:18.000 --> 10:21.000
because we don't have the full picture, and sometimes we'll be completely wrong.

10:21.000 --> 10:22.000
So, we miss the mark.

10:22.000 --> 10:23.000
And that's okay.

10:23.000 --> 10:26.000
It happens.

10:26.000 --> 10:30.000
Another critical point in an incident is clear communication.

10:30.000 --> 10:35.000
So, let's see an example.

10:35.000 --> 10:55.000
Yeah, the sound makes it better, but I'll explain it.

10:55.000 --> 10:56.000
So, I'll explain it a little bit.

10:56.000 --> 10:58.000
So, basically, they're taking the car to the grid.

10:58.000 --> 11:02.000
They don't have anything there, so they need to get everything in there.

11:02.000 --> 11:06.000
So, parts, mechanics, tools, everything in there.

11:06.000 --> 11:10.000
So, with the sound, it's more clear, but basically, they are communicating with each other.

11:10.000 --> 11:14.000
What they need, where they need to be, where the car needs to be, who's engineers need to stay in the garage

11:14.000 --> 11:15.000
and need to be there.

11:15.000 --> 11:18.000
And again, they have less than 22 minutes to make this happen.

11:18.000 --> 11:21.000
So, it's super critical that everyone communicates super clear.

11:21.000 --> 11:25.000
What they need, what they need to do, what tools do they need, what are the spare parts.

11:25.000 --> 11:30.000
And one needs to look at the car and understand what broke, what they need to change and whatnot.

11:30.000 --> 11:31.000
So, clear communication.

11:31.000 --> 11:37.000
It's all about sharing the right information with the right people at the right time, right?

11:37.000 --> 11:42.000
So, if we have clear communication, we are ensuring faster resolution, right?

11:42.000 --> 11:46.000
We need to communicate clearly so that we are all on the same page.

11:46.000 --> 11:49.000
Who's here has been in a situation where you're trying to fix an incident,

11:49.000 --> 11:52.000
and there's two people looking at the same thing at the same time,

11:52.000 --> 11:54.000
all messing with the same services.

11:54.000 --> 11:57.000
I'm changing this, and then suddenly someone changes it as well.

11:57.000 --> 11:58.000
Yeah, that happens, right?

11:58.000 --> 12:03.000
If we do this in under 32 minutes, we're trying to change the suspension, it's going to be a mess, right?

12:03.000 --> 12:04.000
So, we've all been there.

12:04.000 --> 12:05.000
So, this is super important.

12:05.000 --> 12:08.000
Also, we want to reduce the errors, right?

12:08.000 --> 12:12.000
So, I'm sending the wrong message to my colleagues, so he's doing something that he's not supposed to do.

12:12.000 --> 12:14.000
We make things even worse.

12:14.000 --> 12:16.000
We want to improve coordination, right?

12:16.000 --> 12:18.000
So, I'm doing engineers.

12:18.000 --> 12:21.000
We are working, I need you to do something, you need me to do something.

12:21.000 --> 12:24.000
We need to be super clear on what's needed, what needs to be addressed.

12:24.000 --> 12:26.000
And of course, increase transparency.

12:26.000 --> 12:29.000
If communicate in this clear, everyone understands what's happening, right?

12:29.000 --> 12:34.000
So, we're working on having good communication, help everyone, what's going on,

12:34.000 --> 12:37.000
and how can we make this resolution faster?

12:37.000 --> 12:41.000
And of course, we want to keep everyone in the loop and maintain trust during incidents.

12:41.000 --> 12:44.000
I'm asking you to do something, I trust you.

12:44.000 --> 12:45.000
This is the clear direction.

12:45.000 --> 12:49.000
And of course, for non-technical stakeholders, it's super important that they understand

12:49.000 --> 12:54.000
what the hell's going on, what we're doing, and that we are working to make things faster.

12:54.000 --> 12:57.000
Next up, clear processes.

12:57.000 --> 12:59.000
Let's see it again.

12:59.000 --> 13:04.000
So, at this point, they need spare parts, they need tools.

13:04.000 --> 13:06.000
They don't have it there, right?

13:06.000 --> 13:10.000
It's not their garage, so they need to bring a lot of stuff on track.

13:10.000 --> 13:15.000
So, they are basically communicating what they need, right?

13:15.000 --> 13:21.000
And, at the end, there's the director of the garage saying, guys, we have clear processes

13:21.000 --> 13:23.000
for this, like, please follow them.

13:23.000 --> 13:24.000
It's super critical, right?

13:24.000 --> 13:28.000
They're probably going to have dozens of engineers trying to get a suspension off and suspension

13:28.000 --> 13:29.000
on.

13:29.000 --> 13:34.000
If they don't follow those procedures, they have no chance of getting this work.

13:34.000 --> 13:40.000
So, clear processes are the fine steps that will guide us through incidents.

13:40.000 --> 13:43.000
So, they will allow us to have rapid response.

13:43.000 --> 13:47.000
If you have a clear process on how you manage incidents, it will be a lot easier.

13:47.000 --> 13:52.000
So, if you have people that have a fine test during an incident, it will be a lot easier.

13:52.000 --> 13:55.000
It allows you to reduce chaos, right?

13:55.000 --> 13:58.000
Again, the same example as before, if we have two people working on the same thing,

13:58.000 --> 14:01.000
looking at the same dashboard, messing with the same services, it will be a mess, right?

14:01.000 --> 14:04.000
So, you want to reduce chaos as much as possible.

14:04.000 --> 14:06.000
You want consistent execution, right?

14:06.000 --> 14:09.000
So, you don't want to go from one incident to the other,

14:09.000 --> 14:12.000
and then everything is completely different, right?

14:12.000 --> 14:17.000
You want some sort of resemblance of consistency from one incident to another.

14:17.000 --> 14:19.000
And of course, you will improve collaboration, right?

14:19.000 --> 14:23.000
Everyone understands exactly what they need to do to address this incident.

14:23.000 --> 14:27.000
So, you have to plan before incident happen.

14:27.000 --> 14:28.000
So, how do you do this?

14:28.000 --> 14:32.000
You actually have a plan in place, and you train, right?

14:32.000 --> 14:37.000
So, if you, I don't know who here is an F1 fan, but if you see before the race,

14:37.000 --> 14:41.000
there are people just testing out how to change a tire, right?

14:41.000 --> 14:44.000
But just there, taking tires off, putting tires on.

14:44.000 --> 14:46.000
Taking tires off, putting tires on, because it's super critical,

14:46.000 --> 14:48.000
like milliseconds are important here.

14:48.000 --> 14:52.000
And of course, clear steps make for a faster response.

14:52.000 --> 14:55.000
So, if we all understand during an incident, what our role is,

14:55.000 --> 15:00.000
what we need to do, it will make the incident resolution a lot faster.

15:00.000 --> 15:03.000
Right? This should be an obvious one, right?

15:03.000 --> 15:06.000
But during an incident, teamwork is critical.

15:06.000 --> 15:11.000
So, this is here just an example of a small snippet of all of the engineers

15:11.000 --> 15:13.000
just working together, right?

15:13.000 --> 15:16.000
They're basically taking suspension off, putting on,

15:16.000 --> 15:18.000
the max is just looking at the car.

15:18.000 --> 15:23.000
They're bringing a front nose from actually front nose is fairly straight forward to change.

15:23.000 --> 15:26.000
During a race, it's tricky because it takes a few seconds,

15:26.000 --> 15:27.000
and every second counts.

15:27.000 --> 15:30.000
But for here, it's basically just a 50-game mistake, right?

15:30.000 --> 15:31.000
So, it's not that bad.

15:31.000 --> 15:34.000
A suspension is a whole different thing.

15:35.000 --> 15:36.000
So, teamwork.

15:36.000 --> 15:39.000
It's all about collaboration and share responsibility, right?

15:39.000 --> 15:41.000
It's not a matter if I fix it,

15:41.000 --> 15:44.000
or you fix it, the matter of getting things over the line.

15:44.000 --> 15:47.000
So, you need never skills for faster fixes.

15:47.000 --> 15:49.000
You need people to, who will be like,

15:49.000 --> 15:51.000
who will be the engineers fixing thing.

15:51.000 --> 15:53.000
You need someone who will handle comms.

15:53.000 --> 15:57.000
So, you need a diverse skill set to actually get this over the line.

15:57.000 --> 15:59.000
Like teamwork makes the dream work, right?

15:59.000 --> 16:01.000
So, this collaboration will actually be what,

16:01.000 --> 16:04.000
I would allow you to have a smoother transition from,

16:04.000 --> 16:06.000
I have a problem, it is fixed.

16:06.000 --> 16:08.000
Everyone plays a part, like I said before.

16:08.000 --> 16:11.000
It's not all just about fixing this thing.

16:11.000 --> 16:13.000
It's all about communication, making stakeholders accountable,

16:13.000 --> 16:15.000
making customers understand what's going on.

16:15.000 --> 16:18.000
And of course, it has the side effect of,

16:18.000 --> 16:20.000
we all learn together, right?

16:20.000 --> 16:23.000
So, if we're working together, we understand what's happening.

16:23.000 --> 16:27.000
So, who here has read the Phoenix Project?

16:27.000 --> 16:28.000
Right?

16:28.000 --> 16:29.000
So, you know, Brent?

16:29.000 --> 16:31.000
So, Brent is the antithesis of teamwork, right?

16:31.000 --> 16:33.000
Brent is the guy who fixes everything.

16:33.000 --> 16:35.000
It's the guy who knows the whole system,

16:35.000 --> 16:36.000
who does everything.

16:36.000 --> 16:38.000
There's no teamwork there, right?

16:38.000 --> 16:39.000
So, and it's this idea.

16:39.000 --> 16:42.000
So, this idea is that we don't want Brents.

16:42.000 --> 16:44.000
We want people like Brent who has some knowledge,

16:44.000 --> 16:47.000
but it's about a whole team fixing a problem, right?

16:47.000 --> 16:49.000
So, teams that work together,

16:49.000 --> 16:52.000
solve problems faster, like even the most experience

16:52.000 --> 16:55.000
and most knowledge engineers in one organization.

16:55.000 --> 16:57.000
When the system gets big enough,

16:57.000 --> 16:59.000
one person will take longer to fix it,

16:59.000 --> 17:01.000
and if we don't do it together.

17:01.000 --> 17:05.000
Of course, teamwork builds a more resilient and effective response

17:05.000 --> 17:06.000
to incidents.

17:06.000 --> 17:08.000
Brent will have to go on holiday sometime, right?

17:08.000 --> 17:09.000
Brent gets sick.

17:09.000 --> 17:11.000
So, if we don't have all of these things in practice,

17:11.000 --> 17:14.000
we will have serious problems.

17:14.000 --> 17:17.000
Something that is easier said and done,

17:17.000 --> 17:21.000
incidents, we need to keep calm, right?

17:21.000 --> 17:24.000
So, this is probably the video where the audio

17:24.000 --> 17:26.000
would make the most sense,

17:26.000 --> 17:30.000
because you will listen to how engineers communicate with each other,

17:30.000 --> 17:34.000
and they will be as much as possible calm, right?

17:34.000 --> 17:37.000
So, at this point, they have five minutes to put the car on the ground, right?

17:37.000 --> 17:39.000
So, they're finishing up the suspension,

17:39.000 --> 17:42.000
they're putting the nose on, they have to put the tire on,

17:42.000 --> 17:46.000
they have to bolt it, it's not only about having the suspension on.

17:46.000 --> 17:49.000
Someone will be driving this car for 60, 70 laps,

17:49.000 --> 17:51.000
over 300 kilometers per hour.

17:51.000 --> 17:54.000
So, if they mess up, the guy will crash at 300 kilometers an hour

17:55.000 --> 17:56.000
against the wall.

17:56.000 --> 17:59.000
So, they not only need to make it happen,

17:59.000 --> 18:01.000
they need to make it happen with extreme precision,

18:01.000 --> 18:03.000
so that they don't kill anyone, right?

18:03.000 --> 18:06.000
And you will see that, in some of these communications,

18:06.000 --> 18:08.000
there's someone who's communicating with the FIA,

18:08.000 --> 18:10.000
which is a regulation, but regulatory body,

18:10.000 --> 18:12.000
and he's constantly asking for feedback.

18:12.000 --> 18:14.000
And it's super calm, guys.

18:14.000 --> 18:16.000
We have five minutes.

18:16.000 --> 18:17.000
How are we on track?

18:17.000 --> 18:18.000
Are we not on track?

18:18.000 --> 18:20.000
Super calm, not like pressure,

18:20.000 --> 18:23.000
shouting, and all this bugging.

18:23.000 --> 18:27.000
So, during incidents, we need to make the best

18:27.000 --> 18:31.000
on thing-compose and performing under pressure, right?

18:31.000 --> 18:34.000
If we're calm, we can think better, right?

18:34.000 --> 18:36.000
I know it's easier said than done,

18:36.000 --> 18:39.000
but it will allow us to make it better, better decisions

18:39.000 --> 18:40.000
than just being stressed, right?

18:40.000 --> 18:43.000
Calm teams will better, like, a calm atmosphere

18:43.000 --> 18:45.000
if I'm not pressuring my colleagues, I'm asking, of course,

18:45.000 --> 18:47.000
we're up late, I need to understand what's going on,

18:47.000 --> 18:49.000
but I'm not shouting, I'm not rushing, I'm not bugging him,

18:49.000 --> 18:51.000
every five seconds they'll, is it fixed now?

18:51.000 --> 18:52.000
Is it fixed now?

18:52.000 --> 18:53.000
Is it fixed now?

18:53.000 --> 18:54.000
That's what we want.

18:54.000 --> 18:55.000
Right?

18:55.000 --> 18:57.000
We have to stay tough, air-cooled, right?

18:57.000 --> 19:02.000
So, we need to build this mental strength to allow us to,

19:02.000 --> 19:06.000
as much as possible, keep some cool, right?

19:06.000 --> 19:09.000
And although we need to make quick decisions,

19:09.000 --> 19:12.000
we also need to, we need to strike a balance between

19:12.000 --> 19:17.000
quick and too quick, like, try to not rush a lot, right?

19:17.000 --> 19:20.000
So, a calm mind allows us to make better decisions during incidents,

19:21.000 --> 19:23.000
and we have to train to handle this stress incidents.

19:23.000 --> 19:27.000
So, the best way to stay stuff, to stay tough, to stay tough,

19:27.000 --> 19:31.000
of course, in incidents, but you train them before, right?

19:31.000 --> 19:32.000
Stuff like, chaos engineering, like,

19:32.000 --> 19:35.000
for a game day is where you actually break stuff on purpose

19:35.000 --> 19:38.000
and try to test your processes or, like,

19:38.000 --> 19:40.000
building that muscle where, when you have an incident,

19:40.000 --> 19:43.000
I've seen this before, I know what to do, right?

19:43.000 --> 19:50.000
Technical proficiency is also super important for resolving an incident.

19:50.000 --> 19:55.000
So, here's an example of some, so this is at the start,

19:55.000 --> 19:58.000
so they still have 20 minutes, and they're basically just discussing.

19:58.000 --> 20:00.000
Can we even do this, right?

20:00.000 --> 20:05.000
Can we do, can we change this suspension in under 22 minutes?

20:05.000 --> 20:08.000
And they basically say, we have to do it faster that we ever did before.

20:08.000 --> 20:09.000
We never did it this fast, right?

20:09.000 --> 20:11.000
And they, again, they train those things, like,

20:11.000 --> 20:13.000
how fast can I change this suspension, right?

20:13.000 --> 20:15.000
And they've never done it before.

20:15.000 --> 20:17.000
So, it's all about technical proficiency.

20:17.000 --> 20:20.000
Do the people that go into an incident, actually,

20:20.000 --> 20:22.000
know the frameworks that they're using.

20:22.000 --> 20:24.000
Know the service, know the whole context.

20:24.000 --> 20:27.000
They know what the trade-offs are, right?

20:27.000 --> 20:29.000
So, this is super important, right?

20:29.000 --> 20:32.000
So, strong skills allow us for efficient diagnosis

20:32.000 --> 20:34.000
and resolution of a problem, right?

20:34.000 --> 20:36.000
So, skills actually matter.

20:36.000 --> 20:39.000
So, people need strong skills, not only about the programming language,

20:39.000 --> 20:42.000
the frameworks that we're using, but the whole context of the company, right?

20:42.000 --> 20:44.000
Why is this service configured this way, right?

20:44.000 --> 20:45.000
So, there is some context.

20:45.000 --> 20:48.000
So, over time, we need to get that context.

20:48.000 --> 20:50.000
Knowledge is power, right?

20:50.000 --> 20:56.000
So, the more the team knows, the more the team will be capable of understanding what the problem is and how to fix it.

20:56.000 --> 20:58.000
And of course, we need to stay sharp, right?

20:58.000 --> 21:01.000
So, we need to keep up to date with both the industry,

21:01.000 --> 21:04.000
but also the how things are in our company.

21:04.000 --> 21:08.000
So, stronger skills mean that we can be stronger at an incident response.

21:08.000 --> 21:12.000
And of course, the right tools will empower our teams to resolve incidents effectively.

21:12.000 --> 21:17.000
And last but not least, post-modern.

21:17.000 --> 21:20.000
So, this is the only video that is slightly different from the others,

21:20.000 --> 21:23.000
because they don't show the post-incident review,

21:23.000 --> 21:25.000
post-modern, whatever you call it in your organization.

21:25.000 --> 21:29.000
This is just an example of Mercedes actually going through

21:29.000 --> 21:31.000
how they make some decisions after a race.

21:31.000 --> 21:33.000
So, I'll leave it a link at the end.

21:33.000 --> 21:36.000
So, Mercedes has some nice videos where they analyze the race

21:36.000 --> 21:39.000
and they share with the whole public saying,

21:39.000 --> 21:42.000
we did this decision, this didn't pan out.

21:42.000 --> 21:44.000
I won't step back too much time,

21:44.000 --> 21:46.000
because we don't have audio here.

21:46.000 --> 21:48.000
But a post-modern, more than anything,

21:48.000 --> 21:52.000
it's all about learning from incidents to prevent future problems.

21:52.000 --> 21:54.000
And that's the critical part, right?

21:54.000 --> 21:57.000
Post-mortems are about learning.

21:57.000 --> 21:59.000
Nothing else, right?

21:59.000 --> 22:01.000
So, you need to use them for learning.

22:01.000 --> 22:03.000
So, how do you learn?

22:03.000 --> 22:04.000
You try to find the root cause, right?

22:04.000 --> 22:07.000
So, I have an incident, I mitigated or fixed it.

22:07.000 --> 22:09.000
What causes this?

22:09.000 --> 22:12.000
What led to was having this incident?

22:12.000 --> 22:14.000
The idea is to learn from a sex.

22:14.000 --> 22:15.000
Something didn't go well.

22:15.000 --> 22:17.000
Let's go through the problem.

22:17.000 --> 22:18.000
Let's see what happened.

22:18.000 --> 22:20.000
And what did it go well, right?

22:20.000 --> 22:22.000
So, in the case of a car like this max,

22:22.000 --> 22:24.000
it was this driver error.

22:24.000 --> 22:25.000
It did it break too late.

22:25.000 --> 22:27.000
There was some problem with a car.

22:27.000 --> 22:30.000
So, trying to understand so that it doesn't happen again.

22:30.000 --> 22:32.000
Super, super critical.

22:32.000 --> 22:33.000
No blame.

22:33.000 --> 22:34.000
This is critical.

22:34.000 --> 22:36.000
If you assign blame during a post-mortem,

22:36.000 --> 22:38.000
people will shut up the next time.

22:38.000 --> 22:39.000
They won't speak, right?

22:39.000 --> 22:41.000
It's super important that you're learning.

22:41.000 --> 22:42.000
This is a learning process.

22:42.000 --> 22:46.000
It's not about the race driver that broke too late or

22:46.000 --> 22:48.000
there was a problem because someone didn't fit it.

22:48.000 --> 22:49.000
And it's okay.

22:49.000 --> 22:53.000
Yes, maybe it doesn't matter, right?

22:53.000 --> 22:55.000
Post-mortems, post-incident reviews.

22:55.000 --> 22:57.000
Whatever you call it in your organization,

22:57.000 --> 23:00.000
will have teams learn from incidents and get better every time.

23:00.000 --> 23:02.000
So, the next time that it happens,

23:02.000 --> 23:04.000
we know, so we will not make the same mistake again.

23:04.000 --> 23:07.000
And of course, we need to fix on fixing problems,

23:07.000 --> 23:08.000
not finding fault.

23:08.000 --> 23:10.000
I don't stress this enough, right?

23:10.000 --> 23:14.000
Please don't point finger points.

23:14.000 --> 23:17.000
So, why is this all of this important?

23:17.000 --> 23:18.000
Right?

23:18.000 --> 23:19.000
I think that today, what do you want?

23:19.000 --> 23:21.000
We want to prevent incidents.

23:21.000 --> 23:23.000
So, we need to learn from incidents,

23:23.000 --> 23:26.000
but if possible, what we don't want are incidents, right?

23:26.000 --> 23:28.000
So, we don't want them at all.

23:28.000 --> 23:31.000
When we do have them, we want to make them mitigate them

23:31.000 --> 23:35.000
and fix them as fast as possible, as soon as possible, right?

23:35.000 --> 23:39.000
And ideally, what we want to do is to avoid them to happen again.

23:39.000 --> 23:41.000
So, having the incident over and over again,

23:41.000 --> 23:43.000
it's costly has a lot of problems.

23:43.000 --> 23:47.000
Unfortunately, many organizations don't have many of these practices in mind.

23:47.000 --> 23:49.000
So, they don't practice these things.

23:49.000 --> 23:52.000
And the idea for this talk is to show like,

23:52.000 --> 23:54.000
one of the top engineering practices in the world,

23:54.000 --> 23:56.000
they do it, and why do they do it?

23:56.000 --> 23:58.000
Because it happens, because it helps, right?

23:58.000 --> 24:00.000
It actually works, right?

24:00.000 --> 24:04.000
So, and then you might ask, okay, where should I start?

24:04.000 --> 24:07.000
And my answer is always post-mortems, right?

24:07.000 --> 24:10.000
It's one of those things where, if done correctly,

24:10.000 --> 24:12.000
done with the learning mindset,

24:12.000 --> 24:14.000
it will feed on all the others.

24:14.000 --> 24:17.000
Because during a post-mortem, you will understand, okay,

24:17.000 --> 24:20.000
the incident took longer, because we didn't have clear communication.

24:20.000 --> 24:22.000
The incident took longer because we had engineers on call

24:22.000 --> 24:26.000
that actually were knowledgeable about that service.

24:26.000 --> 24:28.000
Maybe we need some kind of process, right?

24:28.000 --> 24:31.000
Because there was a lot of confusion.

24:31.000 --> 24:33.000
Some people didn't know what they needed to do.

24:33.000 --> 24:35.000
We had two engineers working on the same thing at the same time.

24:35.000 --> 24:38.000
So, post-mortems will actually inform all the others.

24:38.000 --> 24:40.000
So, if you have to start somewhere,

24:40.000 --> 24:43.000
you probably should start with post-mortems, right?

24:43.000 --> 24:47.000
And probably the one that it's easier to adopt

24:47.000 --> 24:51.000
that will get you the biggest man for a buck is clear processes.

24:51.000 --> 24:53.000
There are a lot of ETL frameworks out there.

24:53.000 --> 24:55.000
You don't need to reinvent your own.

24:55.000 --> 24:57.000
Just from board one, try it.

24:57.000 --> 24:59.000
Make the adjustment that you need, right?

24:59.000 --> 25:01.000
So, there's a lot of industry literature,

25:01.000 --> 25:05.000
where you can just read and say, okay, here is a framework

25:05.000 --> 25:08.000
that I can just follow that will help us a lot.

25:08.000 --> 25:11.000
So, you might be asking, all of this for what?

25:11.000 --> 25:15.000
So, what did they achieve in 22 minutes?

25:15.000 --> 25:19.000
Again, the video has a little bit of a dramatic audio,

25:19.000 --> 25:21.000
but...

25:21.000 --> 25:23.000
So, you've done it.

25:23.000 --> 25:27.000
So, they went in under 22 minutes from a broken car

25:27.000 --> 25:31.000
that wasn't able to race to actually fighting for podium

25:31.000 --> 25:33.000
and finishing P2, right?

25:33.000 --> 25:36.000
So, not only they managed to get the car on track,

25:36.000 --> 25:40.000
they managed to give him a car to overtake five cars

25:40.000 --> 25:44.000
at this point max rose to third in the championship

25:44.000 --> 25:47.000
and redwood rose to second in the championship.

25:47.000 --> 25:50.000
So, if they didn't do any of this,

25:50.000 --> 25:52.000
if all of this wasn't successful,

25:52.000 --> 25:55.000
we probably have dropped in the driver's championship

25:55.000 --> 25:58.000
and redwood should have dropped in the...

25:58.000 --> 26:00.000
Constructs as championship.

26:00.000 --> 26:02.000
So, in any two minutes, what people...

26:02.000 --> 26:05.000
Miracle, yeah, maybe an exaggeration,

26:05.000 --> 26:07.000
but again, they had never done it before,

26:07.000 --> 26:10.000
and this allowed to redwood to continue in the race

26:10.000 --> 26:12.000
and actually fight for the championship.

26:12.000 --> 26:14.000
They didn't win it, right?

26:14.000 --> 26:16.000
Max finished the third actually this championship,

26:16.000 --> 26:18.000
but he won next year.

26:18.000 --> 26:21.000
But this was the role that I needed to go to actually...

26:21.000 --> 26:23.000
to actually get there.

26:23.000 --> 26:25.000
So, just a few links.

26:25.000 --> 26:28.000
So, the first one is for this actual incident.

26:28.000 --> 26:31.000
It's more, it's roughly eight minutes, so it's a fast way.

26:31.000 --> 26:33.000
So, if you guys want to follow the whole process

26:33.000 --> 26:34.000
and what was happening.

26:34.000 --> 26:36.000
So, if you want more information on this...

26:36.000 --> 26:39.000
Grand Prix itself, it's the second link.

26:39.000 --> 26:43.000
And the last one is the link for the Mercedes AMG.

26:43.000 --> 26:44.000
YouTube channel.

26:44.000 --> 26:46.000
They have a lot of post-incident reviews

26:46.000 --> 26:48.000
where they go through what they like.

26:48.000 --> 26:50.000
We made this decision, but we messed up.

26:50.000 --> 26:54.000
And if you search the internet a little bit, you will see some...

26:54.000 --> 26:57.000
So, they don't show the incident review from each team,

26:57.000 --> 26:59.000
because they will be discussing confidential things,

26:59.000 --> 27:01.000
but there are some videos that are zoomed out,

27:01.000 --> 27:03.000
and you can only see, and it's intense.

27:03.000 --> 27:06.000
Like, they go to huge scrutiny

27:06.000 --> 27:10.000
to actually understand why did we mess up on this race.

27:10.000 --> 27:11.000
Was it a problem with the drivers?

27:11.000 --> 27:13.000
Did we mess up the strategy?

27:13.000 --> 27:16.000
Was that a problem with some part of the car?

27:16.000 --> 27:18.000
They go into an extreme detail.

27:18.000 --> 27:21.000
Mercedes has this habit of actually explaining some of those decisions.

27:21.000 --> 27:23.000
Like, we messed up on the strategy, right?

27:23.000 --> 27:27.000
We made a pit stop at or like, and they go through, okay?

27:27.000 --> 27:30.000
This is why we did that, and that actually didn't work.

27:30.000 --> 27:32.000
So, it's actually a good learning experience

27:32.000 --> 27:35.000
where not only they do it internally, but they actually do it

27:35.000 --> 27:37.000
with the whole public, as I was saying.

27:37.000 --> 27:39.000
Guys, we messed up.

27:39.000 --> 27:40.000
This was our reasoning.

27:40.000 --> 27:41.000
This isn't work.

27:41.000 --> 27:43.000
This is how things work.

27:43.000 --> 27:45.000
And this is all from my part.

27:45.000 --> 27:46.000
Thank you very much for being here.

27:46.000 --> 27:48.000
And if you ask any questions,

27:48.000 --> 27:49.000
hello, hello.

27:49.000 --> 27:51.000
Is it open?

27:51.000 --> 27:52.000
Yeah.

27:52.000 --> 27:55.000
Can everyone hear me in the back?

27:55.000 --> 27:57.000
It's not all over the country.

27:57.000 --> 27:58.000
Yeah, yeah, yeah.

27:58.000 --> 28:00.000
It's not all over the country.

28:00.000 --> 28:03.000
So, in terms of communication, how do you, like,

28:03.000 --> 28:04.000
when that happens?

28:04.000 --> 28:05.000
Yeah.

28:05.000 --> 28:06.000
Yeah.

28:06.000 --> 28:08.000
Can everyone hear me in the back?

28:08.000 --> 28:10.000
It's not all over the country.

28:10.000 --> 28:11.000
Yeah, yeah.

28:11.000 --> 28:12.000
It's not all over the country.

28:12.000 --> 28:15.000
So, in terms of communication,

28:15.000 --> 28:18.000
how do you, like, when a separate one is a range of zero

28:18.000 --> 28:20.000
incident happens?

28:20.000 --> 28:22.000
How do you, you know, notify people?

28:22.000 --> 28:23.000
Like, is it an alert?

28:23.000 --> 28:24.000
Is it you?

28:24.000 --> 28:26.000
Notifying people in the channel?

28:26.000 --> 28:27.000
Yeah.

28:27.000 --> 28:28.000
Yeah.

28:28.000 --> 28:29.000
Yeah.

28:29.000 --> 28:30.000
Oh, my one.

28:30.000 --> 28:32.000
So, the question is, if you have a set one,

28:32.000 --> 28:34.000
incident, how do you alert people?

28:34.000 --> 28:35.000
Yeah.

28:35.000 --> 28:36.000
Yeah.

28:36.000 --> 28:39.000
So, you want it as much automated as possible, right?

28:39.000 --> 28:41.000
But it won't happen every time.

28:41.000 --> 28:44.000
So, what you want to do,

28:44.000 --> 28:46.000
it's over time, start with a problem.

28:46.000 --> 28:48.000
Maybe you have an incident, and you didn't have any alert, right?

28:48.000 --> 28:50.000
Maybe a customer told you, maybe you,

28:50.000 --> 28:52.000
some ZP, he was actually browsing your side,

28:52.000 --> 28:53.000
found it out, right?

28:53.000 --> 28:55.000
And that's the worst, right?

28:55.000 --> 28:57.000
So, you want to do it over time, you wanted to make that,

28:57.000 --> 28:59.000
please leave quietly.

28:59.000 --> 29:01.000
What you want to do is to make that automated as possible.

29:01.000 --> 29:02.000
Start from corporate, right?

29:02.000 --> 29:05.000
So, maybe you found out that a certain cohort of customers

29:05.000 --> 29:07.000
that do some predefined action in some time,

29:07.000 --> 29:08.000
that fails, okay?

29:08.000 --> 29:09.000
Can I actually replicate it?

29:09.000 --> 29:12.000
So, you have a learning on top of the things that you already have,

29:12.000 --> 29:14.000
and somebody that to me and my team

29:14.000 --> 29:17.000
has been working a lot is on synthetic monitoring, right?

29:17.000 --> 29:19.000
So, trying to simulate users all the time, right?

29:19.000 --> 29:24.000
Making that explicitly clear and try to pick up things as soon as possible.

29:24.000 --> 29:25.000
All right.

29:25.000 --> 29:26.000
Thank you.

29:26.000 --> 29:29.000
Any more questions?

29:29.000 --> 29:31.000
What?

29:31.000 --> 29:35.000
Please, again, leave quietly,

29:35.000 --> 29:40.000
if you have to leave in the library.

29:40.000 --> 29:41.000
I don't know.

29:41.000 --> 29:42.000
I'll let it out, okay?

29:42.000 --> 29:43.000
All right.

29:43.000 --> 29:46.000
So, in the stakeholder phase,

29:46.000 --> 29:49.000
you said that there should be always procedures

29:49.000 --> 29:52.000
for, would say, most of the emergency,

29:52.000 --> 29:54.000
but for example, in our environments,

29:54.000 --> 29:58.000
it could be that there can be different things

29:58.000 --> 30:00.000
that we haven't even thought about.

30:00.000 --> 30:03.000
How should we be able to stay calm,

30:03.000 --> 30:05.000
if we don't have procedures on that?

30:05.000 --> 30:06.000
Yeah.

30:06.000 --> 30:09.000
And how to prepare for things like that?

30:09.000 --> 30:10.000
Yeah.

30:10.000 --> 30:11.000
Yeah.

30:11.000 --> 30:15.000
So, the question was,

30:15.000 --> 30:20.000
if you have to leave, please do so quietly.

30:20.000 --> 30:25.000
So, the question was about processes and having processes

30:25.000 --> 30:27.000
and on the stay calm phase,

30:27.000 --> 30:30.000
what if something happens that I've never seen before?

30:30.000 --> 30:32.000
Like, I've never prepared before, right?

30:32.000 --> 30:34.000
So, how do you deal with that, right?

30:34.000 --> 30:37.000
Again, what our teams usually do,

30:38.000 --> 30:41.000
they do a lot of game days and chaos testing, right?

30:41.000 --> 30:43.000
So, we're basically putting our teams

30:43.000 --> 30:46.000
not on the stress, but regularly saying,

30:46.000 --> 30:48.000
here's a new thing, right?

30:48.000 --> 30:52.000
So, here's something that they're used to handling.

30:52.000 --> 30:57.000
They're not used to handling on a Tuesday at 5 PM, right?

30:57.000 --> 30:59.000
So, we're basically keeping that,

30:59.000 --> 31:01.000
so it's all about one,

31:01.000 --> 31:04.000
introducing things that will likely happen,

31:04.000 --> 31:06.000
so they are getting accustomed,

31:06.000 --> 31:08.000
so it's about just introducing things,

31:08.000 --> 31:10.000
just introducing stuff, right?

31:10.000 --> 31:12.000
So, when something new happens,

31:12.000 --> 31:14.000
that they never seen before, and I was like,

31:14.000 --> 31:17.000
okay, it's just another day at the office, right?

31:17.000 --> 31:18.000
So, then we'll see.

31:18.000 --> 31:19.000
And only it's like,

31:19.000 --> 31:21.000
if then using the post-mortem is okay,

31:21.000 --> 31:23.000
we've never seen this before,

31:23.000 --> 31:25.000
what can we do to not have it,

31:25.000 --> 31:26.000
or to make it easier?

31:26.000 --> 31:28.000
Do we need better observability?

31:28.000 --> 31:30.000
Do we actually need, maybe it's the process,

31:30.000 --> 31:31.000
we need to refine the process,

31:31.000 --> 31:32.000
maybe it's communication,

31:32.000 --> 31:33.000
maybe we need to train for this,

31:33.000 --> 31:35.000
but even if we have all the automation,

31:35.000 --> 31:36.000
all the scripts, all the buttons,

31:36.000 --> 31:38.000
whatever, it's still hard,

31:38.000 --> 31:39.000
but if we train this,

31:39.000 --> 31:41.000
like, let's on a Tuesday,

31:41.000 --> 31:43.000
let's break the system and try to fix this

31:43.000 --> 31:45.000
on a flight one purpose, right?

31:45.000 --> 31:48.000
So, it's all about that thing where it will happen,

31:48.000 --> 31:49.000
right?

31:49.000 --> 31:51.000
It will happen, something that you never prepare for,

31:51.000 --> 31:52.000
it's the nature of business,

31:52.000 --> 31:56.000
it's keeping that muscle alive of continuously

31:56.000 --> 31:58.000
breaking stuff and fixing it,

31:58.000 --> 31:59.000
and making it like,

31:59.000 --> 32:01.000
it's just another day at the office.

32:02.000 --> 32:04.000
Any more questions?

32:04.000 --> 32:05.000
Yep.

32:13.000 --> 32:16.000
Have you had challenges at your current employer

32:16.000 --> 32:18.000
of convincing management to,

32:18.000 --> 32:20.000
carve out specific times of duties,

32:20.000 --> 32:23.000
practices these chaos engineering type things?

32:23.000 --> 32:24.000
Absolutely.

32:24.000 --> 32:26.000
How did you navigate them?

32:26.000 --> 32:27.000
Yeah.

32:27.000 --> 32:29.000
So, one of the challenges that you had specifically

32:29.000 --> 32:31.000
when you're talking about product companies

32:31.000 --> 32:33.000
that work with end user, right?

32:33.000 --> 32:35.000
So, the first time that you talked to someone

32:35.000 --> 32:36.000
higher up saying,

32:36.000 --> 32:38.000
I'm going to break my system on purpose

32:38.000 --> 32:39.000
and they're like,

32:39.000 --> 32:40.000
no, you're not.

32:40.000 --> 32:42.000
So, you are not doing this.

32:42.000 --> 32:44.000
So, the, again,

32:44.000 --> 32:45.000
being completely transparent,

32:45.000 --> 32:47.000
the best strategy that we found

32:47.000 --> 32:50.000
is to actually somehow correlate this

32:50.000 --> 32:52.000
to a number.

32:52.000 --> 32:53.000
It could be a metric,

32:53.000 --> 32:55.000
it could be a dollar value, right?

32:55.000 --> 32:57.000
So, one of the practice that we use

32:58.000 --> 33:01.000
or to make it crystal clear that chaos engineering is important,

33:01.000 --> 33:02.000
is saying, okay,

33:02.000 --> 33:05.000
let's average out all the set ones that we have

33:05.000 --> 33:07.000
and let's make a wild gas estimation

33:07.000 --> 33:09.000
how much did this cost, right?

33:09.000 --> 33:10.000
And it cost us,

33:10.000 --> 33:12.000
again, in our context,

33:12.000 --> 33:15.000
a minute of downtime depending on the incident

33:15.000 --> 33:17.000
and on the time of day,

33:17.000 --> 33:19.000
could be costing the company millions, right?

33:19.000 --> 33:21.000
So, millions of dollars.

33:21.000 --> 33:23.000
So, if we somehow say, okay,

33:23.000 --> 33:24.000
we can train the team better

33:24.000 --> 33:26.000
or we can invest in this tool, right?

33:26.000 --> 33:29.000
And instead of taking us to our success,

33:29.000 --> 33:30.000
it will take us 10 minutes.

33:30.000 --> 33:32.000
It makes the conversation more easier.

33:32.000 --> 33:34.000
There's a lot of upfront work

33:34.000 --> 33:36.000
that we'll have to do to actually

33:36.000 --> 33:38.000
analyze data,

33:38.000 --> 33:40.000
giving those higher number values.

33:40.000 --> 33:42.000
But it's making that understand

33:42.000 --> 33:46.000
especially for product and customer facing the value.

33:46.000 --> 33:48.000
And the best thing that we've used

33:48.000 --> 33:50.000
that with success with executives

33:50.000 --> 33:51.000
is putting a dollar amount.

33:51.000 --> 33:52.000
It's money.

33:52.000 --> 33:54.000
At the end of the day, for a product company,

33:54.000 --> 33:55.000
it will be money.

33:55.000 --> 33:57.000
And if you can do that correlation,

33:57.000 --> 33:59.000
people say, okay, I got this, right?

33:59.000 --> 34:01.000
I have no idea what we're talking about,

34:01.000 --> 34:02.000
open telemetry,

34:02.000 --> 34:04.000
but if you're telling me that instead of taking two hours,

34:04.000 --> 34:06.000
you can do six to ten minutes,

34:06.000 --> 34:07.000
go for it.

34:07.000 --> 34:08.000
And it will cost you somehow,

34:08.000 --> 34:09.000
I don't know, two million dollars to do.

34:09.000 --> 34:10.000
Yes.

34:10.000 --> 34:11.000
If I have a 10 minute outage,

34:11.000 --> 34:13.000
it's costing me 20 million,

34:13.000 --> 34:14.000
no brainer, right?

34:14.000 --> 34:16.000
So, it's somehow getting to the point

34:16.000 --> 34:18.000
where you're talking a language

34:18.000 --> 34:19.000
that's an executive,

34:19.000 --> 34:21.000
understand and it's usually about money.

34:21.000 --> 34:22.000
At the end of the day.

34:23.000 --> 34:24.000
Anyways.

34:24.000 --> 34:25.000
I think we have one question.

34:25.000 --> 34:27.000
No, no, no, no, no, no, no, no, no, no, no.

