WEBVTT

00:00.000 --> 00:14.040
So, for the next talk, let me introduce Victor Luboslowski with a talk about an end-point elementary

00:14.040 --> 00:16.000
blueprint for security teams.

00:16.000 --> 00:18.000
All right.

00:18.000 --> 00:23.000
Thank you.

00:23.000 --> 00:27.000
Welcome to my talk on end-point telemetry.

00:27.000 --> 00:29.000
Like, I mentioned my name is Victor Luboslowski.

00:29.000 --> 00:32.800
I've been in tech for over 25 years and I'm currently a principal software engineer at

00:32.800 --> 00:33.800
Fleet Device Management.

00:33.800 --> 00:38.600
I'm also the tech lead for our security and compliance product team.

00:38.600 --> 00:42.600
Fleet would make end-point telemetry for corporate security teams and device management

00:42.600 --> 00:45.000
for corporate IT teams.

00:45.000 --> 00:49.400
So, I need to start with a confession, until recently I wasn't a security person.

00:49.400 --> 00:55.000
I'm a software engineer, I like building things and for a long time security felt intimidating.

00:55.000 --> 01:01.600
It's felt complex, hard to test and constantly in the way of getting real work done.

01:01.600 --> 01:06.000
And every time I try to learn it, someone tried to sell me something.

01:06.000 --> 01:08.000
I just want life to be simple.

01:08.000 --> 01:13.000
I want to write code, I want to show features, I want to understand the systems I'm responsible

01:13.000 --> 01:19.000
for without dragging around the pile of security baggage I can't fully grasp.

01:19.000 --> 01:24.000
So when I became the tech lead for a security product team, I realized something revealing.

01:24.000 --> 01:28.000
If security feels like this to me, it's probably feels like this to a lot of engineers.

01:28.000 --> 01:31.000
So the question for me wasn't how to add more security.

01:31.000 --> 01:37.000
It was how to make security something engineers can actually inspect and understand.

01:37.000 --> 01:43.000
One of the core problems in endpoint security today is a mismatch between responsibility and ownership.

01:43.000 --> 01:47.000
Engineering and security teams are responsible for incidents.

01:47.000 --> 01:52.000
They're the ones on call, they're the ones doing post-mortems, but they don't own the telemetry.

01:52.000 --> 01:59.000
They don't control what's collected, they don't control the schema, they don't control retention or replay of how detections are expressed.

01:59.000 --> 02:05.000
They're accountable for outcomes, but dependent often dependent on black boxes for evidence.

02:05.000 --> 02:13.000
Now an open source, we're used to owning our systems and to end if something breaks, we can inspect it.

02:13.000 --> 02:18.000
If we don't like the defaults, we can change them.

02:18.000 --> 02:24.000
If we want to experiment, we can, but in security, we've somehow accepted the opposite.

02:24.000 --> 02:30.000
So when we talk about endpoint telemetry, we're talking about what machines already know about themselves.

02:30.000 --> 02:36.000
Machines such as laptops, servers, point of sale terminals of the store and many others.

02:36.000 --> 02:44.000
Endpoints generate some of the richest signals we have, processes, files, network connections, start a behavior, et cetera.

02:45.000 --> 02:57.000
And yet teams often only see a filtered summary of the activity.

02:57.000 --> 03:04.000
The raw data exists, the endpoints know it, but the engineer's responsible for security don't control it that data.

03:04.000 --> 03:08.000
What's collected, how it's represented, or how long it's kept.

03:09.000 --> 03:16.000
This, that mismatch changes when endpoints are treated as systems we can interrogate, not opaque components controlled by a vendor.

03:16.000 --> 03:22.000
And once you do that, you need the way to manage those questions at scale safely, consistently and transparently.

03:22.000 --> 03:31.000
So let's take a look at what endpoint telemetry looks like when ownership is put back in the hands of the security team that's responsible for it.

03:31.000 --> 03:36.000
If you're responsible for security telemetry, you're really responsible for a system.

03:36.000 --> 03:41.000
And just like any system, it's easier to reason about when responsibilities are clearly separated.

03:41.000 --> 03:48.000
The goal of this blueprint is simple, make each concern independent enough so you can change it without breaking everything else.

03:48.000 --> 03:52.000
Each layer should be swappable without collapsing the system.

03:52.000 --> 03:56.000
So everything in the system starts at the endpoint.

03:56.000 --> 04:00.000
They're not a layer on their own, they're the execution service for everything that follows.

04:00.000 --> 04:03.000
From their endpoint telemetry breaks into four concerns.

04:04.000 --> 04:07.000
The first layer is controlled, this is about intent.

04:07.000 --> 04:12.000
What questions are we asking endpoints and how do we change those questions safely and consistently?

04:12.000 --> 04:15.000
The second layer is ingestion streaming and storage.

04:15.000 --> 04:21.000
Once they deletes the endpoint, this layer is responsible for moving and retaining the data reliably.

04:21.000 --> 04:27.000
The third layer is detection, this is where we decide what matters, turning the raw events into signals.

04:27.000 --> 04:32.000
And the final layer is correlation, correlation, intelligence and response.

04:32.000 --> 04:36.000
This is where humans and automation investigate visualize and act.

04:36.000 --> 04:42.000
Each layer has a responsibility and the goal is that you can change one layer without breaking the rest.

04:46.000 --> 04:51.000
The real value here isn't the layers themselves, it's the boundaries between them.

04:51.000 --> 04:59.000
Each layer has a clear contract, it defines what it consumes, what it produces, and just as importantly what it does not care about.

04:59.000 --> 05:08.000
For example, the ingestion layer doesn't need to know which detections exist, it's job is to move the data reliably, it shouldn't change when detection logic changes.

05:08.000 --> 05:16.000
And the detection layer doesn't care how data was transported, it shouldn't need to know whether the events came from a buffer, a stream or a file.

05:16.000 --> 05:20.000
When those boundaries are respected, each layer can evolve independently.

05:20.000 --> 05:24.000
That's what gives you the flexibility without chaos.

05:24.000 --> 05:26.000
This is a very open source way of building systems.

05:26.000 --> 05:31.000
Clear interfaces, explicit responsibilities, no hidden coupling.

05:31.000 --> 05:33.000
This is how we restore ownership.

05:33.000 --> 05:42.000
When responsibilities are clear and boundaries are explicit, team can own their telemetry and to end without needing to own every implementation detail.

05:42.000 --> 05:47.000
Now let's walk through these layers, one by one, starting at the endpoint agents.

05:47.000 --> 05:55.000
For the endpoint agent, we're using OSquared, OSquared is an open source project that's been around for over a decade and is part of the Linux Foundation.

05:55.000 --> 06:01.000
It runs a lightweight agent on laptops and servers across macOS, Linux and Windows platforms.

06:01.000 --> 06:07.000
At the high level OSquared lets you interrogate endpoints using structured questions.

06:07.000 --> 06:12.000
OSquared makes endpoint data accessible without forcing you to write custom programs.

06:12.000 --> 06:18.000
You don't need different commands, formats or OS specific knowledge to answer basic questions.

06:18.000 --> 06:26.000
System state is expressed through a single SQL interface using concepts like tables and schemas and engineers already know about.

06:26.000 --> 06:34.000
If you can write a basic query, you can start asking meaningful questions about endpoints, that lowers the barrier without dumbing anything down.

06:34.000 --> 06:38.000
What makes this model work is that everything shows up in the same shape.

06:38.000 --> 06:47.000
Data from files, operating system API, system services, and event sources is normalized, because the data is structured and consistent, you can combine it.

06:47.000 --> 06:54.000
That's what turns endpoint telemetry from scattered signals into something you can actually explain to other people.

06:54.000 --> 06:57.000
So this is a real OSquared query.

06:57.000 --> 07:00.000
At a high level, this is asking a simple question.

07:00.000 --> 07:04.000
Show me SSH processes that are listening on a non-standard port.

07:04.000 --> 07:09.000
The key point isn't here, isn't the syntax, it's what it represents.

07:09.000 --> 07:15.000
Process information and network information already normalized so they can be combined directly.

07:16.000 --> 07:20.000
OSquared has around 300 tables in the core project alone.

07:20.000 --> 07:27.000
The cover things like processes, users start to behave your files, network connections, packages, hardware, and system configuration.

07:27.000 --> 07:35.000
And the model is extensible, people create their own tables and expose application specific or custom system data.

07:35.000 --> 07:41.000
That's what makes OSquared a foundation you can build on, not a fixed set of signals.

07:41.000 --> 07:46.000
So in this blueprint, OSquared is an agent that sends its data to the control layer.

07:46.000 --> 07:48.000
It focuses on observation.

07:48.000 --> 07:55.000
OSquared doesn't dictate how data is moved, stored, or analyzed, it just makes high fidelity endpoint data available.

07:55.000 --> 07:59.000
We choose OSquared because it aligns with open source values.

07:59.000 --> 08:03.000
It's transparent, it's inspectable, it schemas are open.

08:03.000 --> 08:08.000
If something doesn't behave the way you expect, you can see why and change how you use it.

08:08.000 --> 08:12.000
That gives teams real ownership over their telemetry.

08:12.000 --> 08:17.000
Now there are other ways to collect data from endpoints without a unified agent model.

08:17.000 --> 08:21.000
Some teams rely on scripts and schedule commands.

08:21.000 --> 08:25.000
Others use lock shippers like fluent bit or file beat.

08:25.000 --> 08:30.000
And at the lowest level platforms expose system specific APIs.

08:30.000 --> 08:33.000
Now all of these approaches can work.

08:33.000 --> 08:36.000
They just shift the complexity onto the team.

08:36.000 --> 08:38.000
OSquared sits above all of this.

08:38.000 --> 08:48.000
It integrates these data sources and exposes a consistent, queryable view of system state across platforms using one interface.

08:48.000 --> 08:51.000
OSquared answers one important question.

08:51.000 --> 08:55.000
How do we observe endpoints in a way engineers can understand and trust?

08:55.000 --> 08:57.000
The next question is just as important.

08:57.000 --> 09:01.000
How do we manage that safely and consistently at scale?

09:01.000 --> 09:04.000
That's where a control layer comes in.

09:04.000 --> 09:06.000
The control layer is about intent.

09:06.000 --> 09:10.000
Once endpoints are observable, the next problem is coordination.

09:10.000 --> 09:18.000
What questions should endpoints answer when should they answer them and how do we change that safely across thousands of machines?

09:18.000 --> 09:26.000
The control layer turns endpoints into high fidelity queryable sensors without treating each endpoint as a snowflake.

09:26.000 --> 09:29.000
The control layer is responsible for three things.

09:29.000 --> 09:31.000
First, central configuration.

09:32.000 --> 09:36.000
Query schedules and settings are defined once and applied consistently.

09:36.000 --> 09:38.000
Second, live interaction.

09:38.000 --> 09:42.000
During an incident, you can ask unique questions and get answers immediately.

09:42.000 --> 09:44.000
And third, consistency.

09:44.000 --> 09:48.000
The same questions, the same schemas across all operating systems.

09:48.000 --> 09:54.000
This is what turns endpoints to memory from an ad hoc collection into a system you can rely on.

09:54.000 --> 09:58.000
So in this blueprint, fleet plays the role of the control plane.

09:58.000 --> 10:01.000
Fleet is primarily MIT license open source.

10:01.000 --> 10:07.000
The repo also maintains a small set of advanced features in a clearly separated directory under a different license.

10:07.000 --> 10:11.000
But everything we cover here is free and MIT license.

10:11.000 --> 10:14.000
Fleet was built to manage OSquery at scale.

10:14.000 --> 10:17.000
It works with standard vanilla OSquery.

10:17.000 --> 10:20.000
Fleet also provides its own agent that wraps OSquery.

10:20.000 --> 10:27.000
That agent handles lifecycle management and its capabilities like additional tables, software installs and running scripts.

10:27.000 --> 10:29.000
On endpoints.

10:29.000 --> 10:35.000
Fleet manages configuration, schedule squaries, and enables safe life queries at scale.

10:35.000 --> 10:40.000
All while people can control separate from ingestion, detection and response.

10:40.000 --> 10:44.000
This is a screenshot of the top level dashboard of the fleet UI.

10:44.000 --> 10:47.000
Without a control layer like this, endpoint telemetry tends to drift.

10:47.000 --> 10:51.000
Different machines run different queries, configurations diverge.

10:51.000 --> 10:53.000
Incident response becomes guesswork.

10:53.000 --> 10:56.000
With the control layer intent is explicit.

10:56.000 --> 10:59.000
You can see what's being asked, the rent points, why's being asked,

10:59.000 --> 11:04.000
change it deliberately, and see the activity audit of who changed what.

11:04.000 --> 11:09.000
And all of this can be managed to get up with configuration stored in version control.

11:09.000 --> 11:14.000
That's critical when security teams are accountable for outcomes.

11:14.000 --> 11:17.000
There are other ways to manage endpoint agents.

11:17.000 --> 11:24.000
Things use platforms like Wahzu, which combine control collection and detection into a single system.

11:24.000 --> 11:28.000
That can work well for small fleets, but it tightly couples concerns.

11:28.000 --> 11:32.000
In this blueprint, we're intentionally keeping control focused on coordination,

11:32.000 --> 11:37.000
so the rest of the system stays swappable.

11:37.000 --> 11:42.000
Evented OSquery tables give you near real-time signals, but only if you don't drop them.

11:42.000 --> 11:47.000
That means the control layer has to work hand-in-hand this reliable ingestion and buffer.

11:47.000 --> 11:51.000
Control defines intent, the next layer makes sure nothing gets lost.

11:51.000 --> 11:56.000
So once endpoints answer questions, that data has to leave the endpoint safely.

11:56.000 --> 12:02.000
In this blueprint endpoints send telemetry to fleet first, our control layer from there, data is forwarded onward.

12:02.000 --> 12:09.000
The job of the ingestion layer is to take that data that's already been collected and move it reliably to downstream systems.

12:09.000 --> 12:14.000
This is where buffering, back pressure and durability are primarily handled.

12:14.000 --> 12:18.000
Not on endpoints, not in detection logic.

12:18.000 --> 12:21.000
This blueprint ingestion is handled by vector.

12:21.000 --> 12:27.000
The control-flare layer fleets, forwards telemetry using web hooks of our writing logs to files.

12:27.000 --> 12:31.000
Vector then picks that data up and takes responsibility for moving it downstream.

12:31.000 --> 12:34.000
Vector is designed to sit at this boundary.

12:34.000 --> 12:38.000
It handles back pressure explicitly, so slow consumers don't cause data loss.

12:38.000 --> 12:43.000
It supports structured transforms, which lets you normalize or enrich data before analysis.

12:43.000 --> 12:47.000
And it's configuration model is friendly to security teams.

12:47.000 --> 12:51.000
Auditable, versionable and predictable underchange.

12:51.000 --> 12:56.000
There are other open-source options for this layer.

12:56.000 --> 13:04.000
Some teams use the previously mentioned fluent bit, which is lightweight and widely deployed.

13:04.000 --> 13:08.000
Others use lockstash, which is powerful and heavier to operate.

13:08.000 --> 13:12.000
These tools can work well, what matters isn't the specific choice.

13:12.000 --> 13:18.000
It's keeping ingestion focused on reliability and structure, not detection and response.

13:18.000 --> 13:26.000
So once ingestion is in place, we've solved the first hard problem, getting endpoint data off and point reliably.

13:26.000 --> 13:29.000
At that point, streaming becomes an option.

13:30.000 --> 13:38.000
Streaming lets you decouple producers from consumers replay historical data and rerun analysis as detection logic evolves.

13:38.000 --> 13:42.000
Not every system needs streaming, so this is an optional step.

13:42.000 --> 13:46.000
But when you do, it should come after ingestion, not instead of it.

13:46.000 --> 13:49.000
This is where streaming fits into the blueprint.

13:49.000 --> 13:52.000
So in this blueprint, streaming is handled by Apache Kafka.

13:52.000 --> 13:54.000
Kafka sits after ingestion.

13:54.000 --> 13:58.000
It provides durable streams that multiple consumers can read independently.

13:58.000 --> 14:04.000
Detections and analytics and experimentation can evolve without changing how data is collected.

14:04.000 --> 14:10.000
Like I mentioned, not every telemetry pipeline needs streaming, streaming makes sense when detections change frequently.

14:10.000 --> 14:16.000
When you want to replay historical data, when multiple consumers need the same events,

14:16.000 --> 14:19.000
or experimentation is expected, not exceptional.

14:19.000 --> 14:24.000
If none of these apply, you may not need this layer yet, and that's fine.

14:24.000 --> 14:28.000
There's one anti pattern I want to mention here, and then it's skipping the ingestion layer.

14:28.000 --> 14:30.000
This is where things often go wrong.

14:30.000 --> 14:33.000
Sending endpoint data directly into Kafka is leaky.

14:33.000 --> 14:38.000
You push back pressure onto the control layer, which propagates it to the endpoint.

14:38.000 --> 14:43.000
You lose control over buffering, schema changes ripple outwards and break consumers.

14:43.000 --> 14:47.000
Kafka is a great streaming system, but it should not be your ingestion layer.

14:47.000 --> 14:52.000
In this blueprint, ingestion absorbs the mess and streaming preserves flexibility.

14:52.000 --> 15:06.000
Once data is reliably ingested and streamed, it has to land somewhere.

15:06.000 --> 15:11.000
The storage layer is where telemetry becomes durable, horrible, and useful over time.

15:11.000 --> 15:17.000
This is where you support investigations after the fact historical analysis and new questions apply to old data.

15:17.000 --> 15:25.000
Storage isn't just about keeping data, it's about being able to ask hard questions later and get answered quickly.

15:25.000 --> 15:31.000
In this blueprint storage layer is built on clickhouse, clickhouse is designed for high volume analytical workloads.

15:31.000 --> 15:39.000
It's fast for the kind of question security teams usually ask, filtering aggregation and slicing large amounts of event data.

15:39.000 --> 15:45.000
Because it's column oriented, you can query large data sets efficiently without scanning everything.

15:46.000 --> 15:49.000
Most importantly, clickhouse gives you direct access to your data.

15:49.000 --> 15:53.000
No proprietary schemas, no hidden query engines.

15:53.000 --> 15:56.000
There are other open source ways to store telemetry.

15:56.000 --> 16:00.000
Some teams use search-based systems like open search or elastic search.

16:00.000 --> 16:04.000
These are great for text search, but can get expensive or slow at scale.

16:04.000 --> 16:10.000
Other store data in object storage and query it using table formats like Apache iceberg.

16:10.000 --> 16:14.000
That works well from batch analytics, but is slower for interactive investigation.

16:14.000 --> 16:18.000
These approaches can all work, they just optimize for different access patterns.

16:18.000 --> 16:23.000
The key is choosing storage that matches how you actually investigate incidents.

16:23.000 --> 16:29.000
So this point with collected the data moved it reliably and stored it in a way that's fast and accessible.

16:29.000 --> 16:37.000
The remaining question is no longer about infrastructure, it's about meaning.

16:37.000 --> 16:42.000
How do we turn all this data into signals that actually that matter?

16:42.000 --> 16:47.000
How do we express detections in a way that's understandable, testable and reusable?

16:47.000 --> 16:50.000
That's where detection comes in.

16:50.000 --> 16:53.000
For detection, this blueprint uses sigma.

16:53.000 --> 16:56.000
Sigma is a rule format, not a detection engine.

16:56.000 --> 17:02.000
It lets you describe suspicious behavior in a portable, readable way independent of storage transport

17:02.000 --> 17:03.000
of vendor.

17:03.000 --> 17:08.000
That portability is important, it means your detection logic isn't locked to one backhand

17:08.000 --> 17:11.000
and it can evolve as the rest of the system changes.

17:12.000 --> 17:15.000
So in this blueprint, sigma is used to offer detection logic.

17:15.000 --> 17:21.000
Those rules are then translated into native SQL and executed directly against clickhouse.

17:21.000 --> 17:24.000
But there's currently a gap in the ecosystem.

17:24.000 --> 17:28.000
There's no official backhand doing that translation to clickhouse for you.

17:28.000 --> 17:32.000
I'll throw there is a SQLite backhand that can be used as a starting point.

17:32.000 --> 17:37.000
So in practice, teams usually build their own tooling here as they grow and mature.

17:37.000 --> 17:42.000
Sigma defines what we're looking for, SQL defines how we execute it.

17:42.000 --> 17:45.000
This is a concrete example of a translation.

17:45.000 --> 17:49.000
On the left is the Sigma rule expressing suspicious behavior.

17:49.000 --> 17:54.000
And on the right is the equivalent clickhouse query that executes it as an alert.

17:54.000 --> 17:58.000
This is where intent becomes execution explicitly and visibly.

17:58.000 --> 18:02.000
You can see exactly how detection is defined and then how it runs.

18:02.000 --> 18:07.000
And because detections run as scheduled SQL queries, there is also deterministic.

18:07.000 --> 18:09.000
The logic is inspectable.

18:09.000 --> 18:13.000
There's no scoring or heuristics or vendor magic.

18:13.000 --> 18:15.000
You can test detections.

18:15.000 --> 18:16.000
You can version them.

18:16.000 --> 18:18.000
You can rerun them against historical data.

18:18.000 --> 18:22.000
That's what makes detection something engineers can reason about.

18:26.000 --> 18:29.000
There are other ways to express detections.

18:29.000 --> 18:31.000
You can write raw SQL directly.

18:31.000 --> 18:34.000
That's powerful, but often less portable and harder to standardize.

18:34.000 --> 18:39.000
You can use stream processing systems like Apache Think or KSQLDB.

18:39.000 --> 18:43.000
Those can work well for real-time use cases, but are more complex to operate.

18:43.000 --> 18:48.000
You can rely on always query only detections, but you lose hindsight and replay.

18:48.000 --> 18:52.000
Or you can use seam style systems like open search with built-in analytics.

18:52.000 --> 18:56.000
Those can work, but often hide logic behind abstractions.

18:56.000 --> 18:58.000
Each option has its trade-offs.

18:58.000 --> 19:01.000
The goal here isn't to pick the best detection system.

19:01.000 --> 19:08.000
It's to make detection logic portable, understandable, testable, and owned by the team that's responsible for outcomes.

19:08.000 --> 19:12.000
Sigma gives you a common language. Clickhouse gives you scale and replay.

19:12.000 --> 19:17.000
Together, they let detections evolve without rewriting the rest of the pipeline.

19:17.000 --> 19:20.000
Detection alone isn't intelligence.

19:20.000 --> 19:22.000
A detection tells you something happened.

19:22.000 --> 19:25.000
Intelligence tells you whether it matters.

19:25.000 --> 19:29.000
Intelligence requires context correlation and humans in the loop.

19:29.000 --> 19:33.000
This layer is where raw signals become something a team can actually act on.

19:33.000 --> 19:38.000
In this blueprint, the correlation and visualization are handled by Grafana.

19:38.000 --> 19:40.000
Grafana works well here because it's SQL native.

19:40.000 --> 19:43.000
It can build dashboards directly on top of Clickhouse.

19:43.000 --> 19:46.000
It fits naturally into GitHub's workflows.

19:46.000 --> 19:49.000
Dashboards alerts and queries live in version control.

19:49.000 --> 19:54.000
And it's already familiar to most infrastructure teams, which lowers the barrier to adoption.

19:55.000 --> 19:57.000
This is a real Grafana dashboard.

19:57.000 --> 20:00.000
It shows processes running on a single device.

20:00.000 --> 20:05.000
But this data can be sliced and aggregated as needed across the whole fleet of devices.

20:05.000 --> 20:08.000
The important part here is not only the dashboard itself.

20:08.000 --> 20:14.000
It's that detections, investigations, and context all live on top of the same data.

20:14.000 --> 20:20.000
You can pivot from detection to historical data to fleet-wide patterns without changing tools.

20:20.000 --> 20:23.000
That's what correlation looks like in practice.

20:23.000 --> 20:27.000
Once something is detected understood, it needs to go somewhere.

20:27.000 --> 20:30.000
Grafana has a loading, a loading built in.

20:30.000 --> 20:36.000
Alerts can be routed to Slack, ticketing systems, or security orchestration, automation, and response tools.

20:36.000 --> 20:39.000
But response isn't just automation.

20:39.000 --> 20:45.000
Sometimes the right response is to pause, investigate, or ask better questions.

20:45.000 --> 20:48.000
This layer connects detection to real-world action,

20:48.000 --> 20:52.000
without hiding the logic or locking you into a workflow.

20:52.000 --> 20:54.000
Grafana isn't the only option for this layer.

20:54.000 --> 20:59.000
Some teams use like tools like Kibana when they're already invested in the elastic ecosystem.

20:59.000 --> 21:05.000
Others use SQL-focused analytics tools like Apache Superset or Metabase for ad hoc investigation.

21:05.000 --> 21:07.000
These can all work.

21:07.000 --> 21:12.000
The thing to focus on is that visualization, correlation, and alerting, sit on top of your data,

21:12.000 --> 21:15.000
not inside a closed detection engine.

21:15.000 --> 21:21.000
As long as this layer stays focused on helping humans understand and act, the system remains flexible.

21:21.000 --> 21:26.000
Now that we walk through each layer, let's put it all together.

21:26.000 --> 21:30.000
This is one concrete way to assemble the blueprint using real open source tools.

21:30.000 --> 21:33.000
Start on the left and follow the data.

21:33.000 --> 21:37.000
OSCARI runs on the endpoint and expose this system state.

21:37.000 --> 21:41.000
Fleet sits above it and handles intent configuration and live interaction.

21:41.000 --> 21:46.000
From there to elementary flows through ingestion and buffering into streaming and storage.

21:46.000 --> 21:48.000
Detection logic is defined separately,

21:48.000 --> 21:52.000
translate it into SQL and execute it directly against the data,

21:52.000 --> 21:57.000
and then correlation, intelligence, and response sit at the point where data and detections come together.

21:57.000 --> 22:00.000
This isn't the only stock that works.

22:00.000 --> 22:05.000
What matters is that every boundary you've seen is real, enforceable, and swappable.

22:05.000 --> 22:10.000
This diagram exists to show that that separation holds all the way down.

22:10.000 --> 22:14.000
Now, whenever we talk about open endpoint telemetry and detection pipelines,

22:14.000 --> 22:16.000
the same question comes up.

22:16.000 --> 22:18.000
Is this meant to replace EDR?

22:18.000 --> 22:20.000
The short answer is no.

22:20.000 --> 22:22.000
And it's important to be explicit about that.

22:22.000 --> 22:25.000
This blueprint does not replace EDR.

22:25.000 --> 22:29.000
EDR tools are valuable, and many teams rely on them every day.

22:29.000 --> 22:32.000
But attackers play a constant cat and mouse game with vendors.

22:32.000 --> 22:36.000
Signatures change, behaviors shift, evasions get discovered and patched.

22:36.000 --> 22:39.000
Even top tier products like CrowdStrike can be evaded.

22:39.000 --> 22:42.000
Not because they're bad, but because the game never stops.

22:42.000 --> 22:45.000
That's just the reality of endpoint security.

22:45.000 --> 22:49.000
This is why visibility diversity matters.

22:49.000 --> 22:51.000
Open endpoint telemetry isn't about beating EDR.

22:51.000 --> 22:56.000
It's about verifying and complimenting what your other tools claim to see.

22:56.000 --> 23:01.000
When you own the telemetry, you can cross check detections, investigate independently,

23:01.000 --> 23:04.000
and ask questions your other tools didn't anticipate.

23:04.000 --> 23:07.000
Open endpoint telemetry gives you a second set of eyes.

23:07.000 --> 23:09.000
One, you control.

23:09.000 --> 23:13.000
That's what defense and depth looks like in practice.

23:13.000 --> 23:16.000
Now, this talk isn't about a specific stack.

23:16.000 --> 23:18.000
It's about a way of thinking.

23:18.000 --> 23:21.000
A way to break endpoint security into clear concerns.

23:21.000 --> 23:23.000
A way to draw boundaries you can reason about.

23:23.000 --> 23:26.000
A way to build systems that evolve without collapsing.

23:26.000 --> 23:29.000
If you take one thing away, let it be this.

23:29.000 --> 23:30.000
You don't need a perfect tool.

23:30.000 --> 23:32.000
You need a system.

23:32.000 --> 23:33.000
You can understand.

23:33.000 --> 23:35.000
You should leave the three things.

23:35.000 --> 23:38.000
A mental model, you can adapt to your environment.

23:38.000 --> 23:41.000
A stack you can run locally inspect and change,

23:41.000 --> 23:44.000
and the confidence to own your telemetry and to end.

23:44.000 --> 23:48.000
Security improves when telemetry is transparent, replayable,

23:48.000 --> 23:51.000
and owned by the team that defends the system.

23:51.000 --> 23:54.000
Now, before I wrap up, I want to pause on this.

23:54.000 --> 23:57.000
These are some of the teams using fleet introduction today.

23:57.000 --> 24:00.000
They're very different organizations with very different constraints.

24:00.000 --> 24:03.000
We've learned a lot from working these customers

24:04.000 --> 24:08.000
and many of the lessons I talked about come from operating at this kind of scale

24:08.000 --> 24:10.000
on the real conditions.

24:10.000 --> 24:12.000
That's it for this talk.

24:12.000 --> 24:16.000
If this blueprint aligns with how you think about endpoint telemetry,

24:16.000 --> 24:19.000
I'd love to hear how you're approaching it in your own systems.

24:19.000 --> 24:22.000
The details vary by the trade-offs are usually similar.

24:22.000 --> 24:24.000
Here are a few links about fleet.

24:24.000 --> 24:25.000
We are hiring.

24:25.000 --> 24:29.000
Here's where you can find me and some of my other content.

24:29.000 --> 24:32.000
This QR code is for the session feedback

24:32.000 --> 24:34.000
for Fossent Dance specifically,

24:34.000 --> 24:37.000
if you guys have time to fill it out.

24:37.000 --> 24:39.000
Thank you, everyone.

24:45.000 --> 24:46.000
All right.

24:46.000 --> 24:48.000
I'll give you a slide if you guys want to talk.

24:48.000 --> 24:50.000
We have time for one little question.

24:50.000 --> 24:52.000
No pressure.

24:52.000 --> 24:54.000
No question.

24:54.000 --> 24:55.000
Okay, thank you again.

24:59.000 --> 25:01.000
Thank you.

