WEBVTT

00:00.000 --> 00:10.000
I think we are good to go.

00:10.000 --> 00:11.000
Okay.

00:11.000 --> 00:15.000
So, could you close the doors?

00:15.000 --> 00:21.000
Please close the doors.

00:21.000 --> 00:23.000
So, where are time?

00:23.000 --> 00:24.000
Yeah.

00:24.000 --> 00:27.000
Please welcome Bucklocks.

00:27.000 --> 00:28.000
Thanks.

00:30.000 --> 00:31.000
Hi, everyone.

00:31.000 --> 00:33.000
I'm Federico Foskini.

00:33.000 --> 00:36.000
I'm the third detection team leader at Sertigo.

00:36.000 --> 00:42.000
In addition to my role there, I'm also a Google signer of code mentor for the

00:42.000 --> 00:44.000
One in the organization.

00:44.000 --> 00:47.000
I will tell you more about that later.

00:47.000 --> 00:52.000
And also, lecturers in the master of cyber security at the University of

00:52.000 --> 00:54.000
Bologna.

00:55.000 --> 00:58.000
I'm Lorena, a third detection engineer at Sertigo.

00:58.000 --> 01:01.000
Completing a master degree in computer science.

01:01.000 --> 01:05.000
I'm also the creator behind the Federico's idea.

01:05.000 --> 01:08.000
And the principal maintainer of Baphologs.

01:08.000 --> 01:11.000
An open source project we'll talk about later.

01:11.000 --> 01:16.000
And now we'll start explaining the origin behind its birth.

01:16.000 --> 01:27.000
So, in Sertigo, we have a main product.

01:27.000 --> 01:31.000
It's a managed security service portal.

01:31.000 --> 01:41.000
All the detection and services we implement are either made in house by ourselves or based an open source software.

01:41.000 --> 01:45.000
So, we are deeply committed in contributing to the open source community.

01:45.000 --> 01:52.000
And we wanted to develop a new threat detection tool, but we didn't know the scope yet.

01:52.000 --> 01:56.000
So, we set ourselves a specific criteria.

01:56.000 --> 02:04.000
We wanted to build something that solves a common point for security teams and just not an

02:04.000 --> 02:05.000
niche problem.

02:05.000 --> 02:10.000
It has to fill a genuine gap in the security landscape.

02:10.000 --> 02:16.000
And also, it has to be something that could be manageable by a small team.

02:16.000 --> 02:22.000
In fact, most of the development has been done by Lorena, single end.

02:22.000 --> 02:28.000
So, hi, how we identify the right focus for our tool.

02:28.000 --> 02:31.000
We took two approaches here.

02:31.000 --> 02:36.000
The first one is reviewing and evaluating existing commercial products.

02:36.000 --> 02:43.000
To see if they offer innovative solution lacking in open source alternatives.

02:43.000 --> 02:48.000
And the second we review, we are reviewing security incident.

02:48.000 --> 02:57.000
We manage to see if we can identify common patterns that could be better detected.

02:57.000 --> 03:00.000
So, let's talk about project evaluation.

03:00.000 --> 03:07.000
These are one of the commercial products we want to evaluate.

03:07.000 --> 03:15.000
We can do the evaluation manually by installing the product ourselves and run all the tests.

03:15.000 --> 03:21.000
That will be extremely time consuming and also very expensive.

03:21.000 --> 03:30.000
So, we look at reports made by organization that are specialized in productive evaluation and classification,

03:30.000 --> 03:35.000
like Mitre with attack framework first, I DC Gardner.

03:35.000 --> 03:37.000
And what did we find?

03:37.000 --> 03:48.000
Interestingly, we found out that every vendor claims to be a leader and also claims perfect detection rate.

03:49.000 --> 03:57.000
That's looked to me that they are writing their own rule and awarding them as their prizes.

03:57.000 --> 04:01.000
We know that doesn't reflect the reality of cybersecurity.

04:01.000 --> 04:08.000
As Wendy Nafer pointed out in an amazing presentation, she did a couple of weeks ago.

04:09.000 --> 04:18.000
The same problem we faced 20 years ago, persist and also are amplified by increased software complexity,

04:18.000 --> 04:21.000
ripple effect of breaches, monetization and data.

04:21.000 --> 04:28.000
So, we know that could be room for improvement in detection.

04:28.000 --> 04:37.000
When our first approach doesn't yield and result, we start analyzing the kill chain of attacks we manage to see if we can identify common attack patterns.

04:37.000 --> 04:51.000
In the last year, we managed 10 major compromises and the defining we are striking 7 out of 10 of these breaches.

04:51.000 --> 04:56.000
Involved compromises credential obtained through data leaks.

04:56.000 --> 05:09.000
Two were through fishing campaigns, and only one was an actual attack where ACVE was used to reduce the key space to conduct a force attack.

05:09.000 --> 05:21.000
Past in the last year, we see some key patterns like VPN, remote test tool for persistence, lateral movement, that encryption crypto currency mining.

05:21.000 --> 05:25.000
What we learned from here, basically, should think.

05:25.000 --> 05:31.000
The first thing is our customers spend a lot of money in very advanced security product,

05:31.000 --> 05:38.000
still get a bridge through a legitimate login with a stolen credential.

05:38.000 --> 05:52.000
And also, this given site, and we learned that we need to focus on the initial access where the most common attack vector I was.

05:52.000 --> 05:56.000
And that's basically how buffaloes was born.

05:56.000 --> 06:06.000
Buffalo is in an open source solution for authentication protection, designed to detect and alert the most common tactics used by attackers.

06:07.000 --> 06:14.000
I let Lorena continue, and she will delve deeper into buffaloes internals.

06:14.000 --> 06:23.000
First of all, it's important to underline that the buffaloes project consists in the detection of the anomalous login.

06:23.000 --> 06:30.000
Indeed, the first phase based on the login data collection.

06:31.000 --> 06:45.000
It's an ingestion phase where we use the file bit and elastic search, but they are not required.

06:45.000 --> 06:51.000
So in let's recap all the process.

06:51.000 --> 06:57.000
The first phase, the ingestion phase, in addition to the collection of data.

06:57.000 --> 07:04.000
Geolocation information is enriched, and the field normalization occurs.

07:04.000 --> 07:11.000
After that, the logins data are processed and analyzed by the detection logic.

07:11.000 --> 07:18.000
And just the interesting logins are saved into the buffalo zone database.

07:18.000 --> 07:22.000
Of course, with the relative user send alerts.

07:22.000 --> 07:37.000
These data in the database are cleared for the creation of user inventory for create a web interface and to send alerts to the alert to watch her module.

07:38.000 --> 07:47.000
The installation process is very easy dealing with containerized Docker application, so it's just required to clone the repository.

07:47.000 --> 07:58.000
Run the application container, and if you'd like to interact with the Django admin panel, you have to create a super user.

07:58.000 --> 08:02.000
The first version of the web interface is minimal.

08:02.000 --> 08:06.000
I don't like front end, I really don't like front end.

08:06.000 --> 08:12.000
But there are some interesting graphics, for example.

08:12.000 --> 08:21.000
We have a graphics about the users that triggered the alerts divided by the risk score.

08:21.000 --> 08:39.000
Then we have the alert split up by trigger date time, a table with the alert summarization, and a map in order to emphasize the location information of the logins that triggered the alert.

08:39.000 --> 08:44.000
But now let's deep into the really core of buffaloes.

08:44.000 --> 09:08.000
More in detail, the anomalous logins reported by buffaloes include different types of anomalies, such as related to the appliances used, some users information, or familiar properties in country where the login occur.

09:08.000 --> 09:19.000
Some of them are labeled with the working progress, but they are just running internally and have to be merged into the public repo.

09:19.000 --> 09:24.000
Let's clarify some of the detection with an example.

09:24.000 --> 09:49.000
So if we have a first login from Milan at three, and then after an hour from Rome, this will be considered an impossible travel, because the relationship space time, it's not exceeded the threshold metrics that are considered feasible for this distance of course.

09:49.000 --> 10:04.000
Supposing the following logins is done at 1250 from Vienna, this is not an impossible travel, but it's considered a new country alert.

10:04.000 --> 10:18.000
And then supposing a new login from Austria happens 30 days later, this will trigger the not typical controller.

10:18.000 --> 10:33.000
The logins utilizes different user agents, and some IPs are considered anonymous, their relative new device and anonymous IP login alerts are triggered.

10:33.000 --> 11:00.000
The detection to highlight the so-called state accounts, that is an integration of the user inventory that would consist in querying the APIs of the internal authentication service used by a company in order to compare the login information with the logins captured by buffaloes to send unfamiliar vulnerabilities.

11:00.000 --> 11:29.000
For example, if we have on the left table, a user, John Doe, a user, John Doe that has logged and is active for the internal authentication service used in the company, but for buffaloes he has never logged in, this could be a possible vulnerability for the system.

11:29.000 --> 11:40.000
And also the Mark Turner account that has not logged for a while.

11:40.000 --> 11:47.000
Once explained the detection types, let's take an overview about the detection logic itself.

11:47.000 --> 12:01.000
Since the different input sources are accepted, we have to normalize the data and mapping them into a predefined schema.

12:01.000 --> 12:23.000
The initial in just the process is recap here, and we have all events cleared every 30 minutes, and just the login events are saved and processed by the detection logic.

12:23.000 --> 12:31.000
These logs are normalized in the schema shown before and grouped by users.

12:31.000 --> 12:42.000
All these logins are then processed by the detection engine and just the anomalous one are sent as alerts.

12:43.000 --> 12:52.000
This phase of ingestion is very crucial because we deal with a really huge amount of data.

12:52.000 --> 13:07.000
Indeed, the power of this project is that since the beginning it has developed using real logs, gathered from both web applications and cloud services.

13:07.000 --> 13:16.000
Testing the detection on about 500,000 real data allowed us to improve the alerting logic step by step.

13:16.000 --> 13:34.000
In addition, the final version of buffaloes is running on the search code detection platform, analyzing the data of more than 30 customers for a total of about 300,000 logins per hour from different sources.

13:35.000 --> 13:43.000
On the other end, these high amount of real data has brought out some critical aspects.

13:44.000 --> 13:59.000
First of all, we saw that a user could use different types of devices, for example, logged in from the PC and from this smartphone.

14:00.000 --> 14:17.000
This will change rapidly the user agents, of course, but if we consider also that as smartphone could log in using the mobile data or so the IP is changing frequently.

14:17.000 --> 14:38.000
If a user also uses the company applications with different networks, for example, with VPN, we saw a proxy network and from not proxy network, this will change the IP uses.

14:38.000 --> 14:48.000
So what's the problem, of course, so many alerts occur and the user profile creation was made really impossible.

14:48.000 --> 15:05.000
But we found a troubleshooting and we developed a personalizable filter page where in this config panel, we can create custom filters about the users.

15:05.000 --> 15:10.000
For example, our location devices and alerts itself.

15:10.000 --> 15:22.000
Moreover, we gave the possibility to change also the detection metrics and this gives a sort of bring your own detection characteristics to buffaloes.

15:22.000 --> 15:33.000
And this is a nice feature because the most detection services don't provide the changes of these parameters.

15:33.000 --> 15:44.000
Another problem we faced is the possibility that different location providers give us for the same different locations.

15:44.000 --> 15:50.000
So this of course could give us the wrong log in information.

15:50.000 --> 16:00.000
And we identified the maximum as the most accurate provider.

16:01.000 --> 16:05.000
Thanks. Speaking about the future.

16:05.000 --> 16:15.000
Basically, we want to add a lot of additional sources, maybe reading logs from Postgreeds database.

16:15.000 --> 16:22.000
It's my SQL gray log and adding new alert notification and sync feature.

16:22.000 --> 16:32.000
But I think our primary focus will be implementing automatic user blocking, that's something that identity provider already supports.

16:32.000 --> 16:45.000
And when triggered, these functionalities will automatically suspend user accounts when as suspicious activities detected.

16:45.000 --> 16:58.000
And also allowing the user themselves to unlock his account, maybe through a password reset or a multi-factor authentication confirmation.

16:58.000 --> 17:05.000
This will greatly reduce alert fatigue and also streamline security remediation.

17:05.000 --> 17:13.000
The last point, actually, is there because that it told me is mandatory, I think, a little glimpse during presentations.

17:13.000 --> 17:21.000
I really don't know what to do with these technologies, but maybe you do, speaking with that.

17:21.000 --> 17:36.000
We are participating in the Google Summer of Code. For those unfamiliar, this is a project where students can contribute to open source projects and during their summer break and get paid by Google.

17:36.000 --> 17:42.000
So I think it's a great occasion to contribute to buffaloes.

17:42.000 --> 17:50.000
If you are interested, check out our repository because as soon as Google will confirm our participation.

17:50.000 --> 17:56.000
We will be adding all the details there.

17:56.000 --> 18:00.000
And that's all. Thanks everybody.

18:00.000 --> 18:10.000
Thank you.

18:11.000 --> 18:24.000
Yeah.

18:24.000 --> 18:29.000
Sorry.

18:29.000 --> 18:38.000
Yeah. The question is if it's possible to ingest on logs in the system.

18:38.000 --> 18:44.000
And yes, but you need to use this data schema.

18:44.000 --> 18:53.000
If your logs contains these fields, buffaloes automatically will read them.

18:53.000 --> 19:06.000
Yeah.

19:06.000 --> 19:19.000
Okay.

19:19.000 --> 19:35.000
Okay. The question was if we can also ingest custom detection.

19:35.000 --> 19:42.000
Okay. So like additional information about the logins.

19:42.000 --> 19:54.000
No, actually, it is not supported. I think it's something that's fit better like a CM or something else.

19:54.000 --> 20:01.000
Because with buffaloes, we really want to implement only the detection part of it.

20:01.000 --> 20:07.000
And not the parsing events from other sources.

20:07.000 --> 20:13.000
Apart from this information, we need to do our detection.

20:13.000 --> 20:17.000
Yes.

20:18.000 --> 20:39.000
Yeah.

20:39.000 --> 20:49.000
The question is why we create buffaloes when other product supports the same feature essentially.

20:49.000 --> 20:56.000
Basically, two things. The first one is most of these product are closed source.

20:56.000 --> 21:03.000
And the second one, the most important one is with buffaloes.

21:03.000 --> 21:09.000
We have control of the detection ourselves. We are made the algorithm ourselves.

21:09.000 --> 21:19.000
So we can change it more easily and also adding new new new stuff.

21:19.000 --> 21:36.000
Because what was limiting us before buffaloes were when we received alerts from other identity provider like Azure.

21:36.000 --> 21:40.000
And we actually didn't know how the detection was implemented.

21:40.000 --> 21:50.000
So we gave, we read an alert like impossible travel, but we didn't know how it was calculated.

21:50.000 --> 21:58.000
And on what events and so on.

21:58.000 --> 22:03.000
Okay.

22:04.000 --> 22:16.000
No, it's, it's all on buffaloes because we wanted to split the logic from the weekend.

22:16.000 --> 22:22.000
So in, we, in this example, last search was used only for events.

22:22.000 --> 22:30.000
All the logic and detection are inside the jungle in Python basically.

22:30.000 --> 22:37.000
Okay.

22:37.000 --> 22:44.000
Okay.

22:44.000 --> 22:52.000
Yeah.

22:53.000 --> 23:10.000
Yeah.

23:10.000 --> 23:15.000
Yeah.

23:16.000 --> 23:23.000
Is alert fatigue because most of the, this type of detection cannot be determined.

23:23.000 --> 23:34.000
If we see or not without asking the users because we have a lot of of false positives when the users travel abroad.

23:34.000 --> 23:45.000
And basically what we are doing right now is sending the alert directly to the user itself.

23:45.000 --> 24:01.000
So it can confirm or not, but and that's why I really wanted to implement the automatic user blocking feature to avoid the amount of other alerts.

24:01.000 --> 24:15.000
We are receiving now basically.

24:15.000 --> 24:20.000
Not really because.

24:20.000 --> 24:27.000
No, because we don't save all the logs data of the customers into the buffalo database.

24:27.000 --> 24:33.000
There we save just the last logins and update them step by step.

24:33.000 --> 24:42.000
So we reduce a lot of data saved.

24:43.000 --> 24:48.000
Yes.

24:48.000 --> 24:51.000
Yeah.

24:51.000 --> 25:00.000
Yeah.

25:00.000 --> 25:04.000
Yeah.

25:04.000 --> 25:05.000
Yeah.

25:05.000 --> 25:11.000
The question was if we have planned to add the real time detection.

25:11.000 --> 25:12.000
Yeah.

25:12.000 --> 25:20.000
Actually, that's other to do with our architecture right now.

25:20.000 --> 25:28.000
But also you have to keep in mind that some providers like Azure.

25:28.000 --> 25:37.000
Generates log with with a big delay up to 24 hours.

25:37.000 --> 25:47.000
So you do not do not have the guarantee even if you are analyzing the events you are receiving.

25:47.000 --> 25:53.000
In real time that your detection with really really real time.

25:54.000 --> 25:56.000
Yeah.

25:56.000 --> 25:57.000
Yeah.

25:57.000 --> 25:59.000
Yeah.

25:59.000 --> 26:01.000
I think we are out of time.

26:01.000 --> 26:03.000
Yeah.

26:03.000 --> 26:04.000
Thanks.

26:04.000 --> 26:05.000
Yeah.

26:05.000 --> 26:19.000
Thank you.