WEBVTT

00:00.000 --> 00:21.000
Okay, yes, thank you for the introduction and so we just noticed actually that when we looked at our talk again that we're talking about this advanced security and our experience with how to share.

00:21.000 --> 00:26.000
We didn't even mention that we're doing it with our SPMD.

00:26.000 --> 00:33.000
So, yes, so we are doing this with our SPMD and yeah, let's start.

00:33.000 --> 00:46.000
So, hi again, we are from Highline, Highline Consulting and we're doing the May Projects there together with our colleagues.

00:46.000 --> 00:50.000
I want to do a little step back.

00:50.000 --> 00:55.000
So, you know, first of all, we shall use this slide with an infrastructure.

00:55.000 --> 00:58.000
Most of the part is not important.

00:58.000 --> 01:07.000
What I want to show here is that when there's an MTA, we also have an SPMD proxy in South.

01:07.000 --> 01:15.000
So, this is why at every hope, we call SPMD for maybe it has some tasks to do.

01:15.000 --> 01:20.000
If it does not have any tasks, we just say, okay, pass.

01:20.000 --> 01:29.000
This is, this we are doing because we have brought many, many of our policies into SPMD.

01:29.000 --> 01:34.000
So, the obvious practice was you installed an MTA.

01:34.000 --> 01:36.000
You will have your rules.

01:36.000 --> 01:40.000
Okay, this should not be an unknown domain.

01:41.000 --> 01:43.000
So, the domain should be set up.

01:43.000 --> 01:46.000
Should not be listed on an RBL.

01:46.000 --> 01:53.000
So, this we have done all in post-fix and over the years, we switched all this policy things.

01:53.000 --> 02:00.000
All things, what we have something to do with about spam reputation and anything into SPMD.

02:00.000 --> 02:05.000
So, our post-fix today is just a mayor router.

02:06.000 --> 02:10.000
So, and what we do in SPMD.

02:10.000 --> 02:15.000
This is what sets something about bias is not perfect.

02:15.000 --> 02:17.000
Also, GPT is not perfect.

02:17.000 --> 02:21.000
I can say there are many, many other routes there, not perfect.

02:21.000 --> 02:26.000
So, our idea is to get many, many indicators.

02:26.000 --> 02:30.000
Any kinder is means, okay, we have done an RBL check.

02:31.000 --> 02:33.000
This is IP is listed.

02:33.000 --> 02:38.000
We have done some regular expressions and something hit.

02:38.000 --> 02:39.000
We have done bias.

02:39.000 --> 02:42.000
We have done GPT.

02:42.000 --> 02:46.000
Maybe there is also neural network inside us from the itself.

02:46.000 --> 02:55.000
And what we like to do is to find many, many indicators for spammingness or heminess.

02:55.000 --> 03:02.000
Because then we can summarize in the end and have a very good decision if we want to

03:02.000 --> 03:10.000
maybe drop the email because it's spam or want to let it through or any other action.

03:10.000 --> 03:20.000
This is why we concentrate on our spamD here.

03:20.000 --> 03:29.000
So, what we noticed is that generally when we have all of these indicators like data like RBLs

03:29.000 --> 03:43.000
and which has been collected worldwide through spam traps or some older mains or or that this kind of what we would consider

03:43.000 --> 03:46.000
is the global spam.

03:46.000 --> 03:52.000
And your particular spam is only a small part of that.

03:52.000 --> 04:04.000
And so the intersection with this global spam can vary between 10 to 90% according to our experience.

04:05.000 --> 04:14.000
What we're looking at is a group of customers which are similar customers like different universities or different

04:14.000 --> 04:25.000
government authorities and they, their personal spam actually did not coincide.

04:25.000 --> 04:34.000
So what we then did notice is that if you look at several within those groups like different universities have much more

04:34.000 --> 04:42.000
in common and different also customers within the government authorities have much more in common.

04:42.000 --> 04:50.000
And this let us think that it would be great to share this experience of what has been already detected as personal spam

04:50.000 --> 04:56.000
with peers that have a similar like where you are the similar target group.

04:56.000 --> 05:09.000
And so that was that was the idea on how why we came up with this sharing sharing and sharing within those groups.

05:09.000 --> 05:17.000
Yeah, what we, what we share for now is fuzzy.

05:17.000 --> 05:25.000
As visitors had so buys is more person to say you cannot share it's so good.

05:25.000 --> 05:40.000
buys is a statistic, a statistic, a statistic over words they together in one email like by Viagra.

05:40.000 --> 05:43.000
Fuzzy is a different algorithm here.

05:43.000 --> 05:54.000
It's based on shingles so you can have like after running this algorithm you have the intersect of two text of two text.

05:54.000 --> 06:02.000
So you can fuzzy can say those two emails match on in 99%.

06:02.000 --> 06:05.000
So maybe one word has changed.

06:05.000 --> 06:12.000
So it's perfect to to match on a very specific email.

06:12.000 --> 06:21.000
Also the fuzzy implementation in Aspen B was designed for spam trips or honey pots.

06:21.000 --> 06:25.000
This is why we have some weight in it.

06:25.000 --> 06:30.000
So when you learn maybe one text one once.

06:30.000 --> 06:38.000
It has a very, very low weight and it has to cross a specific threshold.

06:38.000 --> 06:42.000
You can define it maybe let's say our threshold is 30.

06:42.000 --> 06:49.000
Then this email has maybe been seen this 30 times and then this hash activates.

06:49.000 --> 06:57.000
When it activates, the score is growing up to maximum amount.

06:57.000 --> 07:04.000
For Aspen B is typically 12 points from 15 where the default reject level is.

07:05.000 --> 07:12.000
Also Fuzzy has a feature to have different categories.

07:12.000 --> 07:16.000
So in bias you only have it's on the ham side it's on the spam side.

07:16.000 --> 07:18.000
There's nothing in between.

07:18.000 --> 07:22.000
For Fuzzy you can define your own.

07:22.000 --> 07:32.000
Typically in the default installation you have spam, ham and some think it's gray, gray something.

07:32.000 --> 07:36.000
But you can also do your own like you want.

07:36.000 --> 07:40.000
Like having another one for bulk emails full newsletter.

07:40.000 --> 07:43.000
Also you can learn positive things.

07:43.000 --> 07:51.000
Like emails you get again and again like typical invoices you get in or something.

07:51.000 --> 07:56.000
You can build up those databases here.

07:56.000 --> 08:09.000
So the Aspen B Fuzzy worker, this is the one who is managing all this hashesus.

08:09.000 --> 08:23.000
And this is the network service normally you would spin it spin up this locally and you could put in hashesus and query it to get the actual results back.

08:24.000 --> 08:31.000
The Fuzzy worker also summarizes and calculates source thresholds source weights they come in.

08:31.000 --> 08:37.000
And so actually to the Aspen B typical register database.

08:37.000 --> 08:50.000
When you have when you want to run a honey pot you would use this worker when you have a local installation I would definitely.

08:50.000 --> 09:02.000
I would recommend you you have a look to it.

09:02.000 --> 09:16.000
Yeah, so what is that is so this this this Fuzzy check plugin is able to not only communicate to one of those Fuzzy workers but actually to many so.

09:16.000 --> 09:21.000
And this gives us this ability that we were talking about so.

09:21.000 --> 09:30.000
Visilot and our Spendee also offers a service where you can have this you can have a subscription of this Fuzzy plus or something like that.

09:30.000 --> 09:34.000
You can have your own local Fuzzy or.

09:34.000 --> 09:42.000
Yeah, exactly enabled by default but if you hit the limits then you can also have a subscription and then you can have your own.

09:42.000 --> 09:47.000
And then we also thought of the idea of having this as another option.

09:47.000 --> 09:50.000
The shared Fuzzy.

09:50.000 --> 09:54.000
Yeah, so.

09:54.000 --> 10:05.000
What is important though is that you have you follow the same concepts and configuration with the people that you share your Fuzzy worker with that.

10:05.000 --> 10:10.000
So for us it is often easy we recommend our customers who we have which are in the same field.

10:11.000 --> 10:17.000
We have somebody else who has a very similar installation and there it would work well and.

10:17.000 --> 10:22.000
And because it is very important that you have to avoid to learn falls positives.

10:22.000 --> 10:34.000
Yeah, one way to learn falls positives in the terms of Fuzzy is that you use high scoring instead of reject.

10:34.000 --> 10:42.000
In our Spendee to stop an email so say I get an extra file and say oh it's not allowed and I'll give it 99 points.

10:42.000 --> 10:52.000
Yeah, then I will learn the content of the email and not and not just rejected and therefore then I have all of the text.

10:52.000 --> 10:57.000
And that is the learned and this could then be seen as a false positive.

10:58.000 --> 11:00.000
Also.

11:00.000 --> 11:09.000
Way back here, so it's perfect when when you maybe want to use our Ansible roads because there we have prepared all those.

11:09.000 --> 11:14.000
All those policies and everything and.

11:14.000 --> 11:26.000
Way where we have all put us in like an X of file when when when your policy is we don't accept X of files as as attachment then there's a rule a force rule which rejected.

11:27.000 --> 11:38.000
If it looks spammy gets a high score that's it's a principle of it and also we have a little custom script which which is.

11:38.000 --> 11:49.000
Which is doing some more checks which email could be learned and which should be maybe not be learned because it's looks could be a false positive.

11:50.000 --> 12:07.000
And on our customer side so when we have like two universities having having rolled out with our Ansible scripts they have nearly the same policies and in the end they had at least the same the same style of counting spent.

12:07.000 --> 12:20.000
And then it's very easy to to let those also post in the same database of fuzzy hashes or what we see later maybe also other ideas.

12:21.000 --> 12:44.000
Yeah, when when you have multiple multiple custom customers or users posting in one database those threshold should be higher when you when we now say okay for person and one you may be say it's 30 for.

12:44.000 --> 13:00.000
Like a shared one you go with 100 or something like that so yeah so this kind of image should be seen very often at the one part or should be seen multiple times at different parts then it's.

13:01.000 --> 13:12.000
Yeah, the things what we do like very much like to do is to cut the maximum score of different groups this very easy and ask me you can say okay.

13:12.000 --> 13:25.000
The group RBL so when you have a very bad IP it's listed in spam house a music spam call and there and there so those those scores would normally.

13:25.000 --> 13:47.000
So you would learn this email only because it's listed in every RBL when you have the fourth positive in source RBL then then you learn it just because of that.

13:47.000 --> 14:03.000
So this is why we cut the maximum score in in those groups like here for for fuzzy we have implemented something the default fuzzy 12 default reject score is 15.

14:03.000 --> 14:22.000
So we have done now a group so any all fuzzy databases together so all those summits could have a maximum score of 17 so it's still over the reject level but they're also.

14:22.000 --> 14:27.000
Other summits which may be lower the score again.

14:28.000 --> 14:45.000
What we have done the local fuzzy what you still learn next to your shared fuzzy has also a max score was as shared fuzzy for maybe about.

14:45.000 --> 14:49.000
I think we have now.

14:49.000 --> 14:55.000
Another idea would be to share something like appear reputation data.

14:55.000 --> 15:16.000
Very much very much data we shared for every email so our idea is to to maybe use an HTTP API and not post every spam email we got was all it's made a metadata like a peer addresses domains and what we see but.

15:16.000 --> 15:34.000
As well as a very nice plug in very generic and very flexible plug in called reputation and in the end it's nothing nothing more than calculating the median score for.

15:34.000 --> 15:53.000
For all email it's seen with this where you and this would could be a very good point to share also this data in a at a send report and calculate also new reputations all of that.

15:54.000 --> 16:11.000
Yeah, what we have seen also in the previous talks is about that stuff sometimes takes a while especially with the LMAs and so users think that mail is real time communication.

16:11.000 --> 16:39.000
I don't know why we have this need for the deep analysis of potential threats and the analysis of tools are slow or really slow and so for the clam maybe it can take more than 10 seconds for a very complex PDF or we also have some this office macro stuff that does maybe we even have a sandbox.

16:39.000 --> 16:54.000
I mean then we're talking about minutes and we want to maybe decrypt and and and so despite that there are these there they specialize open source tools for for all of those.

16:54.000 --> 17:02.000
I'm extra things that are available for I mean this sandboxing office macro and file analysis.

17:02.000 --> 17:16.000
We we we actually wanted to bring in this concept of a second stage like that we can't initially already decide if that email is good or not yeah so we're talking about.

17:17.000 --> 17:30.000
The question is why is this important nobody you so a normal company is going on the market okay I I have checklists to do and.

17:30.000 --> 17:40.000
I have so I need some some kind of compliance thingy and they they go to some of the commercial.

17:41.000 --> 17:45.000
vendors for for cent boxes fire I all the antivirus.

17:48.000 --> 17:59.000
developers in and on say they all have boxes you can buy there's a pricey but you can put it in your network push push your data HTTP data in and they do it for you.

17:59.000 --> 18:09.000
what we wanted to do is to analyze emails preaker before accepting the email into my infrastructure.

18:09.000 --> 18:19.000
Because as long as as I rejected before before reaching my network it's it's not my mail I'm I'm not.

18:19.000 --> 18:29.000
I'm responsible yeah so and in the end it's easy to do because we're doing it we have done it for for years.

18:30.000 --> 18:55.000
We are we are soft rejecting emails just because of their we didn't know their IP address in from all the database and the thing is we are doing the same here we are doing like a pre flight check in us maybe that could be if it's fast enough antivirus could be a small local check.

18:56.000 --> 19:10.000
And if we decide in the end with all those indicators again that we want to have a second look deep a deep analyzes we posted to the second stage scanner.

19:11.000 --> 19:33.000
In the second stage scanner we can have now those sandboxes file analyzes which can also run for minutes or you would have done macro offices long term GPT runs queries all other stuff.

19:33.000 --> 19:45.000
Yeah so the problem is that the time and the cost already explained that we can actually circumvent it by using this greatest thing and.

19:46.000 --> 20:03.000
That we also I mean our spendee users I'm our spendee things in milliseconds yeah users in seconds and so we can already see that here this all the tools is already flag does it's a slow and able slow timer yeah it's more than 300 milliseconds.

20:04.000 --> 20:13.000
And so so with this idea we have this possibility of actually doing this thorough scan when needed.

20:13.000 --> 20:42.000
So the typical great typical software jack gives us four five more minutes normally and and in this time we we can we can perfectly find two sandboxing you can do sandboxing in in one minute this is what what all the commercial vendors are saying but but there are all those tricks for for sandboxing so like.

20:43.000 --> 20:53.000
One of those trorians is doing nothing for the first two minutes or so so so you you may you want to invest some some more seconds in that.

20:56.000 --> 20:59.000
Yeah that's already said as I think.

20:59.000 --> 21:08.000
So when when the email comes back at the second time we require is a second stage scanner again and getting a full report based on you.

21:09.000 --> 21:13.000
On this report we can then decide what to do with the email.

21:14.000 --> 21:17.000
We can still reject it we can accept it.

21:19.000 --> 21:21.000
How we like.

21:21.000 --> 21:34.000
Yeah also expected data could be pushed back into the second scan of our spendee like IP addresses domains address and everything and could be then queried again against.

21:34.000 --> 21:36.000
That's what we're talking about.

21:38.000 --> 21:40.000
Yeah last slide it it seems.

21:42.000 --> 21:47.000
What what's the status of this product we have implemented it is running in production.

21:48.000 --> 21:53.000
The second stage scanner let's say it's it's a productive POC.

21:54.000 --> 21:56.000
It's it's running nicely.

21:57.000 --> 22:11.000
But let's say sandboxing was more effective in the in the time of a motive currently we we don't we don't find so many threats what all.

22:12.000 --> 22:18.000
We have found not many threats which are not also covered by anti virus.

22:19.000 --> 22:22.000
But let's see what's what's happened in the next time again.

22:23.000 --> 22:25.000
Thank you.

22:28.000 --> 22:29.000
Okay.

22:29.000 --> 22:33.000
Any registration here one two three four okay five.

22:35.000 --> 22:37.000
I'm waiting for.

22:38.000 --> 22:39.000
Okay.

22:39.000 --> 22:40.000
You will.

22:40.000 --> 22:41.000
Yes.

22:41.000 --> 22:55.000
So the question whether we defer mail and keep it why we scan for longer time so we can.

22:56.000 --> 23:02.000
And if and the question if that is legal comes up so basically we haven't decided.

23:03.000 --> 23:06.000
Yet if this actually is a proper email.

23:06.000 --> 23:09.000
Yeah so during the scan process.

23:10.000 --> 23:21.000
We are during the email process we we're waiting until the data and then we can see if we have the early indicators and if we haven't decided yet then we say we don't know come back later.

23:22.000 --> 23:26.000
And if they're not coming back later we're not doing anything with that email we'll never be delivered.

23:27.000 --> 23:31.000
Yeah so we will not deliver an email that we say we did we we defer.

23:32.000 --> 23:34.000
Yeah so in that sense.

23:35.000 --> 23:36.000
Yes.

23:45.000 --> 23:50.000
Maybe the question was we we if deferment mail and we keep the data in our memory.

23:51.000 --> 23:52.000
Yeah.

23:52.000 --> 23:53.000
Yeah.

23:53.000 --> 23:54.000
So we have.

23:55.000 --> 24:02.000
Of a hardest which is running full and it's a time and it's it's a temp file is is staying there it's.

24:03.000 --> 24:06.000
It's a state so when you when you really want to go into it.

24:07.000 --> 24:09.000
It's it's a.

24:10.000 --> 24:11.000
I'll say.

24:12.000 --> 24:13.000
Great area maybe.

24:14.000 --> 24:15.000
Yeah.

24:16.000 --> 24:17.000
So.

24:18.000 --> 24:20.000
Yeah but this just about the security.

24:23.000 --> 24:35.000
So that's if you want to go through the second stage and then we can second question is can you have a few words and certainly try this in.

24:35.000 --> 24:38.000
Concerns as far as sharing.

24:39.000 --> 24:48.000
Okay so the first question was what fraction of emails go to the second stage scanner and then the second was.

24:48.000 --> 24:54.000
if we can share a few words regarding privacy with fuzzy hashes.

24:54.000 --> 24:57.000
So, this is configurable.

24:57.000 --> 25:00.000
We can see what kind of indicators we want.

25:00.000 --> 25:02.000
So, definitely we will not want everything.

25:02.000 --> 25:06.000
It's an email that looks like perfect ham should not go.

25:06.000 --> 25:08.000
And these thresholds are configurable.

25:08.000 --> 25:12.000
And it is up to us to decide what needs another look.

25:12.000 --> 25:16.000
The question is, what's the typical order of magnitude for this?

25:16.000 --> 25:20.000
The second stage is going to be invoked.

25:20.000 --> 25:28.000
So, what we have done, we never sent attachments which are on the bend list.

25:28.000 --> 25:35.000
Also, there were many white listed ones like signatures and stuff.

25:35.000 --> 25:43.000
And we sent over there what can be possibility for being dangerous.

25:43.000 --> 25:49.000
Office files, PDFs and all this stuff.

25:49.000 --> 25:52.000
We are saying that I do use that we can production.

25:52.000 --> 25:56.000
So, that's some data of all of this stuff.

26:00.000 --> 26:07.000
We sent it of mail is being sent there to the second stage of incoming.

26:07.000 --> 26:08.000
I don't know.

26:08.000 --> 26:12.000
We can get back to you with the statistics there.

26:12.000 --> 26:16.000
And the second question, and maybe also the last because running out of time,

26:16.000 --> 26:24.000
will be a words on privacy with these fuzzy hatches.

26:24.000 --> 26:32.000
I think you cannot calculate back what's what the text was.

26:32.000 --> 26:34.000
But the privacy is always an issue.

26:34.000 --> 26:40.000
You have to think about if you query data against

26:40.000 --> 26:42.000
the external databases.

26:42.000 --> 26:50.000
So, when you do query hatches, attachment hatches against the spam house area,

26:50.000 --> 26:52.000
it's the same.

26:52.000 --> 27:01.000
So, spam in the end, spam house could have view where this may come from.

27:01.000 --> 27:09.000
If you simplify and set it to me, and we have both the same area, then.

27:09.000 --> 27:16.000
So, the database then knows that you have sent me something.

27:16.000 --> 27:17.000
Okay.

27:17.000 --> 27:19.000
Two very quick questions for you.

27:19.000 --> 27:27.000
The second stage, we would like you to go fishing.

27:27.000 --> 27:28.000
Yes.

27:28.000 --> 27:29.000
Yes.

27:29.000 --> 27:30.000
Yes.

27:30.000 --> 27:34.000
The question was if we have for the second stage,

27:34.000 --> 27:37.000
would you say that QR code fishing?

27:37.000 --> 27:38.000
Yes.

27:38.000 --> 27:39.000
Yes.

27:39.000 --> 27:44.000
There was all this in QR codes, OCR, and so on.

27:44.000 --> 27:49.000
This was what this was called Cortex, and so.

27:49.000 --> 27:52.000
They are those multi-file analyzer, say,

27:52.000 --> 27:55.000
they have plug-ins for everything, like that.

27:55.000 --> 27:56.000
So, yeah.

27:57.000 --> 28:04.000
So, send this to generate the body at the same time.

28:04.000 --> 28:09.000
So, if you can give them a four-minute error,

28:09.000 --> 28:13.000
you would come back half an hour later with a ticket for it.

28:13.000 --> 28:18.000
Just that, though, off your.

28:18.000 --> 28:23.000
The question was when, when the email is generated on the fly,

28:23.000 --> 28:28.000
and the body is maybe different on the next try,

28:28.000 --> 28:31.000
when we, when we're software eject the email,

28:31.000 --> 28:36.000
the second stage scanner is mostly based on attachments.

28:36.000 --> 28:37.000
Okay.

28:37.000 --> 28:38.000
Okay.

28:38.000 --> 28:39.000
Thank you very much.

28:39.000 --> 28:40.000
Yes.

28:40.000 --> 28:41.000
Thank you.

