WEBVTT

00:00.000 --> 00:18.560
Hello everyone, so that's the third year we have this wonderful room and for them and that's

00:18.560 --> 00:28.320
the second year I'm going to talk about LMS in Sponsor 3, so from last year talk when I was

00:28.400 --> 00:37.760
talking about using LMS for Sponsor 3, year has passed and I would say that I'm still a bit

00:37.760 --> 00:45.920
skeptical about fully replacement of Sponsor 3, because it's still a bit unstable, it's still quite

00:45.920 --> 00:52.080
costly, so you can ask multiple times and get different opposite results and what's more important

00:52.720 --> 00:59.760
there are some privacy concerns that are not solved and for sure there are problems with explanation

00:59.760 --> 01:06.160
why this message got rejected or not, because you can't simply get into LMS that deeply

01:06.160 --> 01:13.680
to understand why this decision has been made, but for sure we have added more support for LMS in

01:13.680 --> 01:19.600
Sponsor 3, in particular we added LMS context, we added the search context, so it's

01:19.600 --> 01:26.960
in general possible to represent to LMS not only current message but also history of previous

01:26.960 --> 01:34.480
emails and it was quite useful, but it's still one thing, but what has changed in the past year

01:34.480 --> 01:40.880
is that development with LMS has changed dramatically, so it's now completely different thing,

01:41.680 --> 01:49.120
some people are thinking about it like replacement from assembly to compile language and when

01:49.120 --> 01:55.920
we previously has written code manually, now you can just write some prompt and get the code from

01:56.320 --> 02:05.280
it. So basically, in 2024, what did we have? We have code completion, we had compile it,

02:05.280 --> 02:14.240
we have that was quite good, so basically you write small things and get help from a system

02:14.240 --> 02:20.560
so that can check your code and suggest something good or not, but it worked in this way, so it was

02:20.640 --> 02:32.000
mostly assisted coding the outer completion, for sure it was mostly for one file edits and also

02:32.000 --> 02:37.120
it could do some documentation look up, so you start to type and you get documentation about

02:37.120 --> 02:44.400
this and this function, so in 2025 it's completely different stories, so basically you have

02:44.640 --> 02:56.720
your LMS fully assistant that writes code, that can design architecture, that can do multifile

02:56.720 --> 03:05.840
refactoring or things about it, writing tests, testing stuff, debugging things, debugging issues,

03:05.840 --> 03:13.680
fixing them, making reasoning, making commits, doing good workflow properly, so finally I get

03:13.680 --> 03:21.120
proper commit messages for example, that's a big, big, big action and for sure it can do lots of

03:21.120 --> 03:28.560
projects, so literally my current workflow is the following, so I have two console tools, a

03:28.560 --> 03:37.040
code and refactor droids and I'm using them for different models, so I'm using git work work work

03:37.120 --> 03:46.800
trees for doing stuff independently from the main branch and different models can work with different

03:47.840 --> 03:54.400
basically work trees and do their own stuff there isolated from the main environment, it's very useful

03:54.400 --> 04:02.000
because after all you get results, you review them carefully and you select what to get and also

04:02.000 --> 04:09.680
it's good run different subvegenes because it's a big say of tokens, big say of projects and

04:09.680 --> 04:16.320
if you work on some huge projects it's good to split it into small parts and refactor them independently

04:16.320 --> 04:23.760
in different models in different sub trees, but I guess many you are developing the LMS after some

04:24.240 --> 04:33.440
experiences and end up with something similar, so the basic the main gain and I think the main advantage

04:33.440 --> 04:39.440
of all these assisted coding is that you can prototype fast, so I have lots of ideas and one of the

04:39.440 --> 04:45.920
biggest issues is that some ideas are too large to implement, so I'm going to explain it a bit later

04:45.920 --> 04:53.600
about some examples of these large ideas, but now you can just do this, this and this and if you don't

04:53.680 --> 04:59.120
like you can just throw out the code and you have no regrets of it, that's a big thing

05:02.640 --> 05:08.400
and you can actually explore different things because if something doesn't work you can always go back

05:08.400 --> 05:12.800
and build something else and again you don't have a little regrets about miss at the

05:12.800 --> 05:19.360
opportunities about miss time and about miss movement of your fingers to write this code and that's

05:19.600 --> 05:25.680
very good, another big advantage is really better documentation, English is not my first language

05:25.680 --> 05:32.960
and I'm not very good in describing things and that's I think a problem for any author of software

05:32.960 --> 05:39.600
because you know your software better and you imply a lot of things that are not apparent to other

05:39.600 --> 05:44.480
people and you get angry if you get stupid questions but they're not stupid because basically it's

05:44.560 --> 05:50.000
your knowledge and if you apply your knowledge to this question it's stupid but it's not stupid

05:50.000 --> 06:00.480
because you're not explicitly defining it and one biggest help was automated boilerplate stuff

06:00.480 --> 06:07.920
for example the website's terrible without telling him he needs no uncibly need to

06:08.080 --> 06:14.000
to program in YAMO you need to do deployments and list deployments and respite line is

06:14.000 --> 06:20.640
good absolutely fixed by L and so if I get my server crashed or revoked or hacked I can easily

06:20.640 --> 06:28.080
rebuild them now because previously it was like secret knowledge of that there's nowhere to

06:28.080 --> 06:32.640
comment it and you have no documentation nothing like that because as I said I respond to this

06:32.640 --> 06:38.800
that there's a hobby project and mostly building it to something bigger building it to the

06:38.800 --> 06:45.520
fully open source project was a bit painful and mainly about site projects such as documentation

06:45.520 --> 06:56.560
analytics and stuff like that. Real disadvantage is that you are a bit out of context so you

06:56.560 --> 07:01.840
have multiple models that are doing their stuff and sometimes you don't even understand what you

07:01.840 --> 07:13.520
are doing now so yeah sometimes some conversations can be very difficult so I remember one issue

07:13.520 --> 07:20.720
got into reply from plot that it told me like you know it's from the better please advise me where

07:20.720 --> 07:28.720
to go so that was a very open recession and this was very very long thing and sometimes it

07:28.720 --> 07:36.640
starts to do very simple solutions so quite recently I've found some bug and tried to debug it

07:36.640 --> 07:43.040
and closed decided to let's compare pointer to function and if this pointer has a lot of pattern

07:43.040 --> 07:48.560
it's bad point so of course it's bad pointer but the question is not if it's bad or not because

07:48.560 --> 07:55.040
but the question is why it's bad things you need to we need to worry be very clear about that and for

07:55.120 --> 08:04.560
sure it's impossible to fully build large projects without human attention because AI attention

08:05.760 --> 08:11.440
it's a mathematical thing and it's quite limited I would say so LM is not an expert in

08:11.440 --> 08:19.680
response the LM is expert probably in writing code but not in years specifically yes and for sure

08:20.240 --> 08:30.720
you have the main expertise yes and one thing that it's quite important for the open source

08:30.720 --> 08:40.960
as a whole that actually your own value is a bit lower now because many people can actually

08:40.960 --> 08:47.760
check the same things so if you have a question about responding I think yesterday we asked

08:47.840 --> 08:54.320
one question from code and it found a wrong solution but when we fixed prompt something like

08:54.320 --> 09:00.240
check the documentation and do it properly it created a proper solution with like a multimapo

09:00.240 --> 09:06.960
these two conditions like get this from and get this recipient and get them matched very simple things

09:06.960 --> 09:12.960
like a simple rule that many people are doing in that day to day job to filter some specific

09:12.960 --> 09:18.720
spam or do some specific policies but now it's possible to do this code but unfortunately not

09:18.720 --> 09:24.960
directly so if you ask it to do like write me a rule first from day to do this in this and this

09:25.680 --> 09:37.440
usually it's it's wrong it's just wrong so actually what are not problems and what is usually

09:37.760 --> 09:42.800
addressed well and development is that code below is not a problem because if you write prompt carefully

09:42.800 --> 09:49.120
if you're a view carefully if you ask them to refactor bad places it's okay same as for LM slope

09:49.120 --> 09:54.800
so you can actually ask before commit to fix and clear everything it's okay hallucinations is

09:54.800 --> 10:01.200
again a problem but you know that it's it's crap and you can fix it again you can review and actually

10:01.200 --> 10:05.920
you can ask another model to check the documentation and check the code and do review and find

10:05.920 --> 10:12.720
these hallucinations and fix it so it's not a real problem so here is a short list of different

10:12.720 --> 10:19.760
prototypes that I've done over a year and not many of them are parts of our sponge itself because

10:19.760 --> 10:25.760
there are lots of things around how to manage documentation how to build new site how to build

10:25.760 --> 10:32.960
analytics how to write client integrations in different languages which is also quite interesting because

10:33.040 --> 10:40.000
for example I don't know go but many people will ask me about native clients for go and it was

10:40.000 --> 10:44.320
very simple to kind of a trust client to go client with all creepster stuff and stuff like that

10:45.040 --> 10:49.920
so it was quite good finally for a lot of them for example finally they have interfaces for

10:49.920 --> 10:59.840
our internal structure and that was quite useful so okay I'm fine about the first part and now I'm

10:59.840 --> 11:10.160
going to tell the detail about some of significant projects that were done over a year one thing that

11:10.160 --> 11:20.640
I was thinking for maybe like two years so far was adding fuzzy detection for HTML structure

11:21.840 --> 11:28.640
the problem is that many sponge is built from templates and these templates have different texts

11:28.960 --> 11:37.600
different basically addressing like deer somebody deer different text and stuff like that

11:37.600 --> 11:43.280
sometimes they have different personalized personalized links with the same domain but different

11:43.280 --> 11:51.520
hashes different queries and the idea was to build fuzzy hash on top of HTML structure only so get all

11:51.600 --> 11:58.880
tags extract features from that like waterfalls, waterfalls, water styles I used and what's more

11:58.880 --> 12:08.000
important that they also include so-called call-to-action domains again a cultivation thing is actually

12:08.000 --> 12:15.040
our wood is clicked by user so like wood domain get clicked from this email with some

12:15.120 --> 12:20.640
button like subscribers, subscribe, go, whatever because there are lots of links that are not

12:20.640 --> 12:25.440
relevant so for example you have a foot or something like facebook and policy and stuff like that but

12:25.440 --> 12:33.360
this links are relevant and we only check call-to-action links then we use the same algorithm for

12:33.360 --> 12:40.080
fuzzy crushes that we use shingles it's like me and hash trick so it's quite simple and true

12:40.720 --> 12:48.080
and then you find similarities between different shingles and compare how close it does

12:48.080 --> 12:54.400
some match things so you have some hash with shingles and you match them either directly if it's

12:54.400 --> 13:01.920
had exactly the same structure or a bit indirectly if we have some common things but not different

13:01.920 --> 13:07.440
but what's important that we use call-to-action for hash on everything so if call-to-action changes

13:07.600 --> 13:13.680
it's a different hash the reason for that was to distinguish fish from normal email because sometimes

13:13.680 --> 13:20.240
swimmers are just getting legit emails and just fix the main and that's it but we need to distinguish them

13:20.240 --> 13:27.760
from normal ones so why it's important because we can do proper white list thing with fuzzy

13:27.760 --> 13:36.960
crushes because we can get some brands shipping all the confirmations one time codes alerts and that's

13:37.040 --> 13:43.440
white listing part and that's important to have legit emails that are based for non-sanders no

13:46.000 --> 13:51.280
for example let's imagine like social network like Facebook link and then they have different

13:51.280 --> 13:57.280
content but they're templated and always the same and that's good to find stuff like that for

13:57.280 --> 14:02.240
white listing because for group sponsored and system white listing is even more important than

14:02.240 --> 14:08.720
black listing to be honest but for sure it's okay for black listing as well so we have some real

14:08.720 --> 14:17.280
life testing and it really works well with templated emails and they still exist and they're still

14:17.280 --> 14:25.120
widespread some examples of that. Lotures comes fishing template based fraud stuff like that

14:25.440 --> 14:36.880
so in general that was very good project and I would say other if I did it but full code on my own

14:36.880 --> 14:43.760
it would take maybe two or three times longer to implement now it's up and running and it's okay

14:46.240 --> 14:53.520
yes and city was just like a benefit because now we can get the real links and distinguish them from

14:53.520 --> 15:02.000
but from insignificant links. How do you get those? Oh sorry I don't know how to get back.

15:03.440 --> 15:10.160
So significant links are usually detected by belonging to specific classes like button

15:11.280 --> 15:21.440
or they are highly visible by CSS and as they are not in footer so we are trying to detect

15:21.600 --> 15:27.040
these by heuristics but again for sure it's set of sets of heuristics and it's possible to cheat

15:27.040 --> 15:32.880
them somehow but now Spanners are not trying to break it because probably it's just to you.

15:35.280 --> 15:42.480
Another project that was surprisingly failed to Google some of code project and to be honest

15:42.480 --> 15:47.760
I'm not quite sure how Google some of code is planning to proceed in this year for because of

15:47.840 --> 15:57.840
all of the phones. I really don't understand. So historically we had only Spanners' home classification

15:57.840 --> 16:04.640
and the idea was basically to extend these to multiclass and for sure the mathematics behind

16:04.640 --> 16:11.600
by his and theorem is not strictly married with binary classification you can do one versus all

16:12.480 --> 16:21.760
which is actually used or something else and it's good to check not only Spanners' home but also

16:21.760 --> 16:27.200
mark different other labels such as transactional and mail social and then emails personal emails

16:27.200 --> 16:37.680
and so on and so forth. And yes now we have multiple class and this actually is wrong. So I personally

16:37.680 --> 16:41.840
don't suggest to make Spanners' home in different other classes so I personally suggest to have

16:41.840 --> 16:47.920
different classifiers. One is binary for Spanners' home and one is binary for other labels.

16:49.920 --> 16:55.920
Same thing so basically you get corpses and trained them via common language so this is the whole

16:55.920 --> 17:01.840
architecture basically so you get message you extract and split it into big grams, add some

17:01.840 --> 17:09.360
meta tokens and then you have cash lookups on side of radius so it's all implemented as

17:09.360 --> 17:16.880
already scripts and on radius you just get all classes there and you get multiple results afterwards.

17:16.880 --> 17:24.640
So basically again two codes it's one code. So you call it once with all pipeline of tokens and you

17:24.960 --> 17:31.280
let write the scripts to get back weights and then you get this weights and basically get

17:31.920 --> 17:42.240
resulted from example it's Spanners. It's okay. Yeah so you can you can for sure do this so there is

17:42.240 --> 17:47.920
all tooling but for sure it's not like out of the box it's like a brand new feature and it takes

17:47.920 --> 17:53.760
some time to get into let's say mail call. If it's a mail call it gets extensively tested by

17:53.760 --> 18:04.000
many users. So it's good thing. Yeah okay I have not much time to talk about it but one thing that was

18:04.000 --> 18:12.960
a long standard project was to convert fuzzy to from UDP to TCP just basically to dodge

18:12.960 --> 18:21.760
headscanner, port scan, broken logic and in October it was big incident that affected thousands

18:21.760 --> 18:27.280
of users over the world and even I had to write a mail like you break the internet in the

18:27.280 --> 18:35.680
half of the world. Yeah and the idea was basically to switch it to TCP and it was also quite

18:35.680 --> 18:43.120
difficult and so for one side it was simple like let's convert some protocol from UDP to TCP

18:43.120 --> 18:48.800
but in general it was very difficult because you need to think about lots of things like how to switch

18:48.800 --> 18:55.040
like how to fall back what to do if your TCP connection is over how to frame things inside of

18:55.040 --> 19:01.840
TCP stream how to multiply between different connections so it was quite a big project but I hope

19:01.840 --> 19:08.880
in future it will be much much better. I've started this as my own project but I never finished

19:08.880 --> 19:15.520
the client part why because it's it's it's just a mess. You start to implement it and at some point

19:15.600 --> 19:23.680
you just lose all hopes to make it any how usable for production. I think it's a common sense

19:23.680 --> 19:30.400
when you're trying to refactor a big chunk of code and at some point you just give up. Yes

19:33.520 --> 19:40.640
and the final project hopefully I have five minutes left and that's one of the most interesting

19:40.640 --> 19:48.720
because not only because I've implemented it in the most recently but it was a talk in in

19:48.720 --> 19:55.920
the matrix room where my previous experiments about using embeddings for a classification will not

19:55.920 --> 20:04.000
so successful but I built more prototyping around it and what's more important I've revoked

20:04.080 --> 20:14.080
this part so previously it was two layers never let me be without normalization layers and

20:14.080 --> 20:21.440
this I think softmax at the end but now I have dropout layer and normalization layer fixed it from

20:21.440 --> 20:30.320
railroad to yellow change it arosines normalization so a lot of tunes around my own classification

20:30.320 --> 20:39.600
engine and it was very good so basically what we're doing now we get displayed depth part like

20:39.600 --> 20:45.920
if it's a HTML it's multi-part alternative we select a HTML part if it has some larger

20:46.000 --> 20:50.400
HTML attachment and nothing else we also select the attachment if you have all the text and

20:50.400 --> 20:57.840
the select text so basically select text part then we do so called smart three-minute text so for

20:57.840 --> 21:02.320
example if you have a long chain of replies and I've seen messages with megabytes of replies

21:02.320 --> 21:09.760
we trim all quotation we trim all replies the trim all features like with best regards name and

21:09.760 --> 21:18.640
stuff like that so we basically get the meaningful part of text then we get all other possible features

21:18.640 --> 21:27.840
for example like symbols and metatocans and for text we generate embeddings like part of LLM so

21:27.840 --> 21:35.760
it's just a compressor part and instead of text it outputs embeddings so that's the most expensive part

21:36.160 --> 21:42.640
else then we get all features from emails and we can use these features as a normalized

21:42.640 --> 21:50.960
vectors we fuse them all these features together normalize weight them and feed it to local

21:51.680 --> 21:59.680
automatically or manually learnt multi layer network with all this architecture and the results

21:59.680 --> 22:07.360
were very very good actually so the only thing is that LM embeddings are quite expensive so that's

22:07.360 --> 22:15.040
the only drawback but it's worse to consider that not only for classification but also for storing

22:15.040 --> 22:26.000
these tokens for field like full-text search semantic search whatever so tokens are important

22:26.080 --> 22:31.920
and what's more important is that this information is not personal data so you can store them

22:31.920 --> 22:38.480
and train on them and do all things afterwards without content of emails and that's where

22:38.480 --> 22:46.160
a good for privacy protection at least I believe so you can't actually get the message from tokens

22:46.160 --> 22:57.040
you yes so there are some results with it too testing independently with some of our

22:57.040 --> 23:06.480
sponge users so I've used it on my own corpus and that person has used it on his own corpus

23:06.480 --> 23:13.920
so I would say that the numbers are better than bias so bias is a couple of percent worse but what's

23:14.000 --> 23:21.520
more important is that this architecture I would say it's more more scalable and more

23:21.520 --> 23:29.360
interesting for future use and counting that embeddings models I usually move the language

23:30.320 --> 23:37.600
on multi-language case it will definitely outperform bias because bias can't distinguish between

23:37.840 --> 23:49.040
languages in general yes so I will also test it this architecture on CPU and yes GPU is

23:49.040 --> 23:53.760
ten hundred times faster especially if you do some sort of batching especially if you

23:54.800 --> 24:01.680
optimize models and GPU can handle multi-language models and what's more important you can actually

24:02.560 --> 24:08.640
even speed it up more by splitting it by languages so for example if you have mail from one

24:08.640 --> 24:14.240
specific language you can feed it by GPU and leave the rest to whatever you want or just skip it

24:14.960 --> 24:21.760
and this can be quite efficient yes but for personal service I think when this feature comes

24:21.760 --> 24:28.800
again to mail cow or stuff like that you'll be able to test it more extensively on CPU it's still

24:28.800 --> 24:35.680
viable but when I try it on my MacBook I decided to rent GPU just to preserve my cool ups

24:39.280 --> 24:46.320
yeah and one important thing for embeddings is the best table so when you when you get same

24:46.320 --> 24:55.200
message you get same embeddings and yes it's it's much easier to deploy a part of embeddings

24:55.200 --> 25:04.640
network land full oh and and I think that's it

25:13.840 --> 25:19.840
so if you have any questions

25:19.840 --> 25:39.680
what do you mean yeah maybe next year I won't be giving presentation that's my so for example

25:39.680 --> 25:47.120
this one the slides were created again by Claude so it started Apple script to build the slides and

25:47.120 --> 25:53.200
key note because that's the slides creation tool that I'm using and all diagrams are also kind of

25:53.200 --> 26:01.840
created by a lot of like types into diagrams that are presented here maybe next year I won't need to talk

26:01.840 --> 26:12.400
at all so maybe we have like last year of human human communications I don't know

26:13.360 --> 26:37.200
and other questions so in bias it's not it's bias it's the same kind of mathematical

26:37.200 --> 26:44.240
operations so when you reload you reduce weights of tokens in all other classes and increase weights

26:44.240 --> 26:50.720
of all tokens in one class so basically that's really learning and then if you adjust probabilities

26:50.720 --> 26:59.440
and operate on feel the pipeline of bias and classifier it will move to another class and then you

26:59.440 --> 27:06.640
just rely on smart exploration to remove all tokens and remove insignificant tokens and stuff like that

27:07.200 --> 27:11.680
but between different classifiers if that was question between spam and transaction for

27:11.680 --> 27:18.480
example we can't move message because for sure it will be a no-op classifiers are self-isolated

27:18.480 --> 27:26.080
between classifier so classifier produces results and if you have two classifiers you have two results