WEBVTT

00:00.000 --> 00:10.000
So, thanks everyone for coming. I hope you can hear me.

00:10.000 --> 00:20.000
Okay, I'll try to, but my name is Selwood and I'm the author of our Spondy and I've spent quite

00:20.000 --> 00:27.000
a significant part of my professional career walking with email spam.

00:27.000 --> 00:36.000
In this presentation, I will talk mostly about using of LLM for spam detection as was announced.

00:36.000 --> 00:43.000
So, it's mostly a kind of experimental feature. I'm not quite sure that it will be suitable for everyone,

00:43.000 --> 00:49.000
but it might be interesting as an experiment and interesting in terms of possibilities.

00:49.000 --> 00:54.000
But I would like to start my presentation from this slide.

00:54.000 --> 01:02.000
Because when I walked in central of London, there is a very ancient medieval cemetery in London.

01:02.000 --> 01:07.000
And I walked through it quite a lot of times and I've seen this tomb.

01:07.000 --> 01:10.000
And this is tomb of Thomas Baes.

01:10.000 --> 01:19.000
And I'd like to say that without Baes, we would have no kind of proper content on this poem.

01:19.000 --> 01:26.000
And that was the first ability to filter emails based on their content.

01:26.000 --> 01:39.000
And that's why I think that this mathematician was very important for all development of machine learning of statistics

01:39.000 --> 01:51.000
and stuff like that. And in particular, in email, that was the first example of their content filtering.

01:51.000 --> 01:59.000
So, many of you may be heard about the article of Paul Graham, about the note of spam.

01:59.000 --> 02:04.000
And he mentioned that Baes is an ancient.

02:05.000 --> 02:11.000
So, Baes' engine is actually one of the examples of supervised learning.

02:11.000 --> 02:17.000
So, he needs to train it and without training it's useless.

02:17.000 --> 02:26.000
But the main advantage of Baes and why I like Baes for content filtering is that it can easily adopt for new content.

02:26.000 --> 02:35.000
So, the idea is quite simple and I will talk about it soon because I want to explain how it works in nutshell,

02:35.000 --> 02:41.000
because it might be useful to understand why I was interested in L.M.

02:41.000 --> 02:51.000
Baes is really very fast. So, I would say that it's one of the fastest methods to kind of do classification.

02:51.000 --> 03:04.000
And apparently it's very cost effective. So, you can just store your database, query it, and you have completely nothing to do some sort of computational work.

03:04.000 --> 03:09.000
So, let me explain very quickly about how it works.

03:09.000 --> 03:16.000
Maybe, in example, first from the, but let's imagine that we have like three samples.

03:16.000 --> 03:26.000
And in each sample we have like maybe other tokens, but we have something like Best Vagra, Best Beesius, Best Efforts, Best Beesius, and again Best Beesius,

03:26.000 --> 03:30.000
because Best Beesius is one of the common modes that is met in L.

03:30.000 --> 03:40.000
Then we kind of try to train that, and each tokens combination, which was combination, like two modes is one token.

03:40.000 --> 03:50.000
In other ways, and engines, there are sometimes one mode, sometimes more than one mode, but basically there is no significant difference.

03:51.000 --> 03:59.000
And then what you do is actually you count number of these tokens that were met in spam and ham.

03:59.000 --> 04:08.000
That's it. So, the whole your base and database is actually some representation of tokens and the counts, nothing else.

04:08.000 --> 04:11.000
So, as I've said, it's very simple.

04:11.000 --> 04:18.000
And if you want to calculate something like, we have another like fourth email with Best Beesius.

04:18.000 --> 04:22.000
So, how can we detect what's token is it?

04:22.000 --> 04:30.000
So, the formal is quite simple. So, we basically just summarize weights and divided by number of loans,

04:30.000 --> 04:35.000
because the more loans we have, the less relative weight we have.

04:35.000 --> 04:45.000
So, basically for Best Beesius, we again have weight of normalised weight for one and one for ham and spam.

04:45.000 --> 04:49.000
So, basically what does it mean? It means that this token is neutral.

04:49.000 --> 04:55.000
So, we can't say if it's ham and spam, but the combination of tokens is very important.

04:55.000 --> 05:01.000
So, after all, we get some sort of number for spam and number for ham.

05:01.000 --> 05:09.000
And then you can apply something like bison probability, versky square distribution.

05:09.000 --> 05:19.000
So, there are lots of statistical methods about how to get from counts from frequencies to probabilities.

05:19.000 --> 05:22.000
And that's it. So, the method is very simple.

05:22.000 --> 05:30.000
And the database is very simple as well. So, it just stored a number of equivalences and nothing else.

05:31.000 --> 05:40.000
But as you see, the bison classifier doesn't actually know about the context anything.

05:40.000 --> 05:45.000
So, for example, it just operates with tokens, nothing else.

05:45.000 --> 05:54.000
And that's a bit problematic. So, I'll explain all problems that I personally had with bias.

05:55.000 --> 06:00.000
So, the first problem is that bias is very good for your personal email.

06:00.000 --> 06:10.000
Why? Because, for example, you have some sort of ham that you receive from your trusted kind of friends, relatives, colleagues, mailing lists,

06:10.000 --> 06:14.000
maybe some social networks internet.

06:14.000 --> 06:20.000
But that's the variety of ham is actually quite low for each individual person.

06:20.000 --> 06:26.000
And that's actually very good for bias because basically you have your own topics.

06:26.000 --> 06:33.000
So, for example, I have my own interests and my own databases quite different from other databases.

06:33.000 --> 06:40.000
But for me, it's good. So, personal biases are really very effective and straightforward approach.

06:40.000 --> 06:48.000
But it doesn't understand emails. So, for example, how can I distinguish fishing email from PayPal from the real PayPal?

06:49.000 --> 07:00.000
Because their content will be very similar, but with some distinctive capabilities that won't be visible for bias in the vast majority of the cases.

07:00.000 --> 07:11.000
But we are trying to avoid this issue by adding not only content, as I have shown in the examples like words and text.

07:11.000 --> 07:20.000
Also, some sort of meta tokens like number of attachments, the order of headers, a number of received headers and so on and so forth.

07:20.000 --> 07:27.000
And these are all features that might be useful to distinguish spam and not spam.

07:27.000 --> 07:30.000
As I've said, the fishing is not so good.

07:30.000 --> 07:37.000
And another problem of bias engine that I've seen loads of complaints about bias in many scenarios.

07:37.000 --> 07:40.000
But the problem is that bias requires learning.

07:40.000 --> 07:46.000
And if you stop learning, then it stops kind of being actual.

07:46.000 --> 07:58.000
And you need also to expire all tokens because basically how it's useful to store tokens for spam messages like from the previous year.

07:58.000 --> 08:01.000
It doesn't make any sense.

08:01.000 --> 08:13.000
And another big problem for example, for large installations is that your bias and engine for different people like from HR department.

08:13.000 --> 08:19.000
And let's say engineering their bias and that the bias should be completely different.

08:19.000 --> 08:27.000
And what is spam for engineers will be very useful emails for let's say HR marketing department.

08:27.000 --> 08:35.000
And this is one of the biggest issues to use bias in many scenarios.

08:35.000 --> 08:38.000
Another problem is languages.

08:38.000 --> 08:54.000
So multilingual emails are kind of like a big problem because you can have loads of emails in let's say English like you have some sort of company.

08:54.000 --> 08:58.000
And all Chinese emails are usually spam.

08:58.000 --> 09:02.000
But then you hire some person who is modeling which is Chinese.

09:02.000 --> 09:13.000
And his messages and his correspondence will be automatically detected as spam just because bias and engine is kind of skewed towards spam for this language.

09:13.000 --> 09:16.000
And this is also a problem.

09:16.000 --> 09:21.000
Okay, so what I was trying to solve with the languages model.

09:21.000 --> 09:28.000
So when they were announced, I first tried to test chat GPT to detect some of emails.

09:28.000 --> 09:33.000
And I was quite impressed with it quality at the first time.

09:33.000 --> 09:38.000
And that's why I decided to do some sort of experiments and they are quite simple.

09:38.000 --> 09:45.000
So one big advantage of a large language models is that they are kind of unsupervised learning.

09:45.000 --> 09:47.000
So you don't need to train it.

09:47.000 --> 09:51.000
You just need to get model with all knowledge.

09:51.000 --> 09:54.000
So many models are now open source with all weights.

09:54.000 --> 09:59.000
You can run it on your own even this laptop, not this I think.

09:59.000 --> 10:05.000
And you don't need to train them explicitly for your content.

10:05.000 --> 10:14.000
Another advantage that I found that a large language models, especially with many parameters, are very good to work with multiple languages.

10:14.000 --> 10:22.000
And that was quite impressive because I don't understand what this message means, but large model knows that.

10:22.000 --> 10:26.000
And surprisingly they were quite efficient for fishing.

10:26.000 --> 10:35.000
So at the first time I was very skeptical about their abilities to find fishing, but they were definitely better than bias.

10:35.000 --> 10:50.000
And another thing that these models can really be useful for not only binary classification, but providing other services I will explain with it later.

10:50.000 --> 11:04.000
So I will little bit explain how we send data to large language model because you can't actually send the email as is because it will be just inefficient.

11:04.000 --> 11:08.000
So first of all we get some meaningful part.

11:08.000 --> 11:13.000
So if it's multi-part, we have StrayMail, if it's just steps to each track text.

11:13.000 --> 11:21.000
If we have some large StrayMail attachment and relatively small text parts, we use this attachment.

11:21.000 --> 11:30.000
So there is some sort of heuristics about how to find the content and message because that's not very obvious in some cases.

11:30.000 --> 11:37.000
And also add some additional data like from name, subject, euro and email domains.

11:37.000 --> 11:48.000
And I'm thinking about adding more metadata such as HTML structure because it might be very useful for model to get more input.

11:48.000 --> 11:55.000
Because when I added like subject, the quality of filtering improved by like 10% as far as I remember.

11:55.000 --> 12:00.000
And that was quite impressive. So adding more data is useful there.

12:00.000 --> 12:08.000
And then I've written a simple utility that gets to folders with Hammond Spum.

12:08.000 --> 12:15.000
Then shuffle, mix them and get some sort of part of it is used for training.

12:15.000 --> 12:19.000
And pass a view for testing or cross validation.

12:19.000 --> 12:23.000
But in terms of unsupervised learning, we don't need training at all.

12:23.000 --> 12:29.000
So we just use this cross validation or test set and exclude training step completely.

12:29.000 --> 12:32.000
So for example, that word results for bias.

12:32.000 --> 12:36.000
This is published in the internet. So you can check it in the side.

12:36.000 --> 12:39.000
And actually there results are quite good.

12:40.000 --> 12:52.000
But this corpus, it's very simple because this experiment was with the corpus where spam messages were quite similar.

12:52.000 --> 12:56.000
There were quite a lot of duplicates for both spam and ham.

12:56.000 --> 13:02.000
So I've used like complaints for spam and complaints for positive, actually, for this.

13:02.000 --> 13:08.000
So this corpus is quite good example of what you can see in a typical company.

13:08.000 --> 13:14.000
So it's not like a personal email, but some large company with lots of emails, lots of spam.

13:14.000 --> 13:17.000
But the spam is quite similar, that's important.

13:17.000 --> 13:21.000
So as you can see, the results are quite decent.

13:21.000 --> 13:28.000
But as I've said, the result for this training might be different for tomorrow's one.

13:28.000 --> 13:37.000
This was a model from, that was at the moment the most advanced model GPT for all.

13:37.000 --> 13:46.000
It was quite expensive. And again, the results are a bit lower than bias, but you don't need training.

13:46.000 --> 13:55.000
And there was some strange thing, and actually it was quite common for all other experiments.

13:55.000 --> 14:00.000
There are more false positives than false negatives.

14:00.000 --> 14:07.000
So for some good messages, LN decides that it's likely spam.

14:07.000 --> 14:15.000
I think it's quite bad. So for example, if you compare with bias, it has less false positives than false negatives.

14:15.000 --> 14:20.000
And that's quite interesting thing.

14:20.000 --> 14:25.000
But the time was completely awful comparing to bias.

14:25.000 --> 14:29.000
So it's like 12 seconds for the whole corpus.

14:29.000 --> 14:38.000
And like 300 seconds for a limit, only 90 percent of course you fight.

14:38.000 --> 14:42.000
Okay, so I don't quite a lot of results.

14:42.000 --> 14:48.000
So you can probably check slides after this talk. I'm not going to concentrate on this.

14:48.000 --> 14:54.000
But what you can see that the LAPS time is kind of fluctuating.

14:54.000 --> 15:02.000
I'll stop on it a bit late, but actually the problem is that it's slightly unpredictable.

15:02.000 --> 15:04.000
But overall numbers are quite good.

15:04.000 --> 15:11.000
Again, false positives are prevailing mostly on false negatives in the vast majority of the cases.

15:11.000 --> 15:17.000
And actually there is no significant difference between cheap models and expensive models.

15:17.000 --> 15:21.000
I've also tried my own laptop like Mac.

15:21.000 --> 15:30.000
There are small Lama 3 billion parameters on the GPU of Mac, so without any acceleration, without anything.

15:30.000 --> 15:35.000
And that results will completely rubbish to those.

15:35.000 --> 15:44.000
And I was actually quite surprised that in many cases it tried to do completely different things like

15:44.000 --> 15:49.000
I've asked to classify the spam or not and try to translate it, for example.

15:49.000 --> 15:51.000
So I don't know.

15:51.000 --> 15:57.000
But again, that might be fixed and I will discuss it a little later.

15:57.000 --> 16:03.000
But just small models are not very good actually in sparmclicifying.

16:03.000 --> 16:08.000
They are quite attractive because you can run it and come only to hardware, you don't need anything.

16:08.000 --> 16:12.000
But there results are not too impressive.

16:12.000 --> 16:14.000
Right.

16:14.000 --> 16:22.000
So in any cases, Lama is much, much slower than by real it.

16:22.000 --> 16:33.000
It's not something that you can use for like a thousand messages per second, unless you are very rich and have a cluster of GPUs, but probably it's.

16:33.000 --> 16:37.000
Also, it's not a silver ballet, so it can be done mistakes.

16:37.000 --> 16:47.000
And from my personal opinion, these mistakes are very common and they are like maybe human-like.

16:47.000 --> 16:56.000
So if I ask somebody from this room to do this classification manually, I think nobody can do it a bit hundred percent accuracy.

16:56.000 --> 17:05.000
Because sometimes it's very difficult to distinguish spam and spam emails, especially if you have no previous context sort of.

17:05.000 --> 17:21.000
So for example, if I can provide like full thread of messages to LLM and probably I can do it at some extent, it's very easy to reduce false positives for like replies and stuff like that.

17:21.000 --> 17:32.000
But on the other hand, if you know that this message is reply from mailing list, we don't need to kind of filter it because we already know that it was replied.

17:32.000 --> 17:47.000
So the majority of errors will surprisingly own exotic language such as Arabic language, I don't know what was that spam from because that was just like completely unknown to me message.

17:47.000 --> 17:57.000
Some messages in Chinese, so I really don't understand why they were not classified as spam because I can't receive message in Chinese because I don't know the language.

17:58.000 --> 18:04.000
And again, this is because models don't know the context.

18:04.000 --> 18:15.000
Also surprisingly, lots of my technical discussions about spam filtering will classify the spam, but probably.

18:15.000 --> 18:24.000
And one of the main problems is that really caused money and they are very, very greedy in terms of resources.

18:24.000 --> 18:32.000
So I was thinking that mining of group the currency is bad for environment, it's like using of wasting of energy.

18:32.000 --> 18:38.000
But I think that spam filtering using LLM is even worse.

18:38.000 --> 18:44.000
So they are really quite greedy.

18:44.000 --> 18:49.000
And one of the biggest issues with LLM is really uncertainty.

18:49.000 --> 19:01.000
It provides unstable results. So for example, especially lower hyperparameters models are very bad in terms of stability.

19:01.000 --> 19:07.000
So from one invocation to another invocation, the result might be completely different.

19:07.000 --> 19:15.000
Yes. Also, if you use some sort of cloud sources, then it might have unstable latency.

19:15.000 --> 19:22.000
So for example, when there was a big hype about deep seek model, I couldn't use deep seek model as this.

19:22.000 --> 19:26.000
I had to use distilled model of train it on deep seek.

19:26.000 --> 19:35.000
But the latency can vary. So for example, in night, your latency can be one second, like big time it can be ten seconds.

19:35.000 --> 19:39.000
And it's completely unusable in many cases.

19:39.000 --> 19:51.000
And also quite interesting that many models, especially again, with lower number of hyperparameters, can output some crepence that of Gson.

19:51.000 --> 19:59.000
In spite of the fact that you explicitly ask to get Gson with specific attributes.

19:59.000 --> 20:07.000
Then another problem is privacy. So of course, your messages are scanned by some blackbooks engine.

20:07.000 --> 20:11.000
And even worse, your messages can be used to train these blackbooks.

20:11.000 --> 20:14.000
And apparently they can leak sensitive data.

20:15.000 --> 20:21.000
Just after that experiment, I've created a sort of anonymizer that tries to remove personal stuff.

20:21.000 --> 20:28.000
Just to try this experiment to do that.

20:28.000 --> 20:32.000
But after all, I even added some sort of GPT anonymizer.

20:32.000 --> 20:37.000
So I've asked one GPT and tried locally to anonymize message there.

20:37.000 --> 20:43.000
So yeah, that's crazy, it is.

20:43.000 --> 20:47.000
Also, I tried to do some sort of consensus model.

20:47.000 --> 20:56.000
So instead of getting some expensive model with many parameters, I've tried to use multiple models.

20:56.000 --> 21:01.000
I think that was four models.

21:01.000 --> 21:04.000
Or one mini hi-coup.

21:04.000 --> 21:05.000
Oh, sorry.

21:05.000 --> 21:10.000
And deep-s, no, deep-sick.

21:11.000 --> 21:17.000
Anyway, so these are results from consensus of three models.

21:17.000 --> 21:24.000
And in case if some model is bad in terms of hallucinations or Gson reply or whatever,

21:24.000 --> 21:27.000
the consensus is usually good.

21:27.000 --> 21:32.000
So yes, the percentage of classified is much higher than for each individual model.

21:32.000 --> 21:38.000
And the other results are again slightly better, but nothing again.

21:38.000 --> 21:39.000
Very good.

21:39.000 --> 21:43.000
One thing is actually training bias.

21:43.000 --> 21:46.000
And that was another corporal, actually.

21:46.000 --> 21:54.000
And in this case, as you see, the number of good specifications of spammer is very bad.

21:54.000 --> 21:57.000
Because the spammer is really high variety.

21:57.000 --> 22:00.000
And bias is not so good in this.

22:00.000 --> 22:04.000
So this was the corporal of my personal emails for some years.

22:04.000 --> 22:07.000
And spam is really completely different.

22:07.000 --> 22:09.000
There are no duplicates, nothing.

22:09.000 --> 22:12.000
And this corporal is quite good in terms of representation.

22:12.000 --> 22:18.000
But it definitely shows the problem of bias that it's not good in terms of high

22:18.000 --> 22:21.000
cardinality of hyperrides input.

22:21.000 --> 22:23.000
Yes.

22:23.000 --> 22:26.000
But again, it's very fast.

22:26.000 --> 22:31.000
And it's quite good for learning for using of spam.

22:31.000 --> 22:39.000
And actually, if you train bias by LM's, you can set a separate instance,

22:39.000 --> 22:44.000
where you can mirror some portion of traffic, classify it with LM and train bias,

22:44.000 --> 22:48.000
and you get good from both worlds.

22:48.000 --> 22:54.000
Also, I have quite a few, quite a little time by.

22:54.000 --> 22:57.000
This one was also quite interesting.

22:57.000 --> 23:04.000
So in many cases, it's very difficult to explain a user why this message was classified as spammer ham.

23:04.000 --> 23:10.000
In this thing with LM, you can actually add some sort of reasoning.

23:10.000 --> 23:12.000
And this reasoning is really very good.

23:12.000 --> 23:15.000
And it's very good in terms of human reliability.

23:15.000 --> 23:19.000
So you can actually tell what this message is.

23:19.000 --> 23:22.000
And why it was classified as spam.

23:23.000 --> 23:27.000
So for example, this is my recent fishing from yesterday.

23:27.000 --> 23:30.000
I received this message in in train.

23:30.000 --> 23:35.000
And I immediately asked GPT about why is it bad.

23:35.000 --> 23:41.000
Again, that's quite decent explanation about fishing.

23:41.000 --> 23:44.000
So quite a good tool.

23:44.000 --> 23:45.000
Okay.

23:45.000 --> 23:47.000
I don't have any time.

23:47.000 --> 23:49.000
So let's go to questions.

23:50.000 --> 23:55.000
Thank you.

23:55.000 --> 23:57.000
So very quick answer.

23:57.000 --> 24:00.000
And please try to repeat the question.

24:00.000 --> 24:05.000
I'm going to say that.

24:05.000 --> 24:06.000
Are you?

24:06.000 --> 24:08.000
What do you think that the future of this thing is,

24:08.000 --> 24:14.000
AI actually in the lens are actually going to be future of anti-spend or

24:14.000 --> 24:18.000
are they going to be the future of actually spent creation?

24:19.000 --> 24:21.000
Well, that's a very good question.

24:21.000 --> 24:24.000
So is it a good for spam creation?

24:24.000 --> 24:27.000
I think it's a very good for spam creation.

24:27.000 --> 24:31.000
Especially there were a good in terms of creativity spam.

24:31.000 --> 24:35.000
So that's quite a boring thing.

24:35.000 --> 24:40.000
And I think I've heard some talks about that even.

24:40.000 --> 24:43.000
Some other conferences.

24:44.000 --> 24:50.000
Yes, I think this tool can be used for both of things as usually.

24:50.000 --> 24:53.000
It can be used for good and for bad.

24:53.000 --> 24:56.000
Can you see in our spending what the future of our spending

24:56.000 --> 25:00.000
depending itself from the coming way of AI spam?

25:00.000 --> 25:03.000
As usually there are multiple criteria.

25:03.000 --> 25:08.000
So if you decide spam harm based on content only,

25:08.000 --> 25:10.000
I think that's not enough.

25:10.000 --> 25:14.000
But if you are at here like IP addresses,

25:14.000 --> 25:18.000
like reputation, like DMARG,

25:18.000 --> 25:21.000
the team and all good stuff that will use in the past.

25:21.000 --> 25:24.000
And if you combine that with kind of analysis,

25:24.000 --> 25:28.000
and if you combine that with some meta data like,

25:28.000 --> 25:31.000
you know, number of messages structure,

25:31.000 --> 25:35.000
because actually elements are very good in finding content

25:35.000 --> 25:37.000
that is generated by elements.

25:37.000 --> 25:40.000
So that's also quite a use case.

25:40.000 --> 25:44.000
So if you use some sort of traditional methods and

25:44.000 --> 25:47.000
content based on the analysis,

25:47.000 --> 25:51.000
you can actually try to do something better.

25:51.000 --> 25:54.000
I think, yes.

25:54.000 --> 25:57.000
There you go ahead.

25:57.000 --> 26:03.000
If you use supervised learning to modify the thoughts

26:03.000 --> 26:15.000
about learning of elements.

26:15.000 --> 26:21.000
So I've planted it into the future walk and stuff like that.

26:21.000 --> 26:28.000
But the idea is really to train low parameters LM,

26:28.000 --> 26:31.000
with high parameters LM.

26:31.000 --> 26:33.000
That's also possible.

26:33.000 --> 26:37.000
So you can't train not only bias, but another model.

26:37.000 --> 26:41.000
You can also improve the prompt as you've suggested.

26:41.000 --> 26:43.000
And actually I've tried to do it.

26:43.000 --> 26:47.000
But I've not found any significant differences in that.

26:47.000 --> 26:52.000
Because the prompt is not so important in terms of,

26:52.000 --> 26:54.000
you know, detection because the prompt is simple.

26:54.000 --> 26:57.000
So you have a message and you need to detect spam harm.

26:57.000 --> 27:02.000
And the only thing that was useful was adding actually reasoning.

27:02.000 --> 27:08.000
So in theory, adding reasoning should not affect the probabilities and the accuracy

27:08.000 --> 27:09.000
and stuff like that.

27:09.000 --> 27:17.000
But surprisingly adding just the reason was beneficial for the raw results.

27:17.000 --> 27:21.000
But it wasn't suggested by LM.

27:21.000 --> 27:24.000
It was like my idea.

27:24.000 --> 27:28.000
So one final question, let's take a look at the results.

27:28.000 --> 27:31.000
In my words, I can only experiment such as this.

27:31.000 --> 27:34.000
If there's only one thing I can do about LM,

27:34.000 --> 27:39.000
I'll ask it to raise the following email on the state.

27:39.000 --> 27:42.000
I'm seeing the spam to find it.

27:42.000 --> 27:45.000
And the results were quite spectacular.

27:45.000 --> 27:48.000
And it already included the reasoning without me even.

27:48.000 --> 27:51.000
So that was 3 billion, right?

27:52.000 --> 27:56.000
Yes. So that was 3 billion model.

27:56.000 --> 28:01.000
And my contract question is what language he used?

28:01.000 --> 28:03.000
The email is right.

28:03.000 --> 28:04.000
And English.

28:04.000 --> 28:08.000
So that's the biggest difference because I've tried multiple languages.

28:08.000 --> 28:11.000
In English, the results were quite decent.

28:11.000 --> 28:14.000
In other languages, nothing good.

28:14.000 --> 28:18.000
And I think that's the main disadvantage of low parameters models

28:18.000 --> 28:23.000
that they actually don't have multiple languages

28:23.000 --> 28:26.000
at the same level as larger models.

28:26.000 --> 28:31.000
So yes, I understand, and actually I've tried local models

28:31.000 --> 28:36.000
and I've tried to do them, but as I've said,

28:36.000 --> 28:41.000
I've decided not to continue just because of multiple English support.

28:42.000 --> 28:43.000
All right.

28:43.000 --> 28:45.000
Thank you very much.

