WEBVTT

00:00.000 --> 00:10.000
So, Sergey, now we'll talk about Lama Dieter.

00:10.000 --> 00:11.000
Yeah, hi, guys.

00:11.000 --> 00:16.640
I don't know how many of you heard about Lama Dieter so far, but please do not miss like

00:16.640 --> 00:22.000
there's Luming Dieter, that Mozilla is developing, so Lama Dieter, that's what we are

00:22.000 --> 00:23.000
building.

00:23.000 --> 00:26.000
So, we kind of started it a little bit early, but there is still a funny project with

00:26.000 --> 00:28.000
this defined in names that are really closed.

00:28.000 --> 00:32.000
Maybe we'll join one day sometime, but not at this moment.

00:32.000 --> 00:35.000
So, I will bring a more details about that.

00:35.000 --> 00:36.000
My name is Sergey.

00:36.000 --> 00:38.000
I work for a company called CyberGizer.

00:38.000 --> 00:46.000
We are building software and now doing a lot of funny stuff with AI in open source.

00:46.000 --> 00:52.000
And I will bring a couple of tools that we recently developing together with the AI

00:52.000 --> 00:53.000
Foundry.

00:53.000 --> 00:56.000
So, if you haven't joined, check out the AFoundry.org.

00:56.000 --> 01:03.000
We have a Discord, we have a lot of good reading stuff, so yeah, that's something that

01:03.000 --> 01:06.000
you can find useful.

01:06.000 --> 01:14.000
So, the first thing about the LLM as a judge concept, I don't know how many of you heard

01:14.000 --> 01:17.000
about LLM as a judge.

01:17.000 --> 01:21.000
Okay, some of you, let me give you a little introduction.

01:21.000 --> 01:29.000
So, LLM as a judge is when you use a model to assert the data or your source code

01:29.000 --> 01:30.000
or whatever.

01:30.000 --> 01:33.000
In this case, the target could be another model.

01:33.000 --> 01:40.000
So, you can use one model to assert prompts and quality of the another model.

01:40.000 --> 01:47.000
For example, using OpenAI to assert local Lama's running with Lama CPP.

01:47.000 --> 01:52.000
So, two building blocks that you might need for this one.

01:52.000 --> 01:55.000
So, the first project called Neko API.

01:55.000 --> 01:58.000
This is open source, API that is compatible with OpenAI.

01:58.000 --> 02:03.000
But at the same time, it opens interface to run local models.

02:03.000 --> 02:05.000
Do not change your production code.

02:05.000 --> 02:10.000
So, if you have an openAI, for example, then you can easily switch your development

02:10.000 --> 02:12.000
stuff using, let's say, Lama CPP.

02:12.000 --> 02:15.000
And in this case, interface of the application is going to stay the same.

02:15.000 --> 02:21.000
There is a few more use cases, but for building LLM as a judge, we're going to use this one.

02:21.000 --> 02:27.000
And the second one, which is Lama Gator, that's the tool that you can use to store all your prompts.

02:27.000 --> 02:32.000
And use kind of as a arena for your LLM.

02:32.000 --> 02:35.000
So, what does this thing do?

02:35.000 --> 02:41.000
So, in Lama Gator, you can add all the models that you have local and remote.

02:41.000 --> 02:47.000
For example, for this demo, I have OpenAI model, which is GPD 3.5.

02:47.000 --> 02:53.000
And using Neko API, I have a local model running inside of the Docker.

02:53.000 --> 02:56.000
And that is small one, which is small.

02:56.000 --> 03:01.000
So, I have a prompt that needs to calculate distance to the moon.

03:01.000 --> 03:05.000
And I've created an assertion to use LLM as a judge.

03:05.000 --> 03:07.000
So, what does it do?

03:07.000 --> 03:11.000
It asks local, small LLM, what's the distance to the moon.

03:11.000 --> 03:15.000
And instead me doing regular expressions or parsing the result, I say,

03:15.000 --> 03:18.000
OK, now go and ask OpenAI to validate that.

03:18.000 --> 03:22.000
So, in this case, I ask OpenAI return true.

03:22.000 --> 03:26.000
If it's close, which is again, close, not exactly.

03:26.000 --> 03:28.000
So, it might be a little bit different.

03:28.000 --> 03:30.000
So, which is close to the right answer.

03:30.000 --> 03:33.000
And I can give it like 10% of a kind of proximity.

03:33.000 --> 03:35.000
So, let's see how it goes.

03:35.000 --> 03:39.000
So, in this case, I have this prompt and I have that assertion.

03:39.000 --> 03:41.000
And I can create a test run.

03:41.000 --> 03:45.000
In this test run, I choose assertion, which is LLM as a judge.

03:45.000 --> 03:49.000
And I choose the model that I would like to test out, which is a small,

03:49.000 --> 03:51.000
using Neko API.

03:51.000 --> 03:56.000
So, creating test run, giving it like, and here we go.

03:56.000 --> 04:00.000
So, we have it passed and small give me exact number.

04:00.000 --> 04:02.000
And as validated, I can run.

04:02.000 --> 04:04.000
And that's it. Thank you very much.

04:05.000 --> 04:07.000
Then I'm in Flastock.

04:07.000 --> 04:10.000
Then I'm in Flastock. That's impossible.