WEBVTT

00:00.000 --> 00:28.720
Welcome welcome done, Bishai. I'm not biting.

00:28.720 --> 00:35.720
Let's keep it as a secret.

00:35.720 --> 00:45.560
Hi everyone, thanks for staying up late. I know, it's like how fast 3 so it's already quite

00:45.560 --> 00:50.400
a time and yeah, thanks for being here. So my name is Alex. I mean, probably you can try

00:50.400 --> 00:55.160
to spell my full name in surname, but it doesn't matter. Just call me Alex. That's fine.

00:55.160 --> 01:00.240
I'm from OpenEvilSystems and today we're going to talk about building the A factories

01:00.240 --> 01:08.040
with open source tools, right? So AI, AI, AI, AI, that was a little bit different story,

01:08.040 --> 01:17.600
but yeah, AI. So I'd like to say thank you to EPSYSIS project which helps us to develop

01:17.600 --> 01:25.040
cool things, bring the cool things and yeah, we're glad to be part of it. So thank you.

01:25.200 --> 01:31.200
What are we going to talk exactly today? So that will be the problem and concern. So

01:31.200 --> 01:36.960
why I'm even here telling something to you, right? What's the problem? How we can solve

01:36.960 --> 01:42.640
these problems, right? And we'll talk about what's the ray appliances and then I'll try

01:42.640 --> 01:51.640
to show a demo, you know, and well, think happens, but trust me, I'll try to make it as

01:51.720 --> 01:58.040
smooth as possible. All right. So what are the key problems right here? Now, if you'd

01:58.040 --> 02:04.280
like, like, airlines, AI, and all that thing, it's cool, it's fine, but it actually takes

02:04.280 --> 02:09.160
a little bit of time and effort from you to spin it somewhere, you know, especially if we

02:09.160 --> 02:14.280
go beyond the local host, right? That runs on my laptop. So if we talk about something bigger,

02:14.280 --> 02:19.080
then that's quite an effort to build, right? And then you can fall back to something like

02:19.080 --> 02:27.960
the managed or, you know, AI as or LLM as a service, which is fun, you know, sunshine and rainbows

02:27.960 --> 02:34.440
until you start to bang for that. So then it becomes, well, quite expensive. Now, another thing

02:34.440 --> 02:40.280
is now the configuration, right? So every public cloud vendor that supplies you with this kind of

02:40.280 --> 02:46.120
stuff has its own way and thinking how you should configure, right? So if you migrate from

02:46.120 --> 02:53.480
one to another, then well, learn from scratch, ah, pain in the ass, excuse me. So, right? So if you'd

02:53.480 --> 03:00.440
like to go a little bit further and you'd like to have something unified, that would be the problem.

03:01.000 --> 03:08.360
So how we can solve it with the help of the openabular, right? First of all, we've

03:08.440 --> 03:16.280
probed the idea of AI as a service to your data center. And by data center, I don't mean

03:17.240 --> 03:22.680
data center like the one that's on-prem, but anything, right? So it could be public cloud that you're

03:22.680 --> 03:27.880
using and managing with the help of openabular. Yeah, why not? You can run it there as well. So you

03:27.880 --> 03:34.680
get this flexibility of moving that stuff from on-prem to public cloud, from one public cloud,

03:34.680 --> 03:40.520
to another public cloud, and do it the way you wish, maintaining the same configuration approach,

03:40.520 --> 03:46.360
right? So again, it doesn't matter where you start it, it's going to work. Now, another thing is that,

03:46.360 --> 03:54.040
well, yeah, you would like to use GPUs, right? And in openabular, we developed it the way that you can

03:54.040 --> 04:00.280
actually enable the GPU password and allow the virtual machine that's going to run a specific

04:00.360 --> 04:09.160
appliance to benefit from the GPU password. So again, sunshine rainbows, right? Another thing is

04:09.160 --> 04:14.040
that, well, you would like to deploy it, and to deploy it, you would like it to be fast, you would

04:14.040 --> 04:19.400
like it to be simple because, well, I'm not the AI expert, right? And if I go deep into that,

04:19.400 --> 04:26.200
that probably spinning a real thing in the real server would take me a couple of days first,

04:26.840 --> 04:33.480
maybe weeks, and depends how deep I would like to go. So if you're not the super expert,

04:33.480 --> 04:38.760
but you still out and build something that is not, you know, public to available, or you don't want to

04:38.760 --> 04:47.160
use the CHAPGPT by OpenAI or anything, you know, even more public, then it's quite painful. So again,

04:47.160 --> 04:54.280
with the help of Openabular, you can kind of deploy it in three to four clicks. Of course,

04:54.280 --> 04:58.600
supplying the specific things that you need, like, you know, password, API key and stuff.

04:59.240 --> 05:06.120
And well, where you get those models from hugging phase, right? So another open source to

05:06.120 --> 05:16.120
kicks in, and you just easily deploy that. So and aside from just what I've talked, what we also

05:16.120 --> 05:23.320
offer is working with other clients, right? So you deploy it that LLM, you start to use that,

05:23.400 --> 05:30.840
you start to develop something that works with that LLM, but then it's not enough just deploy.

05:30.840 --> 05:35.640
You need to move it, you need to administer it, you need to make backups, you know, in all the

05:35.640 --> 05:42.760
drill. So for that, again, Openabular provides a full set of tools that you can use, you can leverage

05:42.760 --> 05:50.280
to make sure that you have your operations continuously, you can migrate it to closer to consumer,

05:50.360 --> 05:57.960
you may, you know, run a multiple instances of the same of the same clients. And what's my favorite

05:57.960 --> 06:04.040
one because I love to be to care about the customers or about the end users. So the whole

06:04.040 --> 06:14.360
that thing makes the end user experience smooth as a butter, right? So this is the ready to use

06:14.360 --> 06:22.840
a clients called Ray. So we're using Ray for that, and it's my, but what I mean by ready to use,

06:22.840 --> 06:30.760
I mean ready to use ready to use, right? So you import that on your system, do small fine tuning,

06:30.760 --> 06:36.760
to make sure that it suits your needs, and that's it. So as everything in our marketplace,

06:37.960 --> 06:44.280
it is ready to use. So again, another cool thing. It has a couple of variables that you need

06:44.280 --> 06:54.920
to configure in order to make sure that it works as expected. So the first one would be the API

06:54.920 --> 06:59.640
port, right? So if you'd like it to listen to a different port or can you go with default,

07:00.680 --> 07:08.280
you can increase the temperature and find the temperature to be a more consistent or to be

07:08.280 --> 07:17.000
more creative, then you need your hug and face API key. So and also the model ID. So you need to

07:17.000 --> 07:22.280
select the model that you would like to use. So not every model is supported, there are a list of

07:22.280 --> 07:27.720
the ones that are supported, but these are the most used ones, and you will find the one that

07:27.720 --> 07:35.640
actually you like. So there are some requirements, like at least eight gigs of memories required.

07:35.640 --> 07:40.520
So again, if you would like, so the thing can run on your local host. So if you have a powerful

07:40.520 --> 07:47.560
machine with you, you can start it on your Linux laptop without any issues. But at least,

07:47.560 --> 07:55.640
must dedicate the eight gigs of memory. I would suggest to give a little bit more, like,

07:55.640 --> 08:03.640
I'm starting with 16 to have it. Then what you also can do, you can upload your own

08:03.720 --> 08:10.840
Python script, and this Python script will work instead of our script. So our script is

08:10.840 --> 08:16.760
kind of to showcase the capabilities, basically that would be the basic chatbot, but it's very generic,

08:16.760 --> 08:22.680
so if you'd like it to be more fine tuned, you can check what we did, so take our code,

08:23.640 --> 08:35.480
revamp it a little bit, and pipe it down to your to-app lines. So, and by default, just a

08:35.480 --> 08:42.120
sign note, if you will start with that, the this size that's dedicated is eight gigs, you would

08:42.120 --> 08:47.640
like to increase it, because what we would like to do is to be very tiny, so you can

08:47.720 --> 08:52.840
fastly download and import it into your open and build environment, but then you need to

08:52.840 --> 09:00.040
increase it to make sure that it's around 80 to 100 gigs to facilitate your needs.

09:01.400 --> 09:09.000
Now, as I mentioned before, aside from just giving you all that capabilities, we are simplifying

09:09.000 --> 09:15.640
with this the LLN deployment. Well, more or less, reducing the operational costs, right, because

09:15.720 --> 09:22.600
instead of just paying to a cloud provider to basically use the same thing, you can use it yourself.

09:24.040 --> 09:31.800
We provide a native support for the GPUs, which again is a win-win, so it can run on CPU only as

09:31.800 --> 09:37.720
well, but you know, with GPU, you get much more performance. Now, the robust multi-tenancy,

09:37.720 --> 09:45.320
so you can actually provide different quotas and make sure that some of the users have

09:45.400 --> 09:50.680
different level of access and make sure they're separated and not interfere with each other.

09:51.720 --> 09:58.360
And as I mentioned before, it's the hybrid cloud approach, so it means that you can deploy this LLN

09:59.800 --> 10:07.480
where you want, so you're not tied to a specific vendor, it's a specific public cloud provider,

10:07.480 --> 10:13.320
it could be your local host, it could be on-prem, it could be in a public cloud. So again,

10:13.320 --> 10:19.240
you don't need to obey the rules and laws of and pricing of that public cloud provider.

10:19.880 --> 10:28.120
If you don't like it, you say like, okay, I'm moving to another cool cloud provider.

10:29.480 --> 10:35.560
Right, so what's going to be next in the upcoming release, so in the upcoming releases,

10:35.560 --> 10:44.520
we are going to support the VLLMs, we will integrate, we'll work on open AI API,

10:44.520 --> 10:52.600
and we will also extend the list of supported LLMs, right, from the hugging phase plus,

10:52.600 --> 10:59.000
some recommendations on sizes, fine tunings, and so on so forth. So right now, it is ready to use,

10:59.000 --> 11:05.400
but of course that's the first release, like first public release, so comments, comments,

11:05.400 --> 11:09.320
are welcome. Now, in order to do a small demonstration,

11:11.800 --> 11:19.880
right, so let's have a look where to find that awesome appliance, right, you go there to the apps,

11:19.880 --> 11:27.800
which is basically a list of all services that you can install, and my dear service,

11:27.800 --> 11:34.920
mini-io, WordPress, and so on, so forth, but you would like to search for rate, right,

11:34.920 --> 11:43.400
and you would like to obviously export that to your open Nabila, open Nabila environment.

11:43.400 --> 11:47.480
Now, that's going to appear as the VM template. That's what I'd like to say.

11:48.280 --> 11:54.200
We have images, we have templates. Image is a snapshot of a disk.

11:54.360 --> 12:01.400
Template dictates the way what you'll get in the end, how much RAM, how much disk space,

12:01.400 --> 12:06.520
what's the version network, how to even name the appliance, or name the virtual machine.

12:06.520 --> 12:12.440
So that's why we always working with the virtual machine templates. And virtual machine templates are

12:12.440 --> 12:22.120
customizable. So from the memory that you are going to provide to this awesome virtual machine,

12:22.760 --> 12:33.240
down to the CPUs, down to costs. So with it, it's AI as a service, which may be AI as a service

12:33.240 --> 12:39.240
for yourself to make your life easier, or it might be a little bit further for your customers,

12:39.240 --> 12:43.800
your subscribers, or even in your bigger organizations, you know, organizations sometimes charge

12:43.800 --> 12:49.720
each other within one team, charge another to make sure that the money moves. So you can actually

12:49.800 --> 12:58.040
calculate how much a specific user or a group of users will owe you after they are going to deploy

12:58.040 --> 13:07.000
that virtual machine and going to use it for a while. So all that is customizable. So then that's

13:07.000 --> 13:15.240
where what I've mentioned before, it comes as ADICs good to quickly download, but you must extend it.

13:15.320 --> 13:24.040
So extend it to at least 80. Give it some kind of a network, because it must download the

13:24.840 --> 13:31.320
appliance from the hugging phase. And basically that's it. You don't need to edit anything else.

13:31.320 --> 13:40.200
Remember, give it a little bit of power, extend the disk, and give it a network. That's it.

13:41.160 --> 13:49.400
So then you obviously would like to run it. Now, yeah, one thing I forgot to mention, excuse me,

13:49.400 --> 13:57.800
is that for now, this is going to be launched as a CPU dependent. So I do have an instance with GPUs.

13:57.800 --> 14:05.880
I will showcase the GPU, but yeah, so if you need to attach the PCI device, so aka GPU in this case,

14:05.880 --> 14:14.520
then you also need to edit that at a template level. So some of these things will be available

14:14.520 --> 14:21.800
when you try to deploy the virtual machines. Some of these are not going to be available. So you would

14:21.800 --> 14:27.160
like to make your life as easy as possible and pre-bake a lot of things at a template level.

14:27.720 --> 14:36.600
Now, going to template, click instantiate. So again, verify that all things look as you wish.

14:38.280 --> 14:45.320
Next, now here's the fun begins. Now, you need to supply with some of the variables,

14:46.200 --> 14:51.720
and some of these are just the endpoint for your API. So basically for your chatbot,

14:52.680 --> 14:58.600
the API port, again, this should be in sync with the Python script that you are going to provide

14:58.600 --> 15:06.360
if you go with a custom thing, or leave it as is if you just go with whatever we supply as the

15:06.360 --> 15:12.280
example. Now, find you in the right model, the temperature, so give it a little bit more of

15:12.360 --> 15:21.320
creativeness or give it as straight as possible. Then select the LLM model that you would like

15:21.320 --> 15:29.720
to run, I'll go with Metaslama, but yeah, you can use others. And again, more to come. So it's

15:29.720 --> 15:37.480
just a first iteration of the supply. Don't forget the API token. That is very important.

15:37.480 --> 15:44.520
And before you start to blame us, check your hugging face account. Are you actually able to

15:44.520 --> 15:50.440
download this LLM, or maybe you need to fill a form and wait for like five to ten minutes, so you

15:50.440 --> 16:01.880
will be approved to use that model. So that is just a hint from a truly yours. And I need to copy

16:01.880 --> 16:10.120
and paste that. Yeah, please prepare your cameras, so you can reuse my hugging face API key.

16:10.120 --> 16:19.240
Oh, no, sorry. I'll just get it. Gotcha. So, and then the one app ray model prompt. So something

16:19.240 --> 16:27.480
that we are going to pre-pand along with your chat. So what will be there? Well, it's up to you.

16:27.480 --> 16:33.160
Depends on how you'd like to find to it. So one of my examples is that I'm trying to

16:33.960 --> 16:38.840
think about learning Spanish, but I'm very bad at it. So I need a sparring partner. And I don't

16:38.840 --> 16:47.160
want to abuse my friends with my very bad Spanish. So I can ask the chatGPT, so your chatGPT excuse me,

16:47.160 --> 16:54.920
the LLM model that, hey, you are the Spanish instructor. So please assist me in learning Spanish.

16:55.000 --> 17:01.480
So something like that. Or may go with a little bit different use case and say that you are,

17:01.480 --> 17:08.440
you know, the Python professional. And you would like to assist me with the code.

17:09.640 --> 17:17.240
Now, so again, this could be the prompt that we are going to place before the actual chat.

17:18.200 --> 17:26.600
Next, verify the network and storage again. So this just to make sure that you may want

17:26.600 --> 17:33.080
to attach a different network. Whatever the case may be. And that's it. So the appliance is

17:33.080 --> 17:39.240
going to be deployed in a while. Again, if you see the green light here that doesn't mean that

17:39.240 --> 17:43.240
actually it's up and running. So it means that the virtual machine is up and running, but there's

17:43.400 --> 17:48.280
the internal process that happens. So we need to download the modular itself. We need to

17:48.280 --> 17:59.640
configure the rate to actually serve that. And to achieve that, we have, well, you can have the

17:59.640 --> 18:07.560
command line access to that and you can run some nice commands to make sure that your appliance

18:07.560 --> 18:19.880
is working as expected. So for example, come on, serve status is going to show you what is

18:19.880 --> 18:25.480
the current state of the appliance. So it might be deploying. If there will be any error,

18:25.480 --> 18:31.560
you are going to get it here. So if it fails for whatever reasons, for example, that DNS or maybe,

18:31.560 --> 18:38.280
well, you have a very, very bad connectivity that also can happen. But yeah, in general, all your

18:39.000 --> 18:46.760
can issues or all your bells and whistles are going to be there. Now, what you would like to get is

18:46.760 --> 18:51.720
this. You can think of a big completed, which basically means that it's up and running and it's healthy.

18:51.720 --> 19:02.360
So now you can start to talk to your chatbot. Now, we supply the client.py, so it's a simple

19:02.360 --> 19:09.960
Python application that just works as the chat interface. Well, whatever your imagination, you can

19:10.040 --> 19:24.760
use whatever you wish. So what you need to supply here is the IP address, the port and the API endpoint

19:24.760 --> 19:32.440
aka chat. In this case, it's, well, in my case, it's 12701 because I've connected to the

19:32.440 --> 19:38.040
ray appliance directly. But again, if you're, if you're routing allows, then you can do it from your

19:38.120 --> 19:45.960
computer so you can download this Python thingy from the one apps repository. So it's available

19:45.960 --> 19:55.000
as the open source stuff. Basically, hi there. So that is your helpful assistant and you can say

19:55.000 --> 20:06.920
something like basic Python header. It is going to take some time to come up with the response.

20:07.000 --> 20:16.120
So that really depends on the amount of resources that you supply to this LLM and, well,

20:17.400 --> 20:28.520
to certain extent, it may also crash if it's up for for too long and you're not interacting with

20:28.520 --> 20:35.000
that. So they're kind of looking why the thing happens. But yeah, so sometimes it unfortunately happens.

20:35.880 --> 20:44.840
Now, let's give it a while. I will return back to this to tell you, so we can see the actual

20:44.840 --> 20:53.320
basic Python header. But yeah, before that, I would like to answer your awesome questions

20:53.960 --> 21:10.040
that you most likely have. So, come on, where's the mouse? I've lost power. Yeah, here

21:10.040 --> 21:18.600
this. So, meanwhile, questions and I'm ready to answer them while this thing responds to me

21:18.600 --> 21:37.480
with a prompt. Do we have any? Well, rate self. So it's, do we have any log messages?

21:38.440 --> 21:44.920
Yes, we do, but it's not, we as the open envelope. So rate appliance itself writes a ton of logs.

21:44.920 --> 21:52.520
You can find them in the et cetera, ray, then current session, logs, and there will be a plenty.

21:52.520 --> 21:59.800
So you can enjoy. Yeah, but again, it's not a troubleshooting session. So, I don't want to

22:00.600 --> 22:07.000
go into the shallows of the depths of troubleshooting the appliance. So that's going to take

22:07.000 --> 22:12.440
too much time. Yes, please. Any other questions? Thanks for the question, by the way.

22:15.560 --> 22:23.640
Yes. So, this service way is more intended for the burdens, as opposed to coming in place

22:23.640 --> 22:29.800
to the platform, which is intended for sharing AI models and stuff. Is there any developments

22:29.800 --> 22:35.480
on the, are there any open source solutions for something like having a space home?

22:35.880 --> 22:47.880
Well, is there any solution for the hugging face hub? Sorry, I'm not in the position to answer that.

22:47.880 --> 22:54.600
So in this case, yes, we kind of give you the platform to use it, but yeah, it probably need to call

22:54.680 --> 23:02.680
hugging face, and I'll spend this question. But thank you, noted.

23:06.120 --> 23:13.320
Excuse me, but the mic connection just hung to that, because it's a remote data center.

23:15.480 --> 23:21.080
But if you are really interested into that, I'm available here, so you can just jump

23:21.080 --> 23:24.760
and I'll show it on my computer. Thank you, any other questions?

23:27.800 --> 23:29.080
Thank you.

23:29.080 --> 23:29.720
Thanks for the action.

