WEBVTT

00:00.000 --> 00:13.720
Hi everyone, so I am here to present my talk on high performance Jupiter Notebooks with Zaspir.

00:13.720 --> 00:19.200
About me, I am an open source contributor, I have been contributing since 2016 and this is my

00:19.200 --> 00:27.080
second first demo talk, last one was in J. Ruby and these are my GitHub and X URLs.

00:27.080 --> 00:33.080
So as you all might have used Jupiter in your work and you might have found certain

00:33.080 --> 00:38.880
problems related to the performance, like a lot of times you might see the sessions crashing,

00:38.880 --> 00:43.480
the Notebook becomes unresponsive, especially if you are using it on a Jupiter hub instance

00:43.480 --> 00:49.680
and you have a sign very lesser RAM or less CPU, so like and the kernel dies and sometimes

00:49.680 --> 00:55.080
you will use your work, so yeah this was me like I was facing such issues, so I thought

00:55.080 --> 01:02.640
that can I make it better and so the project Zaspir was born and also it handles other

01:02.640 --> 01:08.360
problems like scalability issues and it has a better user experience and in terms of

01:08.360 --> 01:14.680
packaging Zaspir is a single binary just 8MB of size, so yeah it is very like it

01:14.680 --> 01:22.080
shifts small and it also has much better performance compared to Jupiter lab, so I will explain

01:22.160 --> 01:30.240
this in more, so yeah in terms of performance Zaspir uses 500% less CPU than Jupiter lab

01:30.240 --> 01:37.440
and around 4000% less RAM, plus it is much faster than Jupiter lab that the responses are

01:37.440 --> 01:43.920
instantaneous, it has ultra-low latency and the kernel start-up time is also very fast and it

01:43.920 --> 01:50.400
has higher throughput and yeah you can hand much more kernels than Jupiter lab, so if you are

01:50.480 --> 01:57.040
like running an HPC cluster and you are using a kernel gateway, it can easily handle around

01:57.040 --> 02:05.360
100 kernels whereas Jupiter has just around 64 kernels, so yeah there are proper benchmarks

02:05.360 --> 02:12.640
that I will show you and that you can run on your machine and you can verify this, so this is

02:12.640 --> 02:18.320
how a Jupiter lab architecture looks like, so we have this in the center, the Jupiter server

02:18.400 --> 02:25.680
and on the right most part we have this Python kernels or Jupiter kernels, this kernels can

02:25.680 --> 02:32.640
be either in IPython or Ruby or Julia or R kernels, the center part is Jupiter server like

02:32.640 --> 02:39.360
Jupiter lab, so whenever and the backend is built into a nano, a nano is something like

02:39.360 --> 02:46.640
it helps you run asynchronous core routines, plus we have this the front end which is built in

02:46.640 --> 02:53.920
domino and react, so yeah and how as per the first from there is like not much changes, we wrote

02:53.920 --> 03:02.240
the backend in go, go back in and we have a custom react front end here, very light fit, so yeah

03:03.040 --> 03:09.680
and rest of these things are like the Jupiter kernels is the same from what Jupiter lab does it

03:09.920 --> 03:17.840
and we have a 0MQ, 0MQ is like a communication library like on if you want to communicate on

03:17.840 --> 03:23.760
two sockets, you use 0MQ and for the message not being lost, 0MQ provides something called a

03:23.760 --> 03:30.320
message queue, so if a client or a server goes down the messages are not lost, so that makes 0MQ

03:30.400 --> 03:40.880
really cool and while Jupiter lab uses piZMQ python base 0MQ, we have a go native 0MQ,

03:40.880 --> 03:47.280
so it's not connected to the C bindings directly written in go, whereas piZMQ uses C base

03:47.280 --> 03:56.160
UMQ within it, so let's go into the benchmarks, so if you see the RAM uses, so how I've done

03:56.240 --> 04:02.080
benchmarks is, let's say whenever you open a Jupiter lab screen and you create, you launch an

04:02.080 --> 04:08.400
ipython notebook and at that point of time a notebook opens up but in the backend, the Jupiter

04:08.400 --> 04:14.240
server creates launches a Jupiter kernel, so this is a different process and this process,

04:14.240 --> 04:22.240
the server and the Jupiter kernel, they communicate to each other via 0MQ sockets, so to benchmark

04:22.400 --> 04:29.360
this under stress, what I do is I try to create a lot of notebooks, for example there is 0MQ

04:29.360 --> 04:38.000
2040, 60800 on the x-axis and we try to bombard it with messages, so like if you have let's say 15

04:38.000 --> 04:44.880
notebooks open, so there would be 50 small ipython processes open, now you have to communicate with them,

04:44.960 --> 04:53.520
so what I try to do is I add max, I do a 10 messages per second to per kernel, so that means

04:53.520 --> 05:01.120
if you have 50 kernels open, I am bombarding it with 500 messages per second and we see that how many

05:01.120 --> 05:07.520
kernels die and whether the Jupiter server will crash or not, so this will try to do it,

05:07.600 --> 05:17.360
as you can see in terms of CPU users we are at 0.8% and Jupiter is in blue color, whereas

05:17.360 --> 05:26.160
as per the blue color, so it is at 2.7, so as we increase the number of kernels, so for higher

05:26.160 --> 05:34.880
number of kernels you can see that the blue line it goes around till 12.5% but in terms of

05:34.880 --> 05:41.280
Jupiter it goes up and then you can see after 64 it goes down, this is because basically because

05:41.280 --> 05:48.160
the Jupiter server was crashing, like a lot of kernels went down and that's why you see the

05:48.160 --> 05:56.720
losing performance but if like this use keep kept going up, so yeah and Jasper on the other hand

05:56.720 --> 06:05.360
no kernels will lost, so yeah in terms of memory users you can see that there is in terms of

06:05.360 --> 06:13.680
Jasper there is a flat line, so we are always near 20.2mb, but for Jupiter you can see it keeps

06:13.680 --> 06:19.120
growing, like at the last one you can see that it crossed even 1000mb of RAM and at that point

06:19.200 --> 06:28.800
even some of the kernels crashed, so yeah, so how do we do this? Let's say in Python we have

06:28.800 --> 06:36.320
something called cooperative single threaded road routines, Coruetines, so let's say if we have

06:36.320 --> 06:42.560
something an eye-oblocking thing, like in Jupiter, Jupiter server most of the thing is like we

06:42.560 --> 06:47.920
tried to communicate with the kernel, the kernel does the CPU thing but like CPU bound task,

06:48.080 --> 06:53.360
but the server Jupiter server is mostly responsible for, receiving the message from the client,

06:53.360 --> 06:58.960
getting the message to the kernel and applying back, so in that case is you try to run most of

06:58.960 --> 07:07.680
these messages as asynchronous Coruetines, so these Coruetines are like there and in a like

07:07.680 --> 07:14.240
advance in the event loop executor, like you submit that to the event loop, but this event loop,

07:14.240 --> 07:20.640
like even though you have created a new thread, it will run in the same on the same code,

07:20.640 --> 07:28.640
like the GIL doesn't allow Python to run it on a separate thread, so that's why you start

07:28.640 --> 07:35.280
getting this performance drops, whereas in terms of go we are using preemptive lightweight green thread,

07:35.280 --> 07:42.000
so you can all use all your CPU course advance, so this is the major reason for why we are

07:42.000 --> 07:49.760
getting such a amazing performance boost, scheduling is done in go via this go runtime scheduler,

07:49.760 --> 07:56.720
via the violin asyncio we are using the event loop, now we have this blocking behavior in

07:56.720 --> 08:03.280
go routines, but asynchronous IO, this must not block and they are like they need to be cooperative,

08:03.280 --> 08:09.120
and so you have some done blocking, you must use a weight or futures and then we have to wait

08:09.120 --> 08:14.880
for these messages, so yeah this is how we are getting the performance boost, now what are the

08:14.880 --> 08:24.080
benefits, you get much improved responsiveness in Zaspir, it's very lightweight, 8MBs of binary

08:24.080 --> 08:30.080
package that you can just run and for interfaces it has a much better cost efficiency and better

08:30.160 --> 08:41.440
scalability, yes before this I like to show you this the benchmark report like it's not trust

08:41.440 --> 08:49.520
me bro, you can just run it on your machines and the entire code for benchmarking is here and

08:49.520 --> 08:59.840
I did it on my Mac mini M4 with 16MBs of RAM, so these are like different plots for like

09:00.240 --> 09:05.840
RPS, like 10 messages per kernel, two kernels, four kernels, then you could be doing now

09:05.840 --> 09:15.440
then I do it for 100 kernels and 10 RPS, then I go for 100 RPS, like 100 messages being sent to

09:15.440 --> 09:24.560
each kernel, so two kernels, four kernels, yeah and there is also like why sometimes Zaspir crashes

09:24.640 --> 09:34.480
and why Jupiter crashes, and yeah apart from that this is the demo, like this is how when you

09:34.480 --> 09:40.320
load a Zaspir instance it looks like this, I create a Python 3 project, I run LS,

09:42.880 --> 09:49.200
122, you can run all kind of Jupiter kernels, like C base, C plus plus base, Julia base,

09:50.080 --> 09:55.600
just follow the docs on our website and let's say I had to show a faster kernel startup time

09:55.600 --> 10:00.400
I just click on this, all the cells will run, you can see like this is much faster than Jupiter,

10:01.280 --> 10:09.840
and yeah that's all, acknowledgments to fast United and zero the for supporting this project,

10:09.840 --> 10:15.920
I worked on this for one year full time and these are also very helpful to me in funding this project

10:16.560 --> 10:29.600
and yeah thanks questions, right time for the question, it may be even through, we're doing

10:29.600 --> 10:42.080
well on time, so in terms of authentication you want me to tell you how we do authentication

10:42.080 --> 10:49.840
in Jupiter servers, so if like the Jupiter kernel is a process you don't need authentication there,

10:49.840 --> 10:55.760
if you have to put an authentication you have to put it on top of the Zaspir server and if you

10:55.760 --> 11:02.320
run it by the Zaspir CLI with a more minus protected is equal to true then you will have a log in

11:02.320 --> 11:23.280
screen there, you log in and then you can go ahead, okay you can, so right now we have allowed something

11:23.280 --> 11:30.640
like access token like you set up randomly and it's on random you just compare it and then you

11:30.640 --> 11:38.080
can share it with like how Jupiter does it and for more like but your custom ways you have to improve

11:38.080 --> 11:44.640
the code and then you can learn it, I have a quick question as well, how is this received in the Jupiter

11:44.640 --> 11:52.560
community because any feedback from them, there are like two Jupiter contributors here and yeah

11:52.560 --> 11:59.840
like the community was very supportive and I also like acknowledge them because it's not like

12:00.160 --> 12:07.600
this work is based on the top of Jupiter wire protocol and I just didn't invent something new

12:07.600 --> 12:14.400
but I just implemented it in a new way and like I received some good feedback from the

12:14.400 --> 12:19.840
containers and like they're really helpful, okay excellent, thanks a lot