WEBVTT

00:00.000 --> 00:17.600
Hello and welcome. We had just some technical issues presenting with the laptop, but now

00:17.600 --> 00:29.000
it's working with a picture, right? Okay, so let's welcome Neil Scuroil, welcome at

00:29.000 --> 00:40.600
for them and we'll hear his talk, a memory allocator with only 6,000 percent, 30,000

00:40.600 --> 00:46.040
percent fixed overhead written from the scratch, something with big storage you want to

00:46.040 --> 00:51.040
allocate small things. Thank you, let's go.

00:51.040 --> 01:03.040
Hello everyone, so before we start for real, I need to fix something, because we need

01:03.040 --> 01:09.040
to do something about this click baity title or in your case, your appearance baity title.

01:09.040 --> 01:16.040
So what are the actual numbers that I was referring to there?

01:16.040 --> 01:25.040
It's for a given page size that the memory allocator can out at the minimum to bits.

01:25.040 --> 01:37.040
So if you go through the numbers that gives you 4,000 by pages, these 0.006 percent

01:37.040 --> 01:44.040
from the first title. So just to illustrate and give you a bit of a feeling, if you want to manage

01:44.040 --> 01:55.040
one petabyte at 4 kilobit pages, that's 65 gigabytes, just the feeling for that.

01:55.040 --> 02:04.040
And obviously if you choose smaller pages, this overhead will increase.

02:04.040 --> 02:17.040
So at 256 byte pages, the overhead is one order of, so it goes from mega to kilo from

02:17.040 --> 02:23.040
giga to kilo, two giga to mega is a price.

02:23.040 --> 02:31.040
So welcome to a memory allocator with a fixed overhead of 2 to the power of s minus p minus 2

02:31.040 --> 02:44.040
for an arena size of 2 to the power of s and the minimal page size of 2 to the power of p written from scratch.

02:44.040 --> 02:51.040
So my name is Niels. I'm an old guy. I know Linux since I was a teen. I have a degree.

02:51.040 --> 02:57.040
I run an ISP and then another one and another one.

02:57.040 --> 03:05.040
I learned everything from fos. So to all the people who came before me, I owe you a lot.

03:05.040 --> 03:12.040
I feel like I learned nothing at university and everything from staring at other people's code.

03:12.040 --> 03:21.040
For quite some time now, I'm an independent consultant and developer who runs a small company.

03:21.040 --> 03:32.040
Also, for a couple of years now, I'm one of three maintainers of a beautiful piece of software called Warnish Cash, which also this project is about.

03:32.040 --> 03:36.040
It's called slash slash with this slash.

03:36.040 --> 03:49.040
It's licensed LGPA, comparably small, so WC tells me it's 36,000 lines of code including comments,

03:49.040 --> 03:53.040
so something in that ballpark.

03:53.040 --> 04:02.040
What it implements are two storage engines for Warnish Cash, so Warnish Cash has a module concept to extend the build in language, which is called WCL,

04:02.040 --> 04:12.040
but also since quite actually since this project, also to extend core fundamental core functionality.

04:12.040 --> 04:25.040
And one of the relevant components in Warnish Cash is where it's responsible for actually storing the object somewhere.

04:25.040 --> 04:36.040
So if Warnish Cash later asks for it, it gets it back a bit like a file system in metadata.

04:36.040 --> 04:46.040
So this project comprises two storage engines, one is called body, and we will see in just a moment why.

04:46.040 --> 04:55.040
For in memory, cash only, so a cash that you lose when the process restarts or the machine restarts.

04:55.040 --> 05:04.040
And one is called fellow, which is implements here, disc memory storage.

05:04.040 --> 05:22.040
This storage is eventually persistent, which means depending on which compromise you're willing to take, you might lose a couple of seconds or more before you can reload your objects.

05:22.040 --> 05:31.040
It's always consistent, so whatever you do, as long as you deliberately program your hardware to miss behave.

05:31.040 --> 05:38.040
It should always come up consistently with something usable.

05:38.040 --> 05:46.040
It implements ASNGIO, or uses ASNGIO as implemented in IO during, but we also have two other implementations.

05:46.040 --> 06:00.040
And for the even broader context, the first sponsor who made this all possible, they run a video CD and so you can, that's probably illustrative of where the numbers are coming from.

06:00.040 --> 06:13.040
So when I was, when I agreed with the sponsor that I would actually be doing this,

06:13.040 --> 06:26.040
I went through some kind of requirements design phase, and besides the huge storage that it's obviously need to support to be useful.

06:26.040 --> 06:38.040
I also wanted to solve some other issues along the way, so I had my own little requirements in design list on top of that.

06:38.040 --> 06:46.040
First, there is an LIEU fairness problem with the built-in storage implementations.

06:46.040 --> 07:00.040
So for this implementation, I really wanted to make sure that a request which has already been waiting, and is now at the top of the queue, which actually turned out to be the implementation.

07:00.040 --> 07:04.040
So, anyway, first come first served for allocation requests.

07:04.040 --> 07:10.040
I absolutely wanted to have a fixed size, both in memory and in on disk.

07:10.040 --> 07:20.040
So the advantage cache in the built-in storage engine, it's quite an issue that they will take more memory than you configure.

07:20.040 --> 07:26.040
I mean, that's obviously documented, but you never quite know how much more.

07:26.040 --> 07:41.040
So for this implementation, I absolutely wanted to fix size, and ensure I wanted efficiency in managed cache, where we have this efficiency fetish, nothing beats it.

07:41.040 --> 07:49.040
So the little more detail, what comes out of these requirements, or came out of these requirements for me was,

07:49.040 --> 07:57.040
I absolutely did not want to manage free maps, so meta information about free space on disk.

07:57.040 --> 08:05.040
I did not want to do any disk I owe just to manage space.

08:05.040 --> 08:17.040
And upon that this question, can I somehow really be doable to support multiple petabytes of storage with reasonable amounts of RAM?

08:17.040 --> 08:25.040
I mean, it doesn't need to support a petabyte in my laptop, but maybe in a data center server.

08:25.040 --> 08:38.040
Then there was a kind of coincidence that I carried in my, in my thoughts, this idea of the body allocator, which I at one point had read about.

08:38.040 --> 08:43.040
And really, yeah, I thought it was really a cool idea.

08:43.040 --> 08:55.040
Invented as per Wikipedia in 1963 by Harry Markovitz, it's advisable to use in kernels, and I guess in many other places.

08:55.040 --> 09:03.040
And the base idea is that the allocator manages a hierarchy of power of two page sizes.

09:03.040 --> 09:07.040
We'll see that illustrated in a moment.

09:07.040 --> 09:19.040
And with this hierarchy of pages that adjacent pages can merge into the next power of two, or be split into two powers of two lower.

09:19.040 --> 09:22.040
So let's look at an example.

09:22.040 --> 09:25.040
This is super simplified.

09:25.040 --> 09:40.040
In this case, me, the example only has three different page sizes, but really for this idea, it doesn't matter at all if you draw eight or 20 or three.

09:40.040 --> 09:50.040
And if we started with our empty arena, all our page, excuse me, all our free memory will be in just one page.

09:50.040 --> 10:06.040
Actually, through in in in practice, if if I run a one terabyte arena, because it's a power of two initially, I will have one free page off size one terabyte.

10:06.040 --> 10:13.040
Now in this example, that's suppose we want to allocate one of these middle sized pages.

10:13.040 --> 10:20.040
So we split. First of all, we take the top page split it.

10:20.040 --> 10:26.040
When we end up with one allocated here shown blue and one free page shown in white.

10:26.040 --> 10:41.040
Now, if we want to allocate another page this time, a small size, we do that split again, have one page which we can use and another page which is free.

10:41.040 --> 10:49.040
Now, let's do this stuff in reverse, but different order. So we free the bigger page first.

10:49.040 --> 10:55.040
That's then super simple. We do nothing but just marking it free.

10:55.040 --> 11:05.040
But now if we free the little blue page again, also, all these merge into just one page again.

11:05.040 --> 11:19.040
So that's really, I think, this auto coalescing of free space is really what I think is wonderful about the body allocator idea.

11:19.040 --> 11:29.040
So what is the memory allocator about on a very fundamental level?

11:29.040 --> 11:44.040
It's maybe for other things like detecting out of bounds access for, yeah, depending on the API, it might need to also track allocated space.

11:44.040 --> 11:54.040
But fundamentally, it's really not concerned that this space you have already allocated, it's concerned with the space that is free.

11:54.040 --> 12:01.040
That is this way of thinking about it, help me get into the topic.

12:01.040 --> 12:15.040
If free space is actual memory. So if the allocator, it would be talking about was actually only for memory like RAM.

12:15.040 --> 12:23.040
It's what I'm what easy to manage free space because we have plenty of free space as long as we have free space, right?

12:23.040 --> 12:28.040
Because that's RAM. We can put metadata in there.

12:28.040 --> 12:34.040
But if you want to manage free space on disk, there is no RAM that comes with it.

12:34.040 --> 12:44.040
And if we don't want to do I owe to manage free space, well, there will be nice if the metadata fit into RAM.

12:45.040 --> 12:54.040
For traditional approaches, also the management of the free space scales with the fragmentation.

12:54.040 --> 12:59.040
So the more little blocks you have that you need to track, obviously the more metadata you need.

12:59.040 --> 13:07.040
Which is bad for this predictability thing that I talked about a moment ago.

13:08.040 --> 13:17.040
So what was this fundamental idea? Take a body allocator and for all these boxes, just use single bits.

13:17.040 --> 13:26.040
So many, many, many, many, I get allocator. I've looked at they use linked list, trees, whatever.

13:26.040 --> 13:32.040
But in this implementation, I thought okay, but yeah, but they could be bits, right?

13:32.040 --> 13:40.040
So what we end up with this implementation is just one huge bit field for each of these levels.

13:40.040 --> 13:51.040
So in this case, the top bit would be one bit field, the middle and the low bit would all be one bit field.

13:51.040 --> 13:55.040
Excuse me.

13:55.040 --> 14:02.040
And yeah, I just chose to set bits to mark free pages.

14:02.040 --> 14:10.040
It could probably be the other way around that the free pages are zero and one, it probably doesn't matter.

14:10.040 --> 14:19.040
So to jump ahead a little bit, I would like to show you what one nice thing which we can do.

14:19.040 --> 14:35.040
If we use these bit fields, because we can map these bit fields into shit.

14:35.040 --> 14:44.040
Okay, my fondest small excuse me, at the fondest too big.

14:44.040 --> 14:49.040
Right.

14:49.040 --> 14:55.040
So we should see at the top, that there are 13 terabytes of memory managed here.

14:55.040 --> 14:59.040
This is my burn and machine.

14:59.040 --> 15:06.040
So at the top right you see 1429 yada yada bytes total, that's 13 terabytes.

15:06.040 --> 15:10.040
And we are looking at the memory management live.

15:10.040 --> 15:15.040
This is not a mockup or anything you can download the software anyway.

15:15.040 --> 15:24.040
So each character on the screen at this level, at the zoom level represents 8 gigabytes.

15:24.040 --> 15:28.040
And we see that stuff happening there at the left.

15:28.040 --> 15:35.040
Changing between fully taken, which is the star and partly taken, which is the slash.

15:35.040 --> 15:43.040
Yeah, it's because we can zoom in, we can scroll down, we can scroll up.

15:43.040 --> 15:52.040
So yeah, it's a nice gimmick, but trust me, this has helped me keep finding my nasty bugs.

15:52.040 --> 16:04.040
Being able to visualize something is super important and what really amazes me is that it gives you exactly what the memory

16:04.040 --> 16:17.040
memory calculator is using in a safe manner, because the memory map file, which is used to keep the information that is used to keep the bit fields,

16:17.040 --> 16:24.040
is map read only into another process, while the main process here, one is the does its actual thing.

16:24.040 --> 16:33.040
And the other process, this one slash map, it cannot do any harm because it only sees read only mapings.

16:33.040 --> 16:41.040
They come slide, okay.

16:41.040 --> 16:57.040
So now I want to go dive a little deeper with you into the implementation and some of the things I came across.

16:57.040 --> 17:05.040
What do we need to implement for to get a memory allocator?

17:05.040 --> 17:14.040
First of all, a bit of a weird name, I call the free implement, sorry, what you usually, you're in all known malloc and free.

17:14.040 --> 17:20.040
So what you know is free, I call return, because the memories return to the allocator.

17:20.040 --> 17:26.040
I don't know what struck me for some reason, I thought free was already confusing me, so it's called return.

17:26.040 --> 17:34.040
The allocation and the return, what do we need to get them?

17:34.040 --> 17:41.040
We need to be able to free a single page to take a single page.

17:41.040 --> 17:47.040
We need to be able to split the page into merge pages.

17:47.040 --> 17:51.040
Also we need to find space.

17:51.040 --> 18:00.040
And the following examples contain pseudo C, just for readability.

18:00.040 --> 18:04.040
But it's close to what I actually have.

18:04.040 --> 18:11.040
So let's think back at the body allocator and using bit fields.

18:11.040 --> 18:18.040
So if we want to free a page, I said I'm using the set bits to mark the free pages.

18:18.040 --> 18:23.040
So what is freeing a page then is just setting a bit super simple.

18:23.040 --> 18:32.040
The only thing is that the bit fields might be gigabytes in size, but it's just finding the right bird and setting the right bit.

18:32.040 --> 18:40.040
Conversely, the take is just to clear, but if you know that a page is free and you want to take it, clear the bit.

18:41.040 --> 18:46.040
For the split, it's also super, super simple.

18:46.040 --> 18:55.040
And this is the first of all we take one page which we know is free.

18:55.040 --> 19:04.040
Then we go one level down, mark the right page is free because we're not going to need it.

19:04.040 --> 19:07.040
Go one level down.

19:07.040 --> 19:14.040
By the way, the right page is always the uneven one.

19:14.040 --> 19:19.040
So the left end page is always even, the right end page is always uneven.

19:19.040 --> 19:25.040
If you number the page is starting at the zero.

19:25.040 --> 19:36.040
Then we do that again, and we do that again, and then we have the small allocation that we want out of the split.

19:36.040 --> 19:40.040
We need to reverse that.

19:40.040 --> 19:47.040
I don't have a full animation for the merge, but it's pretty clear if you think about it.

19:47.040 --> 20:02.040
So we know that the body page, the one we potentially merge with is the same page number is our page number with just the last bit flipped.

20:02.040 --> 20:05.040
This is our always our body.

20:05.040 --> 20:11.040
So we look at this body and check, okay, while it's in bounds.

20:11.040 --> 20:18.040
So while it's not beyond the end of the bit field.

20:18.040 --> 20:22.040
And it's free so we can take it.

20:22.040 --> 20:24.040
We do that.

20:24.040 --> 20:30.040
And just continue at the next level.

20:30.040 --> 20:36.040
I think this is just, yeah, I mean, I didn't come up with the body idea.

20:36.040 --> 20:48.040
I just found it super nice that I can have the two main functions of the locator in something like 10 lines.

20:48.040 --> 20:52.040
Finding space is finding set bits.

20:52.040 --> 21:00.040
So for 64 bits, we have a compiler built in, which we can, which we can use.

21:00.040 --> 21:07.040
But now the problem is if we have more than 64 bits, if we have more than 64 pages, what do we do?

21:07.040 --> 21:11.040
Yeah, okay, what do you always do?

21:11.040 --> 21:16.040
You turn a linear search into a logarithmic time search.

21:16.040 --> 21:24.040
And in this case, use an index, which is also a bit field, using the same code.

21:24.040 --> 21:26.040
So here's an example.

21:26.040 --> 21:31.040
64 bits are not a good idea to fit onto a slide.

21:31.040 --> 21:37.040
So this example is just with four bit words, but it's the same principle.

21:38.040 --> 21:45.040
So at the bottom is our actual bit field that we, that managers are free pages.

21:45.040 --> 21:54.040
And now what we can very simply do is just maintain another layer of another bit field layer, which has one bits,

21:54.040 --> 22:03.040
whenever the word is non-zero, not just the individual bit, and then do that again, and again, and again.

22:03.040 --> 22:08.040
Yeah, we end up with an index.

22:08.040 --> 22:18.040
Then finding space is just consulting the index until there is one, as long as there is one.

22:18.040 --> 22:23.040
And if the index already says, okay, I don't have any space.

22:23.040 --> 22:30.040
Then we know we're done, and otherwise we have found a hidden, unroll the recursion.

22:30.040 --> 22:41.040
Probably would need to be read on in a closed form, but I didn't bother so far.

22:41.040 --> 22:52.040
So that is what I've just been talking about is finding a page of a certain size where you know that there should be one in the bit field, just find a bit in the bit field.

22:52.040 --> 22:59.040
But we also need to find the right page size to use for an allocation request.

22:59.040 --> 23:11.040
So if the right allocation size is not available, what the body allocator you normally does is it looks upwards.

23:11.040 --> 23:19.040
It looks for larger pages to be for larger free pages and splits them as explained before.

23:19.040 --> 23:31.040
But in this case, for my application, there's also another thing we can do, because if you watch your video on your smartphone,

23:31.040 --> 23:43.040
and one of the sketch needs to fetch that video from some back in somewhere, it's totally happy with storing that in this size.

23:43.040 --> 23:52.040
In this, this chunk size or another chunk size, it doesn't matter so much, it matters a bit for performance, but it doesn't really matter that much.

23:52.040 --> 24:01.040
So that we can make the deliberate decision to fetch the data in smaller sizes, then we would ideally want.

24:01.040 --> 24:15.040
And thus the scram parameter which are introduced to the allocator, which just tells that there is a negative and a positive form, which is slightly different, but in principle, the absolute value matters which just the allocator.

24:15.040 --> 24:31.040
Okay, I'm also fine with a allocation, which is that many orders of magnitude smaller, that many powers of two smaller powers of two smaller.

24:31.040 --> 24:51.040
If you read about body allocators, or at least when I did, I always found this statement, or the body allocator will always return whatever size you requested, rounded up to the next power of two, and sure.

24:51.040 --> 25:00.040
First thing that it does is just that, it knows only powers of two, so in the first place, it gives you that.

25:00.040 --> 25:14.040
But we can do something, we can trim the allocation to the size that the request wanted, and then return the leftover to the allocator.

25:14.040 --> 25:27.040
So that way, okay, you get all these little trim, trimings, as free space, but many of your applications just happens to make use of them.

25:27.040 --> 25:39.040
And the grand stuff that I just talked about is one way to make one way to make it possible to use these trimings.

25:39.040 --> 25:45.040
Something about a look, fragmentation.

25:45.040 --> 26:05.040
All memory allocators fragment. I mean, if you think about it, your memory allocator and people can come to you and allocate all kinds of different countries, sizes, ultimately there will be, and return them and then allocate something else.

26:05.040 --> 26:18.040
There will always be fragmentation, there will always be small parts, which cannot be merged again into bigger parts.

26:18.040 --> 26:33.040
So can we have the Windows 95 defragment tool, and for this application as a web cache, you actually can, because when the cache is full, that's when it's fragmented.

26:33.040 --> 26:43.040
It's not fragmented, usually not fragmented when the cache is only partially used.

26:43.040 --> 26:51.040
So we always need to do, we need to have some cache eviction algorithm, anyway.

26:51.040 --> 27:08.040
And in one cache, this happens to L-I-U, and also in my implementation, this happens to L-I-U, the L-I-U. You could use other eviction algorithms. I'm always talking about L-I-U on these slides, doesn't mean I wasn't aware that there were alternatives.

27:08.040 --> 27:27.040
So how can we combat or counteract fragmentation issues? Well, we can just have a thread, making room, and at the same time trying to allocate larger pages.

27:27.040 --> 27:41.040
That way, we will remove objects as long as they happen eventually to coalesce into the larger size that we want to see.

27:41.040 --> 28:05.040
Now we can put this large size onto a reserve, and whenever some other thread comes along and needs memory, we can return what we have in the reserve and make the other thread happy to get a nice big chunk of memory.

28:05.040 --> 28:23.040
So last part, I want to show you a bit of the API side. Yes, I'm one of these C guys, and I've heard about memory safety.

28:23.040 --> 28:49.040
So there were a couple of things which I thought, or found out actually during the process that I have for, first of all, some separate data types, then batching, waiting allocations, and also asynchronous allocations, allocation requests, I should say.

28:49.040 --> 29:12.040
Coming back to the Lipsy Mallorque API, which is quite common, you all know that you basically get a single pointer back when you make an allocation. So you say, okay, I want 1540 bytes, and you get back a single pointer and you basically the implicit contract is that this will be enough to store 1540 bytes.

29:12.040 --> 29:36.040
And when you free the chunk of memory and you return it to the allocator, the allocator now must somehow know that those were 1540 bytes, or at least, I mean whatever it does, it needs to internally make it so that it makes the right amount of memory available again.

29:36.040 --> 29:56.040
It could be a slab allocator, yada, yada, okay. But here, in this allocator, we actually need to know the original size, if it was a power of 2, we need to know which bit field, we need to clear a bit and excuse me, set a bit.

29:56.040 --> 30:23.040
If it was a different size, which is not a power of 2, we also need to know the exact size because we need to, we then have all these little or increasingly big partial allocations that we also need to clear and potentially merge.

30:24.040 --> 30:38.040
So yeah, and then I told you, okay, one of the main motivations was to have an allocator which doesn't work on memory, but works on disk offsets. So now we have four different things.

30:38.040 --> 30:56.040
We have these arbitrary sized allocations, which I call extends, as pointers or as offsets, and also the power of 2 allocations, as pointers and offsets.

30:56.040 --> 31:05.040
The reason that the letter to exist is just that they are internally a bit more efficient, so I thought, okay, better expose the efficiency.

31:05.040 --> 31:21.040
And yeah, then for each of these four, we have an allocation function and a return function, which ultimately all net onto the same code, but but anyway.

31:21.040 --> 31:37.040
One other thing which I believe is relevant for performance is that entering the allocator is expensive and especially in slash fellow, the disk storage.

31:37.040 --> 31:53.040
Sometimes there is like thousands of segments that need to be returned or allocated at the same time. So why would we want to enter the allocator for one thing and then return and enter it again return.

31:53.040 --> 32:12.040
Also, for mergers to happen, it's really advantageous if we have all the returns at once. Before we make any new allocations, because this way has this chance that returns coalesce.

32:12.040 --> 32:24.040
Before we hand out, before we potentially split something which could more usefully be used as a as a larger page.

32:24.040 --> 32:37.040
So this interface uses a structure on the stack, which is basically a this body Rex stack is basically a.

32:37.040 --> 32:56.040
It's a macro basically creating a compound literal and that then is the argument to a excuse me, then we fill this compound literal with that helper functions make the actual allocation.

32:56.040 --> 33:09.040
And then we get multiple things back get end things back and for all of these end things we can then you take them out of the compound literal structure and and use them individually.

33:09.040 --> 33:17.040
Same idea for returns this was for requests.

33:17.040 --> 33:29.040
So, what do we do as a web cache? If we if we can't get the memory immediately, yeah, we run LIU and I've already talked about this.

33:29.040 --> 33:41.040
So there are two ends to this with this problem one and is is requests which need memory and the other end is LIU which makes room.

33:41.040 --> 34:02.040
And I actually pondered quite a bit how to tackle this best. So is it really a good idea to do this above the memory allocator in the code calling it or should it be something the allocator itself supports.

34:02.040 --> 34:08.040
And I came to the conclusion that really the best way is that the allocator itself supports it.

34:08.040 --> 34:16.040
Which is why we have waiting requests with prior task use. So there's a fixed set of priorities.

34:16.040 --> 34:23.040
There's each priority is basically a take you.

34:23.040 --> 34:29.040
And as long as there's waiting requests, everyone gets onto the onto the queues.

34:29.040 --> 34:35.040
And then something somewhere will make room.

34:35.040 --> 34:42.040
And when it's done making room, it will check, okay, are there any waiting allocations will try to run them.

34:42.040 --> 34:50.040
So that way those should eventually return.

34:50.040 --> 35:00.040
Now this opened an interesting opportunity. I think which is because which is that we do not need to synchronously wait for allocations.

35:00.040 --> 35:09.040
At some point make a request. Oh, I know in this quite soon I'm going to need some memory, but I have stuff to do in the meantime.

35:09.040 --> 35:20.040
So submit the allocation request. Go away do something or I mean react, probably, possibly react to it immediately if it comes back, but if it doesn't.

35:20.040 --> 35:26.040
Go away do something else. And then.

35:26.040 --> 35:34.040
Yeah, you can still judge your mind and synchronously wait or your lucky in comes back and you work with it.

35:34.040 --> 35:44.040
At the moment, this uses the code that that's out there. It uses condition variables.

35:44.040 --> 35:59.040
So it's really based on this model that you either reject asynchronously or that you synchronously wait, but that's not enough for another project.

35:59.040 --> 36:07.040
I need some kind of callback, which is more like in the A8 design pattern.

36:07.040 --> 36:13.040
More following this A8 design pattern.

36:13.040 --> 36:24.040
So regarding performance, I mentioned this initially, so at the end I also wanted to talk about it at least briefly.

36:24.040 --> 36:35.040
The actual project delivers the performance that you expect from it, which is just the bare metal performance. So what if you have 400 gigabit nix, you want to saturate them.

36:35.040 --> 36:41.040
There's also TLS, which is unrelated to this, but the LOK that should not be the bottleneck.

36:41.040 --> 36:47.040
Excuse me, the storage engine should not be the bottleneck and as far as I know it is not.

36:47.040 --> 37:03.040
And then taking one level down, in my own work on this, I've never seen the LOK to be the bottleneck, but I must say, I had this one purpose, I had this one application for it.

37:03.040 --> 37:12.040
So I never, never actually benchmark the whole thing like you would benchmark the email or whatever.

37:12.040 --> 37:23.040
What I can say is that it can easily scale, because you can easily have multiple of these buddies and we actually do.

37:23.040 --> 37:29.040
So why am I here today? And now what?

37:29.040 --> 37:35.040
Yeah, first of all, I wanted to share these ideas in case they might help anyone.

37:35.040 --> 37:46.040
I'm always very fond of seeing people just doing the project's pick and small starting from scratch.

37:46.040 --> 37:58.040
And I just think that the body allocator is an idea, which is so simple that you could basically do it over a weekend.

37:59.040 --> 38:02.040
But I have a big question.

38:02.040 --> 38:10.040
I can hardly believe that no one else before me came up with this bitfield idea.

38:10.040 --> 38:18.040
So if anyone is out there who's done it before, please get in touch.

38:18.040 --> 38:22.040
And let me know, I haven't found you so far if you exist.

38:22.040 --> 38:33.040
But please let me know if you do, at least, so I can give you credit for having the idea, having had the idea before me.

38:33.040 --> 38:36.040
And then, yeah, OK, obviously this is for them.

38:36.040 --> 38:47.040
So another question I have is, is this of general interest, my own case that I wrote this for is I guess special.

38:47.040 --> 38:56.040
A web cache is definitely not the same as a desktop application from the perspective of a memory allocator.

38:56.040 --> 39:02.040
And a normal desktop application cannot just allow you stuff.

39:02.040 --> 39:08.040
So yeah, I know, but still, I have this question.

39:08.040 --> 39:20.040
If if this should for for now and ever remain part of this less project or if if anyone saw any potential in in taking it elsewhere.

39:20.040 --> 39:23.040
Yeah, and that's it. Thank you.

39:23.040 --> 39:38.040
We have some minutes for question and maybe answers.

39:38.040 --> 39:44.040
Anyone here, please wave a bit.

39:44.040 --> 39:46.040
What?

39:46.040 --> 39:51.040
Yeah, sorry.

39:51.040 --> 40:01.040
So I was wondering if you took the bits you needed for the least reason to use policy into account in memory overhead as well.

40:01.040 --> 40:09.040
And if you do something smart like pseudo LRU or if you do some more fully least reason to use policy.

40:09.040 --> 40:15.040
There's a bit of a question I didn't understand something that the bits and LRU.

40:15.040 --> 40:30.040
So you can make a true least recently used where you keep track of which is the actual least recently used thing, but also for least recently used you can actually do your whole body system idea of making it.

40:30.040 --> 40:35.040
The hierarchy of radic steel trees of saying which of the two was the least recently used.

40:35.040 --> 40:36.040
Ah, okay.

40:36.040 --> 40:38.040
And that's called pseudo you.

40:38.040 --> 40:40.040
Yeah, so I wanted this.

40:40.040 --> 40:50.040
I mean, if I understand you correctly, your question is about if we can't kind of track where things are such that we can merge better, right?

40:50.040 --> 40:53.040
Yeah, that just takes a lot of memory if you want to do that.

40:53.040 --> 41:02.040
And that comes back to this, okay, we only use these two bits at the moment for one page, which is great if if you manage a petabyte.

41:02.040 --> 41:18.040
And as soon as we even start to think about having a single pointer eight bytes is already eight times six, I can't I can't think at the moment.

41:18.040 --> 41:23.040
So already way way way out of all of budget.

41:23.040 --> 41:30.040
And regarding the LRU strictness, we do multiple compromises first of all.

41:30.040 --> 41:39.040
We we have a cut of time for one implementation, which doesn't always reinsert.

41:39.040 --> 41:42.040
It just rates a bit until sometimes passed.

41:42.040 --> 41:54.040
And the other compromises, just multiple LRUs, which are also a big big compromise because then the least recently is the least recently on that particular list.

41:54.040 --> 41:55.040
But not globally.

41:55.040 --> 42:02.040
So, but that has to form it because then you have other metrics and stuff that you have multiple of them.

42:02.040 --> 42:03.040
Okay.

42:03.040 --> 42:08.040
And any other question?

42:08.040 --> 42:15.040
Yes, hi, thank you for the talk. So, I don't know if I understand something correctly.

42:15.040 --> 42:19.040
And it's like you can not prevent fermentation for the searching.

42:19.040 --> 42:31.040
But I think I understood that through a preferred eviction size, you do have a effort to find that kind of size the next time that you search.

42:31.040 --> 42:40.040
I know what I said about fermentation is that we can, because in a cash, we can evict stuff.

42:40.040 --> 42:47.040
And we can evict stuff until we have a nice big page.

42:47.040 --> 42:57.040
So, that's the whole idea I was talking about. Does that make sense?

42:57.040 --> 43:07.040
So, you don't have a target for that, like a preferred target for that big space. You just try the maximum.

43:07.040 --> 43:10.040
You just try to evict as much as possible every time.

43:10.040 --> 43:14.040
So, as long as you have free pages, you make this match.

43:14.040 --> 43:22.040
And you can have this grant parameter to say, okay, I'm also okay with a smaller allocation.

43:22.040 --> 43:27.040
But eventually, you will need bigger allocations, right, for some things.

43:27.040 --> 43:33.040
And the point is, they do two things. First of all, we do them in advance.

43:33.040 --> 43:38.040
So, before the fact, before you need to wait for them, we already have them.

43:38.040 --> 43:48.040
That's one thing. And the other thing is, we don't just allow you for some size.

43:48.040 --> 44:02.040
We allow you for a reasonably large size such that we basically express objects until they happen to coalesce into this larger size.

44:02.040 --> 44:13.040
So, and again, that's something you can do on a cash, which you cannot do in, like in Firefox, you cannot just close tabs until you're happy with the fragmentation, right?

44:13.040 --> 44:16.040
Okay, thank you.

44:19.040 --> 44:22.040
Thank you.

44:26.040 --> 44:29.040
But, please.

44:29.040 --> 44:30.040
Camera.

44:30.040 --> 44:32.040
The camera.

44:36.040 --> 44:43.040
Thanks. When I hear LAU, I think ring buffers, essentially, especially for a logging purpose.

44:43.040 --> 44:50.040
Can I basically, but low overhead, abuse this to just allocate a page to put my log message in?

44:50.040 --> 44:54.040
And then let LAU take care of keeping this as a ring buffer.

44:54.040 --> 45:02.040
And somehow, still be able to recover the order of things that I inserted into.

45:02.040 --> 45:06.040
Yeah, I guess.

45:06.040 --> 45:11.040
I mean, it's something you do on top of the LOKTA.

45:11.040 --> 45:20.040
So, what this body implementation gives you is just this whole concept of one side, making room and the other side,

45:20.040 --> 45:26.040
doing allocations with the weights, which connect the two.

45:26.040 --> 45:35.040
The LAU, yeah, okay, I was talking about the LAU again, and again, maybe that was not the best idea in the context of the body LOKTA.

45:35.040 --> 45:39.040
The body LOKTA itself, because it has nothing to do with it, right?

45:39.040 --> 45:49.040
It's something in my implementation which uses it, but it's not like you had to have an LAU to use the body LOKTA, and not even this implementation.

45:49.040 --> 46:04.040
So, yeah, you would probably have some list of your stuff that you know the tail off and the head off and in one end, you insert the other end, you would just take the stuff out and and free it.

46:04.040 --> 46:09.040
And yeah, that's essentially what's happening here.

46:09.040 --> 46:18.040
Just in your case, it would not be like HTTP body segments, but your log stuff or whatever.

46:18.040 --> 46:19.040
Yeah.

46:19.040 --> 46:20.040
Thanks.

46:20.040 --> 46:34.040
Okay, I think I must say this is the last question to get in time here again.

46:34.040 --> 46:36.040
Thank you for your talk.

46:36.040 --> 46:39.040
Thank you for attending, for us them.

46:39.040 --> 46:41.040
Yeah, maybe a warmer applause.

46:41.040 --> 46:45.040
Always we're here.

46:45.040 --> 46:52.040
And, and a big thank you for my side to you as a volunteer and all the other volunteers and everyone making for the possible.

46:52.040 --> 46:54.040
It's absolutely amazing.

46:54.040 --> 46:57.040
Thank you so much for your great work and keep it up.

