WEBVTT

00:00.000 --> 00:18.000
We're going to start again, just please get seated, or if you're chatting, maybe go outside.

00:18.000 --> 00:25.000
Yeah, we're going to be talking about data access, monitoring, and access to where system operations now.

00:25.000 --> 00:27.000
Can you hear me?

00:27.000 --> 00:28.000
Okay.

00:28.000 --> 00:31.000
Yeah, thank you for joining this talk.

00:31.000 --> 00:32.000
So, my name is Song Jae.

00:32.000 --> 00:36.000
You can just call me SJ, and currently working on mirror.

00:36.000 --> 00:45.000
So, today I want to introduce a corner subsystem called Daemon, which is for data access monitoring and access of the system operations.

00:45.000 --> 00:48.000
Let's talk about the data access monitoring first.

00:48.000 --> 00:51.000
First of all, how we can define the data accesses.

00:51.000 --> 01:05.000
We can define data accesses as events on space and time of memory that acting as fetching or writing data to some specific place on time.

01:05.000 --> 01:13.000
And I hope we want to do data access monitoring in very precise and complete and light manner.

01:13.000 --> 01:17.000
That is, we want to take time-generating edges as possible.

01:17.000 --> 01:21.000
Hopefully, CPU cycles, device by number of CPUs.

01:21.000 --> 01:24.000
And for each of the bits, we're even electron.

01:24.000 --> 01:31.000
And for complete history starting from Big Bang, we're at least from booting of the system.

01:31.000 --> 01:35.000
And we want it to lightweight enough to run on production systems.

01:35.000 --> 01:49.000
But the reality is that it is very expensive and also you aren't going to really need it because the overhead is in order of memory size for time complexity and memory size.

01:49.000 --> 01:56.000
But applied by the total monitoring time space overhead if you want to do the recording of all the events.

01:56.000 --> 02:03.000
And maybe you wouldn't really care if a bit was accessed without five years ago.

02:03.000 --> 02:08.000
And therefore, we introduce statement which is a tradeoff between accuracy and overhead.

02:08.000 --> 02:20.000
It aims to be scalable in terms of a size of the system and provide best efforts of accuracy and being the lightweight enough to be used in real world systems.

02:20.000 --> 02:26.000
And we start from the region-based phase handling for reducing the overhead of the space management.

02:26.000 --> 02:36.000
We first define a region as an access motion unit which is a sub-area of the memory in terms of space and the time.

02:36.000 --> 02:44.000
And it is a collection of average percent elements that have been similar access pattern or access frequency.

02:44.000 --> 02:50.000
And instead by the definition, we don't need to check the access to each of the components of the region.

02:50.000 --> 02:58.000
But we need to check access to one of the one random element of the region.

02:58.000 --> 03:12.000
And therefore, for example, we can say if the page is accessed within last one second or not, regardless of how much of the cache line inside the page has been accessed or not.

03:12.000 --> 03:17.000
So we can start from six granularities phase and time access region.

03:17.000 --> 03:25.000
That is, this is sort of periodic six granularities monitoring like what we can do with either page tracking.

03:25.000 --> 03:30.000
And in this game, the time overhead becomes mammalized, divided by space granularity.

03:30.000 --> 03:37.000
And the space overhead becomes time overhead multiplied by monitoring time, but divided by time granularity.

03:37.000 --> 03:41.000
That is, it can reduce it by the granularity.

03:41.000 --> 03:47.000
But still ruled by the mammalized and total monitoring time and therefore, not that scale of it.

03:47.000 --> 03:54.000
And therefore, we can just think about aggregating the bindings from the access sampling by n.

03:54.000 --> 04:06.000
That is, we do n-graded access frequency, we define the sampling interval and regression interval and then check the access of access to each region in sampling for every sampling interval.

04:06.000 --> 04:12.000
And then, aggregate the bindings for given user given time interval code obligation interval.

04:12.000 --> 04:16.000
And then, please start for next obligation interval and so on.

04:16.000 --> 04:23.000
In this way, we can let you just know more fine-grained access frequency with some lower overhead.

04:23.000 --> 04:29.000
That is, the space overhead is now reduced to 1 divided by n.

04:29.000 --> 04:37.000
Though the sphere is mammalized, multiplied by the total monitoring time.

04:37.000 --> 04:42.000
And from this graph, we can show that there are some inefficiency.

04:42.000 --> 04:48.000
That is, there are some other some three regions of similar hunties by the definition of the region.

04:48.000 --> 04:54.000
Region means that all the entities inside the region had in similar access patterns.

04:54.000 --> 05:05.000
And therefore, in this graph, the two regions of access frequency 1 and the two regions of access frequency 2, which are just inefficiency.

05:05.000 --> 05:10.000
And therefore, we introduce a dynamic space granularity mechanism, which,

05:10.000 --> 05:18.000
liquidably, merge the other three regions having similar access frequency and therefore, just wait for into one region.

05:18.000 --> 05:25.000
And also, together with that, we randomly split each of the regions to two sub regions.

05:25.000 --> 05:33.000
In this way, the number of regions becomes sent to the number of different access patterns of the system, or workloads.

05:33.000 --> 05:39.000
And further we can let users to set the minimum and maximum number of regions.

05:39.000 --> 05:44.000
The two regions could be the five regions could be two regions, right?

05:44.000 --> 05:54.000
And therefore, we simply count how long the given hunties has kept for each of the regions using another counter-code age.

05:54.000 --> 06:00.000
And then the snapshot contains history of not full length but useful length.

06:00.000 --> 06:11.000
For example, by beginning each of the regions we have the age 1 and then if the access frequency has not changed, then the age counter is increased.

06:11.000 --> 06:23.000
And from this point we can just discuss the first snapshot and we can relatively long history using only the final region, final snapshot.

06:23.000 --> 06:33.000
In this way, Daemon provides the data access patterns snapshot like this. Let me give you a picture.

06:33.000 --> 06:45.000
So we can start Daemon with, oh, okay.

06:45.000 --> 07:00.000
Actually, the Daemon is running now and therefore we can just ask Daemon to show the access.

07:04.000 --> 07:14.000
Pattern like this. So it shows on which others how long how big size of regions are there and how frequency access for how long times.

07:14.000 --> 07:28.000
For example, in this case, the 409 megabyte region at the end of the other space is that access number for more than 20 hours because this is to come like the access.

07:28.000 --> 07:42.000
And let's see the access pattern of gain.

07:42.000 --> 07:53.000
Oops. It's already a bit.

07:53.000 --> 08:09.000
And then it shows some more colorful outputs and also you can sort the regions by access temperature to show cold regions first and hot regions at the bottom.

08:09.000 --> 08:25.000
So you can draw some kind of histogram that showing how much of the regions are in which specific access temperature or more intuitive way.

08:25.000 --> 08:45.000
So that's what Daemon is providing. It provides the next number of regions times space overhead and both the time and space overhead are not ruled by the value size or more time which you just cannot control in many cases.

08:45.000 --> 08:55.000
And how accurate and how heavyweight it is, it is up to your setting that is you can do at least control and under the control.

08:55.000 --> 09:11.000
Daemon provides the best effort research and it is constantly being successfully used by some real world production products for years and also usually three to four percent single CPU usage is repeated so far Hawaiian skins.

09:11.000 --> 09:27.000
The skins describes what memory operation options the users want to be applied to regions of specific access pattern for example find regions or that not access data for five hours and then reclaim those even if there is no memory pressure on the system.

09:27.000 --> 09:43.000
We'll find a hot memory in a very far normal mode and then the hot memory in very far normal mode and migrate those to more closer but closer normal mode.

09:43.000 --> 09:56.000
And then Daemon Daemon finds the regions of access pattern and then apply the memory operation action for each of the user given time interval.

09:56.000 --> 10:01.000
And also there are some additional features for production level controls that is.

10:01.000 --> 10:08.000
As I just mentioned Daemon can allow you to operate systems in access pattern.

10:08.000 --> 10:16.000
Access pattern is not the full information that you need. You need some more information and some more ways to control it.

10:16.000 --> 10:25.000
For that we have introduced feature called Daemon filters which defines the target memory of the skins in non access pattern based information.

10:25.000 --> 10:34.000
For example, you can ask Daemon Daemon's to page output pages of specific mode node that also see I did with C group A and feedback like this.

10:34.000 --> 10:53.000
This is important because you just might know something more than corner and also in some times Daemon's can be too aggressive will less aggressive than ideal and the users can set the resource portal for Daemon's using or feature for Daemon's quarter.

10:54.000 --> 11:17.000
For example, you can ask Daemon to page output pages of 100 megabytes per second for using only 2% of CPU time and from this point we also introduced auto tuning feature of the quarter that is the quarter feature is important because it can control the aggressiveness of Daemon and the aggressiveness is what many people really need.

11:17.000 --> 11:31.000
They want to page output page specific amount of port pages which is called this one first and in such a way quarter was but it is hard to know ideal aggressiveness.

11:31.000 --> 11:42.000
The quarter auto tuning feature allows you to set only target system metrics and then Daemon is the auto tuning of the aggressiveness to achieve the target.

11:42.000 --> 12:01.000
And we also found that Daemon has started as a way to operate systems but also we realize that Daemon is also useful for monitoring that is Daemon's can be used for efficient and fine grind data smothering for this we have introduced a special Daemon's action called Daemon's tab.

12:01.000 --> 12:21.000
This is not a usual action for making real system operational change but just a special action that making no system change at all but just expose the skin information like which older regions were found as the Daemon's skin target based on the filters and photos.

12:21.000 --> 12:29.000
In this way for example we can do page level properties based monitoring for example.

12:29.000 --> 12:46.000
So let us begin and we have shown the access pattern in this way before right and.

12:46.000 --> 13:03.000
By adding filter say resulting known on pages we can show how much of each of the regions are backed by non anonymous that is five act regions.

13:03.000 --> 13:10.000
For example by just including Daemon that age had a file and then you can use Daemon functions and structures.

13:10.000 --> 13:31.000
And also Daemon provides user space API based on cfs it is recommended for user space program development developers and also we provide the Daemon user space tour named Daemon which built on top of the Daemon user space API it is mostly commenting for you to use if you are human.

13:31.000 --> 13:41.000
And finally there are some Daemon modules that written with Daemon kernel API for specific given purpose like practical information.

13:41.000 --> 13:51.000
These modules provides more simplified interface and for those are also recommended if you if you want to use Daemon for the specific usage.

13:52.000 --> 14:02.000
If you want to do first then you can also start from running Daemon test suit which is country being run on a very humble CI system for each of.

14:03.000 --> 14:20.000
And also Daemon is committed to live on project of course because it is a part of links corner for any questions help touch reviews please use the community we have public value list we have 5 weekly virtual middle series which.

14:20.000 --> 14:35.000
We had on google meet and also we have a project website collecting every resource and use and also the future old Daemon is there for always open and up to you because it is led by community so.

14:35.000 --> 14:51.000
If before end of evolution over intelligent design and I also didn't know how Daemon will be like this when I started it and I have no idea about how it looked like a few years later so if you have anything you want please make change.

14:51.000 --> 15:03.000
So in summary that's Daemon Daemon is a concept system for data assessment training and access over system operations and the future is up to you please participate.

15:03.000 --> 15:06.000
That's all. Thank you. Any questions?

15:06.000 --> 15:19.000
All right questions.

15:19.000 --> 15:41.000
That looks great. My questions in the process and in progress there's you've got the pages and the messages that tell you the kind of memory ranges that each process has mapped and the output that you showed there does the kind of.

15:41.000 --> 15:51.000
Here's the ground like compression boxing of your memory ranges over time and stuff is there anything that exists either written by you or somebody else that reconciles.

15:51.000 --> 16:02.000
The access patterns to the memory that the pages that individual processes have mapped so that you can do reconciliation between process level.

16:02.000 --> 16:06.000
Memory access and time series stuff.

16:06.000 --> 16:21.000
So the question is if there is some use of space ABI like interface that shows so that is Daemon has initially developed for DRAM level memory access.

16:21.000 --> 16:34.000
And in case of casual it might need some kind of cash level monitoring I believe was some other information and for cash level monitoring Daemon might be too big.

16:34.000 --> 16:48.000
But even in the case in modern systems having relatively large amount of cash size that is it is not very rare to see about 100 megabytes size LLC cash right.

16:48.000 --> 17:00.000
In the case even page rendering monitoring research could also show some pattern on the cash address space and for I believe that there could be some chance to use Daemon for scheduler.

17:00.000 --> 17:08.000
And also currently it is using only access to be tracking however we have some plans to make it further.

17:08.000 --> 17:31.000
It's tended to use some fine-grained information like how do a feature like AMD IBS instruction based sampling or a paid for each based information like the auto numer which can allow us to know from which CPU the access has made and based on that I believe that the scheduler could utilize some of the information hopefully.

17:31.000 --> 17:35.000
Thanks.

17:35.000 --> 17:54.000
Yes, I want to ask I think the swap system also keeps track of these recently used pages to what to swap out to what extent.

17:54.000 --> 17:59.000
Is that completely separate from Daemon now and can you maybe merge those two sort of.

17:59.000 --> 18:16.000
I'm currently it is completely separated that is we don't Daemon doesn't want to interfere the swap mechanism and therefore Daemon has developed to never interfere the swap.

18:16.000 --> 18:34.000
Nevertheless, the swap system is using error mechanism to find which page to evict and Daemon is not touching the error list at all and therefore they are separate things and they don't interfere each other and also there is other page flaps that Daemon is using.

18:34.000 --> 19:02.000
Besides what reclaimed logic is using, however one thing that we can further think about is using Daemon to predict claim called pages in this case the reclaimed logic is not directly affected but could get some benefit from the change and also we develop a Daemon's scheme that can find hot pages and then moving those to say.

19:02.000 --> 19:17.000
Head of the error list and find called pages and move those to tell of the error list so that the later reclaimed logic can get some benefit finding better candidate of swap.

19:17.000 --> 19:23.000
Yes, very cool.

19:23.000 --> 19:29.000
All right, let me finish.

19:29.000 --> 19:46.000
P-based on the access frequency and it has shown schwm benefit on the experiment and also there are THP shrunko which is split T and THP based on if it is assumed to be really used or not based on the content of the THP.

19:46.000 --> 20:04.000
That is if say 90% of THP is filled by zero by it then it means that the user is not really writing the page and the version to be not used and therefore we can split those into based pages to reduce the internal fragmentation and memory overhead.

20:04.000 --> 20:19.000
I believe that we can use Daemon together with this for finding nice threshold for splitting what that of the ratio of the gel-based pages in such case and also there could be some more other approaches I believe.

20:19.000 --> 20:23.000
It's done if anything else.

20:23.000 --> 20:29.000
All right, thank you.