WEBVTT

00:00.000 --> 00:09.600
All right, so when I get started, so next up, we've got Material NWA, who's going to be talking

00:09.600 --> 00:13.600
to us about waste-free PLCPU, users-based memory allocation.

00:13.600 --> 00:15.600
All right, thank you, Stefan.

00:15.600 --> 00:21.320
So just as a segue from the previous talk, so the comment topic there is trying not to waste

00:21.320 --> 00:26.160
memory, right, but here my focus is more on user space.

00:26.160 --> 00:31.760
So I'm Mattie Dinoye, I'm a CEO at EPCS, and I'm maintaining the restortable sequence

00:31.760 --> 00:38.920
and memory or system calls, as well as the LTTNG Tracer and in this context, Lebarsec.

00:38.920 --> 00:44.040
It's not, there's no release yet of that project, it's a master branch, but it's a project

00:44.040 --> 00:51.160
that aims at improving per CPU that has structured access in user space.

00:51.160 --> 00:57.720
So, and for that, I need to allocate per CPU data structures, and we'll see how we can do things

00:57.720 --> 01:01.040
better with the Lebarsec project.

01:01.040 --> 01:06.880
All right, so the goal of this presentation, first it's to discuss scaling of data structures

01:06.880 --> 01:12.000
by partitioning, just give a bit of context, discuss the challenges associated with the

01:12.000 --> 01:17.000
use of per CPU data in user space, memory use, file sharing, cache line waste, present

01:17.000 --> 01:23.840
the Lebarsec sample, or CPU allocator, discuss the current M map, M advice, MFDO, pen limitation

01:23.840 --> 01:28.880
with respect shared mappings, meant to be local to a process, because when something is shared,

01:28.880 --> 01:35.320
it's shared with the children after a fork or clone.

01:35.320 --> 01:39.320
So scaling data structures, I'm going to go quickly there because I think there are quite

01:39.320 --> 01:43.600
a few kernel developers in the room, so I'm not going to learn, nobody's going to learn

01:43.600 --> 01:44.600
anything new.

01:44.600 --> 01:49.720
So scope and partitioning of that structure, scope you have stack data, heap partitioning,

01:49.720 --> 01:55.000
you have global variables, tread local storage, in the kernel, that would be adding stuff

01:55.000 --> 02:13.160
into the task, which one though, thread task, well, the Per CPU data, task struct, yes.

02:13.160 --> 02:18.720
So Per CPU data is actually very good, so each CPU access its own data, however, on larger

02:18.720 --> 02:23.400
machines today, that pos some challenges as we will see for user space.

02:23.400 --> 02:30.040
So tread local storage, TLS, so when the workload has much more threads than the system

02:30.040 --> 02:35.840
as CPUs, then it becomes, it leads to inefficient views of the CPU cache, because things

02:35.840 --> 02:40.240
that could be shared between threads end up having those per thread copy.

02:40.240 --> 02:45.640
The access for tread are really, really fast, but if you're bouncing between threads,

02:45.640 --> 02:49.640
then you're invalidating the cache from the other threads.

02:49.640 --> 02:53.160
It's only static definition.

02:53.160 --> 02:58.600
So initialization of large TLS areas, slow down tread creation, still I'm talking about

02:58.600 --> 03:00.240
user space here, right?

03:00.240 --> 03:04.760
So we're in the kernel trap, but I'm talking about how the kernel can help improve user

03:04.760 --> 03:08.600
space, and we'll get to that.

03:08.600 --> 03:14.440
So global dynamic TLS model for shared object is lower than initial exec, and I have additional

03:14.440 --> 03:15.840
side effects.

03:15.840 --> 03:21.080
So Per CPU data is an alternative to TLS, so it's a partitioning strategy that is widely

03:21.080 --> 03:28.000
used within the Linux kernel, but it's not used so much in user space as far as I know.

03:28.000 --> 03:33.040
And when it's used, there's actually this entire pattern that we observe.

03:33.040 --> 03:35.760
People just declare arrays of Per CPU items.

03:35.760 --> 03:40.400
They basically calculate, what's my worst possible case for the number of CPUs?

03:40.400 --> 03:45.360
I just create an array of Per CPU items, and I indexed by the current CPU number.

03:45.360 --> 03:46.360
What can go wrong there?

03:46.360 --> 03:49.480
Well, we'll see.

03:49.480 --> 03:56.040
So in the index, either with the result of schedule CPU, or the RSEC CPU ID field.

03:56.040 --> 04:04.920
Actually, since Linux 6.3, RSEC added a new concurrency ID field, which basically gives

04:04.920 --> 04:12.960
you an index into a first view data structure, but which is indexed by a current index

04:12.960 --> 04:19.080
value that you have within the process that is, as close to zero as possible, allocated

04:19.080 --> 04:23.000
by the scheduler based on the number of concurrently running threads.

04:23.000 --> 04:26.680
So here, we have this entire pattern back to this entity pattern.

04:26.680 --> 04:30.720
So at the left side, we can see fall sharing.

04:30.720 --> 04:35.160
So if you don't align the elements, and they are smaller than a cache line, then you

04:35.160 --> 04:40.800
end up having cache line bouncing across your CPUs that do access to different indexes

04:40.800 --> 04:42.760
in your data structure.

04:42.760 --> 04:47.400
So one solution for that, you'll say, well, okay, let's align each of those data structure,

04:47.400 --> 04:49.160
cache line align.

04:49.160 --> 04:54.440
Well, the problem there is, if you do that for a lot of data in your program, you're

04:54.440 --> 04:58.160
actually wasting hot cache lines with padding.

04:58.160 --> 05:04.040
So you're not, so your functional density of the cache line becomes really bad.

05:04.040 --> 05:09.640
So yeah, just as I said, elements not cache line align, it hurts performance due to fall

05:09.640 --> 05:14.160
sharing if they are a cache line align with padding, you waste cache lines.

05:14.160 --> 05:19.480
So if we look at how the Linux kernel get away with that is by implementing its own

05:19.720 --> 05:21.720
CPU allocator.

05:21.720 --> 05:28.480
It basically maps a memory range on each CPU, and the memory allocator allocates ranges

05:28.480 --> 05:32.320
at the same offset on semantically on each CPU.

05:32.320 --> 05:39.200
So when you get some CPU data pointer that you've allocated, you then use it by indexing

05:39.200 --> 05:40.200
it with a stride.

05:40.200 --> 05:45.000
That will give you the, well, not necessarily as stride, they can do tricks with segments

05:45.000 --> 05:50.880
and select on some architectures, but the idea is, you actually offset in your own

05:50.880 --> 05:53.080
pursuit data.

05:53.080 --> 05:58.320
So I've implemented the similar track for user space in the LIDAR-SEC MENPOL

05:58.320 --> 05:59.960
Per CPU allocator.

05:59.960 --> 06:06.320
So it's a port of those Per CPU kernel allocator concepts, two user space.

06:06.320 --> 06:12.760
It's implemented as a user space API with the LIDAR-SEC, and it basically, so I have an API

06:12.800 --> 06:16.000
to allow user space to create memory pool.

06:16.000 --> 06:22.720
Polls, sorry, each pool maps a memory range, and it's actually a array of Per CPU areas.

06:22.720 --> 06:29.280
So the user can define the size of those areas, for instance, it can be 64 kilobyte per

06:29.280 --> 06:35.600
CPU, and then the allocation against the pool reserved memory at the same offset for each

06:35.600 --> 06:38.440
CPU when it's reserved.

06:38.440 --> 06:41.200
So here's the layout of the MENPOL range.

06:41.200 --> 06:49.560
So we've allocated memory for each of those CPUs, each have their own Per CPU area,

06:49.560 --> 06:52.400
and it's all a contegress.

06:52.400 --> 06:58.760
Then we have the allocation for our stride A, I've reserved the first slot, let's say,

06:58.760 --> 07:04.440
and stride B, the second slot, and then the rest is unallocated.

07:04.440 --> 07:09.880
So the memory access pattern, so we replace a memory of Per CPU variables, so that was

07:09.880 --> 07:15.560
the anti-pattern, anti-pattern that I told you about earlier, where you have the base pointer,

07:15.560 --> 07:21.960
and then you offset by multiplying the CPU times size of your item, that would be the anti-pattern.

07:21.960 --> 07:27.560
We replace that by just flipping over the calculation.

07:27.560 --> 07:34.480
So rather than having an item to the array base pointer, we have pointer, and I'm going back

07:34.480 --> 07:39.560
one's line, so you'd have a pointer to struck B in CPU zero.

07:39.560 --> 07:43.000
Then we add CPU number times pull stride.

07:43.000 --> 07:49.040
So if the pull stride is 64K by default, that gets you to struck B on the right CPU on which

07:49.040 --> 07:50.040
you are.

07:50.040 --> 07:52.800
So that's how you get to your Per CPU data.

07:52.800 --> 08:00.280
So as you'll see, it's a base plus multiplication, so it's the same thing as it was before,

08:00.280 --> 08:02.680
but we're gaining something.

08:02.680 --> 08:09.120
What we're gaining is we're placing all the memory for a given CPU together, so we don't

08:09.120 --> 08:17.600
have to add this padding between the members by having all the items separately for

08:17.600 --> 08:18.600
a CPU.

08:18.600 --> 08:19.600
Okay.

08:19.600 --> 08:25.440
So allocating from a pull, so it returns a pointer to the array of CPU zero, combines

08:25.440 --> 08:31.320
information about the base of the pull range and offset of the item.

08:31.320 --> 08:38.720
So that's the mempool range layout with metadata, so we can see the same per CPU memory

08:38.720 --> 08:41.040
array as that I showed you before.

08:41.040 --> 08:44.120
Now we can see additional things added there.

08:44.120 --> 08:45.680
So I have a header page.

08:45.680 --> 08:51.360
I have a cannery page, initial values, and if configured in robust mode, there's a

08:51.360 --> 08:54.200
free list as well there.

08:54.200 --> 08:59.720
So the header page I'm going to come back to this, but this is to end all freeing memory.

08:59.720 --> 09:04.320
The cannery page is to end all fork, and I'm going to tell you why then later.

09:04.320 --> 09:08.600
The in-ed values, I'm coming back to this, so I'll come back a lot to that slide.

09:08.600 --> 09:16.600
So the in-ed value is to, it's meant to populate the initialization values of newly

09:16.600 --> 09:23.520
allocated memory array without requiring the system to allocate physical memory for each

09:23.520 --> 09:25.360
possible CPU.

09:25.360 --> 09:30.720
And that's a big part of the challenge of doing a CPU allocator in user space.

09:30.720 --> 09:36.840
So you end up in continuous scenario where the CPU set, to restrict you to a subset of CPUs,

09:36.840 --> 09:43.800
but then as soon as you start walking over all your CPU indexes to initialize data,

09:43.800 --> 09:50.640
you've actually done copy on right or required allocation of actual memory, and then, well,

09:50.640 --> 09:53.760
that hurts memory consumption.

09:53.760 --> 09:56.240
So freeing items from pool.

09:56.240 --> 10:01.280
So I wanted to support multiple pools to provide isolation between users, so it's not

10:01.280 --> 10:03.400
a single pool per process.

10:03.400 --> 10:05.120
It's really multiple pools.

10:05.120 --> 10:10.920
But I wanted to do so without requiring the API of free to take a next-door argument besides

10:10.920 --> 10:12.960
the pointer to free.

10:12.960 --> 10:14.680
So here's how I did it.

10:14.680 --> 10:19.200
So I need to reach the pool free list from a pointer to be free.

10:19.200 --> 10:26.000
So how I do it is by aligning the pool range at specific addresses and applying a mask on

10:26.000 --> 10:29.320
the pointer to figure out where that base starts.

10:29.320 --> 10:34.920
So here, I have a pointer that in the range of CPUs, I apply a mask, I find the header

10:34.920 --> 10:38.040
page, the header page, ask everything I need.

10:38.040 --> 10:41.040
That's basically how I do it.

10:41.040 --> 10:47.200
So it's a line, so there's no aligning map exposed by the kernel, so the trick I do is

10:47.200 --> 10:52.600
I align larger than I need, and then I cut away the pieces I don't need.

10:52.600 --> 10:54.480
Memory initialization.

10:54.480 --> 10:58.680
Yeah, so touching every CPU on lower system is an issue.

10:58.680 --> 11:04.680
If you have think a bit with 500 and 12 or more hardware threads, and then you have a

11:04.680 --> 11:14.400
container that restrict your CPU sets or skip affinity, you don't want to touch the

11:14.400 --> 11:18.600
Per CPU area that is not used in that container.

11:18.600 --> 11:25.000
You may want, so in my case, I reserve virtual memory, but I don't want to reserve physical

11:25.000 --> 11:27.480
memory for this.

11:27.480 --> 11:32.920
So that's why I introduced a R-Squec-memical Per CPU malachinette.

11:32.920 --> 11:39.400
So in order to allow the application to stop using this pattern of let's allocate memory

11:39.400 --> 11:44.840
and touch every thing for every possible CPU, I include the initialization in the allocator.

11:44.840 --> 11:49.640
So you pass the initialization pointer and length to the allocator saying, well, that's

11:49.640 --> 11:54.600
what I want, to that memory to be initialized to.

11:54.600 --> 12:00.400
Then I allocate a additional in a range mapping, that's at the end of those mapping, the

12:00.400 --> 12:03.120
in-advalues after the last CPU.

12:03.120 --> 12:12.560
This additional range, so it's a memfd, which has a copy on right mapping for each.

12:12.560 --> 12:19.520
So initially, so we create a memfd, we have this memfd, we have this in-advalue, which

12:19.520 --> 12:28.400
is a memfd, and it's shared mapping, and it has a private mapping of that same range, of

12:28.400 --> 12:32.640
that same backing file, a RIA, into each of the CPU.

12:32.640 --> 12:34.240
Those are private though.

12:34.240 --> 12:39.680
So they observe the change to the in-advalues, but as soon as you write into one of those

12:39.680 --> 12:47.600
CPU, a RIA, they trigger a copy on right, and then they get their own copy of the page.

12:47.600 --> 12:52.720
So it's not for the entire thing, it's page per page that this is done using a copy on right mechanism.

12:53.280 --> 12:57.040
All right, this is, yeah, on-store.

13:00.000 --> 13:04.960
So the idea is we write the initial content of the newly allocated RIA, so let's say you

13:04.960 --> 13:06.240
malach in it something.

13:06.240 --> 13:10.960
We write the initial content in the in-adrange for that small block of data.

13:10.960 --> 13:16.000
We iterate on all possible CPU, and read the content visible from each CPU mapping,

13:16.000 --> 13:18.080
compare it with the in-adrange.

13:18.080 --> 13:21.600
If it matches, there's no need to store, it means there was no copy on right.

13:21.600 --> 13:24.240
We still have this shared backing page.

13:24.240 --> 13:29.840
If it does not match, it means a copy on right has happened to the page due to store from that

13:29.840 --> 13:30.640
CPU.

13:30.640 --> 13:34.880
So then we need to store the initial contents on that CPU mapping as well.

13:34.880 --> 13:43.520
So it ensures that memory is only reserved when it is actively used, stored to by active CPUs.

13:44.480 --> 13:48.720
So the link of four can clone is tricky as well.

13:48.720 --> 13:56.880
So the problem is, with memory of the open, the in-adrange that we create are shared,

13:57.600 --> 14:00.080
and there's no kind of memory of the private flag.

14:00.080 --> 14:05.920
So the one thing I would like to see happening eventually, and I'd like your feedback on that,

14:05.920 --> 14:13.280
is to add a new flag to memory of the open, saying, well, this backing file is private,

14:13.280 --> 14:14.800
to the process.

14:14.800 --> 14:20.080
So there could be multiple shared mappings of that backing file within the process.

14:20.080 --> 14:23.600
And that's all fine, and then you can do map private and everything.

14:23.600 --> 14:32.880
But backing file, if you clone or fork in the child process, it would not be the same backing file.

14:32.880 --> 14:42.880
It would be a copy of that backing file with the exact same MMVMA's layout and everything on that new backing file.

14:42.960 --> 14:47.680
So that's something that would be interesting to, and I'd like to have your feedback on that.

14:49.040 --> 14:53.920
That would be, so currently, I need to use a workaround using M advice.

14:54.800 --> 15:00.160
So I have this cannery page where I do a don't fork M advice to basically,

15:01.040 --> 15:02.480
and I use wipe on fork.

15:02.480 --> 15:07.440
So I don't fork the rest of the mappings of this cannery page with wipe on fork.

15:08.160 --> 15:13.520
That clears a bit in that page to allow me to detect that I'm actually in a child in a fork.

15:14.800 --> 15:18.720
So I don't allow users to use those memory area across fork.

15:18.720 --> 15:20.640
So that's documented.

15:20.640 --> 15:24.720
So but with the MMV private or equivalent, that would solve all this.

15:26.240 --> 15:29.200
So there's a bunch of additional features as well.

15:29.920 --> 15:35.360
So they put auto-expans to add additional ranges when arranges fully allocated.

15:35.840 --> 15:42.800
The MMV can be configured to either do the copy on right from in a range that I showed or from a zero page.

15:44.560 --> 15:47.760
There's a corruption checks as well.

15:48.640 --> 15:50.320
There's a notion of MMV set.

15:51.040 --> 15:54.080
That's a collection of power of two allocation size pools.

15:55.280 --> 16:01.920
So future work, adding support for allocation of variable sized element within a pool,

16:02.000 --> 16:03.440
that would be interesting.

16:03.440 --> 16:08.960
Add a guard page between perceived data to eliminate a cash line bouncing caused by hardware

16:08.960 --> 16:11.280
prefetch in sequential pattern access.

16:11.280 --> 16:14.000
So I actually noticed that in when benchmarking.

16:14.640 --> 16:20.720
So when you have access pattern, that really reads sequentially up to the end of the page.

16:20.720 --> 16:25.680
If you don't put a guard page in there, it's actually going to bleed on intel.

16:25.680 --> 16:29.280
It's going to bleed the read on the next perceived data.

16:29.280 --> 16:32.800
And you're actually causing false cash line bouncing because of that.

16:34.400 --> 16:37.440
So other in depth, future work.

16:39.120 --> 16:46.000
So as I said, figuring out a way to have an enemy move file that is prior to a process that would help a lot.

16:46.000 --> 16:51.440
Meanwhile, I could also work around that for the single shredded case with the copy in a

16:51.440 --> 16:53.760
picture that's workender, that's not done yet.

16:54.960 --> 17:00.960
And then there's other work I intend to work on on the C Group CPU controller to allow

17:00.960 --> 17:03.760
expressing concurrency limits without CPU sets.

17:03.760 --> 17:06.320
I presented that yesterday in the container's rack.

17:10.000 --> 17:14.480
So and that's a small slide on MMV create, MMV private.

17:14.880 --> 17:21.040
So for can clone can be handled more robustly by adding a MMV private flag to

17:21.040 --> 17:21.760
MMV create.

17:24.320 --> 17:26.400
I explained that earlier, the use case.

17:26.400 --> 17:29.040
So it's not just the main rule per CPU allocator.

17:29.040 --> 17:32.160
There's a mesh allocator that exists that would require this as well.

17:32.160 --> 17:37.280
The same physical page at different addresses to reduce internal allocator

17:37.280 --> 17:38.480
fragmentation.

17:38.480 --> 17:43.120
And there's also some Google dynamic analysis tools that require

17:43.120 --> 17:46.560
map shared mappings of a given page within a process.

17:46.560 --> 17:50.000
And they would like to be able to map private on fork.

17:51.040 --> 17:52.800
So that's all I have.

17:52.800 --> 17:54.240
And we have time for questions.

18:04.240 --> 18:07.200
So this is per CPU variables for a user space.

18:07.200 --> 18:07.600
Right?

18:07.600 --> 18:08.080
Yes.

18:08.080 --> 18:09.200
So what's conceptually?

18:09.200 --> 18:13.920
What's the difference between TLS like Burr threat and per CPU?

18:13.920 --> 18:17.760
I mean, from user space point of view that it should be interchangeable.

18:18.720 --> 18:26.000
So for TLS, you define a global variable or static variable.

18:26.000 --> 18:28.640
For per CPU, you can allocate dynamically instead.

18:29.600 --> 18:34.720
So he think you're really dynamic allocation of memory, which you don't have with TLS.

18:34.720 --> 18:38.320
And the TLS, you have one copy of the variable per threat.

18:39.040 --> 18:44.800
Per CPU, you have one copy of the variable per physical hardware threat.

18:45.600 --> 18:48.800
So if you have a system where you have tons and tons and tons and tons of

18:48.800 --> 18:54.240
worker threads, then they are much more than the number of physical hardware

18:54.240 --> 18:58.880
resources, then you misuse your memory because you're allocating

18:58.880 --> 19:04.720
which much and you actually can start having, well, you don't use your CPU

19:04.720 --> 19:07.440
cache as well, also in those cases.

19:09.440 --> 19:11.760
But those are for different use cases, I would say.

19:11.760 --> 19:16.400
Not all algorithm that can be done in TLS are suitable for

19:16.400 --> 19:17.200
per CPU.

19:17.200 --> 19:18.800
You really need to think about it.

19:19.760 --> 19:20.160
Thank you.

19:22.000 --> 19:22.800
Other questions?

19:22.800 --> 19:24.400
Yeah, exactly.

19:25.440 --> 19:27.280
You said it's for different use cases.

19:27.280 --> 19:28.640
I totally get that.

19:28.640 --> 19:33.280
But what I still fail to see is a use case that really makes

19:33.280 --> 19:38.640
per CPU data interesting in user space because how do I

19:39.360 --> 19:41.280
how do I deal with atomicity?

19:41.280 --> 19:45.600
So my flat can be scheduled to a different core and a different

19:45.600 --> 19:47.840
flat can be scheduled to the same CPU.

19:47.840 --> 19:53.520
So how do I deal with concurrency with consistency?

19:53.520 --> 19:54.320
Consistency.

19:54.320 --> 19:55.840
Yes, I was looking forward.

19:55.840 --> 19:57.440
Yeah, I'm glad you asked that question.

19:57.440 --> 20:01.200
So I maintain the restortable sequence system call RSEC.

20:01.200 --> 20:03.760
And it's meant to handle just that problem.

20:03.760 --> 20:07.680
So you can create small critical sections in assembly.

20:07.680 --> 20:11.120
And you let the kernel know if that if it preamps you.

20:11.120 --> 20:14.800
So preemption, signal delivery, and migration,

20:14.800 --> 20:18.320
it's going to move your instruction pointer to an abortender.

20:18.320 --> 20:21.840
So that allow, so if you complete that critical section,

20:21.840 --> 20:24.640
it meant it all ran on the same CPU.

20:24.640 --> 20:28.080
So you are guaranteed that nobody has played on that

20:28.080 --> 20:29.600
that structure while you were running.

20:32.080 --> 20:32.880
Yeah.

20:32.880 --> 20:36.960
Thank you very much for our interest in presentation.

20:36.960 --> 20:40.320
Could it please shortly describe the main difference

20:40.320 --> 20:45.680
between TCMALAQ, HP, CPU, cache, and year approach?

20:45.680 --> 20:51.120
So the TCMALAQ perceived you caches.

20:51.120 --> 20:57.360
So as I understand TCMALAQ, so those are caches for allocation

20:57.360 --> 21:02.560
of memory that is meant to be used across the entire system.

21:02.560 --> 21:06.480
In my case, this is per CPU memory areas.

21:06.480 --> 21:10.880
So the difference is, so the TCMALAQ perceived you caches.

21:10.880 --> 21:14.160
They want to quickly be able to reuse a memory area

21:14.160 --> 21:17.200
that they've allocated, and then it's reclaimed,

21:17.200 --> 21:18.800
and then they want to reuse it quickly.

21:18.800 --> 21:23.200
So it's a fast path that bypasses lower allocator.

21:23.200 --> 21:26.120
So in my case, I'm not allocating memory

21:26.120 --> 21:28.000
meant to be handed over.

21:28.000 --> 21:30.520
So it's not a single area of memory.

21:30.520 --> 21:34.120
What I'm allocating is let's say I have 512 CPU.

21:34.120 --> 21:37.800
So similarly, you do one allocation, you get 512

21:37.800 --> 21:41.640
smaller areas of memory, and depending on which CPU you are,

21:41.640 --> 21:45.480
you're using the current area that belong to your CPU.

21:45.480 --> 21:49.080
So it's a allocator of per CPU data,

21:49.080 --> 21:53.240
and not a cache that is per CPU for a generic allocator.

21:53.240 --> 21:55.800
It's a specialized allocator that I do.

21:55.800 --> 21:57.640
May I ask one more short question.

21:57.640 --> 22:02.360
So as I understand, it can be used by TCMALAQ

22:02.360 --> 22:05.080
to implement their base BPSPU caches.

22:05.080 --> 22:06.440
You are correct.

22:06.440 --> 22:07.320
Thank you so much.

22:07.320 --> 22:09.160
Welcome.

22:09.160 --> 22:09.720
Any questions?

22:15.000 --> 22:16.520
If not, thank you very much.

