WEBVTT

00:00.000 --> 00:09.560
All right, so next, we're going to be talking about Cisco tracing and

00:09.560 --> 00:10.560
resolutions aboard.

00:10.560 --> 00:11.560
Good afternoon.

00:11.560 --> 00:14.760
Okay, good afternoon to everybody.

00:14.760 --> 00:21.760
The talk is, the data for system called tracing every

00:21.760 --> 00:23.680
totalization support.

00:23.680 --> 00:26.760
It's causes, by me, into the University of Bologna,

00:26.760 --> 00:34.360
virtuosquera, and David Belarde, from virtuosquera, too.

00:34.360 --> 00:43.000
Okay, why we have decided to present this paper, this presentation.

00:43.000 --> 00:49.200
Because tracing is very useful, and it's used for many, for many

00:49.200 --> 00:50.200
purposes.

00:50.200 --> 00:57.040
The main, I think, everybody uses some daily basis as trace, for example,

00:57.040 --> 01:05.600
which is based on tracing methods, GDB, but tracing can be used also for

01:05.600 --> 01:06.600
virtualization.

01:06.600 --> 01:13.760
Let us remember, use them at Linux, for example, use them at Linux, is able to

01:13.760 --> 01:26.720
create a Linux kernel as a process, and you can run it as it requires a virtual machine.

01:26.720 --> 01:30.320
We have a project, it's a scale.

01:30.320 --> 01:37.160
If you want to visit the way of this process, you can download the QR code on

01:37.160 --> 01:46.840
my tie, or wiki virtuosquera.org, as well as three flags of projects.

01:46.840 --> 01:52.600
One is virtuosquera data, and it's a sign of the Internet thread, and the status of U.S.

01:52.600 --> 02:01.600
U.S., the idea to create an operating system, entirely new space, able to implement

02:01.600 --> 02:09.120
what we name multiple layer operating system, which is not a layer to operate in system.

02:09.120 --> 02:19.720
If you started the THA, the kernel, education, a kernel, by DAX, that's layer.

02:19.720 --> 02:26.240
It means that each layer is using a different API, a different language.

02:26.480 --> 02:34.880
Instead, the idea is to create a multi-layer operating system, so models that are using

02:34.880 --> 02:42.880
the same language, the language of system calls, but as the lawyer, the lawyer, the lawyer,

02:42.880 --> 02:49.360
the lawyer, and as the app is there, what's an operating system that are many definition,

02:49.360 --> 02:52.600
you can see it from many perspective.

02:52.840 --> 03:02.200
For example, you can say that it's the entity able to give an answer to implement the

03:02.200 --> 03:14.200
system calls. In another operating system, a process can be done by itself, just a numerical

03:14.280 --> 03:25.640
operation, a new operation, can access the memory given by the kernel, for everything else,

03:28.440 --> 03:35.480
it needs to call to decide to sign the request of the kernel.

03:36.120 --> 03:54.120
System calls can be exact. We can, if we have the kernel, speaking, it's language.

03:54.120 --> 04:03.160
The percentage is this pattern. You can have your process running on this layer, and you

04:03.160 --> 04:14.440
can see what the kernel is giving. You can add a layer and your process can speak the same language,

04:14.440 --> 04:21.160
but can see different things. For example, you can mount a five-system at use of level. You can use a

04:21.240 --> 04:32.440
different network in stack at use of level. There's the role of UIS, please.

04:32.520 --> 04:50.680
OK, this picture shows us how UIS works. If I try to ask the issue, a system called a

04:50.680 --> 04:59.560
kernel, the system call goes to the kernel, but there is some way to trace the system call

04:59.560 --> 05:09.640
actually use the trace, and it out to the request to the hypervisor, actually, for performance

05:09.640 --> 05:20.360
point of view, we don't have a just one thread of the hypervisor, but it's thread in the user environment,

05:20.360 --> 05:29.240
has a thread of the hypervisor that is utilizing the thread. We named those guardian answers,

05:29.320 --> 05:39.560
the guardian answers of processes, and then it asks to the module, implement the virtualization,

05:39.560 --> 05:47.320
can be a fields module. Using UIS, you can mount, you can use fields executable,

05:47.320 --> 05:53.640
fields a five-system in use of space, and exactly what you do with the fields implementation

05:53.640 --> 06:03.480
in the kernel, and tell it at use of level. So you could delete, you could exclude the fields

06:03.480 --> 06:11.800
support from the kernel, if you would continue to work. Then the mother replies, and there's

06:11.800 --> 06:22.520
other standards to the thread. Actually, we have an implementation of UIS named UIMVU based on

06:22.520 --> 06:30.520
the trace, but we are working to the site, which is the best implementation, we are not sure,

06:30.520 --> 06:43.400
and so we have started a review, a survey of all the support in the kernel to trace

06:43.400 --> 06:51.880
the schools, and now I'm going to show you all these ways. It means that

06:51.960 --> 07:00.520
Cisco tracing is very useful, given that many different implementation have been added. At the same

07:00.520 --> 07:12.600
point, it seems that there is no silver ballet, so there's not a complete solution. There are

07:13.560 --> 07:27.320
actually, we have moved on the axis of features. UIS gives you the power to do several

07:28.040 --> 07:39.640
stuff, to mount the system to regardless or taking into the second consideration security and

07:39.640 --> 07:44.680
performance. Let me talk about security, and that means split the world security in two parts,

07:45.480 --> 07:52.120
security and safety. Security one there is a big guy, which is trying to do something better,

07:52.840 --> 08:04.280
but there is also safety. Safety means that we can do mistakes. For example, any time I try to

08:05.240 --> 08:15.000
configure a file disc image for a virtual machine, I'm writing into a file, which is mine,

08:16.360 --> 08:26.360
but to in order to have a very simple support, I became root,

08:26.360 --> 08:37.080
to modify system, but all the times I'm afraid, if I type STA in set of STB, I can drop

08:37.080 --> 08:46.600
and tell a system. So this is safety, so you can, if you do the same things, at the user level,

08:46.600 --> 08:54.600
you have no risk of this kind of safety. And performance is the third axis.

08:57.320 --> 09:07.720
We are not the only one seeking for a system of tracing. Fake root is based on the

09:08.040 --> 09:26.920
trick, wine, divisor, and so on. Okay, I think I have already presented this slide. What do you want to

09:27.320 --> 09:37.800
update inside? I think you can hear me. Okay, so let's talk about security. We can try to

09:39.160 --> 09:43.960
describe a program that's a get-it-jacked, the system could get-it-jacked, so a program that

09:43.960 --> 09:51.240
opened ATC-patch fully, so for instance, it brings the list of users, and then get redirected to ATC

09:51.720 --> 10:01.240
so this program is secure when it executed inside some kind of wrapper that I jacked the

10:01.240 --> 10:09.880
system called, I jacked the request. I mean, and move that to ATC-else. There are some cases,

10:09.880 --> 10:17.000
in which we will see that, like PULIPC, which is based on, which can be based on a deprodo, in which

10:17.000 --> 10:25.960
if you do something like a variety calling, the system called, for instance, in assembly, in this case,

10:25.960 --> 10:34.200
you can circumvent the sandbox, because you can escape the sandbox and so on, look at the reality

10:34.200 --> 10:39.400
C-patch fully. So this is the problem of security. For instance, it's difficult to

10:39.400 --> 10:49.400
ensure that a program gets to the real sandbox design we've done.

10:49.400 --> 10:59.880
Yeah, exactly. So you're trying to open ATC-patch WT, and you said that the virtualization

10:59.880 --> 11:14.200
code code opens ATC host name is the problem that we have used as an example, and at the

11:14.200 --> 11:23.560
very last line of the slides, there is a GitHub repo in which you can see this functionality

11:23.880 --> 11:31.720
implemented in using all the system called tracing methods we are introducing now.

11:33.720 --> 11:39.720
So even during the presentation, you want to download and try, please do it, so we can

11:40.680 --> 11:48.120
because trying to do the command pages of code on the slides is useless.

11:49.000 --> 11:54.440
You can try by yourself if you want to see the methods.

11:59.240 --> 12:05.640
So there is another problem which is speed, because if the three axes are features of speed

12:05.640 --> 12:13.400
and security, the problem of speed, in this case, of hypervisor-based virtualization, is that

12:13.400 --> 12:20.680
it's low, like Renzo was saying, that one program that wants to open ATC-patch WT for instance

12:20.680 --> 12:28.520
and get hijacked to another file, gets to the kernel, and then another process get waked up,

12:29.320 --> 12:37.160
and this project come back to the kernel and then the control is returns to the main application.

12:38.040 --> 12:43.800
So it's low, because for one single system called, you get into the kernel basically two times

12:43.800 --> 12:51.160
and then the return, so it's released low because you increment the times you are getting the

12:51.160 --> 13:01.400
context which is and this kind of changes. So we basically done a benchmark for every method,

13:01.480 --> 13:06.760
so in this case is the benchmark which is like one second and something for the

13:06.760 --> 13:16.840
Carteticy of the name, so you get just the file and this is the cases and we have five techniques

13:16.840 --> 13:25.960
may. The first one is P-trace, P-trace is the thing that your debugger does, so when you are just

13:26.040 --> 13:31.240
looking at the system call, for instance in this case we virtualizing Carteticy P-trace with

13:31.240 --> 13:39.000
E and we hijacking it to an hypervisor and in this case you see that the time is the real concern,

13:39.000 --> 13:45.320
you see that it's like seven seconds and so on, so it's it's really low and if you're

13:45.320 --> 13:51.160
visualizing a real application, I don't know chrome or something like this, it gets real slow, so in

13:51.160 --> 13:59.400
this case because you're passing the kernel another additional time, but it gets to some

13:59.400 --> 14:07.000
pros, like I should limit that, it's extremely well tested, it's easy to implement, so it comes with

14:07.000 --> 14:16.840
pros and cons, it has some difficulty in multi-fraining also, because if you do some P-trace tracinger,

14:16.840 --> 14:27.160
you block some time on, you block some time to get the application of the hijacking of some

14:27.160 --> 14:32.680
system calls, that's for instance the case for select and pull, there are some problems in which you

14:32.680 --> 14:41.400
need to have multi-tradding and it's difficult to implement, there are also some problems with

14:41.800 --> 14:48.680
the information of the process, so when you're getting the system call, you want to get the

14:48.680 --> 14:53.720
information on the system call, like which are the six register I using the nine register,

14:53.720 --> 15:01.240
I'm using, so you need to get the register and implement like a dispatcher for your system call

15:01.240 --> 15:09.560
and choose where you're going with your request. To do so, there are some some way to do that,

15:09.560 --> 15:16.600
so like a P-trace peak user, in which you can get all the registers and more or less by time,

15:16.600 --> 15:23.080
and then it's really slow, it's really architectural dependent, and to overcome to this problem,

15:23.080 --> 15:29.800
there is some some request you can, you can ask to P-trace, like P-trace gets this call info,

15:29.800 --> 15:35.480
which you get the system call information like a register, and so on in an architectural

15:35.480 --> 15:43.080
independent way, something like this, but there is no P-trace set, it's colorful, so if you want

15:43.080 --> 15:49.560
to Isaac the system call, you can basically directly change the register, change the values you

15:49.560 --> 15:54.520
have in that, so for instance you have the open and you want to alter the file you open with it,

15:55.800 --> 16:01.320
you need to have something to write the system call values, so you have this slow interface,

16:01.400 --> 16:08.840
which is P can focus a terminology of P-trace, but if you have P-trace set, it's calling for

16:08.840 --> 16:14.440
it will be a lot faster. Fortunately, this is currently discussed in the kernel, in the kernel

16:14.440 --> 16:24.440
in English, and that's basically one-to-back. This is a very ancient sport, it was in system

16:24.520 --> 16:35.960
in the sixth version, so in around the middle of the seventies, and it's during the time,

16:35.960 --> 16:46.520
different supports have been added, P-Qs, P-Qs, P-Qs, and then get the x, now P-trace gets the

16:46.520 --> 16:54.520
scrolling for at the main point, which is architecture independent. If you use P-trace gets the

16:54.520 --> 17:00.840
scrolling for you, you have not to remember which register, in which register you have the number

17:00.840 --> 17:11.720
of the system calling, which other the arguments. Now they are going to tell you the discussion

17:11.720 --> 17:30.040
to add P-trace set, this is calling for. Then the other point on P-trace is that for one system call,

17:30.760 --> 17:40.040
you don't need just four options to come to switch, but it's set for, because P-trace

17:43.320 --> 17:55.480
creates an event at the moment of the system call, and when the kernel responds. So you have two

17:55.480 --> 18:04.920
elements, think to the S-trace. The first seven is to read the arguments, and then the second

18:04.920 --> 18:15.880
element is to read the time value of the error number. Okay. Another technique is based on

18:15.960 --> 18:25.400
second. So P-trace gets a little bit of speed, and this under this scenario, because you avoid

18:25.400 --> 18:31.240
the second return. So you have that in this case, you have second to process it, and yeah,

18:31.240 --> 18:38.120
it gets a little bit of speed. So in this case, like 5 seconds, 6 seconds. But it has the same

18:38.120 --> 18:44.600
problem as a P-trace. So like architectural dependent, difficulty to multitrad, and so on.

18:45.560 --> 18:50.760
So that's another technique, maybe the newest one, is the second to notify.

18:53.560 --> 19:01.640
Basically, it's different, because in P-trace, you modify the trace and process,

19:02.280 --> 19:06.840
and that the trace and process execute a new system call, as a different system call, and so on.

19:07.800 --> 19:13.880
In this case, which is a computer notify, you're as a hypervisor, you're executing the

19:13.880 --> 19:21.400
system call, and returning the result. So it's a different view on how you alter the system call

19:21.400 --> 19:30.120
of flow. So for instance, in this case, you have that the hypervisor has the trace process here,

19:30.200 --> 19:37.080
and the hypervisor doesn't change the trace process system. So the trace process

19:37.080 --> 19:46.120
execute opens the P-trace, and then the hypervisor opens the real P-trace, and execute the

19:46.120 --> 19:54.200
system call, and get the result back. So if directly the hypervisor does that.

19:55.160 --> 20:02.760
Quickly, it's different that the error in signal involved, because it is at the weight,

20:05.400 --> 20:10.600
the state change of the process. Instead of here, there is a final script, so you can use

20:10.600 --> 20:15.640
a pulse, a vector, and other stuff to perceive, there is an error.

20:15.960 --> 20:27.800
There is also a problem, not a problem, as P-dup, which we can get, is that a second, just use BPF,

20:27.800 --> 20:36.200
not BPF, in this case. And so in this case, we need to trace, for instance, all the file

20:36.200 --> 20:42.600
the script or based system calls. So for instance, every read, every write, in the case,

20:42.600 --> 20:50.600
we have ABPF, we can create maps with the VPS. So we can basically create a map of what we want

20:50.600 --> 20:55.400
to trace in term of file descriptor. So there are some cases in which you mix,

20:55.400 --> 21:00.200
real file descriptor, and real file descriptor, because the open, maybe get traced, or some

21:00.200 --> 21:06.040
open, doesn't get traced, so pass directly to the kernel. But in this case, we need to trace

21:06.120 --> 21:16.360
all the read request. Otherwise, if we can have ABPF maps, we can basically decide which read

21:16.360 --> 21:22.520
gets into the kernel, and which read gets into the hypervisor. This is pure overhead, obviously,

21:22.520 --> 21:30.280
the two in red, so the system wake up, the hypervisor for the real file descriptor, not the

21:30.280 --> 21:39.400
virtual one. And also, there are some order to solutions, which are not based on hypervisor.

21:39.400 --> 21:44.840
So you don't need to have another process. This is called an auto-menology service organization.

21:45.640 --> 21:55.800
So for instance, one of this is PRCTL. PRCTL, this is a tricky way to do hijacking of

21:55.880 --> 22:04.920
system calls, because it's mainly designed for an emulation, like wine, like limbo, so you use

22:04.920 --> 22:10.840
that to emulate system calls, which are not the one of the operating systems. And it's extremely

22:10.840 --> 22:16.760
our dependent, and it's difficult, really difficult if you want to read the read me of the repo,

22:17.400 --> 22:22.600
there is a explaining that, it's really difficult to pass the system call in that.

22:22.920 --> 22:29.960
The last one, I want to present to you, maybe, is the simple one, because maybe you use that

22:29.960 --> 22:37.960
in an order form, because it's based on repo load and can be uploaded. It's that delipsy,

22:37.960 --> 22:45.080
basically, is a two-faced library that does the system calls and provides some wrapper for the

22:45.080 --> 22:49.960
system calls. Like the open is not a real open, it's like a wrapper for the open, then get to,

22:50.040 --> 22:56.520
this is the call-and, to the real operating system. And with that, you can basically

22:56.520 --> 23:03.400
hijack the layer at level of libraries, and which you request the open to the library.

23:03.400 --> 23:08.440
With that, you can basically get a native C-School interface in which you can do the

23:08.440 --> 23:13.240
system call and the proof of that is that, obviously, is a library function, doesn't pass to the

23:13.240 --> 23:18.840
kernel two times. So you have this speed, which is basically the same of cut, in this case, not,

23:18.920 --> 23:25.320
you don't have the switch. So we use that. If you are loading several layers,

23:27.000 --> 23:34.760
and we loaded using a view in small, so it's like you are inserting some modern intercarnal.

23:35.320 --> 23:44.840
So these modules are trusted. So we can, instead of having kernel tracing at each layer,

23:44.840 --> 23:52.600
here inside the virtualization program, the hypervisor, we use self-authorization in order to

23:52.600 --> 24:01.560
gain speed. So in conclusion, there are not civil bullets, we saw that there are not civil bullets,

24:01.560 --> 24:05.720
and there are different techniques to adjust system calls, which are collectively implemented

24:05.720 --> 24:10.680
in the Linux system, the kernel and the user interface. There are some of these data, which are not

24:11.640 --> 24:17.560
really adopted at this time, and we can create a layer to virtualize system call,

24:18.840 --> 24:24.920
which is flexible, it's more flexible than container, because container just emulates some part of

24:24.920 --> 24:32.760
the system. In this case, we emulate all the interface facing with the system. I can do a demo,

24:32.760 --> 24:43.000
if you want, yeah, I can show you, if it works, the repository. So for instance, let's do the

24:43.000 --> 24:51.480
P-trace hijacking. I compiled the software. So for instance, we can say P-trace vector,

24:52.280 --> 25:01.000
cap, etichy OS release, and obviously OS release is not hijacked. It is the same as the picture.

25:01.000 --> 25:08.360
So it hijacks only, etichy, passwordier request. But if I say cap, etichy, passwordier, in this case,

25:08.360 --> 25:13.320
it's kept with relies. And get with relies to host name. So you can see that it's not

25:13.320 --> 25:21.880
printing the real, etichy, passwordier, it can show you that there is no tricks in the real

25:21.880 --> 25:28.920
passwordier there, but if you passing through the hypervisor, you get your system call hijacked.

25:29.000 --> 25:36.600
This isn't going to get hijacked to the request of etichy off name. So it's always the same,

25:36.600 --> 25:45.080
the only one I want to show you, which is different is the VOS, VOS method. So you can just start

25:45.080 --> 25:51.240
your MVU bash. As you can see from the prompt, I am inside, I am the big prompt, but I am inside

25:51.320 --> 26:03.640
a virtual machine based on P-trace, in this case. And so I can say VOS in this case, and cut,

26:03.640 --> 26:11.080
etichy, plus VD. If there is no demo effect, I am just hijacking the process that

26:11.080 --> 26:16.840
cataclycuted inside the virtual machine. So it sounds like a container in this case, but it's

26:16.920 --> 26:20.920
not, it's just for a single system call. In this case, a single request to assist a call.

26:22.920 --> 26:27.320
So thank you for your attention, and I don't know if Renzi wants to add something.

26:28.360 --> 26:37.400
If you use a system called tracing your projects, and not happy for the actual support, please contact us,

26:37.480 --> 26:39.480
that will be fine together.

26:50.520 --> 26:56.200
Any questions? All right, thanks.