WEBVTT

00:00.000 --> 00:13.680
I did unmuted well, hi everyone, thanks for being here and thanks for making the room free.

00:13.680 --> 00:21.600
So who am I, my name is Victor, I work at what had in the kernel engineering and one of

00:21.600 --> 00:27.920
the things I do there is I am a commentator of this tool BFF trace which I would like to

00:28.480 --> 00:34.560
introduce to you today. So what is the purpose of this talk? First of all I would like to

00:34.560 --> 00:40.320
introduce BFF trace which is a tracing tool and a tracing language based on a BFF.

00:41.200 --> 00:45.760
And I would like to show you what BFF trace can do for you without its kernel capabilities.

00:45.760 --> 00:51.280
I will slightly go into some like details on how we are doing things but I won't go too deep.

00:51.280 --> 00:55.920
And I would like to also like to show you what BFF trace could do for you in the future.

00:55.920 --> 01:02.480
So what we are going to be working on so that we make BFF trace the ultimate Linux tracing tool.

01:03.520 --> 01:10.800
So let's introduce BFF trace first quick question who here has never written any BFF trace program.

01:12.320 --> 01:21.120
Okay, good to see that many newcomers too too. So as I said BFF trace is a sorry.

01:22.080 --> 01:28.880
It's a tracing tool for Linux based on a BFF which comes with a domain specific language which

01:28.880 --> 01:35.280
we call BFF script. And one of the nice things about BFF trace is that you don't really need to

01:35.280 --> 01:41.280
know internal of a BFF too much to be able to use it because the language is really simple.

01:43.520 --> 01:48.960
How does it work? This is like a very basic workflow how BFF trace works. So at the beginning we have

01:48.960 --> 01:55.040
this BFF script program which can be as simple as this and internally BFF trace once translated

01:55.040 --> 02:01.920
into the BFF byte code which is then loaded into the kernel and then it's attached to some

02:01.920 --> 02:08.080
event for instance in this case we are tracing a kernel function called VFS read so we attach it to the

02:08.080 --> 02:13.680
to the kernel function and whenever that function is called in the kernel your BFF trace program

02:13.680 --> 02:18.480
will execute it can call it from data, it will typically communicate with BFF trace runtime

02:18.560 --> 02:23.760
and then you get some results. That's the very, very basic concept of how it works.

02:25.840 --> 02:32.800
SLBBF script for language the main building work of the program are so called probes

02:32.800 --> 02:40.000
where each probe has two parts. The first one is an attached point which says what event you are

02:40.000 --> 02:45.760
attaching your BFF trace program to for instance in this case we are attaching to a kernel function

02:45.760 --> 02:50.320
but there are other events provided by the kernel or like static trace point you can

02:50.320 --> 02:56.480
attach to user space functions, user space trace points and many many more and then there's the

02:56.480 --> 03:02.800
action work which is like the actual BFF program or BFF first program which you want to execute whenever

03:02.800 --> 03:11.200
that event happens and so what can you write in this action work it's it's a program language like

03:11.280 --> 03:18.080
any other so you get all the familiar stuff like variables, operators like arithmetic,

03:18.640 --> 03:23.040
logic etc you have control force statements or you can do like if conditioning you can

03:23.040 --> 03:31.280
do loops of course with the limitations that BFF provides what is a different from other languages

03:31.280 --> 03:38.480
you have some built in stuff mainly which will mainly help you with the tracing capabilities of

03:38.480 --> 03:46.240
the tool so two things which are interesting in language first is our variables we have two

03:46.240 --> 03:51.360
kinds of variables the first one are called scratch variables which you can think of like a normal

03:51.360 --> 03:55.920
local variables which you know from our program languages that works code which means the

03:55.920 --> 04:01.120
only valid in the current block we have an example here like we are creating a variable x they are

04:01.200 --> 04:08.320
always perfect with dollar and you assign it the current number of the cpu or the current

04:08.320 --> 04:13.120
cpu where your event is running the cpu that's one of the built things which will just give you

04:13.120 --> 04:19.600
the cpu number the more interesting are so-called maps which is another type of variables which are

04:19.600 --> 04:25.120
basically key value pairs which are globally sculpt and you can use them for a native

04:25.280 --> 04:32.080
proposes one is communication between the probe so you can just have one map and you can

04:32.080 --> 04:37.600
store some data and you pick it from another probe to do something useful and you can also use it to

04:37.600 --> 04:43.200
communicate between the the big bioprograms and the use space part of BFF trace

04:44.160 --> 04:50.000
the implementation BFF has maps which is why they are key value pairs and as an example here

04:50.000 --> 04:59.440
we have a very local start the key is the current PAD and we're essentially storing a timestamp

04:59.440 --> 05:05.600
to number of milliseconds also you're essentially storing timestamp for PID in this in this

05:05.600 --> 05:11.520
verb and the second interesting part of the language are built-ins which are as I said special

05:11.520 --> 05:16.400
variables or functions which can help which are built into language which can help you with some

05:16.560 --> 05:24.960
tracing tasks like accessing the kernel data type conversions printing string manipulation and many

05:24.960 --> 05:30.400
many many more there's a month page somewhere where you can find or delete stuff of all of them

05:32.160 --> 05:39.520
okay one thing I want to talk about is why I believe BFF trace is a really great tool

05:39.840 --> 05:46.320
so there are a couple of reasons there are two the most important ones for my point of view

05:46.880 --> 05:52.160
the first one is one lineers the language is designed in a way that it's very

05:52.960 --> 05:59.120
terrors if you will and it allows you to write quite powerful scripts with in a quite short

05:59.120 --> 06:05.520
manner which means that it's ideal for writing one lineers directly in your terminal for

06:05.520 --> 06:10.400
instance debugging production systems because if you BFF it's supposed to be safe so you can

06:10.400 --> 06:15.360
run it on your production machines and you can quite quickly write BFF trace scripts which will

06:15.360 --> 06:22.720
help you find regressions or whatever problem you are trying to debug like for instance an example

06:22.720 --> 06:30.400
here this is a BFF first program which will basically attach to the open Cisco and anytime in open

06:30.400 --> 06:36.960
Cisco happens it will print the calm which is the current name of the current threat and it will

06:36.960 --> 06:42.960
print the file name that is being opened so basically with this one lineer command you can

06:42.960 --> 06:49.440
sniff on all the open cores on your entire system and you can see for each open core which file

06:49.440 --> 06:54.720
is being opened and what's the name of the process which is opening it which is I think quite impressive

06:55.440 --> 07:01.520
the other nice thing about BFF trace and I've mentioned it already is that you don't really

07:01.520 --> 07:08.560
need to know BFF internals in order to use BFF trace because BFF as was mentioned in the first

07:08.560 --> 07:13.520
talk has a lot of powerful features but sometimes they require quite a significant knowledge

07:13.520 --> 07:20.560
of BFF internals to be used and the nice thing about BFF trace is it sort of abstracts

07:20.640 --> 07:26.240
away the implementation details from you by providing the simple scripting language which makes it

07:26.240 --> 07:32.640
a great choice as the entry point to the BFF world because you don't really need to go deep into the

07:32.640 --> 07:39.200
BFF internals in order to use BFF with BFF trace like I have an example here for instance

07:40.240 --> 07:44.480
many of you will know that there's this thing called BFF stack which is quite a limited space where

07:44.480 --> 07:49.840
you can store some like local variables or like some temporary data if it has only 512 bytes

07:50.640 --> 07:56.320
so often if you want to store some like large piece of data somewhere you can't just create a

07:56.320 --> 08:00.960
local variable because it doesn't fit the stack so you would have to have to afford it to a map

08:00.960 --> 08:06.720
or to build the variable now if you want to create a map in in C in the way standard big different

08:06.720 --> 08:11.840
amounts of origin it's not entirely easy I'm not saying it's super difficult but it's already

08:11.840 --> 08:17.760
something that you need to know how to write declaration this is an example of a declaration of

08:17.760 --> 08:23.520
map in BFF trace you don't have to care BFF trace if you create a local variable BFF trace

08:23.520 --> 08:27.920
will automatically check if the size fits the stack and if it doesn't if it's automatically

08:27.920 --> 08:34.560
afforded to in this case to a global variable and we don't need to care about about these

08:34.560 --> 08:42.480
implementation details of course there are strengths but there are so weaknesses which is

08:44.400 --> 08:50.640
what I want to talk about a bit because we are actively working on eliminating them so

08:50.640 --> 08:56.240
there are a couple of weeks but first one is the feature completeness so BFF as I said provides

08:56.240 --> 09:02.000
a lot of powerful features and we don't support all of them we don't have we are missing

09:02.000 --> 09:08.400
super for some helpers we pretty much don't have any cave function there we only support one

09:08.400 --> 09:12.800
that type which is hash hash maps or a couple of those but there's a lot of my type which

09:12.800 --> 09:17.200
you're missing we don't have co-ree and it's a strategy as well so they don't have features

09:17.200 --> 09:23.760
which BFF forever it's which we don't have and another a weakness is that a BFF trace

09:23.760 --> 09:30.240
has traditionally been targeting one line of scripts so these powerful one line there's

09:30.240 --> 09:36.640
which you can like use to on the fly to back your your system however but it turns out that

09:36.640 --> 09:41.440
people want to write large larger BFF trace programs which are they want to maintain for

09:41.440 --> 09:47.840
long period and use them repeatedly etc etc and BFF trace has been liking support for

09:47.840 --> 09:54.880
that for instance like you can't have sale options for your scripts or you can't divide your

09:54.960 --> 09:59.920
probes into into support and plug into functions as you know from a language is because we are

09:59.920 --> 10:05.920
missing support for it so these are one of the things we're actually working on and the last thing

10:05.920 --> 10:10.560
I mentioned here and this is not specific to BFF trace is that there are some events and there are

10:10.560 --> 10:15.920
some environments which are notoriously hard to trace like for instance tracing inline functions

10:15.920 --> 10:22.000
it's kind of tricky because you need to know all the places where the function was inline so

10:22.000 --> 10:27.120
that you can actually at that street which is kind of difficult or for instance running the

10:27.120 --> 10:34.320
trace in a container or in a name space in general can be can be quite difficult so what do we do

10:34.320 --> 10:42.880
to eliminate these? There's a lot of ongoing work on BFF trace these days and I'm just going to

10:42.880 --> 10:47.920
go through all some of the most important things so for instance for tracing inline functions

10:47.920 --> 10:53.840
were already have a solution for that if your trace binary B'd VM Linux or it's based binary

10:53.840 --> 11:00.960
has debugging fault we use LLDB which is a tool from the LLDM ecosystem which can

11:00.960 --> 11:07.600
which helps us to attach your probe to all the inline instances of the function which I think

11:07.600 --> 11:13.120
is very nice it also resolves some fun of problems like it allows you to place the probe

11:13.200 --> 11:18.640
after the function probe which means that you're no longer losing some stack entries that

11:18.640 --> 11:24.240
are specific if you've had ever traced your space binary you could have noticed that sometimes

11:24.240 --> 11:29.840
the stacks are missing one the first entry and this can actually help with that.

11:32.480 --> 11:36.160
Another thing I mentioned was surrounding BFF's inside a container so

11:36.160 --> 11:40.240
I'm so recently we didn't have support for this so if you're running it in a container

11:40.240 --> 11:46.160
with bad namespacing some of the helpers like getting the PI deal with the process ID,

11:46.160 --> 11:51.680
thread ID, order use space stack being worked correctly the reason is that you have to switch

11:51.680 --> 11:57.840
between two different helpers depending on whether BFF trace and the trace binary are running

11:57.840 --> 12:04.240
in the container or not. Now we have a solution it works for most of the cases unless

12:04.240 --> 12:08.960
BFF trace is in a chat namespace and the target is in the root namespace then

12:08.960 --> 12:13.440
we can't support it yet it's kind of a tricky situation to work with.

12:15.680 --> 12:21.040
The rest of the features are here related to the support of those complex programs which

12:21.040 --> 12:27.920
I've mentioned which is one of our goals to be able to support larger BFF trace creeps or

12:27.920 --> 12:32.880
tools based on BFF trace which you can maintain for a long time and one of those are

12:32.880 --> 12:38.720
make variable and net declarations because it's nice that BFF trace has automatic type

12:38.720 --> 12:44.960
inference but sometimes you want to specify the type especially if the inference doesn't work

12:44.960 --> 12:51.600
correctly or you just want it to be more verbable and more like readable if you get back to

12:51.600 --> 12:58.000
it's clear after some time you want to know what the type of variable etc. So these has two parts

12:58.480 --> 13:03.840
declarations of stretch variables we already have support for that it looks something like this

13:04.160 --> 13:09.920
and then the second part are declarations of maps which we don't have yet and it's one of the

13:09.920 --> 13:14.400
features which are being worked on. So for instance it would allow you to say that I want to

13:14.400 --> 13:22.080
enable code hash it will be of a hash type and the key type is an unsigned 32B integer and the

13:22.160 --> 13:33.680
devices are signed 64B integer. Another thing are functions so when your probe gets really long

13:33.680 --> 13:37.520
you really want to split it into multiple functions you also want to reuse code obviously

13:37.520 --> 13:44.800
you all know why functions are useful. So we've recently migrated to LibBFF which enabled us to

13:44.800 --> 13:51.520
use these sub-programs and in future hopefully mere future we would like to support two kinds of functions

13:51.520 --> 13:56.480
one are like normal functions defined directly in LibBFF's trick which will allow you to split

13:56.480 --> 14:02.480
your code into multiple parts but another interesting part is we would like to support

14:02.480 --> 14:07.680
important functions from externally LibBFF programs so you can write a LibBFF program in any

14:07.680 --> 14:12.000
tool you want for instance like in C and then you can just import those functions from

14:12.000 --> 14:15.200
LibBFF tricks and call them from LibBFF tricks which can be useful for instance because

14:15.920 --> 14:22.560
people have written for instance stack workers for Python in BFF in BFF so this way we could

14:22.560 --> 14:30.080
call them from inside BFF programs and for instance create create user space thanks for Python

14:30.080 --> 14:37.280
binaries. Eventually this new functions should help us also creating something like a BFF

14:37.600 --> 14:45.120
standard library which would be very nice. Last thing I mentioned here are common line options

14:45.120 --> 14:50.720
again it's something that is not supported yet but we are working on it so if you have a large

14:50.720 --> 14:57.760
tool BFF trust you want you want to be able to you want to allow the user to somehow modify the

14:57.760 --> 15:03.440
behavior that's what common line arguments are for so we are currently thinking of something like

15:03.440 --> 15:07.920
this you would define a simple helper or so we are simple built in where you would say for instance

15:07.920 --> 15:13.680
I want this group to have one option which would be of integer type and it would be an integral

15:13.680 --> 15:20.240
and then you can just use it for instance in this probe which will periodically print some state

15:20.240 --> 15:25.360
of something like or the data you have collected and you would allow the user to specify the

15:25.360 --> 15:33.440
interval in which you want the number of seconds when you want to print some information and that's

15:33.440 --> 15:40.880
it so just to sum it up BFF trace is quite a powerful tool for tracing Linux which allows you to

15:40.880 --> 15:47.600
use BFF without really knowing BFF internals which is which is quite nice there's a lot of

15:47.600 --> 15:54.960
active development to overcome existing limitations mainly to support creating and maintaining

15:56.000 --> 16:04.480
more complex tools and that's it if you want to know more visit our new website we have a

16:04.480 --> 16:09.600
we have a new logo actually we have a new website so visit it it's it's getting new content pretty

16:09.600 --> 16:17.200
much every day these days there's a bunch of tutorials documentation everything if you have any questions

16:17.200 --> 16:23.040
or ideas for new features for BFF trace if you found any bugs feel free to interact with us via

16:23.040 --> 16:28.480
our kit hub repository we'll be happy to answer your questions or of course you can

16:28.480 --> 16:34.160
talk to me today and that's it thank you and I'm happy to your questions

16:34.880 --> 16:40.240
this one

16:49.520 --> 16:55.920
it was not it was not started by me it was started by Alastair Berton some five years ago

16:55.920 --> 17:00.560
and you would probably have but I think it was at the beginning it was sort of a hobby project

17:00.560 --> 17:08.480
of his oh no this is not like this is not created by Redhead it's just I'm one of the

17:08.480 --> 17:14.160
commentators but Alastair is at meta and the rest of the maintainers are also from meta so

17:14.160 --> 17:17.120
but it's not like bound to throw to a company

17:31.440 --> 17:36.160
so obviously if you if you want to use BFF trace for writing your BFF programs you have to work with

17:36.160 --> 17:42.400
what what what the language allows you to do so as I said for instance if you want some like more

17:42.400 --> 17:47.840
powerful or modern features of BFF you you have to fall back to port port we see or maybe

17:47.840 --> 17:54.240
rest which does support all all kinds of BFF features we only support like a subset we're trying to

17:54.240 --> 18:05.600
like make it as as big as possible what's still worse to like have just a subset oh back there

18:07.600 --> 18:12.960
oh the sorry I forgot to repeat the question so the question is if it's supported on on

18:12.960 --> 18:19.600
which distros this supported as far as I know all of the large distros support BFF trace I know

18:19.600 --> 18:25.360
even some guys are compiling I'm building in for Android so pretty much all the Linux distros

18:25.360 --> 18:30.640
should support BFF trace there's not like technical issue of running it pretty much everywhere

18:30.640 --> 18:39.440
where BFF can run and where you have our login all right I think that's it one minute left so

18:39.440 --> 18:42.240
we're good thank you

