WEBVTT

00:00.000 --> 00:13.960
Hello everyone, I think I'm the final presentation for this room for today, for ever, no.

00:13.960 --> 00:19.240
Nice to see a lot of you around, forensics is quite a niche, so it's kind of fun to see

00:19.240 --> 00:23.760
a lot of people, also a lot of known faces.

00:23.760 --> 00:27.000
This presentation is called your function signature here, please.

00:27.000 --> 00:33.840
But first about me, boy, I'm Jeffrey Rung, I'm the lead scientist of the Explotation

00:33.840 --> 00:38.880
Team at the NFI, which is a team that gains initial access to devices for forensic research

00:38.880 --> 00:41.640
or other types of investigations.

00:41.640 --> 00:49.040
I've worked there for more than 10 years, I think, 12 now, and my main interests are a low

00:49.040 --> 00:54.240
level intersection between hard and software, like where you're going to write raw registers

00:54.240 --> 00:57.000
and try to de-imagines what I really like.

00:57.000 --> 01:04.880
I was also here two years ago, but then this t-shirt fit me a lot better, a bit of a shame.

01:04.880 --> 01:10.800
So the problem we have is we do a lot of reverse engineering, software reverse engineering,

01:10.800 --> 01:14.520
and software reverse engineering in the most basic form is two things.

01:14.520 --> 01:20.040
You annotate functions, variables, structures, and you form hypotheses about the working

01:20.040 --> 01:22.880
of these functions based on those annotations.

01:22.880 --> 01:27.440
You're essentially labeling stuff and then thinking about what it might do, and as you

01:27.440 --> 01:31.560
label more, you understand more.

01:31.560 --> 01:37.560
Most of the knowledge that you put in there as a researcher is the annotations.

01:37.560 --> 01:43.880
So there's some common problems when we do reversing simple, but very frequently occurring

01:43.880 --> 01:44.880
functions.

01:44.880 --> 01:48.200
I don't think I've ever seen an implementation of anything that didn't have MEMS at

01:48.200 --> 01:55.880
and MEMCOPY, so you will see it a lot, but it will differ in a lot of forms.

01:55.880 --> 02:01.440
Then you have some complex functions, and this doesn't look very fun to reverse, and

02:01.440 --> 02:11.120
it is actually not, but what you will see is that the core logic often looks like this,

02:11.120 --> 02:16.240
especially on firmware or on bad devices, you have these huge switch cases where you're

02:16.240 --> 02:20.960
trying to figure out a protocol that you don't really know based on the code, and then

02:20.960 --> 02:25.320
when you start to gain some insight into it, it goes faster.

02:25.320 --> 02:29.120
But you don't want to do this too often, and if you did this once, you don't want to do

02:29.120 --> 02:32.840
it again the next week.

02:32.840 --> 02:37.440
There's some common problems across implementations of functions, like I said, the MEMCOPY

02:37.440 --> 02:41.920
MEMS at you will see them everywhere, but the assembly that there's generated from your

02:42.000 --> 02:47.120
compiler, of course, it can differ with the same source files, and that is depending on

02:47.120 --> 02:51.200
the compiler options, but also on the way compilers are implemented.

02:51.200 --> 02:56.480
And there's a slightly different implementations of the same algorithms for all kinds

02:56.480 --> 03:02.440
of stuff and data structures, and there's no real easy way, at least, what I think is

03:02.440 --> 03:08.840
an easy way, to store and share this reversing knowledge and work, because again, looking

03:08.920 --> 03:13.440
at this, I don't want to do this two times in a month if I can help it.

03:13.440 --> 03:16.400
So there are some possible solutions.

03:16.400 --> 03:24.640
For example, we use Gidra a lot, and you have B, which is, it can make sort of signatures

03:24.640 --> 03:28.040
of functions that you can then compare and say, oh, does this look like this?

03:28.040 --> 03:30.640
But the problem is it's dependent on the Gidra decompiler.

03:30.640 --> 03:34.840
I don't know if any of you have looked at the Gidra decompiler source, but it's not very

03:34.840 --> 03:38.160
pretty, and it's subject to change every time.

03:38.160 --> 03:43.160
So if you start to build databases based upon these kinds of signatures, then they sort

03:43.160 --> 03:47.680
of don't work anymore over time, because the decompiler evolves.

03:47.680 --> 03:52.680
Now we have fuzzy hashing, which there's some transmigropapers on this.

03:52.680 --> 03:58.520
It works, but you don't really use the context of the function, so for more complicated

03:58.520 --> 04:01.120
stuff, it gets less and less efficient.

04:01.120 --> 04:06.720
And of course, you have Lumina, but that's either dependent, if you're not a problem

04:06.720 --> 04:12.800
in itself, but I prefer something that would be cross-stools.

04:12.800 --> 04:20.800
So start to look at entities with, we have a department that does a big data and neural networks.

04:20.800 --> 04:23.160
And we found some papers that looked very interesting.

04:23.160 --> 04:31.440
One of them is a J-trans, it's a way of training a model based on this assembly that makes

04:31.440 --> 04:36.400
it easy to match functions and basic blocks within functions.

04:36.400 --> 04:41.800
So I don't understand a lot of this, as I'm not a neural networking and data science

04:41.800 --> 04:48.080
kind of guy, but what you do is a function always exists out of multiple basic blocks.

04:48.080 --> 04:53.360
So for example, you run some code, then based on a condition that jumps to a different

04:53.360 --> 04:58.560
piece of the function or not, and these things where it jumps, when you snip them apart,

04:58.560 --> 05:00.000
they're called basic blocks.

05:00.000 --> 05:04.040
And these basic blocks, they have pointers and code paths to each other.

05:04.760 --> 05:09.480
The token embedding stuff, you will have to read the paper, because I cannot tell you how

05:09.480 --> 05:13.280
exactly they encoded this into the tokenizer for the model.

05:13.280 --> 05:21.000
But it is quite nicely done where they take basic block, they strip away all the constants,

05:21.000 --> 05:24.400
and they keep the jumps between basic blocks and the assembly.

05:24.400 --> 05:30.840
So the model is aware of what a function looks like based on how it runs.

05:31.080 --> 05:36.920
That is very handy, because if you throw enough functions at it to train the model,

05:36.920 --> 05:41.240
you're actually training it to look at the structure of the functions, which makes it quite

05:41.240 --> 05:43.720
powerful, so we wanted to try this.

05:43.720 --> 05:51.960
But there is one little problem, JTrans is trained for x86, and we don't do a lot of x86,

05:51.960 --> 05:56.280
we do some, but not a lot, most of our stuff is armed.

05:56.360 --> 06:03.240
So we thought, oh, maybe I can go to these neural network guys at our lab, and just say,

06:03.240 --> 06:05.880
oh, can we retrain this for rm64?

06:05.880 --> 06:13.560
And they said, yeah, sure, turns out it's not that easy, because there was code for pre-training

06:13.560 --> 06:18.200
and everything, code quality is not that great, so at the redrawed lot of stuff, but then

06:18.200 --> 06:20.280
the end they ended up doing it.

06:20.280 --> 06:25.800
So what you then need, so they told me like, if you want to do this, you need a lot of functions

06:25.800 --> 06:32.360
in binary form with the source available, and the way you train a model to say, oh, this function

06:32.360 --> 06:37.720
is this function, but they're not exactly the same, you also need, like, a way to change a function,

06:37.720 --> 06:41.160
and that's quite easy, because you just compile it with different optimization levels, which

06:41.160 --> 06:45.480
was in the original paper as well, and then you take those functions out using Gidra and

06:45.480 --> 06:48.040
compare it into each other and you train on that.

06:48.040 --> 06:53.880
To do that, I took the rs user repo, and just compiled everything, I could get my hands on,

06:54.040 --> 06:58.760
and it took a long time, it took a lot of CPU power as well, but I ended up with about six

06:58.760 --> 07:08.040
thousand binaries and about 2.1 million functions to use for training, and I don't know if we're

07:08.040 --> 07:15.400
going to hear it, but I went on vacation, and the people in my team did not, and I thought,

07:15.400 --> 07:20.280
oh, I'll just leave it run while I'm away for three weeks, and I don't know if I have sound,

07:20.520 --> 07:35.000
I don't think so, no, maybe, no, just like the sound, yeah, no, it sounds like a damn jet engine,

07:37.000 --> 07:40.920
we have these dual computers, and normally they're pretty quiet, but when I get to a warm,

07:40.920 --> 07:46.680
they're just getting to full on panic mode, and I got a text, it said, if you don't turn this shit right off,

07:46.760 --> 07:53.320
no, we're going to burn your computer, so at the end, I ended up training it on my cluster,

07:58.760 --> 08:09.480
it's my beautiful setup that doesn't work, and I was doing this like, it's 24, so

08:09.480 --> 08:18.840
they were not very happy about this, I cannot imagine why, so in the end, I ended up giving

08:18.840 --> 08:22.600
making a dataset, they ended up training them all, but then I thought, okay, how are we going to

08:22.600 --> 08:27.320
actually use this, because you want it to be usable, so then it cannot just run on my machine,

08:27.320 --> 08:32.040
it has to be somewhat usable, so I thought I could make a lot of slides about this, but it's

08:32.120 --> 08:41.880
better to do a live, though, hopefully that will go better than the video, so I think it's easiest

08:41.880 --> 08:52.280
to do, yeah, I can clip it on, this clip is to advance for me, I don't understand any of this,

08:52.280 --> 09:06.200
no, but, it's magnetic, and that works, so, I'm going to get this here,

09:10.440 --> 09:21.320
so here we have Gidra, there with me, while I try to change some weird as display settings,

09:22.280 --> 09:43.160
yes, that works, okay, so here we have Gidra, while you don't have Gidra apparently,

09:53.080 --> 10:01.240
yes I'm gonna take it, I think it's not, but so,

10:07.880 --> 10:13.480
there is more effect and full swing,

10:22.280 --> 10:37.280
Yeah, it looks good.

10:37.280 --> 10:47.280
The way I'm looking at it, it looks good.

10:47.280 --> 10:48.280
Perfect.

10:48.280 --> 10:50.280
There we go.

10:50.280 --> 10:52.280
I'm going to be there for a bit.

10:52.280 --> 10:54.280
We're going to do it like this.

10:54.280 --> 10:58.280
So I wanted to show you the server, but it's not so interesting.

10:58.280 --> 11:02.280
It's just a hypercorn server, which is fast API.

11:02.280 --> 11:10.280
For people who think you're better than everyone.

11:10.280 --> 11:14.280
Let's see.

11:20.280 --> 11:24.280
I broke it.

11:24.280 --> 11:28.280
Not again.

11:28.280 --> 11:30.280
Yeah, it works.

11:30.280 --> 11:32.280
I will find a thingy.

11:32.280 --> 11:34.280
So I run a server in the background.

11:34.280 --> 11:38.280
Just a fast API thing that has now a SQLite database.

11:38.280 --> 11:40.280
You used to be...

11:40.280 --> 11:44.280
You used to be postgres.

11:44.280 --> 11:48.280
And it's going to be postgres again.

11:48.280 --> 11:51.280
It's quite a lot faster, I have to say.

11:51.280 --> 11:56.280
And that just runs the model with an HTTP front end.

11:56.280 --> 12:00.280
So what I've done, maybe I can show you that.

12:00.280 --> 12:04.280
What I've done is I've compiled an entrusted firmware, which is...

12:04.280 --> 12:09.280
I would say the reference implementation for a lot of secure monitor on arm devices.

12:09.280 --> 12:15.280
So the thing that monitors when you switch between press on and back.

12:15.280 --> 12:20.280
And your computer, which is a very interesting surface to reverse engineer.

12:20.280 --> 12:25.280
And I've compiled everything in multiple optimization levels.

12:25.280 --> 12:30.280
So, for example, if we look at O2,

12:30.280 --> 12:36.280
the UI scale is not normally like this after so.

12:36.280 --> 12:42.280
We click this away.

12:42.280 --> 12:47.280
And you should be able to see this.

12:47.280 --> 12:50.280
The UI scale is a bit...

12:50.280 --> 12:53.280
But we'll get through it together, I swear.

12:53.280 --> 13:04.280
Now, if I take the function window...

13:04.280 --> 13:06.280
We just select a function.

13:06.280 --> 13:09.280
And what I usually do when reversing, if I don't know anything yet,

13:09.280 --> 13:14.280
I just sort by reference count and start reversing the most reference functions.

13:14.280 --> 13:18.280
Because I usually get your log functions in your main copy and your main comparison.

13:18.280 --> 13:21.280
So, we take the log function.

13:21.280 --> 13:24.280
And you can see that this is the plugin for a model.

13:24.280 --> 13:31.280
It matches 1 to 1, which is logical, because I've made the signatures based on the O2 version.

13:31.280 --> 13:34.280
We'll do it once more just to show it at the works.

13:34.280 --> 13:36.280
We'll take a bit of a bigger function.

13:36.280 --> 13:38.280
And then again, it's 1.

13:38.280 --> 13:42.280
So, these are the database that this runs against.

13:42.280 --> 13:48.280
So, when you click on a function in Gidra, it just sends the whole basic log structure as Jason to the server.

13:48.280 --> 13:51.280
That makes it an abatting, compares it to the database and sends it back.

13:51.280 --> 13:53.280
These look the most like it.

13:53.280 --> 13:58.280
Now, O2 is not that interesting, of course, because it will match perfectly.

13:58.280 --> 14:01.280
Let's take O0, for example, no optimization.

14:01.280 --> 14:04.280
Then we do the same thing.

14:04.280 --> 14:07.280
Take the function window.

14:07.280 --> 14:11.280
No, no, thank it.

14:11.280 --> 14:16.280
Love me some Python too.

14:16.280 --> 14:23.280
And we do the same thing with take TF log.

14:23.280 --> 14:25.280
Now, this is not the same function, you can imagine.

14:25.280 --> 14:27.280
Because O2 automizes a lot.

14:27.280 --> 14:36.280
So, the assembly will look fully different, which we will never be able to see on this resolution.

14:36.280 --> 14:39.280
But that's fine.

14:39.280 --> 14:45.280
And if we go to the functions of O2, we say TF log.

14:45.280 --> 14:50.280
Oh, shit, sorry, here.

14:50.280 --> 14:57.280
So, O0, it will match by quite a good amount.

14:57.280 --> 15:02.280
So, the problem with this is that the absolute numbers they don't really mean a lot.

15:02.280 --> 15:08.280
It just, what you want is that the first entry here discriminates very well against the next entry.

15:08.280 --> 15:13.280
So, then you know like, oh, it's quite certain that at least this looks like TF log.

15:13.280 --> 15:16.280
Or it has TF log in line, something like that.

15:16.280 --> 15:19.280
And we can do that for a couple of functions.

15:19.280 --> 15:23.280
So, if we take another one, console flush again.

15:23.280 --> 15:27.280
Now, we see also this goes very well.

15:28.280 --> 15:32.280
This is quite easy, because we here have a console flush.

15:32.280 --> 15:35.280
So, we already have the symbols for this thing.

15:35.280 --> 15:37.280
But you can imagine if you're reversing.

15:37.280 --> 15:42.280
It would be quite handy if this thing gives you some decompilation, some disassembly.

15:42.280 --> 15:45.280
And you don't know what you're looking at.

15:45.280 --> 15:48.280
It would be quite nice to have that function there.

15:48.280 --> 15:53.280
Now, by the power of having already done this.

15:53.280 --> 15:56.280
I took a strip to one, and this is OS.

15:56.280 --> 15:59.280
So, it optimizes, but in a very different way.

15:59.280 --> 16:02.280
It will not unroll loops or anything like O3 will.

16:02.280 --> 16:08.280
And here, if we do the same thing, you can see these functions they don't have any names.

16:08.280 --> 16:12.280
So, I don't also know what they are now.

16:12.280 --> 16:16.280
But there you can see, ah, this is a CM set of context.

16:16.280 --> 16:19.280
Where is the OS?

16:20.280 --> 16:23.280
Yeah, yeah. This is CM set of context, for example.

16:23.280 --> 16:29.280
It takes some different ones.

16:29.280 --> 16:30.280
Now, you will also see it.

16:30.280 --> 16:33.280
They're just named FUNK.

16:33.280 --> 16:35.280
Take a different big function.

16:35.280 --> 16:38.280
So, it's a really ergonomic setup, put them on.

16:38.280 --> 16:41.280
Let's see.

16:42.280 --> 16:45.280
Woo.

16:45.280 --> 16:52.280
And still, it will say, ah, that is probably console flush.

16:52.280 --> 16:57.280
So, you can see that if you're reversing, this would be kind of nice to have.

16:57.280 --> 17:00.280
And the nice thing is, it works that way around as well.

17:00.280 --> 17:01.280
When you name a function.

17:01.280 --> 17:05.280
So, when, for example, I name the FUNK bla bla console flush.

17:05.280 --> 17:10.280
Or I double click on this one, which makes it auto name it.

17:11.280 --> 17:13.280
Then it will also send it to the server.

17:13.280 --> 17:15.280
And say, ah, okay, now you've named that.

17:15.280 --> 17:19.280
So, if we're reversing a bunch of us, and there's five of us in one room.

17:19.280 --> 17:22.280
If we're reversing an I name something, then immediately for the next person,

17:22.280 --> 17:26.280
if they click that function, even if it's in a different bootloader or

17:26.280 --> 17:29.280
different secure monitor or anything, it will say, ah, it's probably that.

17:29.280 --> 17:31.280
It's exactly what I was looking for.

17:31.280 --> 17:34.280
Because this model will not change for now.

17:34.280 --> 17:37.280
And even if it changes, you can,

17:37.280 --> 17:42.280
they really make the signatures, because it also sends the basic blocks to the server.

17:42.280 --> 17:46.280
And you can keep on using this.

17:46.280 --> 17:48.280
The assembly will not change and build a database.

17:48.280 --> 17:50.280
It gets bigger and bigger and bigger over time.

17:50.280 --> 17:55.280
And in the end, you're never going to manually name MemCopy MemCompare again.

17:55.280 --> 17:57.280
That scusi switch statement you saw.

17:57.280 --> 18:00.280
That, well, you will also never have to do that by hand again,

18:00.280 --> 18:02.280
because I've already done it once.

18:02.280 --> 18:04.280
So, why do it again?

18:04.280 --> 18:07.280
So, that's what I wanted to show you.

18:07.280 --> 18:13.280
And all of this, like,

18:13.280 --> 18:20.280
got to fix the world's most junky setup.

18:34.280 --> 18:37.280
All of this work is open source.

18:37.280 --> 18:38.280
No.

18:38.280 --> 18:39.280
No.

18:39.280 --> 18:40.280
Let's see.

18:40.280 --> 18:41.280
One sec.

18:47.280 --> 18:49.280
All of this work is open source.

18:49.280 --> 18:52.280
You can find it on our GitHub, the model is on hogging phase.

18:52.280 --> 18:55.280
So, you can use it yourself if you want to as well.

18:55.280 --> 18:59.280
And you can set up your own server, the documentation is all there.

18:59.280 --> 19:02.280
So, I have a QR code somewhere.

19:03.280 --> 19:05.280
But I don't know where anymore.

19:11.280 --> 19:13.280
Ah, one sec.

19:33.280 --> 19:38.280
So, it exists from free parts.

19:38.280 --> 19:40.280
You have us in transformers, which is the model.

19:40.280 --> 19:41.280
It's on hogging phase.

19:41.280 --> 19:45.280
I would also share this presentation with you all so you can have the links.

19:45.280 --> 19:46.280
This is the model.

19:46.280 --> 19:49.280
The training set, unfortunately, I wanted to share it.

19:49.280 --> 19:51.280
But all these things are source.

19:51.280 --> 19:53.280
If you compile them, you're distributing binaries.

19:53.280 --> 19:57.280
And it's quite hard to get through all the licenses and be sure that that's allowed,

19:57.280 --> 19:59.280
or what you have to deliver with it.

19:59.280 --> 20:02.280
So, our legal department doesn't want it.

20:02.280 --> 20:07.280
It might slip on to the internet somewhere.

20:07.280 --> 20:08.280
So, this is just a model.

20:08.280 --> 20:10.280
You can run it standalone.

20:10.280 --> 20:13.280
The partner appers everything are there.

20:13.280 --> 20:15.280
This is the server that we're running against.

20:15.280 --> 20:17.280
It's just a fast API implementation.

20:17.280 --> 20:18.280
It's actively developed.

20:18.280 --> 20:20.280
It gets out as a little bit silent on the repo.

20:20.280 --> 20:23.280
But in May, we are continuing this work again.

20:23.280 --> 20:26.280
With a new cross-instruction set model.

20:26.280 --> 20:29.280
Because first we did a lot of arm 64.

20:29.280 --> 20:30.280
Now we're seeing some risk fee.

20:30.280 --> 20:32.280
I want to do VXIR as well.

20:32.280 --> 20:34.280
Like an intermediate representation.

20:34.280 --> 20:36.280
You can run this server on your own.

20:36.280 --> 20:38.280
And in the setup that I just showed you.

20:38.280 --> 20:40.280
And you have Centencia, which is for now.

20:40.280 --> 20:41.280
Just a Gidra plugin.

20:41.280 --> 20:44.280
The one that I just showed you that you point to the server.

20:44.280 --> 20:48.280
And if you click in a function, it will request what functions it knows.

20:48.280 --> 20:51.280
And if you label a function, it will put it in the database.

20:51.280 --> 20:55.280
I am very much planning on also making an item plugin.

20:55.280 --> 20:57.280
Because for us, some people also use it either.

20:57.280 --> 21:00.280
So we also want to be able to use it.

21:00.280 --> 21:03.280
And that would mean you have a sort of a lumina-like thing

21:03.280 --> 21:06.280
across Gidra, Idra and everything.

21:06.280 --> 21:09.280
So please use it.

21:09.280 --> 21:13.280
And please contribute.

21:13.280 --> 21:15.280
That was it.

21:16.280 --> 21:22.280
Sorry for the janky demo.

21:22.280 --> 21:25.280
Are there any questions?

21:25.280 --> 21:27.280
Excuse me.

21:33.280 --> 21:38.280
In essence, how well does this method generalize?

21:38.280 --> 21:42.280
If the source code is different, for example, rust?

21:42.280 --> 21:44.280
That's a bit of the problem.

21:44.280 --> 21:47.280
If you have a project that's compiled in rust.

21:47.280 --> 21:51.280
And it's different from the same project that was originally seen.

21:51.280 --> 21:54.280
You will not see any generalizations between those two things.

21:54.280 --> 21:59.280
If you were a first or rust project, you're versing the next version of a rust project.

21:59.280 --> 22:00.280
It will still work fine.

22:00.280 --> 22:03.280
Because it's so low level that it only looks at the basic blocks of a function.

22:03.280 --> 22:07.280
And it doesn't really matter if there are v tables or anything.

22:07.280 --> 22:10.280
Because they're in the older version as well.

22:11.280 --> 22:13.280
But can you still relate to assembly?

22:13.280 --> 22:17.280
Because if you use a rust or see,

22:17.280 --> 22:20.280
they are semi-insured with at least the same, right?

22:20.280 --> 22:21.280
Yes.

22:29.280 --> 22:32.280
Yes, so the question is, even if you use rust or see,

22:32.280 --> 22:35.280
you can still use the assembly to use the model.

22:35.280 --> 22:37.280
Yes, yes.

22:37.280 --> 22:39.280
The rust just compiles to assembly.

22:39.280 --> 22:40.280
And it's not in assembly level.

22:40.280 --> 22:41.280
It's not that different.

22:41.280 --> 22:44.280
It is very different with calling conventions and v tables.

22:44.280 --> 22:49.280
And this debug, like stuff and runtime, stuff that it has.

22:49.280 --> 22:51.280
But on the things that the model sees,

22:51.280 --> 22:56.280
it's just jumps between basic blocks and certain minimonics.

22:56.280 --> 23:03.280
And it's almost the same question about sequence blocks.

23:03.280 --> 23:09.280
I mean, doing the rust and change because less from the amateur,

23:09.280 --> 23:14.280
this regard is pretty hard because it's the manual.

23:14.280 --> 23:16.280
And the overloading, et cetera.

23:16.280 --> 23:19.280
So what is learning?

23:19.280 --> 23:21.280
Yeah, it is.

23:21.280 --> 23:23.280
It is also hard to reverse.

23:23.280 --> 23:24.280
Sorry.

23:24.280 --> 23:29.280
If I understand correctly, it's a seatless plus is quite hard to reverse.

23:29.280 --> 23:32.280
Will the model still work correctly and the implementation?

23:32.280 --> 23:33.280
Yeah.

23:33.280 --> 23:36.280
In essence, the reversing is a lot harder.

23:36.280 --> 23:38.280
And that's going to stay that way.

23:38.280 --> 23:39.280
But the model will not see that.

23:39.280 --> 23:42.280
The model will just see the basic blocks, even if it's seatless plus.

23:42.280 --> 23:44.280
And I will see it calling v tables.

23:44.280 --> 23:49.280
But to the model is just the referencing of v table and offset inside it.

23:49.280 --> 24:03.280
So true, but on an assembly level, that doesn't really matter to the model.

24:03.280 --> 24:06.280
So you will have to know it as the reverser.

24:06.280 --> 24:12.280
But if you say this thing is the template for this thing, then to the model,

24:12.280 --> 24:15.280
it just says template for this function.

24:15.280 --> 24:19.280
One of the things I also use this for, maybe I forgot too much.

24:19.280 --> 24:23.280
It's just compile a project from source in different levels.

24:23.280 --> 24:26.280
Throw them in the database and then you start reversing something.

24:26.280 --> 24:28.280
And then it says, oh, I already know these things.

24:28.280 --> 24:31.280
You can do the same thing with C++ or Rust.

24:31.280 --> 24:32.280
Just compile it.

24:32.280 --> 24:33.280
Throw it in Guidro.

24:33.280 --> 24:34.280
Throw it against the database.

24:34.280 --> 24:40.280
And then if you start reversing, you see what the symbols in the assembly, what I've said,

24:40.280 --> 24:42.280
if it had debug symbols.

24:43.280 --> 24:45.280
So reversing is still quite hard.

24:45.280 --> 24:46.280
For C++.

24:57.280 --> 25:01.280
Not yet. Not yet. What I really want is for now.

25:01.280 --> 25:07.280
Sorry, how do you deal with things like v tables and structured definitions?

25:07.280 --> 25:09.280
At this point, we don't.

25:09.280 --> 25:11.280
So now it's only function names.

25:11.280 --> 25:13.280
And what I really want is function parameters.

25:13.280 --> 25:14.280
Structs.

25:14.280 --> 25:17.280
And that's going to be a lot harder because if you do an item plugin,

25:17.280 --> 25:19.280
and it should still be cross platform.

25:19.280 --> 25:22.280
You have to think about how you're going to make that portable,

25:22.280 --> 25:25.280
which of course could be had or false or anything.

25:27.280 --> 25:28.280
Yeah, yeah.

25:28.280 --> 25:32.280
As we go to start problem, for example, Guidro has a collaborative server.

25:32.280 --> 25:35.280
And there you can also collaborate over symbols.

25:35.280 --> 25:37.280
So you can say, oh, I have the symbols for this winery.

25:37.280 --> 25:40.280
My made some structs. You can also add it to them.

25:40.280 --> 25:43.280
And they're essentially something like get.

25:43.280 --> 25:46.280
But I've not done anything to implement that here.

25:46.280 --> 25:49.280
That would definitely be on the wish list.

25:56.280 --> 25:59.280
The nice thing is if you compile the art user repo,

25:59.280 --> 26:03.280
everybody gets to make their own package files.

26:03.280 --> 26:05.280
So he gets a lot of different components.

26:05.280 --> 26:08.280
He also gets rust in there and some different languages.

26:08.280 --> 26:12.280
Because it's like the whole landscape comes back

26:12.280 --> 26:14.280
and to what you've been put into the model.

26:18.280 --> 26:19.280
Sorry.

26:19.280 --> 26:22.280
I've also looked at different compilers.

26:22.280 --> 26:24.280
Yes, but we have.

26:24.280 --> 26:26.280
But because of the way the data set is built.

26:29.280 --> 26:30.280
Yes.

26:30.280 --> 26:35.280
Can you tell your model on your application with for the next?

26:35.280 --> 26:36.280
Yes.

26:36.280 --> 26:41.280
Do you want to write to reverse some of the Android packages?

26:41.280 --> 26:42.280
Yes.

26:42.280 --> 26:46.280
Native libraries, you mean our application?

26:46.280 --> 26:50.280
No, no, because then Guidro can do some of that stuff.

26:50.280 --> 26:53.280
But then you're looking at the Java native byte code.

26:53.280 --> 26:56.280
And at this point, it only supports ARM64.

26:56.280 --> 26:59.280
So you can do native libraries for Android,

26:59.280 --> 27:02.280
but not I will repeat the question.

27:02.280 --> 27:09.280
The question was if we've also looked at Android applications

27:09.280 --> 27:11.280
and stuff like that.

27:11.280 --> 27:14.280
But at this point, it doesn't support it yet.

27:15.280 --> 27:17.280
I have a lot of question.

27:17.280 --> 27:20.280
Because when you show it in the demo,

27:20.280 --> 27:22.280
look at the function names.

27:22.280 --> 27:23.280
We only need the first one.

27:23.280 --> 27:25.280
The green one was like good match.

27:25.280 --> 27:26.280
But the other ones.

27:26.280 --> 27:27.280
Do you need different functions?

27:27.280 --> 27:28.280
Yes.

27:28.280 --> 27:29.280
So you only look at the green one?

27:29.280 --> 27:31.280
Well, you need it.

27:31.280 --> 27:32.280
Sorry.

27:32.280 --> 27:35.280
So the only the top one is like a green.

27:35.280 --> 27:39.280
And it says, oh, this first function is the perfect one.

27:39.280 --> 27:42.280
And that's only the case in most of the functions.

27:42.280 --> 27:45.280
Some still don't discriminate that well.

27:45.280 --> 27:50.280
But it's also because this data set is very synthetic.

27:50.280 --> 27:53.280
But in essence, it doesn't really matter if the function is green.

27:53.280 --> 27:56.280
It mostly matters how much it differs from the other ones.

27:56.280 --> 27:59.280
If the model says, they're all look about the same to me.

27:59.280 --> 28:03.280
Usually, it's not an indication that there's nothing good in there.

28:03.280 --> 28:07.280
So you have any specific sort of false boxes?

28:07.280 --> 28:08.280
Yes.

28:08.280 --> 28:10.280
Well, suppose it is that we don't see a lot.

28:10.280 --> 28:13.280
Only, of course, the model will always give you 25 results.

28:13.280 --> 28:14.280
Now, the server.

28:14.280 --> 28:16.280
So even if it says, oh, it doesn't look like anything to me.

28:16.280 --> 28:18.280
You will still get the top 25.

28:18.280 --> 28:20.280
So if you're now for a different binary,

28:20.280 --> 28:22.280
it's going to say, all these results are shit.

28:22.280 --> 28:23.280
But.

28:23.280 --> 28:25.280
All right.

28:25.280 --> 28:26.280
Thank you.

28:26.280 --> 28:29.280
Yes.

28:29.280 --> 28:30.280
Yes?

28:30.280 --> 28:31.280
All right.

28:31.280 --> 28:32.280
Thank you.

28:38.280 --> 28:40.280
Thank you.

