WEBVTT

00:00.000 --> 00:07.000
Can I start?

00:07.000 --> 00:08.000
Yes.

00:08.000 --> 00:09.000
Okay.

00:09.000 --> 00:11.000
So, hello everyone.

00:11.000 --> 00:18.000
My name is Alex and I'll be talking today about the chosen horrors of NES Dynamics Recompletion.

00:18.000 --> 00:26.000
I'm basically going to be talking about a project that I did a while ago when I was trying to learn about dynamic

00:27.000 --> 00:30.000
simulation and I wanted to just.

00:30.000 --> 00:32.000
Just a word.

00:32.000 --> 00:36.000
If people still staying here, can you just defrag?

00:36.000 --> 00:37.000
Oh yeah.

00:37.000 --> 00:39.000
So people can take a sit.

00:39.000 --> 00:40.000
Thank you.

00:40.000 --> 00:58.000
Okay, so like I was saying, basically this is a project that I did.

00:58.000 --> 01:00.000
I'll wait.

01:00.000 --> 01:01.000
I'll wait a bit.

01:01.000 --> 01:02.000
I'll wait a bit.

01:10.000 --> 01:38.000
Okay, well, like I was saying, basically this is a project that I did while I was trying to learn how

01:38.000 --> 01:42.000
emulators and dynamic reconpiler work.

01:42.000 --> 01:45.000
And I'll be talking a bit about it.

01:45.000 --> 01:48.000
Okay, so first of all, what is an emulator?

01:48.000 --> 01:52.000
I mean, different people have different descriptions about what an emulator is.

01:52.000 --> 01:57.000
But the way that I like to think about it, it's basically you're writing a program that's

01:57.000 --> 02:00.000
faking being another system.

02:00.000 --> 02:04.000
It's tricking the original software into thinking that it's running on the hardware.

02:04.000 --> 02:06.000
It was supposed to run.

02:06.000 --> 02:18.000
So like some popular emulators that are out there are QMU, which allows you basically to run

02:18.000 --> 02:23.000
an version of an operating system on any machine.

02:23.000 --> 02:29.000
You have console emulators like RPCS3, which is a PS3 emulator.

02:29.000 --> 02:34.000
You have V8, which wouldn't you wouldn't normally think of as an emulator.

02:34.000 --> 02:41.000
But it actually compiles your code into an intermediate representation and then compiles

02:41.000 --> 02:43.000
it back to machine code.

02:43.000 --> 02:45.000
So it is actually like an emulator.

02:45.000 --> 02:46.000
And why?

02:46.000 --> 02:51.000
Which unlike what they would like you to think is an emulator.

02:51.000 --> 02:56.000
Of when those furlinics.

02:56.000 --> 03:03.000
Okay, so before you consider writing a reconpiler, you probably should consider writing

03:03.000 --> 03:09.000
an interpreter because an interpreter is actually a lot easier to implement.

03:09.000 --> 03:16.000
You just need to go instruction by instruction of the original software and execute whatever

03:16.000 --> 03:17.000
operations it did.

03:17.000 --> 03:19.000
You don't have to do anything fancy about it.

03:19.000 --> 03:21.000
It's very flexible.

03:21.000 --> 03:32.000
It allows you to avoid a lot of pitfalls with memory and caches and everything.

03:33.000 --> 03:36.000
And it's also very easy to debug.

03:36.000 --> 03:43.000
The con of an interpreter and the basically the only reason you would write anything other than an

03:43.000 --> 03:48.000
interpreter is because interpreting is actually quite slow.

03:48.000 --> 03:53.000
Interpreting a single instruction of the original machine could take like hundreds or

03:53.000 --> 03:56.000
thousands of the host machine.

03:57.000 --> 04:00.000
For the NES, that isn't actually a problem.

04:00.000 --> 04:06.000
The NES has a spew that for today's standards is so slow that it doesn't really matter.

04:06.000 --> 04:11.000
You wouldn't notice, but you've got to learn, you've got to start from somewhere.

04:11.000 --> 04:13.000
So yeah.

04:13.000 --> 04:17.000
And when you move on to write a reconpiler, you basically have two options.

04:17.000 --> 04:25.000
You have a just in time or dynamic reconpiler, which essentially, when you want to execute an instruction,

04:25.000 --> 04:29.000
that is at some address on the original software.

04:29.000 --> 04:35.000
What you do is recompile a bunch of instructions.

04:35.000 --> 04:40.000
You recompile the instruction you want to execute and the one below and the one below that.

04:40.000 --> 04:45.000
And you go on and on until you have to stop for reasons we'll discuss later.

04:45.000 --> 04:51.000
So the main benefits of this is that it's usually quite a lot faster than interpreters.

04:52.000 --> 04:59.000
Because once you have recompiled a code, a block of code, you don't have to recompile it again.

04:59.000 --> 05:02.000
You can use the result from the other one.

05:02.000 --> 05:09.000
So basically, the only time a just in time reconpiler is actually slow is when it encounters a lot of new code.

05:09.000 --> 05:14.000
For example, if you go to a new level on a game or you just start running the game,

05:14.000 --> 05:19.000
that's the only part where it's just time reconpiler would be going a bit slow.

05:19.000 --> 05:24.000
And it also has the benefit that it can perform hot path optimizations.

05:24.000 --> 05:31.000
You can analyze which parts of the code you are running a lot and which ones you're running, you're not running that much.

05:31.000 --> 05:36.000
And say, okay, since this part of the code is actually been run a lot,

05:36.000 --> 05:42.000
I'm going to take a bit of time to optimize it so the next time I run it, it's faster.

05:42.000 --> 05:48.000
But if I have a part of the code that it's only executed once, then there's no need to optimize it.

05:48.000 --> 05:52.000
I could just run it and forget about it.

05:52.000 --> 05:53.000
Yeah.

05:53.000 --> 05:59.000
The major cause of the just in time reconpiler is that it's very hard to debug in real time,

05:59.000 --> 06:04.000
because the program is essentially generating new code to execute for himself.

06:04.000 --> 06:10.000
So you can't really break point on it and see what's going wrong.

06:10.000 --> 06:16.000
You just need to assume what's going wrong based on what's happening.

06:16.000 --> 06:24.000
And the step of reconpiling new code is like an overhead, it takes time.

06:24.000 --> 06:38.000
So if you are trying to emulate a program that it's moving around a lot and it constantly finds new code to execute that could be a problem.

06:38.000 --> 06:43.000
Now the other way to do reconpilation is what's called a head of time or static reconpilation,

06:43.000 --> 06:48.000
which is basically everything you would aspire for a reconpiler.

06:48.000 --> 06:57.000
It just reads the whole original code and then translate it into code for your machine all at once and you're done.

06:57.000 --> 07:07.000
This is obviously the fastest way to emulate a program because there's no need to in the middle of execution.

07:07.000 --> 07:10.000
Of execution do anything different.

07:10.000 --> 07:16.000
You just run a program like it was a regular program for your computer.

07:16.000 --> 07:26.000
And another benefit is that you can reconpile once and then execute the same reconpiled program any time.

07:26.000 --> 07:29.000
You don't have to do reconpilation over and over again.

07:29.000 --> 07:36.000
And it's actually easier to debug than a just in time reconpiler because you're not generating code on the fly.

07:36.000 --> 07:44.000
You're generating code once and so if you want to put a breakpoint at some place it's actually easy to do.

07:44.000 --> 07:49.000
The bad part of the head of time reconpilers is that they're a pain in the ass.

07:49.000 --> 08:04.000
You need to analyze the whole original code find which parts of the code jump to other parts of the code and that's not always obvious.

08:04.000 --> 08:12.000
For example, when you want to call a function pointer, how would you know where that function pointer is in memory.

08:12.000 --> 08:18.000
If I only have a function that it's called by a pointer, how do I find where it is?

08:18.000 --> 08:20.000
If no one's direct recalling it.

08:20.000 --> 08:27.000
So you have to do a lot of analyzing the code and finding where everything is to do that.

08:27.000 --> 08:32.000
And it also cannot do how path of optimizations because you reconpile once.

08:32.000 --> 08:38.000
You don't have the chance to analyze which part of the code is being run more than others.

08:38.000 --> 08:44.000
So this is the basic just about emulation reconpilers.

08:44.000 --> 08:49.000
And my project specifically is an NES dynamic reconpiler.

08:50.000 --> 08:53.000
So I'm no need to analyze the whole program.

08:53.000 --> 08:55.000
It's written in C++20.

08:55.000 --> 09:02.000
It uses L of EM to perform the compilation back to host machine code.

09:02.000 --> 09:14.000
Basically, I recompile NES instructions into L of EM and then L of EM compiles that code into X86 or whatever it is you're running.

09:14.000 --> 09:20.000
And for Windows and event handling, I just use this deal basically.

09:20.000 --> 09:21.000
There we go.

09:21.000 --> 09:27.000
So because this was a small project that I wanted to write,

09:27.000 --> 09:35.000
I said myself the clear mindset that I didn't want to do like the holy grail of emulators.

09:35.000 --> 09:40.000
I wanted to do something that worked and to see.

09:40.000 --> 09:44.000
And so that I could understand the process of how it works.

09:44.000 --> 09:50.000
So some of the things that my emulator cannot do, for example, is it cannot handle self-modifying code.

09:50.000 --> 10:01.000
Like for example, Super Mario Brothers 1 has parts of the code that writes to memory and then execute that memory that it has just written.

10:01.000 --> 10:04.000
That wouldn't be handled on my emulator.

10:04.000 --> 10:06.000
It doesn't have support for all these functions.

10:06.000 --> 10:11.000
I basically just support for now instructions for the games I tested.

10:11.000 --> 10:16.000
So that way it could iterate faster and see where the bugs were and all of that.

10:16.000 --> 10:25.000
The PPU is actually the picture processing unit, which is the one in charge of the graphics.

10:25.000 --> 10:32.000
It's an interpreted that I took from somewhere else because recompiling the PPU code would be,

10:32.000 --> 10:37.000
I mean, not impossible, but it would be headscratcher.

10:37.000 --> 10:45.000
And I also have no support for audio for the time because my focus was on the PPU, on the PPU recompilation.

10:45.000 --> 10:49.000
But it can run backmen.

10:49.000 --> 10:55.000
And if I have time, if I have a bit of time later, I'll show it live, but I put there a gift.

10:55.000 --> 10:58.000
So in case I don't have time, you can see that it actually works.

10:58.000 --> 11:03.000
Thank you.

11:03.000 --> 11:09.000
Okay, so now a bit more into how the emulator works.

11:09.000 --> 11:17.000
So basically, whenever you need to run an instruction, you haven't seen before, a code block that you haven't seen before.

11:17.000 --> 11:21.000
You create on the host machine a function.

11:22.000 --> 11:26.000
And for that function, you basically go iterating, like I said, instruction by instruction.

11:26.000 --> 11:34.000
You go fetching instructions until you reach a point where you stop recompiling instructions.

11:34.000 --> 11:44.000
In this case, one of the ways you can stop a code block from recompiling is saying,

11:44.000 --> 11:49.000
that, for example, you want code blocks to have a maximum of 10,

11:49.000 --> 11:51.000
10 original instructions.

11:51.000 --> 11:56.000
You could, for example, want to limit the size of the original of the code blocks.

11:56.000 --> 12:04.000
So they don't get too big and you're missing parts of the code in the middle that you also want to recompile.

12:04.000 --> 12:12.000
Okay, so when, now we know that code blocks are implemented as host as the host machines functions.

12:12.000 --> 12:18.000
And so whenever we jump from one code block to another, we need to know the state in which the CPU was.

12:18.000 --> 12:22.000
When we executed the last one, now we're entering this one.

12:22.000 --> 12:31.000
So the way we're doing it here is basically, we write the registers and the flag values into memory.

12:31.000 --> 12:39.000
And basically, to each function that acts like a recompile code block, we pass the state of the CPU as a pointer.

12:39.000 --> 12:47.000
So that way we can access all the registers, all the flags, and also the integrated RAM is included in there.

12:47.000 --> 12:52.000
And not technically the CPU, but it makes it easier.

12:52.000 --> 13:04.000
And for regular instructions, like addition, shifting, and all of that, we essentially just take the operations that that instruction did.

13:04.000 --> 13:14.000
And we will translate them. So for in this example, we have the rotate right instruction, which takes a value from memory and then shifts it to the right.

13:14.000 --> 13:21.000
And the right most bid goes to the carrier flag. So we do that. We update some flags.

13:21.000 --> 13:26.000
For conditional branching instructions, for example, for an if.

13:26.000 --> 13:34.000
We tag if the condition we're testing is true. If the condition we're testing is true, we return from the function.

13:34.000 --> 13:38.000
And if it's not true, we keep recompiling the instructions below.

13:38.000 --> 13:46.000
So in the case that we do not take that jump, we don't have to go to a new recompile code block.

13:46.000 --> 13:52.000
And for jams and calls, that mean that we'll always be jumping at that point.

13:52.000 --> 13:57.000
And we will not be executing the instruction below, which has returned from the function.

13:57.000 --> 14:06.000
And from the function, we return the address that we'll want to execute the next.

14:06.000 --> 14:12.000
Okay, I don't have time to explain all the horrors of recompiling the NES.

14:12.000 --> 14:20.000
I'll just leave one here. So in the NES, things like the PPU and the APU perform actions.

14:20.000 --> 14:24.000
Once every certain amount of cycles.

14:24.000 --> 14:32.000
And so what this means for my recompiler is that every time I recompile an instruction, I need to.

14:32.000 --> 14:36.000
Check how many cycles have passed on the CPU.

14:36.000 --> 14:44.000
And then after a certain amount of cycles have passed, I need to jump to an interrupt handler.

14:44.000 --> 14:52.000
Basically, I need to jump to completely different part of the code and execute that instead of the next instruction.

14:52.000 --> 15:00.000
So you have to do that every time you recompile an instruction, which makes it a lot harder for LVM to optimize the recompile code.

15:00.000 --> 15:06.000
And that is it.

15:06.000 --> 15:12.000
Okay.

15:12.000 --> 15:16.000
Well, questions, what do I have to do?

15:16.000 --> 15:20.000
Yeah, I can do the demo.

15:20.000 --> 15:23.000
If anyone has any questions, keep an answer.

15:23.000 --> 15:24.000
Any questions?

15:24.000 --> 15:26.000
We have two minutes.

15:26.000 --> 15:28.000
Okay.

15:28.000 --> 15:29.000
Yeah.

15:29.000 --> 15:32.000
First of all, then I come back.

15:32.000 --> 15:34.000
You've time to lose.

15:34.000 --> 15:40.000
First, let me say, I think it's a great work and a great presentation.

15:40.000 --> 15:43.000
Just let me add one bit.

15:43.000 --> 15:52.000
You wrote that fully static ahead of time compilation has no way to do

15:53.000 --> 15:55.000
Part-path optimization.

15:55.000 --> 16:00.000
Actually, it is possible if you use profile guided optimization.

16:00.000 --> 16:01.000
Yeah.

16:01.000 --> 16:09.000
I mean, it is possible if, but it would be extremely, yeah, it is possible.

16:09.000 --> 16:13.000
But it's not, or this takes a lot.

16:13.000 --> 16:14.000
Yeah.

16:14.000 --> 16:15.000
Yes.

16:15.000 --> 16:16.000
Exactly.

16:16.000 --> 16:18.000
Complicated, but possible.

16:18.000 --> 16:21.000
There was a question over here.

16:21.000 --> 16:22.000
That is a problem.

16:22.000 --> 16:24.000
I don't know if it's.

16:24.000 --> 16:28.000
Did you try any other just-in-time libraries?

16:28.000 --> 16:32.000
Do you have a particular preference for LVM simplmentation?

16:32.000 --> 16:33.000
No.

16:33.000 --> 16:35.000
I don't know if.

16:35.000 --> 16:39.000
I basically did LVM because it's the one I'm most used to.

16:39.000 --> 16:41.000
I haven't checked anyone.

16:41.000 --> 16:42.000
Any other one out?

16:42.000 --> 16:45.000
I basically just used to, uh, LVM for that.

16:45.000 --> 16:46.000
Yeah, yeah, go ahead.

16:46.000 --> 16:50.000
My cable doesn't feed here, so I cannot plug the controller.

16:51.000 --> 16:52.000
Yeah.

16:52.000 --> 16:54.000
So, I'd actually still have one question here.

16:54.000 --> 16:56.000
Would you have a couple of minutes, huh?

16:56.000 --> 17:01.000
I just wanted to know, uh, you said you've tested it with a handful of games.

17:01.000 --> 17:02.000
Uh, what's the list?

17:02.000 --> 17:04.000
Just not a curiosity.

17:04.000 --> 17:08.000
Well, I, first I wanted to try outside bike, but outside bike.

17:08.000 --> 17:15.000
It run, but it had, it had some bug that I wasn't able to figure out what was going wrong.

17:15.000 --> 17:19.000
It's like the bike was going faster than the screen was rolling.

17:19.000 --> 17:21.000
So, it was weird.

17:21.000 --> 17:25.000
I tried Pac-Man because it's one of the simple ones and it did work.

17:25.000 --> 17:29.000
I also tried Donkey Kong, and Donkey Kong also worked.

17:29.000 --> 17:32.000
So yeah, those are the main ones I tested.

17:32.000 --> 17:35.000
Very last, hopefully quick question here.

17:35.000 --> 17:37.000
Hi.

17:37.000 --> 17:42.000
Um, if I end up to it correctly, you said that it was difficult to handle

17:42.000 --> 17:48.000
Seth modifying programs, but I was wondering why because if you're just compiling them,

17:48.000 --> 17:50.000
they should be fine at runtime.

17:50.000 --> 17:56.000
They should be able to modify their stuff, like, like the memory rights.

17:56.000 --> 17:57.000
Yeah.

17:57.000 --> 18:00.000
So the problem with self-modifying code is that whenever you are

18:00.000 --> 18:05.000
Recompiling, when you, when you recompile a code block, you start that, uh,

18:05.000 --> 18:11.000
recompiled, uh, actual machine code for the host, for the host machine, you save it.

18:11.000 --> 18:16.000
But for self-modifying code, every time you write to memory, you need to check

18:16.000 --> 18:21.000
all the code you have already recompiled and say, okay, have I written to a memory

18:21.000 --> 18:23.000
address that I had cashed?

18:23.000 --> 18:28.000
And you have to check that all the time, every time you do, uh, a memory

18:28.000 --> 18:29.000
right.

18:29.000 --> 18:35.000
So it is possible, but it, it slows down the program a lot, and you have to do it right

18:35.000 --> 18:39.000
to know when you invalidate memory addresses one not.

18:39.000 --> 18:42.000
Yeah, that is it.

18:42.000 --> 18:43.000
Many thanks, Alex.

18:43.000 --> 18:45.000
Now it's time for Michelle.

18:45.000 --> 18:47.000
Thank you.

