WEBVTT

00:00.000 --> 00:16.360
So welcome to our 2025 Foz Dem open hardware and CAD CAM Devroom.

00:16.360 --> 00:20.600
I am very happy to welcome you.

00:20.600 --> 00:27.800
You got in early, which means you got a seat and we're going to be starting with a very

00:27.800 --> 00:33.520
interesting talk on the 8-bit architecture from the SDCC.

00:33.520 --> 00:42.720
This is a film of course, we'll be giving us an overview and I hope that this is as interesting

00:42.720 --> 00:45.480
to you as it is to me.

00:45.480 --> 00:47.480
So, good, please.

00:47.480 --> 00:48.480
Thanks.

00:48.480 --> 00:55.480
Thank you very much for your time.

00:55.480 --> 01:00.600
Let's get to the background, click in production, 8-bit architecture, I mean,

01:00.600 --> 01:06.080
I hope it was still relevant, because it's still not between either lower and

01:06.080 --> 01:11.520
farbit microcontrollers and the back side of the tube is there where we have arm and wrist.

01:11.520 --> 01:15.480
Actually, last time my body is a cyclocomputer from my bike and the took in the

01:15.480 --> 01:18.640
part, and it had a four-bit microcontroller in it.

01:18.640 --> 01:20.640
So this is things still exist.

01:20.640 --> 01:25.840
Typically, sometimes I would say if you're lucky programmed and see, but there's, because

01:25.840 --> 01:29.680
there's still plenty of stuff on the market where there's no compiler at all, and you have

01:29.680 --> 01:31.680
to program them in a simpler.

01:31.680 --> 01:39.920
They tend to be not that expensive, typically about a cent to a euro, just also sometimes

01:39.920 --> 01:43.120
cheaper ones.

01:43.120 --> 01:50.800
The better memory is in the range of about 30 bytes, to about 8 kilobytes, typically in

01:50.800 --> 01:51.800
the system.

01:51.800 --> 02:00.000
You have a few kilobytes of programmery, either in the form of flash or fused pirom or

02:00.000 --> 02:03.600
similar thing like that.

02:03.600 --> 02:06.800
There's a lot of these devices being made, yeah.

02:06.800 --> 02:12.920
The selling for phobillion, then if one cost just a cent, you can guess the

02:12.920 --> 02:20.600
number, the architectures, however, often not so nice, either they are like ancients,

02:20.600 --> 02:27.080
they're like 85, that's not really a high level language, frankly, I mean, I'm talking

02:27.080 --> 02:31.040
about, see, what the same would hold for any other high level language, which we try to use

02:31.040 --> 02:32.040
it with.

02:32.040 --> 02:37.600
And then there's a few popular theory architectures, that all have the strong and

02:37.840 --> 02:44.000
weak points, but only one manufacturer and when that one goes out of business or just decides

02:44.000 --> 02:48.760
they want to migrate their customers to their new arm line, however, their architectures

02:48.760 --> 02:51.720
are the ice with it.

02:51.720 --> 02:56.440
Okay, that much about 8 bit, now let's get to the small device, see compiler.

02:56.440 --> 03:00.840
Basically, the small device, see compiler is the free, see compiler for 8 bit architectures.

03:00.840 --> 03:06.800
Yes, there are a few architectures supported by GCC or LLVM, but most 8 bit architectures

03:06.800 --> 03:12.080
either don't have a free compiler or it's SCC.

03:12.080 --> 03:21.000
It's a part, current standards, I forgot to update the slide here because it's ISOC 23 now

03:21.000 --> 03:26.200
standard has been officially released recently and we recently have been supported, just

03:26.200 --> 03:30.320
as earlier one, the typical use of the free standing implementation, but sometimes also as

03:30.320 --> 03:36.720
a part of the one, the support in tools for sale, the same clear linker simulators for

03:36.720 --> 03:41.480
the architectures VisaPort, it works on like the most common host operating system,

03:41.480 --> 03:48.680
such as the herd, and a target, of course, a lot of headbed architectures, MCR 51, that's

03:48.680 --> 03:54.520
probably the very important one, because the 51, despite not having been made by Intel for

03:54.600 --> 04:01.520
the case, as a river, probably nearly every one of you in this room probably has been released

04:01.520 --> 04:07.520
one with them right now. For example, if you have a real tag network, Wi-Fi card, then you

04:07.520 --> 04:14.360
compute somewhere, that's small, the 51 core in it. So if those real tag network has ever

04:14.360 --> 04:19.520
get the free firm, where SCC will be the compiler, that it will be compiled with.

04:20.400 --> 04:29.520
Okay, this variance of the 51, there's the 80 and a lot of the 80 derivatives, then we have

04:29.520 --> 04:37.760
things like the S308, an XP microcontroller, must use an automated application, the STM8,

04:37.760 --> 04:45.320
the current S30 architecture, the PDK is a very low-cost device, you can often get some

04:45.360 --> 04:49.640
less than a set, even if you buy just a standard it is reviewed, there's some interesting things

04:49.640 --> 04:56.560
there, such as the HF something like an 8 core microcontroller, the 60-bytes RAM shared by

04:56.560 --> 05:02.680
all 8 cores, and stuff like that, and yeah, the compiler has some unusual optimization, set

05:02.680 --> 05:09.120
make sense for these targets, but wouldn't make much sense for bigger architectures, in particular

05:09.120 --> 05:14.280
a network allocation, because these typically higher irregular architectures. For the target

05:14.280 --> 05:19.800
that you have in GCC, LLVM, graph-coloring, 13-star-ratches, the allocation is perfect, cause

05:19.800 --> 05:25.320
registers are much faster than memory, and registers are kind of equal, as long as you

05:25.320 --> 05:32.160
allocate into a register, that's what matters. On the other hand, here you have few registers,

05:32.160 --> 05:38.960
and no, very often, no two registers are interchangeable, so it really matters, which register

05:38.960 --> 05:46.040
very, it will go into, it will go into a register. There's two big user groups of SDCC, one

05:46.040 --> 05:52.280
is embedded developers, firmware developers, firmware developers, and so on, and the other retro

05:52.280 --> 06:00.920
compute, the retro gaming people, well, that's 8-bit architectures, there too. Okay, so let's

06:00.920 --> 06:05.680
get to the lessons learned, cause I mean, I mean, as the CC developer, that's my background,

06:05.680 --> 06:12.000
I have a computer science education, and I have some ideas of computer architecture, but the

06:12.000 --> 06:19.040
work on SDCC really gives me an idea of what you need in a network, the architecture, what

06:19.040 --> 06:25.040
you want in a network architecture, so it makes a good target for high-level language, because

06:25.040 --> 06:29.200
well, you see, in the support lots of them, but none of them is kind of perfect, even

06:29.200 --> 06:36.440
so, a lot of them have very great features and a really good architecture, so I guess a very

06:36.440 --> 06:40.440
first point is probably the most important, you need efficient stack point of electricity

06:40.440 --> 06:45.520
by seeing, yeah. If you want to implement C, the site forward, where you have local variables

06:45.520 --> 06:51.680
on the stack, and you need to access, be able to access them and generate efficient codes

06:51.680 --> 06:56.680
for accessing the stack, and unfortunately a lot of these architectures don't have it, if

06:56.680 --> 07:00.800
you have it that 80, you can use an index register to set up as a frame point, that's kind

07:00.800 --> 07:05.080
of okay, you have the estimate, you have the stack point of electricity with dressing modes

07:05.080 --> 07:10.040
that you want, but for most of them you don't. Instead, you have to take the stack point,

07:10.040 --> 07:14.400
there are the noffs that put that into an index register, every time you accessing a variable

07:14.400 --> 07:21.760
inefficient. A unified address space makes things much more efficient. I mean, you're used

07:21.760 --> 07:25.360
to that, maybe, from the big architectures, but in the eight bit, well, there's more to

07:25.360 --> 07:32.160
live in. If you have an 80, 50 one, and you have a pointer in SDCC, and you just derive

07:32.160 --> 07:37.440
from inside the pointer, what we do is we look at the bits, and then call a helper form, function

07:37.440 --> 07:44.000
just for the read from memory via a pointer, because there's five different address bases

07:44.000 --> 07:48.520
in the hardware, and depending on which one the pointer pointer pointer you need to do the

07:48.520 --> 07:53.400
different instruction to read from memory, and that, of course, is a huge overhead for

07:53.400 --> 07:58.520
any pointer access, access does help, cause what you can keep an access does not go over to the

07:58.520 --> 08:05.160
memory, hardware multi-strading can be created for replacing peripherals. I mean, if you look

08:05.160 --> 08:12.720
at an SDM8, it's a very powerful device, but all the peripherals are 90% of them are not used

08:12.720 --> 08:17.080
in any given application, that's just that silicon, but they're going to have a multi-cost

08:17.080 --> 08:21.600
up, even if you have a time-critical peripheral, you can still emulate it in software, by having

08:21.600 --> 08:26.600
running it on an extra core, but if you want to do multi-strading, you need support in the architecture

08:26.600 --> 08:30.720
for that, you want to communicate with the user, the script, so you want something that

08:30.720 --> 08:37.240
allows you to, let's say, implement C at all, mix, and then they don't have it. Then we get

08:37.240 --> 08:41.080
to the irregular architectures, as I mentioned before, no two registers are the same, the

08:41.080 --> 08:45.240
few registers are then so on, so it doesn't work well with chantines, the graph-coloring

08:45.240 --> 08:49.720
application, but there's three top, the composition base registers, the allocators, graph-coloring

08:49.720 --> 08:55.920
batches, well, okay, that can handle it, especially for small number of registers, so it's

08:55.920 --> 09:01.360
possible to generate efficient code for these. We want to make sure of 8 and 6-in-bit

09:01.360 --> 09:07.760
operations, because 8-bit is what you use to conserve memory, 6-in-bit is what you use because

09:07.760 --> 09:12.960
well, in and point us as 16-bit, so you need to be able to support that well too, and

09:12.960 --> 09:19.520
here the point us should ideally be 16-bit, because if you have a wider address base, then

09:19.520 --> 09:23.840
you get into the realm where you have the arm and press C5, then other things, no point

09:23.840 --> 09:29.240
trying to compete with them anymore these days, if you have something, all those smaller,

09:29.240 --> 09:34.400
you just can address everything in a uniform way, especially your program memory, where you

09:34.400 --> 09:39.400
want to be able to keep constants, you don't want to have to copy constants to data memory

09:39.400 --> 09:46.760
first as a start-up, because that will cost data memory, that's valuable. Okay, let's get to

09:46.760 --> 09:52.280
some details from here, a lot of these days, with architectures, have something like call

09:52.280 --> 10:01.000
zero types of testing, where you say you have like 256 or 128 bytes of kind of scratchpad memory

10:01.000 --> 10:07.000
as I think is what you usually call it in computer architecture, and the point is, it's actually

10:07.000 --> 10:11.320
not as useful if you have at least deck pointer relative, if it's racing course, then you can do

10:11.320 --> 10:18.520
everything with your local variable on this deck efficiently, in the next pointer relative read instructions

10:18.520 --> 10:25.240
are important, because that allows you to go through chains of structs or access struct members

10:25.240 --> 10:32.360
efficiently, and the classic way of task architectures, namely having prefix bytes, it's an

10:32.440 --> 10:39.880
efficient decoding memory efficiently, encoding for your instructions, I mean, this idea architectures

10:39.880 --> 10:46.600
often are more efficient in terms of code size, then let's say risk files, risk compression instructions

10:46.600 --> 10:52.280
or these are some things, because many instructions are actually just 8 bits, and then there's

10:52.280 --> 10:57.720
variance with a prefix byte, that are in 16 bits to do something else, harder multiplication is useful,

10:57.800 --> 11:02.200
it's of course expensive, there have a multi player, but 8 times 8 to 60 is not that expensive,

11:02.200 --> 11:08.920
and it's very useful, not just for multiplications that you see in code, but also for things like

11:08.920 --> 11:14.760
array access, when you have an array of structs of non-power of two size, and you don't want to

11:14.760 --> 11:19.400
add padding to make it power of two, because you don't have a lot of memory, so you need multiplications

11:19.400 --> 11:26.840
for array accesses, division on the other hand, another hardware-wise expensive instructions relative,

11:27.480 --> 11:33.160
you can eat it out, multiply a letter instructions, help you build wider multiplications from the smaller

11:33.160 --> 11:39.320
ones, and interestingly, there are ancient binary code decimal, so port in the hardware,

11:40.360 --> 11:47.800
can actually be useful for converting from binary to ask you, and therefore for printing numbers,

11:47.800 --> 11:52.280
if you don't have hardware division, so if you don't have hardware division, if you do this

11:52.280 --> 11:57.000
of port, it's something that's cheap and actually still useful, and shift in hotels instructions

11:57.000 --> 12:02.280
of course are also useful, although for implementing bitfields, I mean bitfields often get a bad

12:02.280 --> 12:07.400
name, cause the compiler could add padding, and people just think of using it to make a fixed

12:07.400 --> 12:12.440
layout, but in these architectures using bitfields just to conserve memory is also major application,

12:12.440 --> 12:17.240
and then it's fine to let the compiler handle the details, you just want to make sure you don't

12:17.240 --> 12:22.680
waste any memory, you have only 60 bytes of RAM in your 8 core device, you don't want to

12:22.680 --> 12:31.160
waste a byte or two in your stocked, okay, so basically that's all about the introduction and

12:31.160 --> 12:39.400
the lessons learned, and actually now basically where do I get, yeah, an eight 16 bit mic

12:39.560 --> 12:47.880
architecture, irregular tisk architecture, the core will be a bit bigger than a typical risk,

12:47.880 --> 12:55.320
but it's worth it because, in such an encoding is more efficient, to illustrate that,

12:57.480 --> 13:02.440
this is a dual core 8 bit microcontroller, one of the things that cost only one cent from

13:02.520 --> 13:09.560
paddock, that I boiled in California and put under a microscope, this is not much more than a square

13:09.560 --> 13:17.800
millimeter, it's die, it's one of the biggest devices paddock makes, and we can see the regular

13:17.800 --> 13:22.840
structure on the lower right, plus it's the rounding stuff, that's the program memory, it's accuracy

13:22.840 --> 13:29.000
coders, so that takes up is the potential part of the whole die, so if you have an efficient encoding

13:29.080 --> 13:35.480
for your instruction, you can go with a smaller program memory, even if the core on the upper left

13:35.480 --> 13:48.280
gets a bit bigger, it's worth it, okay, on the other hand if you go to like very small devices like

13:48.280 --> 13:53.080
where you have only one, or maybe only half a kilometer of program memory anyway, you don't want

13:53.080 --> 14:00.280
on the big core, cause a few by itself in the program memory or less, so I have a sub variant of the

14:00.280 --> 14:11.160
F8 that is a bit simplified to make the code a bit smaller, that leaves out a few instructions,

14:11.160 --> 14:18.040
sure code size goes a bit up compared to the regular F8, but it's an option that you want to have.

14:18.040 --> 14:30.040
So, yeah, and I started work on this a bit and I got some support from an Lnet and the I

14:30.040 --> 14:36.200
Siro core program, so that allowed me to put quite some time into it, and well, we have a port in

14:36.200 --> 14:43.560
the compiler, so as current SDCC release from a week ago, the ports are F8, in the current

14:43.640 --> 14:51.240
F8 is not fully fixed, there might be some small changes in the instruction set, but current SDCC

14:51.240 --> 14:59.800
supports the current existing software implementations of the F8, I have implementation of F8 and F8 L,

14:59.800 --> 15:06.120
so SDCC then supports F8 L, that's a GitHub repository with a little bit of documentation

15:06.120 --> 15:13.240
of the architecture and tutorials to put the software onto the MFPGA boards, and for example,

15:13.240 --> 15:21.880
this thing here that counts up seconds from Siro to 31, that's running on an F8 software with

15:21.880 --> 15:27.480
the program is written in C, it uses interrupts and timers to do exactly the second counting,

15:28.280 --> 15:33.320
it's running on a Lnet as FPGA which is supported by free synthesis tools,

15:34.280 --> 15:39.960
and complex, more complex programs also work, it has three-stone, vector, and comma,

15:39.960 --> 15:47.720
benchmark, running the parser, self-test, and so on, I've also ported the F8 code, also works on

15:47.720 --> 15:55.320
the colonships FPGA, I've started porting it to go in, there's still some issue with the interrupts,

15:55.320 --> 16:01.560
might be because next pin also ports the experimental, but yeah, we have a working software,

16:02.520 --> 16:09.400
we have the port in the compiler, we don't have any AC go anything yet, and the F8 L sub variant

16:09.400 --> 16:14.360
is not yet supported, then STCC, but let's look at the current state of the F8.

16:15.320 --> 16:25.320
Thank you, everybody, questions?

16:27.880 --> 16:29.880
What are these two existing names?

16:29.880 --> 16:36.760
They look cool, F8 is an old third shield, this is the first micontroller I think.

16:37.560 --> 16:42.440
There was a fair trial, they've had it, one can call it microcontroller, but I

16:42.520 --> 16:49.880
don't think it was actually a single chip, but the capital of it was a capital F, but actually

16:49.880 --> 16:55.720
there's so many 8-bit architectures, it's not easy to choose any two or three character

16:56.600 --> 17:01.240
names that hasn't been in use for some 8-bit thing at some time, or ready.

17:02.200 --> 17:15.640
Yeah, there's lots of computer science courses that make up some sort of architecture for

17:15.640 --> 17:21.400
educational purposes for teaching assembly, you think this would be a super boring placement for dolls,

17:22.200 --> 17:24.200
such that it has actually all like use.

17:25.160 --> 17:32.040
I mean for teaching, it depends, I mean for teaching very often, you do a very simple architecture,

17:32.040 --> 17:40.360
yeah, you leave out details that you want like, there's a CISG architecture, that's of course

17:40.360 --> 17:46.600
a lot more complicated than a simple risk thing, of course if you want to introduce CISG

17:46.680 --> 17:53.720
in my work, but in the end, well, for real world, efficient 8-bit architecture, you need some things

17:53.720 --> 18:00.200
that you could do if you want to go in depth into this topic, but there's a quick introduction

18:00.200 --> 18:04.920
to computer architectures, you probably just want to leave them out and just get the basics there.

18:05.000 --> 18:09.640
Okay, thank you very much.

18:13.880 --> 18:18.840
Oh, one more question, do you have time for us?

18:20.840 --> 18:27.320
I intend to do that, but I don't have any concrete like timeline or the on the next step will be

18:27.320 --> 18:34.440
to have the F8L supported in the compiler, and this is on the simulator, because of course you can

18:34.520 --> 18:40.280
use a very lot to simulate, but that's low, so to do more simulation, compiler testing, we also have

18:40.280 --> 18:46.200
a different architectural simulator that currently only supports regular F8, but the F8L

18:46.200 --> 18:51.480
subset of course in F8L program is also valid F8 program, so it would just run, but you can't

18:51.480 --> 18:57.640
give on to detect invalid instructions in the simulator, so the next few months will definitely

18:57.640 --> 19:06.120
still be non-ASIC work, but after that, I intend to also look into possibility of ASIC implementations

19:06.120 --> 19:10.600
again, but of course you have to long to an around time, yeah, and you stop mid-year stuff to the

19:10.600 --> 19:15.400
hub, and then it takes like half a year or as a reward as of a year to get things back, until you can

19:15.400 --> 19:21.240
start your debugging, that's one point and we have this thing, some of these MPV startles or these

19:22.200 --> 19:29.480
have relatively small space, and you're sharing with others, so we have to see how this affects

19:29.480 --> 19:36.760
the possibilities, but in general, yes, I intend to try F8 on ASIC, okay, thank you very much,