WEBVTT

00:00.000 --> 00:15.440
1, hello and welcome to this session on keeping up with the SP. You already know Ahmed, he presented

00:15.440 --> 00:21.380
the writing talk earlier. My name is Sumit Simbar and I am the senior technical lead for Android

00:21.380 --> 00:28.160
at Linnaro. RTM at Linnaro works on all things Android. So, before we begin just a trademarks

00:28.160 --> 00:36.280
disclaimer and so, what's the agenda? Well, it's simple. I will talk about some of the

00:36.280 --> 00:40.640
many new features in ASP in the recent past and then Ahmed will talk about the enablement

00:40.640 --> 00:50.120
of these along with some gotchas and pain points that were there. A quick recap on Linnaro

00:50.120 --> 00:56.120
and development boards and ASP. So, I hope some of you may know about Linnaro and the work

00:56.120 --> 01:01.160
that we do in the open source ecosystem. We have been a very long time advocate of ASP and

01:01.160 --> 01:08.080
developing boards in it. We support ASP on a variety of member debboards. We do extensive testing

01:08.080 --> 01:12.280
including CTS, BTS, BTS combinations and these we report and fix ASP and upstream

01:12.280 --> 01:18.200
regulations. So, that's why these debboards are relevant and important for most of us and

01:18.200 --> 01:23.880
the work that we do. For years we have been talking about the importance of debboards in ASP at

01:23.880 --> 01:32.600
different forums like plumbers, ELC, Linnaro, Connectet, etc. And at the ESS in mid-23 I

01:32.600 --> 01:37.960
may talk about the pain points and rewards of maintaining debboards in ASP. So, we will look

01:37.960 --> 01:44.760
at some of the new features that have landed in ASP since then. Let's start with a boot

01:44.760 --> 01:50.640
process. So, Android boot process like many other things is complicated and requires vendors

01:50.720 --> 01:57.040
to implement almost annually changing boot requirements. The developers recognize that and as

01:57.040 --> 02:02.480
a result, they have obstructed the Android systems of the boot process in an Android boot for

02:02.480 --> 02:08.400
UFI application which they call generic bootloader or GBL. More details on the link there.

02:08.400 --> 02:15.040
But the idea is that any bootloader that supports running an EFI app and provides some

02:15.040 --> 02:22.800
basic requirements should be able to have an Android enabled bootloader part. There are various EFI

02:22.800 --> 02:28.400
protocols that have been implemented and proposed. So, if you are interested, go and have a

02:28.400 --> 02:36.000
look at those links there. Also related to the boot process previously on Android systems,

02:36.880 --> 02:43.200
the specification of boot devices was via a platform specific string. This proved to be very

02:43.280 --> 02:50.160
non-stable because things like success paths can change and suddenly your platform specific

02:50.160 --> 02:56.480
string doesn't work. So, across kind of upgrades. To help with that, support was added recently

02:56.480 --> 03:03.040
to system code to allow searching for boot devices via UIDs and this allows making booting

03:03.040 --> 03:12.160
from external device allows much easier. For me, this has been an exciting change in ASP

03:12.240 --> 03:18.480
recent times. The support of 16k page sizes in Android starting from Android 15.

03:19.760 --> 03:25.520
This results in a five to 10% performance boost and about 9% cost on the space site.

03:26.480 --> 03:32.480
How it is achieved is with a page-agnostic OS view, which includes doing runtime

03:32.480 --> 03:39.360
page size calculation, OS minorities are aligned to 16k page sizes etc. And then with that,

03:39.360 --> 03:45.920
once an application is updated to be page-agnostic, then it can run without change on a 4k or a 16k

03:45.920 --> 03:51.840
page size supporting device. Now, the most optimal way to work with file systems is if the

03:51.840 --> 03:57.200
file system block size was also the same as page size. So, file systems like Erofers and FTFers

03:57.200 --> 04:05.680
have been made 16k block size compatible. The next relevant feature is something that is more

04:05.680 --> 04:11.200
relevant for folks that keep their device drivers while they are working on upstreaming them

04:11.200 --> 04:17.680
out of tree. So, this is called the driver development kit and it helps vendors to develop

04:17.680 --> 04:24.080
GK modules against the GK builds using the right tools like hermetic beetles and

04:24.080 --> 04:28.960
correct resources like headers and make files. This has them keeping the vendor modules

04:28.960 --> 04:38.000
easy to manage with GKI while they are working on upstreaming them. When we work with GKI,

04:39.600 --> 04:45.280
there are upstream kernel modules that are already upstream and but need to be built as modules.

04:45.280 --> 04:52.720
So, those are signed and made immutable during fees on the KMI and then they can't be over it

04:52.800 --> 05:01.360
in by vendors. So, basically if you say take Android 6.12 kernels, then the upstream kernel

05:01.360 --> 05:07.760
modules are not mutable by the vendors after they have been upstream. These are called protected

05:07.760 --> 05:13.680
modules and sometimes what might happen is vendors may have pushed out a basic support for a

05:13.680 --> 05:19.280
kernel module and later on added new features. But if they are within the KMI free time,

05:19.360 --> 05:24.960
then as protected modules, they cannot be over it. To allow these new features to be available

05:24.960 --> 05:30.640
in existing products, for example, during an update, GKI provides the option of unprotected GKI

05:30.640 --> 05:37.120
modules. So, they can backport these newer features that are upstream and still use in an updated

05:37.120 --> 05:43.040
in an update for a device that comes later on. These are called unprotected GKI modules,

05:43.040 --> 05:48.320
but the catch is that treated as vendor or out of tree modules and have the same restrictions as

05:48.400 --> 05:58.320
KMI symbols for KMI symbols. Talking about these, how do we make sure that this atomic update

05:58.320 --> 06:04.000
happens? So, there is protected, there is a thing called DLKM or dynamically loadable kernel

06:04.000 --> 06:10.720
module partitions in the Android system, which allows updating GKI and GKI modules independently

06:10.720 --> 06:16.320
of the rest of the partitions. And the protected modules then go in something called system DLKM

06:16.320 --> 06:24.000
while the vendor and unprotected modules are put in vendor DLKM partitions. A related thing

06:24.000 --> 06:28.640
around modules in module loading is parallel code in module loading, which has the name suggests

06:28.640 --> 06:34.160
allows multiple independent modules kernel modules to be loaded in parallel. And a driver can

06:34.160 --> 06:42.000
advertise its participation availability for parallel loading by setting a probe type of its

06:42.080 --> 06:47.440
platform driver to prefer asynchronous mod details on the link here.

06:50.240 --> 06:56.000
You may be aware of the Android shared memory or Ashman, which was an Android specific

06:56.000 --> 07:01.120
file descriptor based shared memory. Now, upstream has something called MFT, which provides pretty much

07:01.120 --> 07:08.000
the same functionality and to allow to use that with Android with ASP and Ashman compatibility

07:08.080 --> 07:15.200
layer for MFT was added, which allows your older apps to still be able to use an upstream

07:15.200 --> 07:22.960
MFT FD. As new GKI kernel branches are tagged, there may be some issues that are found

07:22.960 --> 07:29.600
during these merges to Android common kernel branches. And so, they have started the developers

07:29.600 --> 07:35.680
at Google have started keeping these eratos public. And this is an example of the latest

07:35.680 --> 07:39.200
under 16, 16, 16, 12 kernel code based and issues they found while merging.

07:42.480 --> 07:48.880
This is also something new and it's been talked about at various conferences. So, virtualization

07:48.880 --> 07:56.000
and Android. So, AVF or PKVM, there is an overview link there. I can just talk briefly about it.

07:56.000 --> 08:01.120
So, Android virtualization framework, it provides a secure and private execution environment for

08:02.080 --> 08:08.320
executing code in Android. PKVM is the KVM hypervisor, but further hardened to disallow host

08:08.320 --> 08:15.200
from accessing guest memory by default. And Google's implementation of AVF uses PKVM as the hypervisor

08:16.640 --> 08:22.240
cross VM as the VMM building block. And then, various services are implemented on top of this.

08:22.240 --> 08:27.680
So, this allows you to run, for example, you can run Debian in a VM on an Android device today.

08:28.640 --> 08:37.120
Recently, a new hypervisor driver is being upstream to support SMMV3 in PKVM. And there are also

08:37.120 --> 08:44.240
various how do's in case you are interested in experimenting to how to use custom virtual machines,

08:44.240 --> 08:49.840
updateable virtual machines and device assignment in AVF. You could go to the link and have

08:49.840 --> 08:55.520
look at that. There is a full presentation next by Serban on Transstable. So, I will just

08:55.600 --> 09:00.480
give a one-line introduction. There is a new development process for Android, which is called Transstable.

09:02.320 --> 09:11.440
And with that, I hand over to Ameth, who is going to take you for the rest of the presentation.

09:12.400 --> 09:26.400
Hello. So, in this part of the session, we will look into the

09:27.680 --> 09:33.760
technicalities and the realities of ASP development from a device maintainer and developer point of view.

09:34.560 --> 09:40.400
Read the device, part of ASP or the device being hosted on the extended repositories.

09:40.480 --> 09:46.160
We will cover both the cases. I will share the common pitfalls that we encounter while working on

09:46.160 --> 09:53.360
ASP, the struggles, the failures, the success stories. I will talk about the set of ASP

09:53.360 --> 09:58.720
breakages, set of stream breakages and our experiments with some of the new features that

09:58.720 --> 10:06.480
Sumith has just talked about. So, this is what the ASP reference board page looks like as of today.

10:07.360 --> 10:14.240
These reference boards are the hardware platforms that are used as reference for developing and testing

10:14.240 --> 10:22.320
Android open source project. These reference projects provide a starting point for the device

10:22.320 --> 10:34.480
manufacturers to build their own device. So, one of the most important aspect of supporting a

10:34.480 --> 10:41.360
big device, supporting a device in a big project like ASP, is to keep up with always

10:41.360 --> 10:47.120
churning changes. In this slide, I will share the common pitfalls that we encounter while

10:47.120 --> 10:53.920
working on ASP. I will start with the most common one which is related to the ASP reposing.

10:55.120 --> 11:01.200
At times we run into random failure, runtime failures which are totally or which could be totally

11:01.200 --> 11:06.720
unrelated to the project that we have been working on. So, this is most likely related to the

11:06.720 --> 11:13.360
ASP reposing timing. So, it is advisable to re-sync the sources after some time or wait for

11:13.360 --> 11:20.480
the internal pre-submits to catch and fix the regulations internally. It is the problem persists,

11:20.480 --> 11:26.560
then you know that it may be one or more device configuration that you may need to change based

11:26.560 --> 11:32.400
on the new changes which you have just downloaded. And then there are some core frameworks

11:32.400 --> 11:40.160
specific changes that need corresponding device configuration changes. The examples which I

11:40.160 --> 11:46.640
can think of is the transition from HIDL to AIDL. Another example which we run into

11:47.520 --> 11:53.840
from time back is that the lib hardware UI will come manager. There are few changes which assume

11:53.840 --> 12:01.840
that the underlying Vulcan backend supports two or more cues for the full Vulcan framework to

12:01.840 --> 12:08.960
work. But it is not true for all the Vulcan drivers at least some drivers which we have played with.

12:09.600 --> 12:16.000
So, we could not switch to the full Vulcan support and we fall back to using OpenGL backend for that.

12:17.760 --> 12:23.360
Keeping up with the average changing build configuration is something which we need to look out for.

12:23.600 --> 12:28.960
If you maintain external projects then make sure that the soon properties or the soon

12:31.360 --> 12:39.200
just to cut for the soon changes. I know because I ran into a build failure and it took me a while

12:39.200 --> 12:43.200
to figure out that it is not the codebase which I am using it is something which has changed

12:43.200 --> 12:50.720
certainly in the core framework that I took care of. The design migration seemed to be so it was a

12:50.800 --> 12:57.120
big hype like two years ago three years ago that everything will move to visual build systems because

12:57.120 --> 13:04.800
they wanted to move to more hermetic styles of hermetic style of building. But as whatever I have

13:04.800 --> 13:15.520
seen so far the move to bezel is being limited to just GKI and GBL. The next up is tracking upcoming

13:15.520 --> 13:23.120
years features as a new features keep getting added in a sp the device maintenance had to take

13:23.120 --> 13:28.960
care of that. They do not necessarily have to implement everything but it is good to have the

13:28.960 --> 13:38.400
core features or the important bits implemented if not all. I will start with GBL so we talked about

13:38.400 --> 13:45.280
the GBL we were actually looking into GBL since the inception time we provided feedback we were

13:45.280 --> 13:53.600
testing the half-baked things and everything because we have been picked by legacy bootloaders a lot.

13:53.600 --> 14:02.080
If you are working on an older device you may be aware maybe you are lucky but at least in our

14:02.080 --> 14:09.280
cases our bootloaders did not get updates as much as they would love to right and though it was

14:09.280 --> 14:14.240
keep changing bootloader requirements year after year and starting from bootloader had a version

14:14.240 --> 14:21.040
two three four then add the init boot it seemed like for every solution the problem the solution

14:21.040 --> 14:29.280
was to add one more partition. So again so as Smith mentioned GBL is the generic bootloader

14:29.760 --> 14:36.160
bootflow GFI app which decouples Android specific components from the vendor implementation.

14:37.120 --> 14:43.520
It is very promising it is going to help a lot of devices and small audience to move away from

14:43.520 --> 14:52.640
vendor move away from locked but locked implementations to the latest Android bootloader features.

14:53.920 --> 14:59.120
Once you found out that the GBL is in a relatively stable state we enable that for a farer

14:59.120 --> 15:05.040
forward devices in a USB right so the link is there if you can just click on that you will figure

15:05.040 --> 15:09.920
out the how to enable the configurations which you need and I will not go into much details in that.

15:11.120 --> 15:17.120
Second up error 16 k page size bills we also error 16 k page size bill support for all of our

15:17.120 --> 15:27.120
devices. For those who are not aware of this 16 k page sizes do not support 32 bit apps.

15:27.120 --> 15:32.560
At least the source code which is in errors I know that vendors will find a way to put some kind of

15:33.360 --> 15:39.120
layers in between so that you can run through a bit apps but if you are product is based on a USB

15:39.120 --> 15:44.960
if you are playing with a USB then 16 k is going to support only 64 bit apps.

15:46.720 --> 15:53.760
Okay so we also ran into a bootloader crash while trying to enable 16 k page sizes it turned out that

15:53.760 --> 16:01.440
our bootloader is only 4 k kept for was only 4 k capable and luckily if you were not the only ones

16:01.520 --> 16:08.080
and as you were discussing earlier I have quick graphs in the build source code we could figure

16:08.080 --> 16:13.520
out that okay there is a flag if it just set it you can just boot with if you can set it to

16:13.520 --> 16:19.440
2009 6 it will take care of the image loading part and your bootloader do need to worry about

16:19.440 --> 16:25.200
the user space build whether it is 4 k 16 k or something else.

16:26.000 --> 16:33.360
Right so recently there are few again few more system core changes which caught our attention.

16:34.400 --> 16:38.800
These changes simplify booting and write from external storage devices.

16:38.800 --> 16:45.840
If you are booting from USB disk I mean if you have tried it earlier it was a pain I mean you

16:45.840 --> 16:53.280
cannot do it without changing the system core framework maybe you can but I at least I could not do that

16:54.080 --> 17:00.880
so now the solution is now okay so the existing solution is that there is a canal command line

17:00.880 --> 17:07.200
you must be aware of android boot device where you specify the ccfs entry it will look for that part

17:07.200 --> 17:13.680
it will again backtrack very backtrace and it will find the block device which is used to boot and

17:13.760 --> 17:24.560
right from it works good but what if you have an internal UFS device internal UFS storage and internal

17:24.560 --> 17:29.680
EMS storage but you are trying to boot from external devices what happens is that the platform

17:29.680 --> 17:35.280
devices gets an emulated first so android first agent will find the device it will say okay this

17:35.280 --> 17:42.880
is the boot device but it is because the system as entry is different because just set it for USB not

17:42.960 --> 17:49.200
for the platform device so it will not boot it will just get stuck and it will time out now this new

17:49.200 --> 18:01.440
boot hack what it does is you can specify a partition UID as an argument to this android boot

18:01.440 --> 18:06.880
that bootpad UID it need not to it need not be any boot partition it could be just any partition

18:07.040 --> 18:13.680
from that boot device the logic is that it will trace back the boot device out of that and it

18:13.680 --> 18:21.280
will make sure that it can boot from that external storage another thing here is that if you have

18:21.280 --> 18:29.200
encountered this or not you cannot boot with multiple block devices which have this android

18:29.200 --> 18:35.120
partition layout for example I have an android device the internal partition is at android partition

18:35.200 --> 18:41.600
layout I have plugged in a USB disk or let's say an NCSD card it again has a USB partition layout

18:42.880 --> 18:49.520
GBL will trip on that I forgot to mention that just GBL will trip on that even android

18:50.320 --> 18:54.960
will go crazy that okay what is happening I have no idea why there are multiple duplicate

18:54.960 --> 19:02.240
partition of super or user data so now this approach it fixes that so that's I think this is one of

19:02.240 --> 19:11.360
the main headlines of using this main the important point of using this partition new ideas that

19:11.360 --> 19:19.520
we do this then booting from external storage it's going to be a little spin sorry I exaggerated a lot

19:19.600 --> 19:28.480
on this because I have run into a lot of issues just because of this thing alone and I'm

19:28.480 --> 19:38.080
really glad that the premium guys fixed it in a USB. The next one is the Ashmem recently saw

19:38.080 --> 19:46.000
I renewed effort from the USB guys and they're moving away I hint that they're now finally

19:46.000 --> 19:51.840
moving away from Ashmem Ashmem is the shared memory driver it has out of three driver since five

19:51.840 --> 19:59.440
dot 18 I think but it's still maintained in android common can well because there are some

19:59.440 --> 20:07.360
legacy applications which depend on Ashmem I have tools so this compact patches which I'm talking

20:07.360 --> 20:13.920
about they make sure that you can switch to memory of the interface which is already upstream and

20:14.720 --> 20:21.680
those compact hotels will take care of running those legacy applications which depend on Ashmem

20:21.680 --> 20:34.720
I have tools right so checking relevant of stream projects maintaining a USB underboards is

20:34.720 --> 20:40.720
like maintaining a device on a regular next this row so we have to make sure that the air stream

20:40.720 --> 20:47.200
modifications that should be do a specific modifications or new additions which we do it should

20:47.200 --> 20:53.360
align or I hope that it aligns with the upstream code base for this session I have listed down

20:53.920 --> 21:00.080
the usual projects like the Linux kernel you bootboard loader the Mesa afraideno projects but

21:00.080 --> 21:05.040
the list can go on and on depending on the kind of features that are being supported on your device

21:05.360 --> 21:13.680
I'll start with the frequent help list down few breakages or few pain points which we encounter

21:14.240 --> 21:19.600
the occasional upstream as I build breakages it used to be a lot thankfully we don't catch

21:20.640 --> 21:24.560
new ones or there are no new regression maybe

21:25.200 --> 21:35.920
I'm talking about the building upstream Mesa project and then right so the GFX stream so this is again

21:35.920 --> 21:42.720
a point graphics stream is the I don't know if it's a graphics pipeline it's a graphics

21:42.720 --> 21:50.480
subsystem these are the changes which help in your virtualized displays right so the graphics

21:50.640 --> 21:55.600
stream changes whatever they have done in AOSP these are cross platform changes right

21:55.600 --> 22:01.120
which are very bad you cannot have a dependency of one project on a different project especially

22:01.120 --> 22:07.680
when the second project is an external project like Mesa right so we ran into a lot of

22:07.680 --> 22:15.760
bell issues because of that so we end up having hooks in between and I hope that no one oils

22:15.840 --> 22:23.520
have to deal with this sorry then as then our Linux breakages there are few Linux breakages which

22:23.520 --> 22:31.120
are specific to Android common kernel we track core Android common patches like Ashman

22:32.480 --> 22:40.960
or LFS for a debut amount and DM default key for metadata encryption you make sure that

22:40.960 --> 22:55.200
these patches don't break on almost every much window as this FS nodes as everyone I mean I'm

22:55.200 --> 23:01.920
hoping so it's a FS entries as this FS nodes are not stable interfaces right so kernel

23:01.920 --> 23:06.640
developers have made it clear from day one that okay you can use this FS but don't rely on it

23:06.640 --> 23:13.920
because it's not a stable API Android however uses SSFS a lot right as Linux access control is

23:13.920 --> 23:19.120
based on your SSFS part so if you just change anything if you move from one kernel version to another

23:19.120 --> 23:30.080
there's a chance that SSFS may not sorry as the Linux may not like it another thing is that if

23:30.080 --> 23:37.360
you cannot reproduce ASP bug on a regular Linux distro then it is very hard to sell right so

23:37.360 --> 23:43.920
there are some times they can be very fixed on that okay I cannot I don't see this issue on

23:46.640 --> 23:54.880
my system or on my Linux distro so is it even a valid bug? Next up is the firmware bug

23:54.880 --> 24:01.600
we sometimes run into issues because of firmware if it's our OEM sign firmware then it's worse

24:01.600 --> 24:07.120
because then you know that you're completely logged on that recently there's the spinner quite

24:07.120 --> 24:16.160
search in the LTS packages due to auto select cherry picks but in the end we get to see the

24:16.160 --> 24:22.800
benefits of all the hustle and bug fixes and things when the Android release boots on your device

24:22.880 --> 24:30.480
with no additional changes well mostly because they've been making sure that our devices stay

24:30.480 --> 24:38.960
in sync with ASP code base throughout the development cycle here next up I'm going to talk about

24:38.960 --> 24:44.720
maintaining so I've talked about the devices which you maintain in ASP the futures we enable it

24:44.720 --> 24:50.560
get them with this how about the devices which devices which are not part of ASP right so I'm going to

24:50.560 --> 24:56.320
talk about maintaining our running ASP on the development devices which are on external repositories

24:56.320 --> 25:01.920
on your personal repository on your local stations and by the way if you are not if you are in

25:01.920 --> 25:07.680
ASP developer which I'm assuming that most of us here are and if you have not already joined

25:07.680 --> 25:15.280
this new ASP system's developer community yet then please do we believe that this open collaborative

25:15.440 --> 25:21.200
initiative will be instrumental in information sharing and code-alloping genics of their solutions

25:22.160 --> 25:28.000
and it would be beneficial for the entire ASP ecosystem this is a snapshot from the website and it's

25:28.000 --> 25:36.960
this handful of devices and the projects that run ASP reliably now coming back to the main topic of

25:36.960 --> 25:42.960
running ASP on external devices all the pain points that we discussed earlier this still holds for

25:42.960 --> 25:48.400
these as well but the process I think is a bit relaxed you don't need to stick to the GKI if you

25:48.400 --> 25:55.280
are okay with having a standard the funfic build based best build you can just stay with that

25:56.800 --> 26:03.520
again some perks of running ASP on an external device is there is no obligation to run or support

26:03.520 --> 26:10.720
ASP main development branch you can run a more stable GSI release branch and go on with the project

26:11.600 --> 26:17.360
you can set customs and rules in your local manifest and that will save a lot of download time

26:17.360 --> 26:24.800
as well as data usage you don't need to worry about breaking other ASP targets or breaking internal

26:24.800 --> 26:35.360
targets it could be anything you need to worry about namespace conflicts because if you have

26:36.000 --> 26:42.240
because you don't care about other projects like in ASP

26:43.280 --> 26:49.680
you can skip this part side use these so we use these devices which we don't host in ASP

26:51.280 --> 26:58.240
and I Android play ground or for our proof of concepts speaking of using the words for

26:58.240 --> 27:03.920
ASP play ground and proof of concepts I'm plugging in the the words for Android projects

27:03.920 --> 27:11.440
which you maintain we maintain a few devices few built targets outside of ASP we use it for our

27:11.440 --> 27:17.200
little experiments like running with the stream canal no Android changes see if you can reproduce

27:17.200 --> 27:25.840
the burlare if you can do all sorts of things you can play where they are all the things which

27:25.840 --> 27:29.040
we have which I talked about earlier right the we came as thing the software entering bits

27:30.000 --> 27:35.680
we can just experiment on all these things you can try the unified boot images and all those

27:35.680 --> 27:41.680
experiments we use this workspace as a staging area for devices or features before pushing them

27:41.680 --> 27:48.720
upstream that brings us to the end of the session the summary or the key takeaway from this

27:48.720 --> 27:54.800
session today is that keeping up with the ASP is hard let's work together to engage on comments

27:54.800 --> 28:01.520
of their solutions and join the ASP developer community and help us improve the ASP ecosystem

