WEBVTT

00:00.000 --> 00:08.000
OK, so welcome to our talk.

00:08.000 --> 00:10.000
I'm Ivan, and this is my colleague Michaela,

00:10.000 --> 00:13.000
and we are from the Leibniz Super Computing Center.

00:13.000 --> 00:17.000
So in this talk, we hope to give you knowledge

00:17.000 --> 00:20.000
about a very special software refactoring tool,

00:20.000 --> 00:23.000
which we kind of innovatively applied

00:23.000 --> 00:27.000
to wrap a sea library.

00:27.000 --> 00:31.000
So as you all probably know, in scientific software,

00:31.000 --> 00:34.000
we always have this need to interoperate

00:34.000 --> 00:38.000
with components written in different programming languages.

00:38.000 --> 00:42.000
So I've experienced this most personally writing PD solvers

00:42.000 --> 00:45.000
in Fortune, where I wanted to use a KD3 in C,

00:45.000 --> 00:48.000
and some system library called,

00:48.000 --> 00:52.000
and I've been wrapping these wrappers manually all the time

00:52.000 --> 00:54.000
in sleepless nights.

00:54.000 --> 01:00.000
So as a HPC application support person,

01:00.000 --> 01:03.000
I see the same problem in the projects I support,

01:03.000 --> 01:06.000
where people may have a finite element framework

01:06.000 --> 01:11.000
in C++, but they want to call a special software in Fortune,

01:11.000 --> 01:13.000
and then one of the students, of course,

01:13.000 --> 01:16.000
wants to do scripting and analysis from Python,

01:16.000 --> 01:19.000
and it always creates this need to interoperate.

01:20.000 --> 01:23.000
So for the sake of this talk, let's pretend

01:23.000 --> 01:27.000
that we want to wrap the function lead DJM

01:27.000 --> 01:32.000
from the Blitz library, which is a linear algebra library

01:32.000 --> 01:35.000
I believe written at the University of Texas,

01:35.000 --> 01:38.000
and this is the C prototype for this function.

01:38.000 --> 01:42.000
And for me to be able to call this function as a Fortune programmer,

01:42.000 --> 01:44.000
I have to write this interface block.

01:45.000 --> 01:48.000
And this is really a kind of, let's say,

01:48.000 --> 01:51.000
one-to-one mapping to a different grammar,

01:51.000 --> 01:55.000
but it's still tedious to write these procedures with many parameters,

01:55.000 --> 02:02.000
and libraries like Blitz, which is an implementation of a full blast library,

02:02.000 --> 02:04.000
it has many symbols.

02:04.000 --> 02:08.000
And in fact, it introduces even more symbols.

02:08.000 --> 02:12.000
And of course, all these symbols are provided in four different numerical types,

02:13.000 --> 02:16.000
plus basic and expert versions, plus utility functions.

02:16.000 --> 02:19.000
So this really grows overwhelming,

02:19.000 --> 02:24.000
and one has libraries with over 600 procedures.

02:24.000 --> 02:27.000
So the source of truth is obviously the header,

02:27.000 --> 02:30.000
which is pretty bulky with 30,000 lines,

02:30.000 --> 02:32.000
and makes heavy use of macros,

02:32.000 --> 02:35.000
and the code is really impenetrable.

02:35.000 --> 02:40.000
And one could try to use some ad hoc scripts with regax,

02:40.000 --> 02:44.000
or a oc, but it quickly kind of goes overboard.

02:44.000 --> 02:48.000
So I was telling Michaela about the problems I have.

02:48.000 --> 02:51.000
And luckily, he had the perfect tool for me to use.

02:51.000 --> 02:53.000
So Michaela.

03:01.000 --> 03:06.000
So coxinella is the name of the software we are using for this,

03:06.000 --> 03:12.000
which is a semantic matching and patching engine.

03:12.000 --> 03:15.000
Coxinella means in French ladybug,

03:15.000 --> 03:19.000
and ladybugs are not all used for attacking other specific other bugs,

03:19.000 --> 03:21.000
which they don't like.

03:21.000 --> 03:25.000
And yeah, it's used for the Linux kernel for besaging

03:25.000 --> 03:30.000
but programming practices, let's say,

03:30.000 --> 03:34.000
and migrating kernel, sorry,

03:34.000 --> 03:37.000
driver modules to new kernel versions.

03:37.000 --> 03:41.000
So in other words, for big transformations of C code.

03:41.000 --> 03:46.000
In our case, the big amounts of C code happen in headers,

03:46.000 --> 03:49.000
or in the talk to of today, in this big headers,

03:49.000 --> 03:51.000
which we are interested in to processing,

03:51.000 --> 03:54.000
our characteristic of coxinella is that it does,

03:54.000 --> 03:58.000
it's pretty agnostic to white space, new lines, and so on.

03:58.000 --> 04:01.000
So all of that doesn't matter really,

04:01.000 --> 04:05.000
and what happens is that abstract syntax tree is being created.

04:05.000 --> 04:10.000
So fully fledged parses parser goes through the code

04:10.000 --> 04:15.000
and creates an representation of the C code

04:15.000 --> 04:18.000
and composing all of the elements,

04:18.000 --> 04:21.000
which occur in our program.

04:21.000 --> 04:25.000
In the case of a sequence of declarations,

04:25.000 --> 04:30.000
identifiers are being recognized.

04:30.000 --> 04:35.000
Parameter lists in a context of, for instance,

04:35.000 --> 04:39.000
here in this rule here of void functions.

04:39.000 --> 04:43.000
So here we have a rule which matches void functions,

04:43.000 --> 04:48.000
with whatever number of parameters, as long as the identifiers

04:48.000 --> 04:50.000
begin with bleed underscore.

04:50.000 --> 04:54.000
So this will apply this sort of filtering rule,

04:54.000 --> 04:58.000
and with this complete short filtering rule,

04:58.000 --> 05:01.000
we write maybe something a bit longer,

05:01.000 --> 05:04.000
which is a bit more ad hoc you could say,

05:04.000 --> 05:10.000
but it can be pretty elegant as long as you have a clear idea

05:10.000 --> 05:12.000
of what you want to achieve.

05:12.000 --> 05:16.000
So in this case here we have a coxinella rule,

05:16.000 --> 05:19.000
which contains a Python script,

05:19.000 --> 05:24.000
which gets all of what has been parsed in the rule before,

05:24.000 --> 05:26.000
which is being reused here.

05:26.000 --> 05:29.000
The name of the earlier rule, which was match,

05:29.000 --> 05:32.000
we used entities from there, we inherited them,

05:32.000 --> 05:34.000
and we used them here in Python.

05:34.000 --> 05:37.000
So what we got from the abstract syntax tree,

05:37.000 --> 05:39.000
we reused them in Python.

05:39.000 --> 05:43.000
Here we omit maybe a dozen,

05:43.000 --> 05:45.000
maybe a hundred lines, why a hundred,

05:45.000 --> 05:47.000
because there is a lot of types,

05:47.000 --> 05:50.000
we want, which we want to translate into four turn types,

05:50.000 --> 05:52.000
and that is outside of the scope of coxinella,

05:52.000 --> 05:55.000
but we solve with dictionary of in Python.

05:55.000 --> 05:57.000
So yeah, that way.

05:57.000 --> 06:01.000
In the end we end up printing a bunch of variables,

06:01.000 --> 06:05.000
interface, subroutine, interface declarations,

06:05.000 --> 06:10.000
which is what force needs to get this right.

06:10.000 --> 06:13.000
This is another example where the difference is that,

06:13.000 --> 06:15.000
yeah, we are parsing another,

06:15.000 --> 06:18.000
we are seeking a different signature type.

06:18.000 --> 06:21.000
We could be much more specific here on the signature,

06:21.000 --> 06:24.000
but if others along us here we are general,

06:25.000 --> 06:27.000
and our script is general, yeah,

06:27.000 --> 06:30.000
we can have any kind of translation,

06:30.000 --> 06:34.000
as long as we well know how to do with the limitations.

06:34.000 --> 06:37.000
So if we are quite selective,

06:37.000 --> 06:39.000
we would translate,

06:39.000 --> 06:41.000
and we don't care about unions,

06:41.000 --> 06:44.000
because there is no union in Fortran,

06:44.000 --> 06:47.000
and we kind of also ignore callback functions from C,

06:47.000 --> 06:51.000
which is not so clear how to deal them in Fortran,

06:51.000 --> 06:52.000
in other words.

06:52.000 --> 06:56.000
If we are content with the correspondence,

06:56.000 --> 07:00.000
with the missing full correspondence of C to Fortran,

07:00.000 --> 07:02.000
it's fine.

07:02.000 --> 07:05.000
The point is that we use coxinella,

07:05.000 --> 07:09.000
and we want to use coxinella for large scale code transformations.

07:09.000 --> 07:12.000
So transformations where we would be overwhelmed

07:12.000 --> 07:14.000
of doing this by hand,

07:14.000 --> 07:17.000
and of which it would be too much error prone

07:17.000 --> 07:21.000
of doing this by a set of unimate script.

07:21.000 --> 07:26.000
So we reuse here the abstract syntax three manipulation

07:26.000 --> 07:28.000
opportunities of coxinella,

07:28.000 --> 07:31.000
and even has put here the code,

07:31.000 --> 07:35.000
the full code behind this presentation.

07:35.000 --> 07:38.000
This is on this QR behind this QR code,

07:38.000 --> 07:40.000
should be that URL.

07:40.000 --> 07:43.000
We have been sponsored by this project,

07:43.000 --> 07:48.000
and I have given a last week our new training

07:48.000 --> 07:51.000
for introducing coxinella in one day.

07:51.000 --> 07:54.000
We will be repeating,

07:54.000 --> 07:56.000
offering such a training at the Lebanese,

07:56.000 --> 07:59.000
Reconcentrum and Munich this year.

07:59.000 --> 08:01.000
We don't know yet, but we will.

08:01.000 --> 08:06.000
This cheat sheet here on the right is completely new,

08:06.000 --> 08:08.000
so yeah,

08:08.000 --> 08:09.000
it didn't exist.

08:09.000 --> 08:12.000
I did not aware of cheat sheets for coxinella earlier.

08:12.000 --> 08:15.000
We can enjoy this once you learn coxinella

08:15.000 --> 08:17.000
to not to,

08:17.000 --> 08:19.000
to recall important facts.

08:19.000 --> 08:21.000
This is the coxinella website,

08:21.000 --> 08:22.000
and this is the cheat sheet,

08:22.000 --> 08:24.000
and this is the training of the site.

08:24.000 --> 08:26.000
And I think this is it.

