WEBVTT

00:00.000 --> 00:14.080
We start with the next presentation about the flat land framework, so it's still about

00:14.080 --> 00:22.280
the railways, but all of us will present really interesting approach into open innovation

00:22.280 --> 00:25.000
as well and all of us need to open source.

00:25.880 --> 00:30.520
Thank you very much. Welcome also from my side and very happy to be here and talk about the flat

00:30.520 --> 00:38.600
land framework initiative that originally started at SBB, but now evolved into its own association

00:38.600 --> 00:45.560
based in Burns, Switzerland. A little bit about Swiss Railway network, these numbers are

00:45.560 --> 00:50.920
only from SBB, but SBB is the biggest operator, so it gives you an idea about Swiss Railway

00:51.000 --> 01:02.600
network. There are about 3,300 kilometers of track in Switzerland, there are about 11,000 train runs

01:02.600 --> 01:15.080
per day, passenger train runs, about 1.3 million passengers transported every day, about 175,000 tons

01:15.160 --> 01:23.080
of freight, and a punctuality of about 92.5% precise in Switzerland.

01:29.080 --> 01:32.680
Storm has to be his open data portal, I guess they're correct.

01:38.680 --> 01:43.080
So just to give you an idea, a railway network is quite complex, especially if you have a

01:43.080 --> 01:49.720
dense schedule, also the demand might increase, so there might be also demand or requirements for

01:49.720 --> 01:58.520
more even denser schedule, and so yeah, this poses a lot of challenges for the railway network.

01:59.720 --> 02:05.080
You can also imagine if you have something like a train that is delayed or even versed,

02:05.080 --> 02:11.720
a breakdown that takes a long time to fix, somebody needs to take care of these trains that are

02:11.800 --> 02:18.280
already underway, on one hand to minimize delay, minimize impact, but also to come back to a nominal

02:18.280 --> 02:30.520
state as soon as possible. So this is done by human dispatchers, they navigate or they direct

02:30.520 --> 02:37.080
the trains, they don't drive the trains, but they help, and which routes they have to take,

02:37.080 --> 02:44.440
they have to reschedule to redispatch. So there's already a lot of research in operations research,

02:44.440 --> 02:54.520
traditionally from railway operators that also looks into how such tasks can be supported by

02:54.520 --> 03:00.840
algorithms, by software, but they have a similar problem than the human operators, it is that

03:00.920 --> 03:07.480
if we scale, if you look at bigger networks, like the whole Switzerland, this traditional

03:07.480 --> 03:14.440
optimization algorithms, as well as human operators, they don't scale that very well. So it's hard

03:14.440 --> 03:20.440
to optimize the whole network, especially in a short amount of time, so here we talk about some

03:20.440 --> 03:28.040
minutes, where we have to find solutions, at least short term, so the idea was to look into

03:28.120 --> 03:34.920
novel approaches, so not just classical optimization, but also novel approaches, and about

03:34.920 --> 03:40.760
six years ago at SBB, the idea was let's look into machine learning and in particular

03:40.760 --> 03:47.240
into reinforcement learning and how this might help with this problem. So just a little bit about

03:47.240 --> 03:53.640
reinforcement learning, so basically you have reinforcement learning agent, the agent, the

03:53.640 --> 03:59.160
picked it here and on the right is the brain in a box, you have an environment and the agent

03:59.160 --> 04:05.560
wants to act in this environment and the agent can observe this environment. And in addition to order

04:05.560 --> 04:12.280
to learn, you also need some measure that tells you that it's an indicator how good your action

04:12.280 --> 04:19.640
was. So this is a basic reinforcement learning setup, and so the idea was let's see how we can do

04:19.640 --> 04:25.800
with reinforcement learning for these kind of problems. This worked very well in the beginning,

04:25.800 --> 04:31.400
some early experiments, but it was also clear that this is on one hand novel approach,

04:31.400 --> 04:35.560
it's not so established, it's not the years of research in reinforcement learning in railway,

04:35.560 --> 04:42.120
and also there are a lot of different approaches and so the question was like how can we

04:42.120 --> 04:46.920
make use of this technology without doing everything in-house? And so the flat length frame

04:46.920 --> 04:52.840
worked with spawn, you see it here, and the right, the nice animation, it's basically what

04:52.840 --> 05:00.600
you get out of it, at least visually, so you see the little trains moving, and on the right you

05:00.600 --> 05:05.800
see the basic elements, because the question was also not just how to use new technology,

05:05.800 --> 05:11.160
but also how to bring researchers and people working in reinforcement learning into the railway

05:11.320 --> 05:20.760
domain. And so the idea was let's create something simple that still gives enough complexity

05:20.760 --> 05:26.440
to solve a real-world problem, or at least to look into a real-world problem, but that's also

05:26.440 --> 05:32.040
accessible for people that don't know that much about railway. And so the flat length framework

05:32.040 --> 05:37.800
basically is on one hand assimilated, discrete time simulation, it's on a 2D grid, so there are

05:37.800 --> 05:42.840
some things that are not possible in flat length, but it works quite well for trains because they

05:42.840 --> 05:50.360
usually don't fly, and you only have a couple of simple cell types, a couple of simple as

05:50.360 --> 05:55.960
switches, straight lines, curves and so on, and basically everything in flat length can be represented

05:55.960 --> 06:01.240
by these 10 cell types and rotations of it. Then we have a train, the agent, and the

06:01.560 --> 06:06.280
station, the target. So the task is simple, I'm in train, I start somewhere, I need to

06:07.160 --> 06:14.760
navigate to my train station. However, this might look simple at first, but the question is how,

06:15.320 --> 06:20.440
what's a reward, how can I observe the state, because we have a little bit of the same problem as

06:20.440 --> 06:24.840
the classical optimization, if you want to look at the whole grid, the whole time, so every agent

06:24.920 --> 06:30.200
absorbs the whole grid. If we scale this to the network, the size of Switzerland,

06:31.000 --> 06:37.080
this is again not possible. So this concept of observation and reward are very interesting and

06:37.080 --> 06:43.640
very important for reinforcement learning, and so defining the actual observation and defining

06:43.640 --> 06:51.560
the actual reward is quite challenging. So again, the idea was like how can we use this, how can

06:51.640 --> 06:57.720
we use this approach for this kind of problem. So on one hand, we looked into what are possible

06:57.720 --> 07:02.680
observations, so there's some basic observations that are provided by flat length, the global

07:02.680 --> 07:09.080
observation, which is not so good. Local grid view is another one quite intuitive one, I just look

07:09.080 --> 07:16.680
at some cell cells, but you might imagine if you're on a train on the tracks, the things left and

07:16.760 --> 07:22.600
right usually don't matter so much. However, it matters where you go. You have junctions,

07:22.600 --> 07:28.040
switches, you can go left and right, and so we have also built in this local pre-observation

07:28.040 --> 07:34.840
that should very promising results. However, this was the beginning, we saw that this had potential,

07:35.640 --> 07:42.040
but it was way too much to do in-house and also not just to open source, because

07:43.000 --> 07:48.520
yeah, why should somebody care? And so the idea was, in the beginning, let's start with a challenge.

07:48.520 --> 07:53.720
Machine learning challenge, to bring people onto this topic. So the first flat length challenge

07:53.720 --> 08:00.120
then started, everything was open source, and we had a lot of traction also on this challenge.

08:00.120 --> 08:06.200
So it was really, it has a community that can help and not just people in-house or not just

08:06.200 --> 08:12.440
people that, by accident, stumbled upon the GitHub repository, but that we can actively engage

08:12.440 --> 08:22.440
a community. And one big opportunity here for open source was really like, you can also imagine

08:22.440 --> 08:25.960
like the flat and environment could be like basically close to source code of the environment,

08:25.960 --> 08:31.480
and you just have an API that can interact with the environment. But what I mentioned earlier about

08:32.360 --> 08:40.360
observations and also reward, you can imagine if you have a train run. How good is one decision?

08:40.360 --> 08:45.560
It's hard to say, to go left or right or straight or stop. So you really need to

08:46.520 --> 08:51.800
take into account the whole picture. So you have all the train runs. At the end, you want to be there

08:51.800 --> 08:58.200
on time. But it's hard to say what action was actually responsible for being on time.

08:59.160 --> 09:06.120
And so this was really one important decision for flat land to be fully open source,

09:06.120 --> 09:11.640
because the reward shaping and observation and building is a huge part of reinforcement learning.

09:11.640 --> 09:16.360
And this is really something that also then the community came in. And for flat and challenges,

09:16.360 --> 09:22.120
you could basically modify most of the flatland environment. For the evaluation, there were some

09:22.120 --> 09:27.480
things you couldn't do, but for training basically you could do most of it. You could really

09:27.480 --> 09:31.560
adapt the environment. You can build your own wrappers to build your own observations and so on.

09:32.600 --> 09:38.280
And what we've seen was actually a lot of engagement. On one hand, we had actually extensions,

09:38.280 --> 09:45.160
published also, for the flatland environment. So it was not just a group that did this for themselves

09:45.160 --> 09:50.120
and didn't share it with anyone. On one hand, the condition was that if you want to prize,

09:50.120 --> 09:56.440
you also needed to be open source, if you wanted to prize. But there was also really a lot of people

09:56.520 --> 10:01.960
that were just interested in the topic and happy to share. And so we had a lot of extensions,

10:01.960 --> 10:06.840
like for example here, you have the whole network, but then some people realized only actually

10:06.840 --> 10:11.880
the cells where I have to take a decision, they matter for our approach. So for example here,

10:11.880 --> 10:19.000
the red cells here and the right is where you can stop. So this can have an impact, or you have

10:19.000 --> 10:25.880
switches. That's something you have to go left or right or straight. So there you have to take

10:26.840 --> 10:30.440
it a decision. So they made a report that only basically, or that masked basically all the

10:30.440 --> 10:37.400
non-decision cells. But we also had people that actually built tools for analysis. So there

10:37.400 --> 10:45.000
for whole was a Shivam was one of the, or was he driving force for this tool. So he built this whole

10:45.000 --> 10:49.800
tool where we could visualize all the train runs. You could see the decision statement made

10:49.800 --> 10:56.440
to concessions like here, for example. And you could analyze your whole train run and see

10:56.440 --> 11:02.920
value model or your approach might still fail. This was also open source. Even during the

11:02.920 --> 11:08.840
competitions are also other people could use it. And we had also a lot of people engaging actually

11:08.840 --> 11:14.440
creating the baselines for the challenges. So what you see on the top right is the training

11:15.240 --> 11:19.560
of the reinforcement learning models and a lot of different approaches. So also there,

11:19.560 --> 11:24.600
people just joined in and said, like, hey, I'm interested in this. I want to work on these baselines.

11:25.640 --> 11:32.360
Would you see an orange is the global observation that I talked about before. So you see that the

11:32.360 --> 11:41.320
higher up, the better. So there you also have to prove or at least some proof that the global

11:41.320 --> 11:47.560
observation doesn't work that well. But you can also see that there is a learning curve. And yeah,

11:48.360 --> 11:53.560
and I think this is one of the big things that I came out of the framework.

11:55.560 --> 12:01.080
I think due to these efforts, it was also possible actually for

12:01.640 --> 12:07.160
multi-action reinforcement learning to catch up. So what you see, the top line, the blue one,

12:07.160 --> 12:12.120
is no precincts, operations research approach. So it's still kind of a classical approach

12:12.120 --> 12:18.760
from a multi-action path finding. However, you see here also that the multi-action reinforcement

12:18.760 --> 12:26.760
learning solutions are close. This was not the case in the first challenge. So from left to right,

12:26.760 --> 12:31.000
you have growing complexity of the networks, you have more trains, you have bigger environments,

12:31.800 --> 12:38.040
and again, like the higher up, the better. So one is basically a perfect score and the more you go down

12:38.120 --> 12:44.680
to the less. Here also I think important mention for the competition. They had a limited amount

12:44.680 --> 12:50.680
of time to calculate the next step. So it's, you couldn't like, calculate for a whole hour

12:50.680 --> 12:55.800
for the next step to take for the next action to take. So this was also constraint. And this

12:55.800 --> 13:05.000
is also why you see operations research, for example, doing them. Thanks to also these flattened

13:05.080 --> 13:11.880
challenges and being open source, we also realized others have the same problem. And so this European

13:11.880 --> 13:20.520
research project, the rice and your projects started, AF RealNet, that looks into AI for real world

13:20.520 --> 13:25.720
network operations. We're not alone there with the trains. There's also aviation and the

13:25.720 --> 13:32.440
power grids. And we really realized that not only flatland, but also other initiatives like it

13:32.520 --> 13:37.480
can also help each other. Because some approaches that work, for example, for aviation,

13:37.480 --> 13:45.960
might also work for railway and also pop out it. And so this is really a nice story to see,

13:45.960 --> 13:51.560
like how also open source projects can help each other. In addition, what you see here on the left

13:51.560 --> 13:58.760
is a human. So it's not only about AI, but actually flatland is also used to facilitate human

13:58.760 --> 14:04.360
AI interaction research. So part of AF RealNet is also this research on how can a human operate

14:04.360 --> 14:13.240
to interact with an AI system. So conclude. I think open source can really inspire. In our

14:13.240 --> 14:18.520
challenges, we have more than 4,000 submissions of solutions, with more than 1,000 participants.

14:19.560 --> 14:25.640
And we have also, or they were published. And if we have, there are more than 100 papers published

14:25.640 --> 14:31.480
that directly reference papers we wrote, but also directly made use of the flatland environment.

14:32.520 --> 14:37.400
So from our point of view, this is really a success story. Open source made it possible without open

14:37.400 --> 14:42.360
source. This would not have worked. And if you want to contribute, we're more than happy to go

14:42.360 --> 14:47.480
to our GitHub. Read the docs. There's a start to keep the text about 15 minutes if you don't

14:47.480 --> 14:52.120
have to code and paste the title to run your own environment and train your own agent. Thank you

14:52.120 --> 15:02.120
very much.

15:06.120 --> 15:15.800
Yes, I want you to know if you have worked on an explainability on the models for the

15:15.800 --> 15:22.760
CPU makers, policy makers. So the question was about explainability and if we worked on this

15:22.760 --> 15:31.080
for policy makers and so on. So the answer is we at Flatland we didn't, but there is actually

15:31.080 --> 15:35.960
an A of RealNet, the project I talked about before. Part of the project is also explainability

15:37.560 --> 15:43.160
for multiple reasons. On one hand, because it's in general an interesting topic for AI to make

15:43.160 --> 15:49.960
models explainable, but also that what happened in the past also with Autosoft, I'm not just

15:49.960 --> 15:55.800
AI, is that if human dispatch, for example, or just a user of whatever system AI assistant,

15:56.840 --> 16:02.680
doesn't understand the solution that AI provides, especially if it's really like an interaction.

16:03.240 --> 16:09.720
It's very hard to work with that system. So basically in AI of RealNet, there's both a research

16:09.800 --> 16:16.600
on explainability just in general of the model for safety robustness and so on, but also to enable

16:16.600 --> 16:34.680
really the human AI interaction. So what is the objective of the agent, most of the question,

16:34.680 --> 16:42.920
if it's just about delay or also something else. So basically in the challenges, it's not only delay,

16:44.600 --> 16:51.640
but there it's also kind of a very simplified problem. So you get a score at the end of a train run.

16:51.640 --> 16:56.600
It's also, for example, about completion rate. Basically it's the time you need to get there and

16:56.680 --> 17:05.480
the completion rate, because what you see here, well, you don't see it, but there are no actually

17:05.480 --> 17:11.400
that locks, but you can imagine that for the AI agent, it's not just about being there on time,

17:11.800 --> 17:16.600
but actually to avoid that locks, which might sound easy, but it's not that of an easy problem,

17:16.600 --> 17:22.280
especially if the network is getting big. And so part of the goal for the challenges was also

17:22.280 --> 17:29.640
to avoid that locks. But in principle, you can model any kind of question, well, that you can

17:29.640 --> 17:36.600
represent in this environment, but basically the scoring, you can also define yourself. So this was just

17:36.600 --> 17:43.400
what we did for the challenge to start, but it's the environment also allows for other types of scores.

17:43.400 --> 17:45.400
It allows you to fill a lot of goals.

