WEBVTT

00:00.000 --> 00:13.360
Hi everybody. I'm here to present the mobility database. I'm Isabelle de Robert. I work for mobility

00:13.360 --> 00:21.240
data as a regional director for Europe. I'm Basin Barcelona. I speak English, French and a

00:21.240 --> 00:26.880
bit of Spanish. It's working progress. So if there are any native speakers in Spanish come

00:26.880 --> 00:31.880
to talk to me after I need to practice. And the reason I do this work is to make sustainable

00:31.880 --> 00:39.280
transport. I forgot to put transport. The easiest choice. I'm sure all of you take public transport

00:39.280 --> 00:47.000
as a principle already. I try, but I'm also a bit lazy and sometimes I need to choose between

00:47.000 --> 00:52.720
the easiest or the most sustainable often actually. And so my goal is to make sustainable

00:52.720 --> 01:00.800
transport the easiest for other lazy people like me. So who is mobility data? Who knows about

01:00.800 --> 01:11.360
mobility data in the room? Oh, nice. I was expecting a lot less. We're global non-profit, HQ

01:11.360 --> 01:21.160
in Montreal, Canada. We're about 25 people. And we govern open data formats. All of you might

01:21.160 --> 01:27.240
know GTFS and GBFS. So we govern these standards. And something you might not know is that

01:27.240 --> 01:32.760
we also have a software engineering team that builds free and open source tools for the

01:32.760 --> 01:41.360
community of users. Our goal as an organization is to empower sustainable mobility stakeholders

01:41.360 --> 01:50.680
to make the best of their data, their digital assets. So you might already know GTFS and

01:50.680 --> 01:58.120
GBFS and what we call diffact standard. So it means a format or a specification that becomes

01:58.120 --> 02:07.560
a standard because of its worldwide adoption. So this is the network between GTFS, GBFS for

02:07.560 --> 02:15.000
shared mobility, shared modes, bike share, scooter share, car share. A lot of the work has

02:15.080 --> 02:21.080
been done in Europe to build this format because the market is big in Europe. And about

02:21.080 --> 02:31.240
a thousand show mobility operators use this format today. And what they do is they handle the

02:31.240 --> 02:37.160
well, okay, they handle the communication between the operators and the general public using

02:37.160 --> 02:42.440
the service. So they represent writer-facing information. Nothing kind of operational. It's

02:42.440 --> 02:47.880
really meant to share information to the public using the service. Most of the time if you

02:47.880 --> 02:54.680
are trip-learning apps. So these are a few examples. There's a huge open source community

02:54.680 --> 03:02.280
between these two formats. There is an official GitHub and Slack for GTFS, GBFS. We're

03:02.280 --> 03:08.360
maintaining the Slack. But there is also a very big community of folks building open source software.

03:08.600 --> 03:16.200
We also have a list we've had for a while called Awesome Transit and we should link the two links

03:16.200 --> 03:22.920
somehow. And we try to keep up with all of the tools that are available with the focus on free

03:22.920 --> 03:28.600
and open source. There is no rules to this list. Anyone can add their tool. If there is one

03:28.600 --> 03:35.320
user or thousands of users, it's hard to keep up. So if you see tools that are not there,

03:35.320 --> 03:41.240
if you build tools, please add them to this list. We try to maintain it and keep it up to date.

03:44.280 --> 03:48.840
So before introducing the mobility database, I want to talk about the problem. It's meant to

03:48.840 --> 03:57.400
solve. We talked a little bit about the quality of the data. Sorry for the Google map screen shot.

03:57.400 --> 04:03.960
I know it's bad to bring that in an open source conference. But the point is that often times this happens

04:04.040 --> 04:11.320
so often we are stressed when we take public transport. We don't know what to do with this information.

04:12.680 --> 04:19.080
Having bad data, sometimes it's worse than having no data. It reduces the trust. People

04:19.080 --> 04:25.640
having the service creates a lot of complaints. The people that can afford to take a taxi, they take

04:25.640 --> 04:30.360
a taxi and the one that can't take their right way to work. They are very stressed.

04:30.360 --> 04:39.400
It takes a whole industry to improve data quality. Originally, I didn't have policy makers here

04:39.400 --> 04:48.040
and I feel very ashamed. When you spoke, I added policy makers. Because in the open source system,

04:48.040 --> 04:53.400
we are a bit outside of what's happening in policy. I find, and so I'm very glad you're here

04:54.040 --> 05:00.440
and I'm sorry that I didn't put policy makers on this slide in our ecosystem. It's more

05:00.440 --> 05:06.600
the industry. So the public transport authority is the technology partners, the trip planners.

05:06.600 --> 05:11.720
I take a commitment to collaborate more with policy makers in Europe, especially.

05:13.000 --> 05:18.280
So why do we have this bad experience in the world? Everything is digital. We can do everything with AI.

05:18.280 --> 05:22.360
And when we take the bus, we have wrong information. The best doesn't show up.

05:24.120 --> 05:30.360
It's very hard to translate data into a problem for the writer. So when the bus doesn't show up,

05:30.360 --> 05:36.680
it's a real experience. But when you see a data set, it's hard to trace and to be able to say,

05:36.680 --> 05:42.600
well, this will lead in a bad writer experience. Sometimes it's a bit obscure. We don't know how to

05:42.600 --> 05:50.600
measure data quality. We don't have tools to measure it. We have accountability issues of 10 times

05:50.600 --> 05:57.400
between the operator and your up. There is five intermediaries in Europe. There are the national

05:57.400 --> 06:02.440
access points that are supposed to help with all of this. It also adds another stakeholder. Sometimes

06:02.440 --> 06:06.520
they point fingers. You're responsible. You're responsible. We don't know who's responsible

06:06.520 --> 06:13.640
for the data quality. So the solution, one solution is a product to be able to encode the mobility

06:13.640 --> 06:22.120
database. So it's an open data platform containing data in the GTFS and GBFS format, essentially.

06:22.120 --> 06:32.600
It's free to open source. Currently it has data from 4,500 sources, 75 countries.

06:33.240 --> 06:39.800
And the reason we build this platform is the data quality. I know we talked about availability.

06:39.800 --> 06:48.760
But it's the main thing we're trying to solve with it. And so each feed contains data quality

06:48.760 --> 06:56.680
reports. Our team also builds validators that evaluate the quality of GTFS and GBFS data and

06:56.680 --> 07:05.080
provide this official way to measure the quality. It is required in some places to use these tools.

07:07.560 --> 07:13.240
Here's an example of what a page looks like. So the metadata, I would say on the left,

07:14.520 --> 07:19.800
license, contact information and the features that are included. Is it only timetable? Or is it

07:19.800 --> 07:28.280
only accessibility information, fairs, etc. It links to documentation with just release visualizations

07:29.320 --> 07:35.960
last year. So these you can zoom in and out and be able to kind of see what the data looks like

07:35.960 --> 07:41.480
in practice. And the assumption is to try to make people understand the impact of having data

07:41.480 --> 07:48.200
problems. And we have the compliance reports according to the spec, not the law, in that case.

07:50.600 --> 07:57.160
And the value of this product is to make this data available for many users.

07:59.640 --> 08:05.480
So we have the cities that can use it for all of the planning. They need to do different scenarios.

08:05.480 --> 08:10.520
We have collaborations with the national law access points. We exchange data. Some of them

08:10.520 --> 08:18.760
have integrated our tooling within their platforms. Of course researchers, transport operators

08:18.760 --> 08:25.560
can use it to check the quality of their feed. Maybe longer term is their data reused.

08:25.560 --> 08:30.120
For example, it's something we could add writer applications.

08:33.400 --> 08:40.680
So it has the reports from the official validators that we maintain. We have one for

08:41.560 --> 08:45.960
GTFS schedule. We also have one for GTFS rule time. It's not yet integrated.

08:46.040 --> 08:55.880
We have one for GTFS. It has visualization to troubleshoot problems. You can see kind of the location

08:55.880 --> 09:06.040
of the stops. The weather shape is a lot of metadata, historical data is also available

09:06.040 --> 09:10.280
or historical versions of the data for GTFS schedule only.

09:10.520 --> 09:20.520
And they are widely used. We have a lot of contributors. We have our team, but we are

09:20.520 --> 09:26.280
as an open source project. There is also a lot of people contributing. And we are lucky to have

09:26.280 --> 09:35.240
a lot of them. But we struggle with keeping this platform up to date. These are some of our

09:35.240 --> 09:41.320
issues. I'm sure that any of you building products on open data are facing similar ones.

09:41.320 --> 09:45.800
The feed become inactive. We don't know if there is a replacement. We don't know what the

09:45.800 --> 09:53.640
license is or the license is unclear. It completes data sources. You mentioned some aggregate. Sometimes

09:53.640 --> 09:59.480
you have a address of a bad aggregate or incomplete small local sources. We don't know if the

09:59.480 --> 10:07.560
source is official. Is it maintained? Or is it an awesome contributor that did it for Hackaton?

10:07.560 --> 10:13.560
But then they don't do anything with it anymore. No coverage in certain areas, certain countries.

10:13.560 --> 10:22.760
We have no feed at all. And so I'm lucky to be in a room full of developers to present this.

10:23.400 --> 10:31.080
And so I couldn't help but to prompt you to support this project. I imagine you come from many

10:31.080 --> 10:36.680
different places in Europe. So I encourage you to go on the platform to look if there is the data

10:36.680 --> 10:43.960
from your city or from your country. Check the quality. Bring it to the transport authority.

10:43.960 --> 10:51.400
If they don't know about it, add a missing feed. If you know of a feed that is missing,

10:51.480 --> 10:59.160
replacing outdated source. It's all on GitHub, open issues, whether from missing data,

10:59.800 --> 11:04.920
problems with the quality. It helps us also go to the official source and say people are complaining

11:06.360 --> 11:12.680
about your source and it's not only us saying it. And open each GitHub issue. Also your ideas

11:12.680 --> 11:20.360
on how to improve the platform, new features and functionalities that we will make it useful for you.

11:22.360 --> 11:28.120
And for the time that is left, I encourage you to go ahead and open it. The people that have

11:28.120 --> 11:34.280
their computers. I'm going to show a little demo but go ahead. I won't be mad if you go on your

11:34.280 --> 11:44.040
laptops or on your phones. Open it, browse it and I will show a little demo. I think it's my last

11:44.040 --> 11:56.600
slide. How many minutes? 5 minutes left. Here's what it looks like. I'll make it a bit bigger.

11:57.400 --> 12:16.600
And I'll put TMB in Barcelona. So this is what the page looks like. We know it's an official feed,

12:16.600 --> 12:24.360
so it means they have confirmed that the source is maintained by them. We can download the GTFS.

12:24.360 --> 12:33.240
We can open the quality report with issues in the data.

12:36.840 --> 12:42.440
We are to download the service range. So some of the things we are thinking about doing is, for example,

12:42.440 --> 12:49.640
sending notifications. When the feed is about to expire, because having the feed expiring and not

12:49.640 --> 12:56.760
having replacement is actually a major problem we have. Which features are included. So here we

12:56.760 --> 13:02.040
can see they have, for example, wheelchair accessibility. We can click on it and get to the documentation.

13:02.840 --> 13:09.560
Another thing we're thinking about doing is adding what features could be included. If one feed

13:09.560 --> 13:14.520
only has minimum information, we can say, hey, you could add accessibility here's the documentation.

13:14.680 --> 13:21.400
And this is the big new thing, the visualization. So I'd also open them up.

13:23.400 --> 13:30.840
And I think we'll do just with subways and metro. And this shows the shapes.

13:33.000 --> 13:37.560
So the form that the writer seen a trip planner and the location of the stops.

13:37.560 --> 13:48.520
And that's it. I want to keep time for questions. How much time do you have?

13:51.560 --> 14:01.400
There should be. Let's check. But maybe if you type STIBs, how you find it.

14:01.400 --> 14:08.040
Normally, if you type bristles, you should, you should write, why am I?

14:12.360 --> 14:25.880
Yeah. Here's the feed for bristles. Or for the STIB. Any questions? Yes.

14:26.840 --> 14:33.800
I'm wondering, do you have an example of an unofficial feed that is like trust him by the community?

14:36.040 --> 14:49.000
Yes. The feeds from, well, we removed them now, but we had some from GTFS.BE. It's a platform actually in Belgium.

14:49.560 --> 15:01.400
That we're making the Belgium feeds available without a login because you need it to sign a PDF document.

15:02.440 --> 15:06.040
But now it's not the case anymore. So we remove them. But that's an example.

15:19.960 --> 15:28.920
Me, the same thing. Sorry. For example, a couple of people would be still chairing about the transit and sometimes they say,

15:28.920 --> 15:35.080
oh, this is wheelchair accessibility and they get there and they do. So do you look at, sort of like,

15:36.280 --> 15:44.600
crowdsourced information or did just the feed from that transparency? We take both our philosophies to add

15:44.680 --> 15:52.760
all of the data sources. We have either crowdsource or official or aggregates, disaggregate,

15:52.760 --> 16:02.920
but just document it so people can choose. And then, um, with regards to being able to compare

16:02.920 --> 16:11.480
feed with each other, we our validator measures what is included in the data. And so when we say the

16:11.480 --> 16:19.960
data has wheelchair accessibility, it will mean the same thing in every country. If the producer says

16:19.960 --> 16:25.480
it, or if the if the producers and the consumers use the same tool that measures it, then they have the same

16:25.480 --> 16:32.040
definition. Okay. Yes. And then you. Yes. And then you. Yes.

16:32.120 --> 16:40.680
So, I'm going to say to happen this, are you actually consuming the data? Well, we are

16:40.680 --> 16:46.680
pointing to the feed. We are consuming to build this platform, but we don't build any additional service.

16:46.680 --> 16:55.480
We are consuming to to display them, but we don't, um, we don't have, I don't have an API on the data.

16:55.480 --> 17:00.120
No, we do have an API, so that's not completely accurate. And we have a stable ID as well.

17:00.200 --> 17:09.560
So, um, so, for example, in France, we rely on the national access point to transport a

17:09.560 --> 17:17.240
taggov. And so, they, with the ill-defrost mobility, they have, uh, they have a stable ID. So,

17:17.240 --> 17:21.720
if you look, and they are in touch with ill-defrost mobility, they are the person. And then we,

17:23.480 --> 17:28.120
we provide an URL, uh, based on the transport a taggov feed.

17:28.120 --> 17:32.120
In that case, how do you deal with licensing? We have something up. I mean, I do really

17:32.120 --> 17:40.760
get info about data that they license. Yeah. Yeah, exactly. So, whatever opinions people might

17:40.760 --> 17:46.520
have on the weaknesses in France, well, can I say five questions or? Also, I want you to be able

17:46.520 --> 17:53.240
to ask us your hand the whole time. But we, um, GTFS and NetAx, it's obviously from both the

17:53.240 --> 17:58.680
presentation, the EU guy, but NetAx says the way that things will be going in the EU.

17:58.680 --> 18:04.520
Yeah. Does your tool have any support for NetAx, or are you planning to do this? Because having

18:04.520 --> 18:09.560
something like this that breaks the quality of NetAx means what would be very useful.

18:09.560 --> 18:18.200
We, yes, some people ask us, we consider it. But we govern this format. And so,

18:18.680 --> 18:23.080
from a product perspective, it could make sense, but we also use the mobility that are

18:23.080 --> 18:29.800
based, uh, more selfishly to analyze and then make changes to the standard based on what we see,

18:29.800 --> 18:37.320
which we cannot really do in the text. And so, but it's possible, but currently it's not really

18:37.320 --> 18:43.880
a short-term plan, no. Yeah, we did not go a project there. It's about to be a waste of that purpose.

18:43.880 --> 18:54.440
Oh, cool. But we could operate as much as, as much as we can. Yeah. Okay. Sorry.

