WEBVTT

00:00.000 --> 00:11.000
Okay, awesome. Good afternoon, everyone. I'm glad everyone has made it to the

00:11.000 --> 00:18.760
3 o'clock on Fuzzdom. My name is Vasim Musa. I'm the CTO of Open Cities Lab, a non-profit

00:18.760 --> 00:27.000
organization working out of South Africa, predominantly in the public sector, and it's

00:27.000 --> 00:33.880
a civic tech organization, generally working with cities around Africa and specifically

00:33.880 --> 00:39.360
in South Africa. So today I'll talk a little bit about open source approaches to a secure

00:39.360 --> 00:48.600
data exchange in South Africa's DPI. South Africa is going through a large digital transformation

00:48.600 --> 00:54.680
as many other African countries and South American countries as well. So I'm hoping to

00:54.680 --> 01:04.760
share our last 16 months of learning to build DPI, and yeah, the journey is so far.

01:04.760 --> 01:11.240
So firstly, open cities labs vision a city where every resident regardless of income, identity,

01:11.240 --> 01:17.480
or location can access essential services they need with dignity and ease. And it's very

01:17.480 --> 01:24.080
important that the beginning part is a city, because generally this is where citizens interact

01:24.080 --> 01:28.520
with local government. And I think just two sessions ago in Nicholas was mentioning

01:28.520 --> 01:36.280
local government initiatives for open source. And DPI often, in often cases, is working

01:36.280 --> 01:41.240
at a very high level, generally at a national level, so to see service delivery impact

01:41.240 --> 01:49.720
with DPI is hard to track, but it's important. So what is DPI? It's a set of foundational

01:49.800 --> 01:56.440
digital systems that forms the backbone of modern societies. DPI enables secure and seamless

01:56.440 --> 02:02.880
interactions between people, businesses and governments. This is from UNDP. How I like to

02:02.880 --> 02:09.120
think of it is very much like roads. So just how roads are physical infrastructure,

02:09.120 --> 02:15.400
both to provide access to other infrastructure. So we use roads to go to hospitals, police

02:15.480 --> 02:21.240
stations, private sector, businesses, and so on. But in itself, it's also an infrastructure

02:21.240 --> 02:29.240
that needs to be built, secured, and transport various people throughout. On the next

02:29.240 --> 02:33.960
two slides, I've got a bit of personas about the current state of affairs in South Africa

02:33.960 --> 02:42.400
and how people access digital services. So we got the example here of Tandy, where, you know,

02:42.480 --> 02:46.720
maybe in the EU, it's a little bit different, but speaking to some people, I know they

02:46.720 --> 02:53.280
are some similarities. So incomplete information about where to access services. Oftentimes

02:53.280 --> 02:59.920
in South Africa, we have to visit physical locations to apply for certain things. Tandy

02:59.920 --> 03:05.520
must submit physical copies for verification. So, you know, just interacting with government

03:05.520 --> 03:12.160
needs physical copies of certain things. Oftentimes, they don't meet our needs in terms of

03:12.240 --> 03:18.240
services, and there's large amounts of delays, long cues, and limited payment options.

03:18.240 --> 03:23.520
So that's the current state of affairs. This was pre-digital transformation. It's been just

03:23.520 --> 03:28.400
over a year and there's a lot of national initiatives kind of building alongside to help

03:28.400 --> 03:35.840
speed this along. South Africa in general is highly digitized from the IT perspective. So

03:35.840 --> 03:40.560
there are a lot of great systems built for government. However, as you'll see, a little bit later

03:40.560 --> 03:47.840
in the talk, most of it is isolated in silos within the departments, and, you know, speaking

03:47.840 --> 03:54.320
and listening to open forum, summer yesterday, and many of the other EU cases, it seems like

03:54.320 --> 04:01.680
it's not just the African problem. It's across the world. So there's three core components to

04:01.680 --> 04:08.160
DPR. You have digital identity, digital payments, and the backbone of all of it is the data

04:08.160 --> 04:14.080
exchange. These then power all your digital services. So a very simple example, one could think

04:14.080 --> 04:20.400
about is renewing the passport. To renew your passport, you have to provide an ID. That

04:20.400 --> 04:27.040
check has to go against some interdepartmental verification. In South Africa, we have home affairs,

04:27.040 --> 04:31.520
department of home affairs that does the check, and that generally goes through the data exchange.

04:31.840 --> 04:37.520
Then some sorts of payment is needed to then, you know, renew your passport, and you get

04:39.920 --> 04:45.920
the renewal process done. A key part here is not only digital IDs, but we were looking at

04:45.920 --> 04:52.960
verifiable credentials holistically. So we were thinking about how you would have a health card,

04:52.960 --> 04:59.840
a digital driving license, and not just digital ID. So that's just one example in this particular case.

05:00.000 --> 05:05.280
The focus of my focus is going to be on the exchange layer. It's something not seen to us

05:05.280 --> 05:12.000
generally as citizens. It's usually interdepartmental accessing of data, and extremely important.

05:12.000 --> 05:19.840
In fact, I think the most critical layer for DPI. But firstly, what is the data exchange component

05:19.840 --> 05:26.240
trying to solve? In South African context, data currently exist in isolated silos, and this

05:26.320 --> 05:33.520
rarely hamper streamline research evidence based policy making, government operations, and ultimately

05:33.520 --> 05:41.120
at local government service delivery. So the data exchange is critical. In fact, I would say

05:41.120 --> 05:47.120
it's the most critical part of DPI. It connects governments, scales nationally, and enable services

05:47.120 --> 05:53.120
for millions. And still, at conferences, I think even the last two or three times, I've got a

05:53.120 --> 05:59.520
few people just saying, oh, you're building a middleware, or an API gateway. But I'm hoping to

05:59.520 --> 06:07.280
give a bit more context in this talk. So yeah, over the last 16 months, we did a bit of research

06:07.280 --> 06:11.840
in the first six months around the data exchanges than around the world, and what technology we

06:11.840 --> 06:17.520
should use, and we even started piloting in the last six or seven months with government agencies

06:17.520 --> 06:25.440
in South Africa. The challenge was set out for us in back in 2023. So we were approached by the

06:25.440 --> 06:31.600
World Bank to do a feasibility study on what the data exchange for an African country would need

06:31.600 --> 06:36.960
in this particular case, South Africa. This was done alongside the leading department,

06:36.960 --> 06:42.080
national treasury in South Africa. And national treasury is like, I don't know what's the

06:42.080 --> 06:47.920
ministry equivalent here, but they had our finance, they do our budgets, and they set the

06:47.920 --> 06:55.600
economic policy of the country. And what we needed to do was make use of integrated

06:55.600 --> 07:00.880
data source from various sources for different purposes. So exchange of data in real-time,

07:00.880 --> 07:06.160
bulk data transfer is still, you know, one of the main ways the government agencies move

07:06.160 --> 07:11.600
data is almost in large flat files and they're moving data. We even wanted to look at

07:11.600 --> 07:17.760
open data access and ultimately how all of that leads to evidence-based decision making.

07:19.520 --> 07:24.160
What we did in the design journey, so the initial four months, we did a needs assessment,

07:24.160 --> 07:29.760
meeting the government departments, the major ones that were identified to share data and exchange

07:30.720 --> 07:36.160
data. We did a good practice report and research. I link to that at the end of the

07:37.280 --> 07:42.560
presentation where you'll be able to see it. A high level architecture of what needs to be done

07:42.560 --> 07:49.440
to do a data exchange at scale for a country. And what we did not only in terms of research,

07:49.440 --> 07:55.600
but we actually prototyped software and government through use cases. So it wasn't theoretical,

07:55.600 --> 07:59.600
we actually did some real work and we used open source to do that.

08:02.080 --> 08:09.200
Okay, so initially it was called the integrated data lake project and we were given the brief

08:09.200 --> 08:15.360
around a data lake. It was a good starting point. So what is a data lake? Just an easy

08:17.600 --> 08:22.720
technical solution for structured unstructured and even semi-structured data. Just a place to

08:22.720 --> 08:30.480
move a lot of data into, which generally to some extent made a bit of sense for a starting point.

08:31.600 --> 08:40.160
However, we'll soon see it's very limited and it wasn't effective in this particular exchange

08:40.160 --> 08:47.440
for an entire country. So the data lake started with three initial architectures. We were looking

08:47.440 --> 08:55.200
at centralized, federated and hybrid or federated, centralized system. We moved all the data into

08:55.200 --> 09:01.040
similar to where treasury is storing the data and do a lot of the research,

09:01.040 --> 09:06.000
querying of the data in one place. This is technically challenging. There's a lot of data,

09:06.000 --> 09:11.600
you know, hundreds of terabytes of data actually been moved in national government. And not only

09:11.600 --> 09:16.320
would it be technically challenging, even from a mandate and policy perspective, you know, are they

09:16.320 --> 09:20.880
able to actually hold all that data and how's it? And in South Africa, we have the personal

09:22.880 --> 09:30.320
protection act for identity called Poppy. And that would kind of break some of the rules around

09:30.320 --> 09:34.960
that, especially with like your revenue service, sharing data with certain other services.

09:36.400 --> 09:42.800
You know, it's quite an extensive legal landscape to navigate from a data exchange.

09:43.760 --> 09:49.360
Then we looked at federated. So can we put a data lake solution or data exchange solution

09:49.360 --> 09:55.840
in each department? And this made a lot of sense because kind of that's how our government operates

09:55.840 --> 10:03.120
where departments have various IT solutions and various data maturity. So some of our

10:03.120 --> 10:09.520
our systems like for revenue is actually quite advanced. And then there are other departments

10:09.520 --> 10:16.960
which are much further behind in terms of IT and infrastructure. So what the mixture of both

10:16.960 --> 10:22.560
on the good end really good high quality systems ready for integration and on the other end

10:22.560 --> 10:29.120
almost nothing available under-capacitated, we said a hybrid approach is more likely needed in the

10:29.120 --> 10:34.800
short term with the federated approach in the long term. So you would need a central system where

10:34.880 --> 10:40.320
those departments or ministries that didn't have capacity can work with and operate in.

10:40.320 --> 10:46.000
And then those that can federate of the bet have the IT systems can start by federation.

10:48.320 --> 10:53.280
So there are advantages of the federated and hybrid model in hand-stator control. So

10:53.280 --> 10:57.680
the department still maintain ownership of all the data. It's only access to the data,

10:58.480 --> 11:02.880
improve scalability, no single source of failure as well.

11:03.200 --> 11:08.640
Some of the challenges and something we're still grappling with today is

11:09.600 --> 11:14.880
even infederated, it's hard to ensure all departments adhere to a certain standard of exchanging

11:14.880 --> 11:23.040
data. So it is a tough challenge and there are some solutions going around, but it's still there.

11:23.040 --> 11:27.440
Also the technical complexities you're dealing with different ministries, different departments,

11:27.440 --> 11:33.360
with various authorities, and each one coming in with different levels of understanding of

11:33.360 --> 11:39.600
what data sharing actually means data exchange. And it's almost like a quick pro quo.

11:39.600 --> 11:44.720
Like you give me some data, I'll maybe give you some if it's valuable to me.

11:45.920 --> 11:51.040
So not only from a mandate perspective, but you're actually kind of motivate why you'd want

11:51.040 --> 11:55.840
access to the data, even from a department sharing data with another department's perspective.

11:58.400 --> 12:03.600
We then looked at case studies. So we said okay as the technical solution, the data lake makes a lot

12:03.600 --> 12:10.240
of sense, but it's not encompassing and not broad enough. So we looked at a number of other countries

12:10.240 --> 12:16.560
that already implemented data exchanges and because we've been working in the open source space

12:16.560 --> 12:23.760
for the last decade, we were looking for open source solutions. Very high on the list came Rwanda's

12:23.760 --> 12:32.400
Erembo platform, India Stack, Brazil and UK. Brazil and UK was more around digitizing

12:33.120 --> 12:38.320
government information and not about the data exchange itself. They did have an open data platform

12:38.320 --> 12:45.840
which was particularly useful. India Stack was actually very potent, but not all of it was open

12:45.840 --> 12:50.800
and it was extremely hard to find good documentation on. So something we really struggled with

12:50.800 --> 12:58.560
off the bat and we even engaged with them to certain extent and just found it didn't fit the solution

12:58.560 --> 13:05.840
entirely. Now coming on to the next one and I think this might not come by surprise because throughout

13:06.880 --> 13:13.280
today I've just heard this particular software mention a number of times. We looked at Estonia's

13:14.160 --> 13:20.480
X-Rode. So just by definition it's a secure decentralized data exchange platform almost

13:20.480 --> 13:27.840
fits the spec too perfectly which enables interoperability across public and private sector.

13:29.120 --> 13:36.720
And one of the nice things about X-Rode is it's essentially managed but allows

13:37.600 --> 13:43.360
federated exchange of data. So the registry sits essentially, but departments can share

13:43.360 --> 13:49.440
data directly. So once you become a member as I'll explain in a bit you can share data directly

13:49.440 --> 13:54.400
with each other without the central operator needing to come into effect and that's particularly

13:54.400 --> 14:02.640
powerful in my opinion and fits the use case exactly. Now when motivating for X-Rode we have to look

14:02.640 --> 14:08.080
at the underlying open source software. So firstly the license what does it provide from a

14:08.080 --> 14:15.600
feature perspective and secondly what is the ecosystem for X-Rode? So an X-Rode ecosystem refers to

14:15.600 --> 14:22.560
one X-Rode instance where members are part of the registry. And this is very important because

14:23.280 --> 14:29.680
just by setting up X-Rode you don't automatically have a data exchange solution. You need members

14:29.680 --> 14:34.480
to come on. They need to send these departments need to sign legal agreements,

14:34.480 --> 14:41.280
NDAs, or MOUs and we have to do a lot of legal work which I won't touch on in this particular

14:41.280 --> 14:48.640
presentation but there was a large legal undertaking to understand what the mandate of each department

14:48.640 --> 14:54.720
was and how they could access data and how that could be enforced through security protocols

14:54.800 --> 15:02.240
in the X-Rode ecosystem. Now the X-Rode operator controls who is allowed to join the community

15:02.240 --> 15:08.000
and defines regulation and practices in the ecosystem in this particular case this was treasury.

15:10.400 --> 15:15.360
So why we chose X-Rode? The software is actively maintained with the strong community we spoke to the

15:15.440 --> 15:22.080
NIS Foundation, Estonia, Iceland and Finland are all using it in production for over a decade,

15:23.200 --> 15:31.920
Estonia for over two decades. So it rarely had the data exchange in production for a number of years.

15:31.920 --> 15:40.880
Over and above that we were able to engage Petarianist team very actively and is the CTO of the

15:40.880 --> 15:49.280
NIS Foundation. I think it was earlier today we have been mentioned the model of a member

15:49.280 --> 15:55.200
organizations from countries coming together to create a sustainable open source community and

15:55.200 --> 16:01.600
NIS Foundation is one of them where Iceland, Finland, Iceland and Estonia have all come together to

16:01.600 --> 16:10.960
back abroad as a software being built. It also complies with all the relevant security standards,

16:10.960 --> 16:15.440
its interoperability is a design cornerstone and it has extensive documentation.

16:16.240 --> 16:18.880
I think one of the best I've seen from an open source perspective.

16:22.480 --> 16:30.720
This is a quote I found on LinkedIn recently coming shortly and I think it's one of the challenges

16:31.360 --> 16:40.000
I've seen with open source and it's from Arturo Morante. It says too often open source solutions

16:40.000 --> 16:46.800
are expected to be implemented by an in-house team where in practice lacks the mandate or resource to

16:46.800 --> 16:53.120
sustain them. So moving from I have a proprietary solution and I am the only one who can implement it

16:53.120 --> 16:59.040
to I have an open source solution and you are the only one who can implement it is progress but

16:59.120 --> 17:04.640
it's still not enough. Not all the time departments can maintain this infrastructure themselves,

17:04.640 --> 17:10.720
know how to operate it and they need to be implementation partners who know the local context

17:10.720 --> 17:16.880
to come in and assist. So I think this is really nice and from what I understand in first

17:16.880 --> 17:22.000
them just looking around it's it's a big thing. Not only how we build open source communities

17:22.080 --> 17:29.440
but how we implement the open source solutions. So the bottom line x-road creates a trusted

17:29.440 --> 17:37.120
transport network organizations maintain their own secure entry points. Very similar to chain

17:37.120 --> 17:43.120
stations, you know exactly who's coming in and going out. Data travel is standardized

17:43.120 --> 17:49.120
secure containers essential authority maintains order without controlling the data. So as you'll

17:49.440 --> 17:55.200
see in a bit all the central server does is have a registry of the members and that's in

17:55.200 --> 18:02.560
force through public key infrastructure and everyone can reach everyone else using the same system.

18:02.560 --> 18:08.400
So you plug in an API once it's available to every single other member organization or let me

18:08.400 --> 18:13.840
rather say it's visible. It's not accessible it's visible to every other member organization.

18:14.160 --> 18:24.160
Technical architecture deep dive and what were our final recommendations. So we said for sure

18:24.160 --> 18:30.000
federated data exchange for a country makes a lot of sense, especially in South Africa's context.

18:31.520 --> 18:36.080
It was the way both from mandate and a technical perspective to make sure it's federated.

18:36.080 --> 18:44.960
And something really nice about that is you then maintain data sovereignty so each department

18:44.960 --> 18:52.400
still holds all the data and you you are ensuring that it's a cross government data exchange

18:52.400 --> 18:58.800
is actually just secure. Something about the South African government is most of the IT infrastructure

18:58.800 --> 19:06.480
is 90% on-prem. So you have very, very few systems that are actually in the cloud.

19:07.680 --> 19:12.640
This is maybe both the positive and negative to some extent because it's hard to scale by

19:12.640 --> 19:19.760
procuring servers through in government, especially because you need budget and capacity and so on.

19:20.640 --> 19:28.560
Yeah, so the components mainly included data owner and data consumer layers, which I'll show

19:28.560 --> 19:34.800
in the technical architecture, cross-cutting standards and policies for compliance and legislative

19:34.800 --> 19:40.640
requirements. We didn't touch on that but there was a large undertaking from the lawyers perspective

19:40.640 --> 19:46.640
to make sure that even something from a technical perspective we were suggesting was allowed and

19:46.640 --> 19:57.040
was going to be put into legislation. Yeah and let's dive into it. So on the left-hand side and

19:57.040 --> 20:02.000
the right-hand side we have the data provider internal environment. So in the case of renewing

20:02.000 --> 20:10.000
a passport this could be like home affairs. This is already existing environments where for example

20:10.000 --> 20:16.160
department of home affairs has servers ready to be used. However they would in our case be

20:16.160 --> 20:24.720
missing the API layer or the API layer to share data with external entities and you've got

20:24.720 --> 20:31.440
the data consumer environment on the other side. This could be like your state security agencies

20:31.440 --> 20:36.960
so those giving out grants who want to check against your population registry and that's an

20:36.960 --> 20:44.720
example of interdepartmental sharing of data and this is where X-Road kicks in. X-Road has

20:44.800 --> 20:51.200
essential server and this has the memberships of all those in the ecosystem so home affairs

20:51.200 --> 20:57.040
would be onboarded onto the central server all your other departments would be a member as well.

20:57.840 --> 21:04.160
And the entry points or the encapsulation for security in X-Road is very aptly called

21:04.160 --> 21:10.560
security server. It's the entry point and exit point for all data packages going in and out of the

21:10.560 --> 21:18.160
system. This is extremely important because this is your train station, your entry, your barrier.

21:18.160 --> 21:23.280
So whatever is coming in through your, you know exactly who's coming in, who's got access

21:23.280 --> 21:30.960
and who should be accessing what? So down to very granular and fine-grained policies for your

21:30.960 --> 21:38.640
APIs. So for example on the population registry if you wanted to allow only certain access by

21:38.720 --> 21:43.920
certain government agencies you can do that through the security server. It does mean that has to be

21:43.920 --> 21:50.640
high availability. It doesn't necessarily mean it needs to be very large servers. So in the case

21:50.640 --> 21:57.920
of Estonia we saw they had I think only about three or four servers on the population registry

21:57.920 --> 22:05.120
feeding almost entire country which is crazy and that's because it's not doing any creating of

22:05.120 --> 22:11.920
some sort is just the gateway access point to go through and get the data that's happening over here.

22:16.320 --> 22:24.000
Data cataloging part is a cool feature of X-Road. So every member in the ecosystem can find out

22:24.000 --> 22:31.200
what is happening in the security service in the network. So if a department's brought on board

22:31.280 --> 22:36.560
they can find out every API that's been made available by government. They can't access it.

22:36.560 --> 22:43.520
They still have to go through all your normal membership or MOUs for memorandums of understanding,

22:43.520 --> 22:53.920
go through all the legal, go through all the technical onboarding to become and get access to that

22:53.920 --> 23:00.640
particular API. But the cataloging part is really cool because it at least allows transparency

23:00.640 --> 23:06.640
to show what's out there from an organization. So in the example of you bringing on department

23:06.640 --> 23:12.320
of transportation if they only have one API for drivers license but they're missing cars,

23:12.320 --> 23:18.160
they're missing all your other important key details. You know that's something very transparent

23:18.160 --> 23:28.560
in the network. So the secure part comes in here. So as part of the entire ecosystem

23:29.200 --> 23:34.400
trust services or the public key infrastructure is extremely important. This could either be

23:34.400 --> 23:41.680
third party or built by government and what it means is you need a certification authority

23:41.680 --> 23:47.040
and you need a timestamping authority. A certification authority assigns TLS certificates to

23:47.040 --> 23:53.280
your security service and that's checked against the registry in the central server and that

23:53.280 --> 24:02.960
allows you to become a member. So they will assign a TLS certificate to your security server and

24:02.960 --> 24:08.000
that will be your authentication certificate. You get a second certificate for signing and that's

24:08.000 --> 24:14.320
for all outgoing and incoming requests of data. timestamping authority is just to make sure

24:14.320 --> 24:21.040
something or every exchange of data is packaged and stamped you know at a point in time. So it's

24:21.040 --> 24:28.640
not on your server time or you know as an example I think Estonia mentioned they were requested

24:28.640 --> 24:37.760
from government in a court case to check against a transaction in the population registry

24:37.760 --> 24:43.920
and they had the timestamp of the exact time it went out and which department created that

24:45.200 --> 24:50.160
and it was for some particular lawsuit. So it's very important that these two audit controls are in

24:50.160 --> 24:54.240
place and like I mentioned it doesn't necessarily have to be built by government it would be

24:54.240 --> 25:02.480
really cool. In our case one of the other departments is assisting with this in Europe's case they

25:02.480 --> 25:11.920
might already be some PKI available at a national government level. Okay so I think that's it.

25:12.000 --> 25:22.240
I just wanted I just wanted to end on one last note for first them thanks to all the EU colleagues

25:22.240 --> 25:30.080
and the amazing open source community we have out here I think we leveraging it it's a global

25:30.080 --> 25:34.160
community not just the EU approach and I'm glad to be part of it.

