WEBVTT

00:00.000 --> 00:02.000
You

00:30.000 --> 00:32.000
You

00:48.000 --> 00:50.000
Green

00:50.000 --> 00:52.000
Green is good

00:52.000 --> 00:54.000
So rusty

00:54.000 --> 00:59.000
Thank you

00:59.000 --> 01:02.000
That's a completely unnecessary but we'll take it

01:02.000 --> 01:09.000
So yeah, I was just saying we're sorry about the just face stuff. We're a little bit rusty. We haven't spoken since about 2018 when we did the package management

01:09.000 --> 01:13.000
Devroom which I think you've been smashed into doing next year

01:13.000 --> 01:15.000
Back in the day, everybody's coming back to foster

01:15.000 --> 01:18.000
I just dumped you in that you can punch me later

01:19.000 --> 01:20.000
So thank you all for coming

01:20.000 --> 01:24.000
I appreciate it. It's a really long event and this is the last session of a very long event

01:24.000 --> 01:27.000
And that I'm standing between you and most likely a beer

01:27.000 --> 01:30.000
But yeah, thank you for listening

01:30.000 --> 01:34.000
And thank you all so those of you who don't speak English as your first language

01:34.000 --> 01:36.000
I have a tendency to talk too quickly

01:36.000 --> 01:38.000
This is the last session

01:38.000 --> 01:42.000
It's likely that we'll overrun and that they'll be Q&A at the end for which we have plenty of time

01:42.000 --> 01:47.000
So if I'm talking too quickly, just put your hand up and I'll start talking a little less quickly

01:48.000 --> 01:52.000
But yeah, thank you for coming and seeing us

01:52.000 --> 01:54.000
And the other

01:54.000 --> 01:55.000
Oh god

01:55.000 --> 01:56.000
Really

01:56.000 --> 01:58.000
Okay, so

01:58.000 --> 02:02.000
I did write a bit in this but it's a bit silly

02:02.000 --> 02:05.000
It says I kind of feel like Elton John closing down glass library

02:05.000 --> 02:09.000
Which is basically just a way of introducing who we are and where we're from

02:09.000 --> 02:11.000
So Andrew

02:11.000 --> 02:15.000
Lifts about ten minutes away from glass library the world's biggest festival

02:15.000 --> 02:17.000
And I live ten minutes away from nowhere

02:17.000 --> 02:19.000
Because I'm in the middle of nowhere in structure

02:19.000 --> 02:23.000
We've been working on open source for about ten years

02:23.000 --> 02:26.000
All started with heart bleed

02:26.000 --> 02:33.000
And Meredith Whitaker and Ben Laurie who pulled me into some conversations about where the next heart bleed comes

02:33.000 --> 02:39.000
And I met Andrew, I think I was talking a thing and you were building a system called

02:39.000 --> 02:41.000
The time was it called libraries? I think it was, wasn't it?

02:41.000 --> 02:42.000
Yeah

02:42.000 --> 02:47.000
That was about being able to point people at good open source projects to kind of contribute to

02:47.000 --> 02:50.000
Years later, you know, we've worked on very many projects

02:50.000 --> 02:54.000
Most recently working on ecosystems, which was a little bit about

02:54.000 --> 02:56.000
But it's not a sales picture or anything

02:56.000 --> 02:59.000
We kind of have to in order to get the talk done

02:59.000 --> 03:03.000
But yeah, we've been working for ten years together

03:03.000 --> 03:05.000
And we got bugs

03:05.000 --> 03:10.000
So on the left is Leo and on the right is Mabel

03:10.000 --> 03:15.000
I've got one dog, you don't, you have, is that Luna or Felix?

03:15.000 --> 03:19.000
That's Felix, that's blossom, the wookie

03:19.000 --> 03:22.000
That's Luna, the latest to do

03:22.000 --> 03:25.000
So I just have the one dog Andrew's got four

03:25.000 --> 03:30.000
So our talk is labelled

03:30.000 --> 03:33.000
I'm not afraid of a bit of clickbait, I'm very sorry

03:34.000 --> 03:36.000
But I will defend it if necessary

03:36.000 --> 03:42.000
But the honest fact is that we're treating this genuinely as a kind of research question

03:42.000 --> 03:46.000
So after saying, you know, and working in this space for ten years

03:46.000 --> 03:51.000
Are we in a closer saying that we are supporting our digital infrastructure?

03:51.000 --> 03:55.000
And we're going to treat this as kind of two hypotheses

03:55.000 --> 03:58.000
The first part is going to look at our opinion is

03:58.000 --> 04:02.000
Which is that usage is the best determinant of software criticality

04:03.000 --> 04:05.000
Which we hope to convince you of, we might not

04:05.000 --> 04:08.000
But genuinely, you know, we're here to try to convince ourselves

04:08.000 --> 04:11.000
We've already kind of convinced ourselves, but that's that

04:11.000 --> 04:14.000
And then part two, if you assume that that is true

04:14.000 --> 04:17.000
We believe that most funding that we see

04:17.000 --> 04:19.000
And there's a very specific word in there

04:19.000 --> 04:22.000
Is being misdirected

04:22.000 --> 04:26.000
It is not being, it's not used to fund critical infrastructure

04:26.000 --> 04:28.000
It's going elsewhere

04:29.000 --> 04:32.000
So what methods do we have at the moment?

04:32.000 --> 04:34.000
This is looking at the first part

04:34.000 --> 04:39.000
So is usage the best determinant of critical kind of open source project

04:39.000 --> 04:44.000
What methods do we currently use for identifying critical infrastructure

04:44.000 --> 04:48.000
Both within an organisation and within the border landscape

04:48.000 --> 04:51.000
And we've come up with basically two methods that we see in the world

04:51.000 --> 04:53.000
Which we're characterising thusly

04:53.000 --> 04:57.000
The first is the defining rod method in which one person

04:57.000 --> 05:00.000
An organisation company or maybe an individual

05:00.000 --> 05:03.000
Basically decides, decides what's critical

05:03.000 --> 05:05.000
Decides what's, you know, important in the world

05:05.000 --> 05:09.000
Just kind of looks using the cross sticks and says that that's that

05:09.000 --> 05:12.000
All we do the same thing that we have a group of people in the organisation

05:12.000 --> 05:14.000
Which we call the wage board method

05:14.000 --> 05:16.000
In which you all place a hand on the wage board

05:16.000 --> 05:19.000
And one of us is pushing a little bit more than the other

05:20.000 --> 05:23.000
And we end up, yes, towards your own projects

05:23.000 --> 05:26.000
I'm not going to say the name that you want to say

05:28.000 --> 05:31.000
But yeah, I believe that we can do better

05:31.000 --> 05:34.000
And I believe the evidence based methods

05:34.000 --> 05:39.000
Because, you know, ultimately we've come from a mathematical background in robotics and computer security

05:39.000 --> 05:44.000
The evidence based methods are going to be the way in which we can do this

05:44.000 --> 05:46.000
I'm going to make a point about that

05:46.000 --> 05:51.000
But what evidence based methods can we access currently and what do we see in the world?

05:51.000 --> 05:55.000
Well, we see things like popcorn, the popularity context

05:55.000 --> 05:59.000
Contest for devion packages that basically just takes a log

05:59.000 --> 06:03.000
Of who is downloading and using packages in the devion ecosystem

06:03.000 --> 06:05.000
We've got something similar in home brew

06:05.000 --> 06:08.000
Where they show you all the downloads that they've got

06:08.000 --> 06:11.000
An Andrew once scared a room full of rubyists

06:11.000 --> 06:12.000
Is that what they call them?

06:12.000 --> 06:18.000
In 2018 by doing the same in the NMPM package or was it a ruby package?

06:18.000 --> 06:22.000
I put good analytics in my JavaScript packages

06:22.000 --> 06:25.000
So that I could get real time install information

06:25.000 --> 06:29.000
And then just presented the real time analytics page

06:29.000 --> 06:31.000
During your conference

06:31.000 --> 06:33.000
Which scared the life out of everyone in there

06:33.000 --> 06:37.000
Because they didn't realize how close that you could watch all of the activity

06:37.000 --> 06:40.000
Because person's school scripts of

06:40.000 --> 06:43.000
JavaScript packages that you do whatever you like

06:43.000 --> 06:45.000
And they were all in the one room

06:45.000 --> 06:46.000
They were all in the room

06:46.000 --> 06:48.000
Which was good fun

06:48.000 --> 06:51.000
So we've got a bit of a close up of the results there

06:51.000 --> 06:52.000
Thank you very much

06:52.000 --> 06:54.000
This is incredibly impractical

06:54.000 --> 06:57.000
There are other things that we can do together data

06:57.000 --> 07:00.000
So we have something that is effectively doing that

07:00.000 --> 07:02.000
But on a larger scale scarf

07:02.000 --> 07:05.000
Which is basically a gateway in front of your own kind of download your else

07:05.000 --> 07:07.000
So that you gain access to the information

07:07.000 --> 07:09.000
Taking ownership a little bit more

07:09.000 --> 07:12.000
Or we can do the Linux Foundation's approach

07:12.000 --> 07:15.000
Which is aggregating data from faster, sneak, etc

07:15.000 --> 07:18.000
That is representing proprietary usage

07:18.000 --> 07:20.000
In an aggregate manner

07:20.000 --> 07:22.000
And they can continue to do this

07:22.000 --> 07:24.000
The Linux Foundation in 2022

07:24.000 --> 07:27.000
Published their mobilization strategy that said they want to create

07:27.000 --> 07:30.000
A data lake that companies can basically dump data into

07:30.000 --> 07:33.000
And that was one of their kind of main

07:33.000 --> 07:38.000
Screens of their report that was published just after their second

07:38.000 --> 07:41.000
I think it was meeting in the White House

07:41.000 --> 07:44.000
So you know who will like those

07:44.000 --> 07:48.000
My argument is basically regardless of how much information you

07:48.000 --> 07:52.000
Think you can collect on proprietary usage of open source software

07:52.000 --> 07:56.000
You are never going to be able to say that that's representative of the whole

07:56.000 --> 07:59.000
You are never going to borrow that ocean

08:00.000 --> 08:02.000
I don't believe that is the right approach to take

08:02.000 --> 08:05.000
I think we can basically save ourselves some time

08:05.000 --> 08:09.000
By instead trying to gather all of the data that we can see

08:09.000 --> 08:11.000
The available data

08:11.000 --> 08:15.000
Be able to demonstrate a correlation to what is

08:15.000 --> 08:18.000
Representative of overall usage

08:18.000 --> 08:22.000
And then say that that correlation indicates

08:22.000 --> 08:28.000
A strong enough indication for us to say that what we can see in the real world

08:29.000 --> 08:33.000
Leads us to believe what you know is holds for all of you

08:33.000 --> 08:37.000
Sage of open source which at this point sure that Andrew can say

08:37.000 --> 08:40.000
A lot better than me with some data

08:40.000 --> 08:45.000
Yes, now we agreed before not to make me the quant of open source

08:45.000 --> 08:50.000
But what we've ended up with is now I'm going to do the quant section of

08:50.000 --> 08:52.000
Of the talk

08:52.000 --> 08:54.000
Yes

08:54.000 --> 08:56.000
With your plan all along, isn't it?

08:56.000 --> 09:05.000
Okay, so we to be able to use all the data that we have about open source

09:05.000 --> 09:12.000
To try and get a good picture of how people use it within both within open source

09:12.000 --> 09:16.000
Projects and within closed source projects to be able to try and say

09:16.000 --> 09:20.000
Here is a global representation of usage of open source

09:20.000 --> 09:25.000
Especially from kind of like a relative perspective so you can say these projects are used more than these projects

09:25.000 --> 09:29.000
Which will help us define which projects are critical

09:29.000 --> 09:37.000
What I have been working on is collecting a normalizing data from every different software ecosystem possible

09:37.000 --> 09:44.000
And then mining the dependency data of every open source project that is available

09:44.000 --> 09:49.000
So far I have a database table in Postgres with 20 billion rows in it

09:49.000 --> 09:54.000
And it's a little bit of a headache but the value that you can get out of that

09:54.000 --> 10:07.000
And that we will explain is that we can use data from this open source usage to imply data about proprietary usage as well

10:07.000 --> 10:13.000
By looking at correlations with other measures that include proprietary usage of open source software

10:14.000 --> 10:21.000
And hopefully that will give us a number of

10:21.000 --> 10:26.000
Formulas and kind of statistics that give us confidence enough that we can draw those parallels

10:26.000 --> 10:32.000
Without having to collect all open source usage within proprietary software across the whole world

10:32.000 --> 10:39.000
And instead just use the data that is available and then kind of work just from that

10:39.000 --> 10:44.000
We actually are able to reproduce this data and share it and other people can do the same thing

10:44.000 --> 10:48.000
And where we don't have the data for where we don't have the data for downloads and so on

10:48.000 --> 10:54.000
We can infer it using the information that we do have because we've shown strong correlation in other ecosystems

10:54.000 --> 11:07.000
So yes we believe hypothesis number one we can work out what this usage figure is to be able to give us a picture of what are the most critical pieces of software

11:08.000 --> 11:19.000
Let's have a look at some examples of kind of the approach that we've taken to suggest the open source usage is a good parallel for

11:19.000 --> 11:25.000
Close source usage or at least good enough that we can work with that to then move forward

11:25.000 --> 11:31.000
This is a graph of download of this one is

11:31.000 --> 11:41.000
Packages pH P packages some of the most used pH P packages as you see you can imagine this long tail of dots that ends up in zero zero is pretty massive

11:41.000 --> 11:43.000
But

11:43.000 --> 11:51.000
It's simplified just so that I could actually render this in a browser because five million dots is too many dots for chart tears

11:51.000 --> 11:53.000
Yeah, well

11:53.000 --> 12:05.000
So the other access that we have here is the number of open source repositories that depend on each one of these pH P packages

12:06.000 --> 12:15.000
Looking at the correlation between downloads which includes everyone who is using pH P packages in their proprietary software and

12:15.000 --> 12:22.000
Their open source software against just how many times they're declared as a dependency in an open source project

12:22.000 --> 12:28.000
You can see this actually has quite a nice trend and we can look and see other ecosystems

12:28.000 --> 12:33.000
This is the rust ecosystem and actually has an even stronger correlation

12:33.000 --> 12:44.000
It's still a little bit hard to see here, but you can definitely see the trend locally. We have maps that can actually quantify that in a very useful way

12:44.000 --> 12:50.000
Pearson correlation coefficient of comparing the number of

12:51.000 --> 12:54.000
It's been a long for stem the number of

12:54.000 --> 13:03.000
Repose open source repose that depend on these packages versus the number of downloads that those packages have received over all time

13:03.000 --> 13:13.000
If the correlation is above 0.8 it's very very strong. We can be confident that one will infer the other if one is large the other will be large

13:14.000 --> 13:25.000
Going down the scare, right? Let's have a look at some examples of what we measured in open source usage for a number of very large software ecosystem

13:25.000 --> 13:39.000
So rust and packages are incredibly correlated. Basically if you know how many people in open source depend upon a pH P package or a rust package you can guess this is going to be a similar level

13:40.000 --> 13:49.000
Relatively like it's not going to give you an absolute number, but relatively it's going to be able to go I can infer this. This is really helpful because as you notice there isn't a go

13:49.000 --> 13:56.000
Row here you can't get go download figures they're just not available, but we can measure dependent

13:56.000 --> 14:07.000
Repo usage in open source of go packages if the correlation is strong we can imply that this we can kind of infer what the downloads that's would be if we had

14:07.000 --> 14:17.000
And people much smarter than us with a statistics background could do a lot more I'm not really a con I'm a Ruby developer

14:17.000 --> 14:26.000
We have all of this data that is available on public for people to build more interesting statistical models than I am able to do for a conference talk

14:26.000 --> 14:36.000
To help us do really interesting insights into different ecosystems and across all of open source potentially

14:36.000 --> 14:44.000
To be able to like really build out strong models that imply what we believe the correlation looks very strong

14:44.000 --> 14:50.000
To give us a picture of what proprietary usage of open source is even if we don't have that data

14:51.000 --> 14:59.000
And we actually had a funding proposal that we put out it turned down unfortunately, I was too sure you want to throw that shade

14:59.000 --> 15:05.000
But if anyone isn't interested in funding that research then do come and say hello

15:06.000 --> 15:10.000
You can see

15:10.000 --> 15:16.000
So having established that we believe

15:16.000 --> 15:19.000
You can tell we don't do this often but we do work together a lot

15:20.000 --> 15:31.000
Right, okay, so having proven we believe that you know, there's a very strong correlation between observable usage once saying and kind of all usage of open source software

15:31.000 --> 15:39.000
The next thing is that if you agree with that can we demonstrate that funding for critical infrastructure is being misdirected

15:40.000 --> 15:56.000
At which point I have to obviously say no, don't I because the data that we do have being predominantly from GitHub sponsors and open collective is not specifically directed for projects on the basis of their criticality

15:56.000 --> 16:07.000
So straight off the bat I have to kind of say no, but we can run with it and we can see where we get to and we can see the funding that we do have available that we can see

16:07.000 --> 16:15.000
We can take a look at that see where it's directed see how well it matches up and then we can talk a little bit about you know lack of data effectively

16:15.000 --> 16:22.000
There are plenty of other sources I'm going to talk about them a little bit in a second but I think now basically I have to sub you back in sorry

16:22.000 --> 16:33.000
Okay, so let's have a look at some more graphs and I kind of want to give an example of two different data sets and then see what happens when we smash them together

16:33.000 --> 16:37.000
And it's a little bit of a car crash but it should be fun

16:37.000 --> 16:47.000
So this is a graph of all of the accounts that are on GitHub sponsors by the number of people and other accounts that have sponsored them

16:47.000 --> 16:50.000
We don't have data for how much they gave

16:50.000 --> 16:53.000
GitHub we'd love that data that would be great

16:54.000 --> 16:58.000
But for now we can use this as a proxy

16:58.000 --> 17:03.000
It's a very nice graph you might have seen other graphs like this the

17:03.000 --> 17:10.000
The kind of long tail of anything ends up looking very much like this where there's one account at the end

17:10.000 --> 17:18.000
There's like more sponsors than absolutely everyone else trading off to these accounts all have like one or two sponsors ever

17:18.000 --> 17:25.000
Again, we don't know exactly how much money ZIG tools could have received like absolute huge amounts of sponsorship

17:25.000 --> 17:27.000
But we have to go with what we've got

17:27.000 --> 17:31.000
We know that what was it 68 million last year was given in sponsors

17:31.000 --> 17:33.000
Yeah, so

17:33.000 --> 17:34.000
I think it was

17:34.000 --> 17:39.000
Yeah, I have these Abby's not going to answer that question right now legal told her to stay quiet

17:40.000 --> 17:47.000
We also have very nice data from open collective there's maybe

17:47.000 --> 17:55.000
2,500 collectives in open source collective and again very similar graph

17:55.000 --> 18:02.000
I like this because you know we got two different data sources and the graph comes out looking kind of similar that's encouraging the area under this

18:02.000 --> 18:07.000
This line is approximately 50 million dollars

18:07.000 --> 18:12.000
Which is cool, but also like you can't really see the line

18:12.000 --> 18:19.000
I've actually cut this graph off because it goes over into like the next building. It's very very long tail

18:19.000 --> 18:27.000
What's interesting here is then like we have these funding stats for how much projects have received in

18:27.000 --> 18:33.000
total amount of dollars and let's use open collective because we have dollars not just number of sponsors

18:33.000 --> 18:45.000
Let's look at all of the rust packages that are all open collective and how much they have raised versus how much uses they have

18:45.000 --> 18:52.000
I wouldn't pay too much attention to the access here because the numbers in dollars are not 40 million

18:52.000 --> 19:01.000
Dollars that some of these rust packages have raised but I had to squish them to get them to show up otherwise all the funding looks like nothing compared to the number of downloads

19:01.000 --> 19:05.000
So it's relative right the

19:05.000 --> 19:14.000
We have some interesting things the some packages in rust that are highly used and you know are on open collective are actually receiving a good amount of money

19:15.000 --> 19:22.000
But there's also this kind of like the most downloaded doesn't necessarily have the most dollars collect there

19:22.000 --> 19:32.000
If we step through some other package managers and other ecosystems you start to see everything goes a little bit wobbly and we start to lose our trends a little bit

19:33.000 --> 19:44.000
The Python one has a numpy on here, but course numpy is actually under num focus foundation and doesn't have very good data here at all

19:44.000 --> 19:49.000
But num focus is has as a foundation has a lot of money to be able to support that project

19:49.000 --> 19:57.000
But we haven't mixed that in because it's really hard to discern exactly how much funding from foundations goes into individual projects

19:58.000 --> 20:05.000
JavaScript is actually really popular on open collective. It's like one of the first mentees to adopt that kind of open funding

20:05.000 --> 20:07.000
It's absolutely all over the place

20:07.000 --> 20:13.000
You notice a real correlation with some of these bars as well right where it's

20:13.000 --> 20:17.000
These are all Babel's projects and they're broken up into small modules

20:17.000 --> 20:22.000
How are you supposed to say like okay, well the Babel receives lots of money through open collective

20:22.000 --> 20:30.000
Which one goes to which package how can you then kind of a line that is actually quite difficult to do and how to how to

20:30.000 --> 20:35.000
Square that is really painful and then if you go to the Ruby ecosystem

20:35.000 --> 20:38.000
You're like oh no, what's happened here?

20:38.000 --> 20:46.000
We have almost no correlation between this funding data and the amount of usage of these projects. This is this is disastrous

20:46.000 --> 20:51.000
I have no confidence in this graph whatsoever, but it's actually based on real world data, right?

20:51.000 --> 20:55.000
We just don't have good data to be able to have ten minutes left

20:57.000 --> 21:01.000
So the answer is inconclusive

21:01.000 --> 21:04.000
We don't have enough data to be able to have any confidence in these graphs at all

21:04.000 --> 21:06.000
We're like well, that doesn't make any sense, right?

21:06.000 --> 21:10.000
But how do you know we don't know is kind of the problem

21:11.000 --> 21:16.000
Thank you, it is tiny, unlike me

21:17.000 --> 21:25.000
So as we said at the start of this section look you know the data that we're looking at is not targeted at critical projects

21:25.000 --> 21:32.000
We accept that but even if you were to say that it was the results are basically, you know, budget and not great

21:33.000 --> 21:37.000
So you know, going back to our original research question

21:37.000 --> 21:42.000
After a decade of saying that we need to support our digital infrastructure, we only close to saying that we do

21:44.000 --> 21:51.000
That's basically as my presentation, but unfortunately we've got a few more minutes and you have to listen to me

21:51.000 --> 21:54.000
So what do we need? How are we going to address this?

21:54.000 --> 21:56.000
Martin did such a good job on this.

21:56.000 --> 21:58.000
I know, such good

21:59.000 --> 22:04.000
So there are a few things we need and you know, I'm not saying I come up with all the best ideas

22:04.000 --> 22:09.000
But what I am going to do is just amplify good ideas that other people have had at this point

22:09.000 --> 22:13.000
Over a year ago just about I think it was a year and two weeks ago

22:13.000 --> 22:18.000
Frank Nagle and a few others wrote this report on the value of open source software

22:18.000 --> 22:23.000
Got some interesting kind of figures thrown around about the demand side

22:23.000 --> 22:27.000
And supply side value of open source. I suggest that you read it. It's free

22:27.000 --> 22:31.000
But I would like to share them out because Frank said exactly what I'm going to say

22:31.000 --> 22:36.000
Which is we need better data for financial support both direct and indirect

22:36.000 --> 22:39.000
So that's direct financial support and indirect financial support

22:39.000 --> 22:45.000
Which might be we're employing someone on 50% of their time at whatever rate that they're employing on

22:45.000 --> 22:51.000
To work on that particular project so that we can map what resources are available to whichever project

22:52.000 --> 22:58.000
Now there are some people who are trying to do this and I think Emmy did not manage to make their talk this morning

22:58.000 --> 23:03.000
Because of train but shout out to IOS and invest in open infrastructure

23:03.000 --> 23:10.000
They have been trying to map 400 million dollars worth of funding both state and philanthropic institutional level

23:10.000 --> 23:16.000
To a bunch of open infrastructure projects which a lot of are open source projects

23:16.000 --> 23:19.000
I'm not saying the same thing please don't shoot me Caitlin

23:19.000 --> 23:24.000
But yes, I would just like to give them a shout out and shout out to Emmy who

23:24.000 --> 23:27.000
Yeah, I wish I could have heard your talk

23:27.000 --> 23:29.000
And before you're alert on that

23:29.000 --> 23:32.000
It's really hard and it's very very manual to do right now

23:32.000 --> 23:40.000
Yeah, yeah, they've manually done that and there are very many issues including how to map actual money that is going to the project

23:40.000 --> 23:46.000
This is being used in administration the argument about whether that would actually count a support for a project anyway

23:46.000 --> 23:55.000
It is super super complex and basically what we need is to agree on a way to describe these resources for this project

23:55.000 --> 23:58.000
That can be led from the project side or from the funder side

23:58.000 --> 24:01.000
There are initiatives that have been spoken about

24:01.000 --> 24:06.000
I think IOI may be released a report earlier this week or maybe it will be coming next week

24:06.000 --> 24:12.000
That says hey the 360 data giving data standard was proposed a while back

24:12.000 --> 24:20.000
This is the method that we think we should use for describing how funding is being used and moving to these projects

24:20.000 --> 24:27.000
I actually worked on a open contracting data standard in 2012 when I worked in civic tech with the World Bank

24:27.000 --> 24:33.000
That was deployed in 2014 to describe I think it was most of New York state spending

24:33.000 --> 24:40.000
It's a similar kind of thing and I'm not proposing another standard. I know the SKCD comic no worries

24:40.000 --> 24:46.000
I'm just saying that there are ways in which we can do this and basically we need better information

24:46.000 --> 24:49.000
If we had that information what could we do with it?

24:49.000 --> 24:51.000
Do I take this bit or drink this bit?

24:51.000 --> 24:54.000
Yeah, this is no maths in this bit

24:54.000 --> 24:59.000
So let's imagine a world where my graphs weren't terrible

24:59.000 --> 25:03.000
But also we had enough data that you could be like okay, they're really weird

25:03.000 --> 25:08.000
But at least we can rely on them because like it was representative of actually where the money was flowing

25:08.000 --> 25:11.000
And where others have kinds of support were flowing as well

25:11.000 --> 25:15.000
Some measure of that all kind of fluff together as well

25:15.000 --> 25:24.000
That would potentially be able to give you a picture of where critical projects are under-resourced or over-resourced

25:24.000 --> 25:30.000
And then when you come to make decisions about where should funding go to a project or where should resources go

25:30.000 --> 25:37.000
You'd be able to make a data driven and let's call decision rather than using the OuG Board or using the defining rods

25:37.000 --> 25:43.000
To kind of just wave yourself around and land on whichever project happens to have a really easy way to donate to them

25:43.000 --> 25:49.000
You'd actually be able to go this is a critical project to my ecosystem that doesn't have enough support

25:49.000 --> 26:00.000
And I can confidently give them support until they get to the point where they are no longer kind of under-produced that term of it's a make-up end-up end-up

26:00.000 --> 26:05.000
I had this concept of under-production which basically is like these projects are really heavily used

26:05.000 --> 26:09.000
But they're absolutely crushed under the weight of the pressure of open source

26:09.000 --> 26:16.000
We can recognize that and we can give them more support so that they can stand up

26:16.000 --> 26:24.000
The project to make it much more stronger and sustainable which then has an occupant effect of improving everything in its ecosystem

26:24.000 --> 26:33.000
And kind of being able to scale it up to support for the whole world of open source because the whole world runs on open source now

26:35.000 --> 26:37.000
Thank you, that was very smooth

26:37.000 --> 26:42.000
Yeah, so in addition to that I just want to say you know, we start with the research question

26:42.000 --> 26:45.000
Like can we say that we're supporting and sustaining all of open source?

26:45.000 --> 26:48.000
I'm not saying that finances the be all and end all of open source

26:48.000 --> 26:56.000
I'm saying that it's a constituent component we can probably represent some of the other implicit implied and you know kind of hidden

26:56.000 --> 26:58.000
Kind of support for projects

26:58.000 --> 27:06.000
But ultimately what I want to do is be able to prove for ourselves that the thing that we've been working on in our case for 10 years is actually working

27:07.000 --> 27:09.000
That would be lovely if we could do that. Thank you very much

27:10.000 --> 27:17.000
It's not having to go anyone it's not you know, throwing any shade anything but having spent the time that we've spent in this space

27:17.000 --> 27:21.000
I would really really like to be able to know that I had some sort of impact. Thank you

27:22.000 --> 27:32.000
There's plenty more to come we're going to continue working on this. I've managed somehow to intertwine a couple of organisations that I work at and director of

27:33.000 --> 27:35.000
So that we can do that

27:35.000 --> 27:40.000
There's very many opportunities for us to work together with many of you out there

27:40.000 --> 27:47.000
All of the data that we've just spoken about is freely available as is all of the data on ecosystems

27:47.000 --> 27:56.000
This is not a sales pitch. It is just an open call to say that we are more than happy to work with you on any of these subjects because we genuinely care about the space

27:56.000 --> 28:03.000
So I think that is about it and now we've got a bit of time for questions if anyone has

28:08.000 --> 28:10.000
That's remember to request you

28:14.000 --> 28:24.000
Yes, I have a question and I hope this fits on the bracket of more to come as opposed to reach forward

28:25.000 --> 28:34.000
If we accept the premise that criticality is this kind of a measure or sorry usage is a better measure of criticality

28:34.000 --> 28:42.000
Can you extend that with sort of metrics or indicators that are on the project level

28:42.000 --> 28:47.000
And by that I mean sort of indicators about sort of community or project health

28:48.000 --> 29:00.000
So we have opinions I will repeat the question so if we give if we take it as a given maybe the usage is a determinant of criticality

29:00.000 --> 29:14.000
How might we include other scores and be able to maybe complement that data so I will say from the first thing to get out of the way is that some measures appear to actively punish projects

29:14.000 --> 29:21.000
Looking at them for not having maybe a correctly formatted license file or yes, I know that's another issue but let's get us ignore that for a second

29:21.000 --> 29:27.000
Or you know not not having great documentation so on to which point I think that is probably wrong

29:27.000 --> 29:35.000
If you accept that your project is one of the most used in a particular ecosystem and likely used in industry as a result of that which we've shown

29:35.000 --> 29:40.000
Why would you want to demote some sort of criticality school because they don't have certain documentation

29:40.000 --> 29:49.000
It's still going to have an impact if it blows up but yes, I think you know we're aware that the work of chaos and the work of open SSF

29:49.000 --> 30:02.000
And so on in drilling down into how you might direct your efforts based on measures that you might see in how well the project is managing the demands that are placed upon it

30:02.000 --> 30:05.000
Right so things like you know how well is

30:10.000 --> 30:12.000
You

