WEBVTT

00:00.000 --> 00:10.760
Yes, thanks a lot. It's great being here. So we're the FSF, thanks a lot for the

00:10.760 --> 00:20.280
interactions for having us. And yeah, so I'm the licensing and compliance manager at the FSF

00:20.280 --> 00:29.960
as always the executive director. And we at the FSF are the experts on software freedom.

00:30.000 --> 00:34.960
And we started. We are taking our time to look into the machine learning phenomenon on how

00:34.960 --> 00:42.480
it does it relate to software freedom. And what we wanted to do today is to present our

00:42.480 --> 00:51.920
work in progress on this on this question. And I hope you know we are not machine learning

00:51.920 --> 00:58.760
experts. There are probably more knowledgeable people here in the in the room. So we are

00:58.760 --> 01:04.560
rather coming here with some questions, some ideas that we have. And we are looking forward

01:04.560 --> 01:12.360
to hearing from you, what are your opinions, what is your feedback. And if you have any

01:12.360 --> 01:26.720
please tell it to us, I don't know, or you can also write to us.

01:26.720 --> 01:35.960
Is it okay? Yeah, I can do it. It has a delay, I think. Can people hear me? Okay. So just

01:35.960 --> 01:39.920
insured a little bit about the free software foundation for those of you that don't know

01:39.920 --> 01:47.120
us. We are a foundation that was founded in 1985. So this year is our 40th year of promoting

01:47.120 --> 01:55.120
computer user freedom. We were founded in 1985, as I just said, by Richard Salman, who initiated

01:55.120 --> 02:01.120
the group project. And to support the group project when it got as successful as it did,

02:01.120 --> 02:10.920
they founded the FCF to take care of the organization around that project and more. So

02:10.920 --> 02:19.560
we built through the years a community of strong freedom supporters, user freedom supporters.

02:19.560 --> 02:26.120
And we do a lot of work still for the group project. And we also support a bunch of other

02:27.120 --> 02:34.120
community projects. We also maintain something that is called the free software definition

02:34.120 --> 02:41.120
or what a lot of people, especially in this room, call the four freedoms. And we wrote

02:41.120 --> 02:49.120
and are the steward of the general public license, which is of course the main way for us to

02:49.120 --> 02:52.120
enforce this idea of the four freedoms.

02:57.120 --> 03:03.120
So there is probably no need here to explain why free software is important, but still

03:03.120 --> 03:11.120
I think to just give the context also for our work in this machine learning area. So

03:11.120 --> 03:18.120
well, to put it in a nutshell, it's all about having control over your own computing.

03:18.120 --> 03:26.120
That's what free software lets you, that does. And it actually happened to be quite successful.

03:26.120 --> 03:33.120
I mean, it's at the heart of software development for quite a long time. You can find free

03:33.120 --> 03:40.120
software almost anywhere running all different devices. And why? Because people choose it because they

03:40.120 --> 03:47.120
want freedom. They want to control the devices, the computing they use. And with

03:47.120 --> 03:54.120
especially, we are especially happy to see our strong copy-left licenses, the GNOW

03:54.120 --> 04:01.120
GPL and the AGPL among the most popular license used. There are some statistics with

04:01.120 --> 04:10.120
copied from the report linked below, which I think proves that people are also interested

04:10.120 --> 04:15.120
in having the freedoms protected because that's what the GPLs are intended for to

04:16.120 --> 04:24.120
guarantee that the freedoms last. So free software is really crucial for the society.

04:24.120 --> 04:30.120
And this means that being clear what is software freedom, what is free software is also

04:30.120 --> 04:39.120
important. So what we already said, we developed and maintained the free software definition

04:39.120 --> 04:46.120
for freedoms, which are about, so it's about really being clear what you can do, what

04:46.120 --> 04:54.120
comes with free software. And obviously, again, that's, I think, known fact that it's

04:54.120 --> 05:00.120
about being able to run the software for any purpose, to be able to study it, to be able

05:01.120 --> 05:16.120
to adapt it to your own needs to copy it and redistributed it to share. And while we spend a lot

05:16.120 --> 05:21.120
of our time in our work to be clear about this and to make sure other people are clear

05:21.120 --> 05:26.120
about this, this is also something that comes into focus when we start thinking about

05:26.120 --> 05:33.120
machine learning, how these for freedoms, what these for freedoms mean in this area.

05:33.120 --> 05:40.120
And this is, we perceive this machine learning work as a kind of a natural next step in

05:40.120 --> 05:53.120
what we've been doing for the last 40 years, because this work on being clear on making

05:54.120 --> 06:00.120
sure everyone knows what we are talking about is a kind of work in progress, because

06:00.120 --> 06:07.120
the definition was published in 86 and probably not many of you know that it wasn't

06:07.120 --> 06:19.120
published as it is now already in the time around 1990, only then the freedoms became

06:19.120 --> 06:26.120
bullet points in the documents before that the document was generally talking about what's

06:26.120 --> 06:32.120
important for what's necessary for having software freedom. And also apart from the

06:32.120 --> 06:38.120
having the definition, we've developed the GPLs, the licenses, the actual legal tools that

06:38.120 --> 06:43.120
grant people their freedoms and are intended to protect them. And obviously, obviously,

06:43.120 --> 06:50.120
you also know that the GPLs evolved, we've designed new versions of the GPLs to address

06:50.120 --> 07:05.120
a new and evolving threads to freedom. And yes, so that's something that makes us going

07:05.120 --> 07:10.120
because every now and then there is some issue that comes up and we have to react and

07:10.120 --> 07:18.120
this, well machine learning seems like one of these things.

07:18.120 --> 07:26.120
So at the FSEF, we have of course different areas that we focus on, so one of the things

07:26.120 --> 07:31.120
that we do is our license enforcement, I do that was crystal fun and another colleague

07:31.120 --> 07:37.120
that is right here. Craig Topman, we work really hard to make sure that those licenses

07:37.120 --> 07:44.120
are being enforced and that the meaning and intention of what is free software and what

07:44.120 --> 07:51.120
are the four freedoms continue to stay in place. So the maintenance of that definition is

07:51.120 --> 07:57.120
really important and we need to make sure that it is simple and understood by people what

07:57.120 --> 08:04.120
is meant by it. And when you have something like the GPLs to be able to put a big line

08:04.120 --> 08:13.120
under that and allow people to show the world that they want their software to be fully free

08:13.120 --> 08:22.120
by using our licenses and that's a tool that is of immense value. What we also do at the FSEF

08:22.120 --> 08:29.120
as we promote, of course we run campaigns and we focus on education, making sure that people

08:29.120 --> 08:36.120
know about the concepts and we especially and specifically talk a lot about freedom, where

08:36.120 --> 08:42.120
a lot of people in this room have switched to the open source term, of course the FSEF is the

08:42.120 --> 08:47.120
staunch holder to the term free software. So we're always the people in the room that go, but what

08:47.120 --> 08:53.120
about freedom? And we'll continue to do that, because we genuinely believe that holding

08:53.120 --> 08:58.120
on to that far end of the spectrum and we'll make sure that we don't slowly start moving

08:58.120 --> 09:07.120
our way to a different place. So you don't necessarily need to be in our far end, but we do

09:07.120 --> 09:12.120
always hope that you can find appreciation for the fact that we will continue to be raising

09:12.120 --> 09:18.120
our hands and ask for freedom. And then something about the FSEF that not everyone knows

09:18.120 --> 09:25.120
is that we run our organization fully on free software. So our tech team works really hard

09:25.120 --> 09:32.120
to also abide by the words that we use in our organization and make sure that our entire

09:32.120 --> 09:38.120
software stack all of the servers that we run and all of the tools that we use are all made

09:38.120 --> 09:44.120
and maintained using fully free software. So if you ever have any questions about that or advice

09:44.120 --> 09:49.120
that you need to do the things that we're entering and please contact us because we can always use

09:49.120 --> 09:56.120
expertise there. But we do that also again to make sure that we know what we are talking about.

09:56.120 --> 10:02.120
And then the last item that I think is worth mentioning is that the FSEF is supported

10:02.120 --> 10:10.120
mostly. So in the last few years it's over 95% by individuals. So we don't have any

10:11.120 --> 10:17.120
large corporations. We don't have any big donors that are asking us to perform a certain way or to deliver

10:17.120 --> 10:26.120
certain tasks or to answer to any specific questions that they actually have that would push

10:26.120 --> 10:31.120
us in a certain direction. But it's actually individuals like every single one of you in this room

10:31.120 --> 10:39.120
that support us and give their hard money to the FSEF to allow us to continue what we're doing.

10:39.120 --> 10:44.120
So if you are one of those people then thank you.

10:44.120 --> 10:52.120
And so we want to use all of that expertise and all of the knowledge that we have gained over all of these

10:52.120 --> 11:00.120
years to focus on machine learning and that is mainly because machine learning is a way in which

11:00.120 --> 11:06.120
people are starting to do their computing more and more. We're noticing that it is a question that

11:06.120 --> 11:15.120
is just being posed to the FSEF constantly. And while we have been sitting back a little bit

11:15.120 --> 11:24.120
and watching the developments develop and discuss all of these potential implications internally,

11:24.120 --> 11:32.120
we now decided that it's time for us to step forward and start at least sharing with everyone

11:32.120 --> 11:37.120
what it is that we have concluded so far and how far we've come with that debate.

11:37.120 --> 11:45.120
And we hope to get you guys to fill in the gaps that we don't have that we haven't answered yet

11:45.120 --> 11:50.120
and to bring us questions as well that we do need to answer.

11:50.120 --> 11:56.120
Because machine learning is growing. I mean as it says in this slide, these are just some of the statistics

11:56.120 --> 12:02.120
that we know are floating out there but they are impressive numbers, you know, the size of the growth

12:02.120 --> 12:13.120
60% since 2020 and then 535% growth to hit over half a trillion dollar value in 2030.

12:13.120 --> 12:19.120
I mean these are just sort of vague numbers but the size of these numbers show that machine learning at this point

12:19.120 --> 12:29.120
is just becoming ubiquitous. And like I said before, more and more people are intentionally or even unintentionally

12:29.120 --> 12:37.120
using machine learning on a day-to-day basis. And so we are noticing that machine learning is both an agent

12:37.120 --> 12:46.120
between users and software. It is used to develop software that of course we've seen in the last couple of years

12:46.120 --> 12:57.120
and I'm sure a lot of you are dealing with in your day-to-day work and then machine learning is used more and more to actually do several software tasks.

12:57.120 --> 13:05.120
We're also noticing that there are a lot of free washing attempts or I'm going to use the word that we usually don't

13:05.120 --> 13:13.120
work open washing which of course is something because the organizations and the direction that I'm talking about

13:13.120 --> 13:21.120
of course are people using open source licenses. So for this particular case I'll use those words

13:21.120 --> 13:29.120
but they are of course also our free software licenses that these licenses are being based on and what these organizations

13:29.120 --> 13:38.120
and what these licenses do even though they call themselves open or free they are not. And so we want to make sure that it is clear for people

13:38.120 --> 13:48.120
that just like with free software we need a far end of the spectrum that decides what freedom is for machine learning.

13:49.120 --> 13:58.120
And so we are working really hard to make sure that people understand what that point is in the spectrum of machine learning

13:58.120 --> 14:04.120
which is growing beyond our concepts of what the spectrum is in software even.

14:04.120 --> 14:26.120
So when we started to work on freedom and machine learning this intersection so okay so we might have noticed that we so far haven't used the more widely used term artificial intelligence

14:27.120 --> 14:40.120
and that's a purpose for that because generally the FSF is really mindful of language and it's on purpose because we really believe that you should be clear on what you're talking about

14:40.120 --> 14:47.120
and you should really understand what the the words have meaning and the words can kind of frame the discussion.

14:47.120 --> 14:56.120
And the reason why we are decided to stick to the term machine learning is because we believe it's much more accurate.

14:56.120 --> 15:10.120
So first of all we really do not we are not convinced that any of these machine learning applications or systems have any intelligence whatsoever at least the way how this is

15:10.120 --> 15:21.120
generally understood so they they do not know they do not have understanding so let's say I want to find some recipe for preparing artichokes.

15:21.120 --> 15:36.120
And I and I ask a neural network for it that neural network has no idea what an artichoke is it just gives me if it's an alarm if gives me the most plausible the most statistically probable.

15:37.120 --> 15:57.120
Response to my to my prompt if there is and of course there are not only alarm so there there are other types of machine learning that could be taught or or shown let's say to distinguish artichokes from some other types of things but still.

15:57.120 --> 16:15.120
Compared to humans that have that could have some sense of what it is they just they just operate on numbers they just they are just big calculation machines at least that's that's the way we think about them so.

16:16.120 --> 16:32.120
We we think having this understanding what what we actually talk about and how this actually works is really important is a perk with it to talking about how these tools affect our our freedoms so.

16:32.120 --> 16:47.120
And we also really insist on not focusing on elements in this discussion because we there are a lot of other different differently operating machine learning applications.

16:48.120 --> 17:05.120
And they could affect our freedoms differently or otherwise to put it differently if we focus on elements only if we try to let's say adjust our standards maybe even lower our standards because there are some ways elements are.

17:05.120 --> 17:21.120
Made these days I think we can lose something important I mean we could we could trade off our freedoms for some some convenience and so that that being clear is I think important then we of course.

17:22.120 --> 17:37.120
Have to notice that machine learning is a lot of has a lot of differences compared to software and it's prepared differently people work on it differently so there are all these questions how people.

17:37.120 --> 17:48.120
What is what does exactly the freedom to run the freedom to study for example mean in relation to machine learning systems instead as compared to software.

17:49.120 --> 17:59.120
And also they are built differently so machine learning applications include both software and non software elements which has also practical implications for.

17:59.120 --> 18:13.120
If you want to make sure that people are free what type of license is should be used how you how you ensure what you have to deliver to users to ensure that they can exercise their freedoms.

18:14.120 --> 18:29.120
And these elements I mean okay you could roughly divide them into software non software software and data but when you look at it closer depending on type of machine learning application you want to examine for their freedom.

18:29.120 --> 18:50.120
It may turn out that you could would want to have different types of non software elements are differently provided to you that so that you are sure that you can really learn what's inside well to the extent possible how does it exactly how it exactly works and.

18:50.120 --> 18:58.120
And so on so these are all these elements that we are trying to really understand better and have clarity.

19:00.120 --> 19:04.120
Okay now I forgot if it's you or me.

19:04.120 --> 19:05.120
Okay.

19:08.120 --> 19:14.120
So yeah so what do we have so far I mean the Chris of said we've got all the.

19:15.120 --> 19:22.120
Software and the non software elements they play role but the training data seems to be the most basic topic I think we all know.

19:22.120 --> 19:32.120
From the debates that are going around from people trying to discuss machine learning in a similar way as we are because we know that we are not the only ones that are working on this.

19:33.120 --> 19:46.120
That training data seems to come up as the most debate at topic every single time and so that is the part that we are also focusing on and we want to highlight as well.

19:47.120 --> 20:13.120
Machine learning are not just LLM's LLM's make for the fact that training data often gets this this very difficult or very hard implication or the stamp of almost impossible like why would you go to fully free system with and include the free training data there is often when people say this they have large language models in mind of course because of the vast amount of training data that they.

20:13.120 --> 20:25.120
Include or the fact that maybe that training data isn't even obtainable anymore there are copyright issues that could be at the or privacy issues of course.

20:26.120 --> 20:40.120
But we want people to remember that we are not just looking at LLM's and actually when we go into the future of machine learning there's a good chance that the amount of small smaller machine learning systems and and who knows what else for developing.

20:40.120 --> 20:48.120
Well, um, outrun the difficulties that we currently see with the training data because of the hype around the LLM's that we know.

20:49.120 --> 21:00.120
We should be aspirational we are looking to define something that we can work with for the next 40 years we're not not afraid of change we're not afraid of developments in technological.

21:00.120 --> 21:09.120
Events that make us have to look at ourselves in our own work to see if we need to make any additions or adapt to those situations.

21:10.120 --> 21:18.120
But we do want to set a standard here that we can live up to and we want people to look at that standard and want to create something that fits that model.

21:18.120 --> 21:39.120
We are creating a criteria here that is ambitious and aspirational and comes and helps us maintain the freedoms that we have when we have full software freedom and we want the same for people in the future with LLMs where those freedoms might even be certain more.

21:40.120 --> 21:57.120
That of course also means that we should not lower our standards it's okay if we need to break some models down so get to where we want to be it's okay if something doesn't fit the free standard as we want to set it at this moment and you can work towards it.

21:58.120 --> 22:20.120
We want to take the four freedoms of course as our base for what we are doing the freedom to modify the freedom to share and of course the freedom to run and when we are saying that we are focusing of course on the freedom to study and freedom to modify as the two that make machine learning a much more difficult.

22:21.120 --> 22:44.120
Topic to be talking about because of all the different elements that are at play when we are thinking about that but the four freedoms have worked for us they have worked for us for the last 40 years and so we believe that they will work for us for many many more years to come at least 40 and again we'll be focusing on studying a modification.

22:44.120 --> 23:11.120
Yeah so generally whatever comes out of our work will be well we don't know yet exactly but what we are sure of is that we really insist on we will really insist that users that do their computing with machine learning.

23:11.120 --> 23:21.120
I think should be given all that they really need to to control this this computing so all elements that are necessary to do it.

23:21.120 --> 23:44.120
Training data is so we said I mean it's the things that seems the most controversial but we are pretty sure that without having training data you are really not possible to fully exercise your freedoms when when doing computing with machine learning and.

23:44.120 --> 24:09.120
I think we should not focus on I would say exceptions were getting training data are working with training data is hard because there are a lot of examples of machine learning that when you can have training data they are manageable and maybe we could focus on developing technologies that help us.

24:09.120 --> 24:16.120
I think we will be able to understand this this this that having the situation as a as a rule.

24:16.120 --> 24:30.120
And we also believe that all these situations where people are given some degree of flexibility with models where they could incrementally train them that's also not enough.

24:30.120 --> 24:42.120
I would like to speak about freedoms I for example that that when there is some bias already encoded already taught in the machine learning system.

24:42.120 --> 24:53.120
I at least what from what I know it's really not possible to remove that bias by incremental training so that's that's something that really calls for having.

24:53.120 --> 25:05.120
I would like to have a possibility to really work on the models from from scratch and the same as is or has been our approach for the to the defining free software.

25:05.120 --> 25:15.120
The approaches binary in the sense that we will we want to talk about the systems I as either free or non free.

25:15.120 --> 25:25.120
And well it's it's okay if there is some system that can't be free it's not that we want to really think how hard how to make an exception for it.

25:25.120 --> 25:36.120
We are really I mean our mission is to focus on what is software freedom and to be and to be clear about it.

25:36.120 --> 25:52.120
Why we are working on on this we've come up already with some questions that sometimes we have already some ideas what we what the answer would be sometimes we are still discussing and we also count on.

25:52.120 --> 26:10.120
Your feedback so just as an example and some of these are questions that we discussed today or yesterday with the members of the community here at Boston so for example there's this I think really important question.

26:10.120 --> 26:33.120
I have a machine learning system that does something really useful for the society something which is morally justifiable and at the same time there is this really moral challenge for releasing training data like when the data including some really private information was used to train the model and it was not possible to use.

26:33.120 --> 26:43.120
I'm only my data to train the model because of the model's purpose then I think there is there is some kind of a more question should the system be used or.

26:44.120 --> 26:57.120
Should it not be used because of lack of training data not not not training data being available for for private reasons so that that's I think an important question.

26:58.120 --> 27:14.120
Second well is it really the case that you can't remove the bias by working on a model do you really have to always return the model from from scratch there is also a question I'm not sure if you are familiar with this notion of.

27:14.120 --> 27:35.120
Free but trapped programs that was this issue with Java that there were a lot of before we had a free implementation of Java programs within written in Java could have been released as free but they were dependent on Java so we called them trapped.

27:35.120 --> 27:48.120
This is something that can also I think we we think can also happen with machine learning when machine learning application is dependent on lots of different things that might not be.

27:48.120 --> 28:04.120
And I think this training data and combat with some legal limitations that prevent making training data available might be such an example of a free but trapped machine learning application.

28:05.120 --> 28:28.120
But these are just an example just examples of questions that that I think that we think should be should be tackled and we are looking for for more because the when we hear questions from you that will well enlarge our our scope and context of of our research.

28:29.120 --> 28:31.120
Yeah that's.

28:32.120 --> 28:41.120
Yeah so how will be will we be moving forward I mean we we're the staff we've never been the fastest organization in the world we would have liked.

28:42.120 --> 28:49.120
To say to come here today and go and these are the criteria we didn't manage to do that but we're also okay with that.

28:49.120 --> 29:00.120
We have a lot of work ahead of us of debating that we also have time because this this new development isn't going anywhere and we will be learning more and more about it as we go.

29:00.120 --> 29:18.120
But we will do is make sure that we listen to people so send us your questions will talk to the technical and to the field of the philosophical experts about all of these details and about all of the questions that come in and we will be publishing on the topic.

29:18.120 --> 29:28.120
As we go forward to make sure that we keep everyone informed and you can send us more of your comments and your questions based on the work that we're doing.

29:29.120 --> 29:38.120
So we don't we haven't set ourselves a deadline but we will definitely be publishing on the topic making sure that we show you our progress and that we continue to remind you.

29:39.120 --> 30:01.120
That this definition or these criteria that we're working on are on our way it's not going to take us years but it's going to take us a little bit longer and we want to make sure that we create an aspirational conclusion as a goal for the future and something that we can live up to and all agree on and something that guarantees are freedom.

30:01.120 --> 30:14.120
Yes, and I think we still have some time for opening the discussions so yes we are looking forward to any questions or comments.

30:14.120 --> 30:16.120
Okay first one in front.

30:16.120 --> 30:32.120
Hello, thank you it's very interesting stuff you mentioned that you're going to be looking at where the data comes from and I guess that means looking at things like using scraped data from GitHub and things like that.

30:32.120 --> 30:42.120
Are you also looking at aspects of machine learning which cause a lot of damage like environmental impact and things like that or is that something considered out of scope for your organization.

30:43.120 --> 30:47.120
It's a very good question it's something that we keep in mind.

30:47.120 --> 31:06.120
So we at the have said we tend to focus on software freedom but we can avoid running into this issue of course I mean we we one of the reasons why one of our campaigns is right to repair is also because we know and we want to make sure that we where we can and where these.

31:06.120 --> 31:19.120
Other social movements or issues hits the software freedom issue that we can actually talk about it so so I mean it's something that we know and it's something that we definitely take into account.

31:19.120 --> 31:44.120
Can I guarantee you that it's an aspect that we'll put into our definition I can probably guarantee you the opposite because in any way that we would create a limitation as much as we feel personally that it's a limitation that would benefit the world we can't call it free anymore.

31:45.120 --> 31:58.120
Thank you. Hello yes I question about regulations and transparency in the GDPR which is a European regulation about data privacy for those who don't know about it.

31:58.120 --> 32:10.120
There is a theatrical 22 which states that no automated decision about people should be done by an algorithm without being able to explain it.

32:10.120 --> 32:15.120
And that brings us to the question how do you explain the decision of LLM.

32:15.120 --> 32:22.120
But that you need probably to have access either to the training not either to the training and to the data.

32:22.120 --> 32:34.120
So that can be a leverage I think for free software and did you do you work in that direction are you interested to work in that direction I mean the link between the current regulations.

32:34.120 --> 32:56.120
For transparency and free software given that also that regulations for data protection and transparency are spreading everywhere I mean I think that's you have such regulations in 55% of countries around the world today.

32:57.120 --> 33:05.120
Can you just repeat for me one more time what it was exactly that the rule was.

33:05.120 --> 33:14.120
GDPR article 22 requires transparency of the explicability of automated decisions.

33:14.120 --> 33:44.120
I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I

33:44.120 --> 34:14.120
mean that mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean really mean I mean I mean I mean said it mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I Efendimiz I mean I mean I mean I mean if I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean I mean

34:14.120 --> 34:20.360
easy to check what's happening inside, especially if you want to trust the system to make some

34:20.360 --> 34:28.600
decisions that affect people. I mean, yeah, at least in directly this affects our work because

34:28.600 --> 34:36.680
we want to make sure that people have the freedom to really be able to understand what's inside

34:37.480 --> 34:43.320
what's the inside machine learning application.

34:46.360 --> 34:57.560
I can you elaborate your thoughts when open data that is published or used on open data

34:57.560 --> 35:06.360
license or creative comments license. How do you see the attribution thing with the freedoms

35:07.240 --> 35:17.560
to be handled in the future. But are you asking how to follow attribution in when you work with

35:17.560 --> 35:23.320
data in machine learning. When I provide data in the or in creative comments license.

35:24.600 --> 35:32.280
Yeah, you're saying my my my my my my three data freak test, but it has freedoms. I want

35:32.360 --> 35:36.360
attribution. How do you see that from an adverse perspective.

35:40.200 --> 35:48.120
Well, I think it's I would say it's a challenge. I mean, okay, I still don't understand exactly in

35:48.120 --> 35:53.800
which particular scenario. Yeah, I'm going to say there's a couple scenarios that I can imagine

35:53.800 --> 36:02.200
your question being relevant to. People keep it up. I think I've got this, keep your hands up.

36:04.440 --> 36:09.480
Thank you very much. We didn't actually answer his questions, so we can talk to you after after

36:09.480 --> 36:15.640
the session. Thank you. Okay, so I wanted to address the first question. I'm listening to you.

36:15.640 --> 36:32.360
You're going to be subjected to massive losses attack by non-holy alliance of copyright, trolls,

36:33.240 --> 36:38.120
and government regulators, basically any every enemy of progress and therefore humanity is going to target

36:38.120 --> 36:42.120
again destroy the entire industry, which is going to put you to post humanity ten years back.

36:47.720 --> 36:50.840
I don't, yeah, was there a question? I think it's more comment.

36:50.840 --> 37:03.640
I mean, yeah, I mean, to address what you're saying. I mean, we don't, we're not

37:04.520 --> 37:10.120
supporters of copyright either. We use it to make sure that our tools continue to be free,

37:10.120 --> 37:18.280
but it's not something that we support either. But at the moment, we don't have the option of not

37:18.280 --> 37:23.640
having copyright, and so we play with the system to do that. And as long as copyright is in

37:23.640 --> 37:27.080
existence, we'll continue to use it in our advantage.

37:42.120 --> 37:46.600
Yes, no, I agree with you. And this every day, something that we think about, absolutely,

37:46.600 --> 37:50.040
it's also something. I mean, copyright is something, of course, an issue that is being discussed

37:50.040 --> 37:54.680
in many levels when it comes to machine learning from what goes into it to what comes out of it.

37:55.560 --> 37:59.720
So it's definitely something that we are taking series, and we think about it every step of the way.

38:00.920 --> 38:04.280
But, like I said, as long as it's still here, we need to use it to our advantage.

38:04.840 --> 38:15.160
So maybe quickly, if I understand, if you are corrected by that we would say something that supports

38:15.720 --> 38:21.720
the abusive use of copyright, I'm sure we won't. That's actually something that

38:21.720 --> 38:28.600
why we want to take time working on this, because we don't want to support how copyright is used

38:28.600 --> 38:33.080
to block access to information or flow of information.

38:35.240 --> 38:39.640
Yeah, an observation. Yeah, and I'll just give you a little addition to what I was saying.

38:39.640 --> 38:44.680
Like, it's not that we, I'm also not saying that we're looking to abolish copyright. We do

38:44.680 --> 38:50.680
think that in some ways, of course, copyright is something that can support artists and push society

38:50.680 --> 38:57.160
forward. It's just the abuse of the system around it is what we don't support. Copyright, of course,

38:57.160 --> 39:03.880
was created to make sure that we can build up on the work of others, and that's the part that we believe in.

39:04.280 --> 39:07.080
Thank you. I would like to ask a question.

39:07.720 --> 39:15.080
What's your stand about the open AI definition that OSI is working on?

39:16.520 --> 39:21.800
I can imagine that there might be some friction. Can you explain a little bit about this?

39:23.880 --> 39:34.600
Sure. I mean, I think OSI worked really hard to get to a conclusion.

39:35.560 --> 39:41.800
I respect the fact that they did, and I respect the fact that they felt like they had to publish

39:41.800 --> 39:47.640
a conclusion. Like I said, we don't necessarily believe that there should be a deadline on

39:47.640 --> 39:55.800
a conclusion, especially not if it's driven by anything, but what we believe in and framing and

39:55.800 --> 40:01.320
phrasing that the right way. I mean, the FSF is an organization that, like I said, is supported by

40:01.320 --> 40:08.200
individuals. We are driven by the needs of freedom, and that's what we really sort of want to hold on to.

40:09.640 --> 40:15.880
I don't want to all of a sudden make this into a presentation where we start accusing the OSI

40:15.880 --> 40:21.320
of having done everything wrong, but I can say that we are not supportive of their conclusion,

40:21.320 --> 40:29.160
and it's, of course, the part that the fact that we are driving towards the conclusion ourselves,

40:29.240 --> 40:37.800
that trading data should be included fully into our criteria is the direction that we're going in,

40:37.800 --> 40:44.600
and that's not the direction that the OSI has taken. Just a quick response to your answer,

40:44.600 --> 40:57.880
because I have some troubles with the open AI definition, and also with the statement of the

40:57.880 --> 41:03.800
training's data should always be open. And this is a struggle for myself, because I'm a supporter of

41:03.800 --> 41:15.720
the FSF E. The case that I can think of where I have my troubles is related with training's data

41:15.720 --> 41:26.360
in ethical, that are very difficult. For instance, let's say you would develop a machine learning

41:26.440 --> 41:38.360
system to use against CSAM, then you need to have CSAM material to train this machine learning

41:38.920 --> 41:47.000
system, and by offering that type of very sensitive material, you know, this is the ethical

41:47.000 --> 41:52.200
dilemma that I have. I understand, and it's a dilemma that we also, I mean, it's the first one that is

41:52.280 --> 42:02.360
on our slide right here. There are, of course, so we've lived our lives by the binary of free

42:02.360 --> 42:08.040
and non-free, and it's the binary that we believe in. We genuinely believe that if you start

42:08.040 --> 42:14.680
naming something almost free or maybe free, it's not free, and if you start accepting those kinds of

42:15.560 --> 42:23.480
solutions, and you're eating away at the possibility of full freedom, which goes about

42:23.480 --> 42:27.640
which talks about software, but it also talks about our lives in general, which is, you know,

42:27.640 --> 42:36.040
where a lot of our passion for this topic comes from. But of course, even we see that there are

42:36.040 --> 42:41.000
situations where at this moment, anyway, some examples are being given, that can be

42:41.880 --> 42:48.520
that show us that there are moments in which we have to consider that maybe a non-free system

42:50.040 --> 42:56.680
is morally justifiable. It's not, like, that's not the answer we're giving you, but it's a

42:56.680 --> 43:03.000
question we're posing ourselves as well. It's a difficult one. I think maybe in the room,

43:03.000 --> 43:13.000
there are also people that can respond to this, even. But I also think that us putting down a

43:13.000 --> 43:19.400
very hard binary criterion will also make other people think about how they can create systems

43:19.400 --> 43:25.560
without, you know, not looking for the easy solution there, and maybe finding solutions themselves

43:25.560 --> 43:32.040
that make it possible to not violate people's privacy or, because I mean, it's very easy to

43:32.040 --> 43:38.600
give people an out, and we don't want to do that either. Okay, we're using up the time.

43:38.600 --> 43:42.120
Let's try and get as many questions in as we can, so keep your questions short, if you want.

43:43.480 --> 43:48.760
Thank you very much for your talk. If you, if we want a foundation model or a

43:48.760 --> 43:55.720
labs or whatever, relying on free data and free hyper parameters, shall we build

43:56.040 --> 44:03.080
our own foundation models, like as a free software initiative, and does it exist? This kind of initiative?

44:06.120 --> 44:12.680
I mean, initiatives that work on models based on freely free release data.

44:13.960 --> 44:19.240
Well, I believe there are, so first of all, from what I know, there are a lot of sources for

44:20.120 --> 44:25.160
data that are not in combat with any restrictions on their distribution.

44:27.560 --> 44:35.400
So, yeah, I think this is something that we also want to, well, promote and support,

44:35.400 --> 44:42.120
maybe not as the FSF work. We are doing something different, but yeah, I think that I believe that

44:42.120 --> 44:47.560
that's the future for the machine learning freedom. We want to be clear what it is, and then

44:47.640 --> 44:50.760
people can follow and and fill in the gaps.

44:54.200 --> 45:01.720
I am not really question just as a thought, because talking about CC, CCO, the openness.

45:01.720 --> 45:13.000
You still try to not to combine the idea of source code and data. For example, text

45:13.080 --> 45:20.120
semantics from my point of view, I'm a typesetter is source code. If I take good as fast

45:20.120 --> 45:27.320
and just set up from scratch, a decision management system, our can calculate, for example,

45:27.320 --> 45:35.000
that molesting child is quite okay with a higher percentage instead, if I do not add something

45:35.000 --> 45:44.120
like a law against rape, as simple as that. So, couldn't it be easier to say that the texts

45:44.120 --> 45:48.760
we train, we use for training from scratch, because we have to train from scratch, if we want to

45:49.800 --> 45:55.960
avoid bias and back doors, just to say we need a different license.

45:56.920 --> 46:03.880
Especially CCO and or maximum CC by, because we have to change the text.

46:08.360 --> 46:14.360
You're not the first person to have mentioned it. It's definitely something we're thinking about

46:14.360 --> 46:18.280
as well, but it's a good observation and we're taking it with us.

46:25.960 --> 46:32.360
It's free, for instance, you have a anti-diamy provision and it doesn't affect freedom zero,

46:32.360 --> 46:43.400
because it defeats it by giving more freedoms. So, did you look into such licensing scheme,

46:43.400 --> 46:49.960
like to protect privacy with a machine learning and so on, that gives more freedom to

46:49.960 --> 46:53.320
like defeat bad purposes or not?

46:57.320 --> 47:04.680
Well, the short answer will be, at least from my side, not yet, it's not that we already have

47:04.680 --> 47:10.440
some ideas for designing new licenses or new licensing mechanisms to address such issues.

47:11.000 --> 47:17.880
But, I mean, there are all the existing licenses for data that are free,

47:20.040 --> 47:26.600
and I think that's the first step to be, to be really clear that there is data freely licensed

47:26.600 --> 47:29.480
or data in public domain and uncovered.

47:30.200 --> 47:38.840
So, right now, we live in a society that is machine learning, called the AI,

47:38.840 --> 47:44.120
it's like starting to be everywhere, like the justice system, the legal system, the military,

47:44.120 --> 47:52.360
the hospital. So, I wonder, how can we embrace societies and the public and how to mobilize

47:52.360 --> 47:57.880
the public in this, if you have thought of this, because I think right now, the conversation

47:57.880 --> 48:06.920
is like limited in a previously, I was in a conversation in the EU policy room, and I felt like

48:06.920 --> 48:15.800
the conversation is like, think very limited in the European Parliament or the, or say,

48:15.800 --> 48:20.680
some very technical people talking about technical stuff. And I wonder, how can,

48:20.680 --> 48:28.920
how can we embrace the people who, we're at now, they see, they see all these marketing

48:28.920 --> 48:33.240
news about the AI, and they are wondering what, what is our stance, how can we fight this,

48:33.240 --> 48:39.160
how can we fight that we are losing control. So, have you thought about it,

48:39.160 --> 48:44.600
and another question, really fast, how can we contact you in order to send ideas?

48:45.000 --> 48:49.000
So, the email address on the screen?

48:49.000 --> 48:56.760
Yeah, sorry, I, well, I say, well, we were thinking about it, I think it's really important

48:56.760 --> 49:03.560
to be clear and to speak in simple words, to non-technical people, what's at stake,

49:04.600 --> 49:08.920
because for a lot of people who are just trying to experiment with various

49:09.000 --> 49:13.080
other lands, for them, they really don't see a difference between interacting with another

49:13.080 --> 49:19.800
land, and some more traditional application, why there is a difference that they should be,

49:21.240 --> 49:26.040
yeah, but they should know what it is. So, yeah, I think we will be working on making it,

49:26.920 --> 49:29.720
could you message to people what's at stake?

49:29.720 --> 49:34.360
It's, I mean, and it's important, we absolutely agree with you that it's important,

49:34.440 --> 49:43.160
but to be completely fair, and that, I mean, we have been trying to tell the world for years

49:43.160 --> 49:48.840
that software should be fully free, and we've tried, we're trying really hard to translate into

49:48.840 --> 49:53.720
ways that we can talk to people that are not technological people to understand these concepts,

49:53.720 --> 50:00.680
and we have always relied on drawing in different social movements, other people's concerns

50:00.680 --> 50:05.960
that have nothing to do with software, per se. So, just to summarize, this is a responsibility,

50:05.960 --> 50:13.000
I think that we all have, that isn't just limited to even the technical world, but really,

50:13.000 --> 50:16.040
yeah, we need to make sure that that's more understood.

50:16.040 --> 50:20.600
Thank you all very much. I know there are more questions, but we've run out of time.

50:20.600 --> 50:23.240
Thank you so much to Zairn, Chris, I'm tremendous here.

50:23.240 --> 50:32.600
Thank you. Please don't forget to carry on the conversation info at FSF.org.

50:32.600 --> 50:33.880
Thank you very much.