WEBVTT

00:00.000 --> 00:17.280
I want to start the way it's saying that I was not supposed to be doing this talk, we were

00:17.280 --> 00:22.540
supposed to be having Anna here from the accessibility team, which is an awesome team at

00:22.540 --> 00:26.140
Mozilla, but unfortunately her visa got rejected.

00:26.140 --> 00:33.820
So big boo for whoever rejected her visa.

00:33.820 --> 00:39.700
So then I took the time to come here to talk all of you about something else.

00:39.700 --> 00:45.980
I came from the middle of nowhere where I lived to the beautiful Brussels and can't wait

00:45.980 --> 00:48.980
to give this great talk to you.

00:49.980 --> 00:55.980
All right, let's move on and see what are we actually going to be talking about.

00:55.980 --> 01:00.780
Quick intro about localization, what are we actually doing to localize how to read

01:00.780 --> 01:06.820
Mozilla, then Mozilla language portal, that's something new, that's going to be the main

01:06.820 --> 01:15.860
the core part of this talk, then a little bit about bird box questions and then we go home.

01:15.860 --> 01:23.500
So it's really hard to say in just a few words what localization is and how we do it at

01:23.500 --> 01:24.500
Mozilla.

01:24.500 --> 01:33.500
So I thought I'd ask someone very intelligent and the answer that came up was very poetic

01:33.500 --> 01:39.540
and I decided to dig in a little bit, like what do you actually mean by that?

01:39.540 --> 01:45.980
And then the famous poet from the East actually clarified it.

01:45.980 --> 01:50.700
And I think it's a pretty good statement.

01:50.700 --> 01:55.580
The whole point of localization is to bring the open web to way more people than if we don't

01:55.580 --> 02:00.540
do localization and we can prove that with numbers, there are way more Firefox users

02:00.540 --> 02:07.100
that use translate at localize versions of Firefox than English Firefox.

02:07.100 --> 02:09.940
So localization matters.

02:09.940 --> 02:17.580
And for a brief moment, let's dive into some stats.

02:17.580 --> 02:23.020
We don't only localize Firefox, at Mozilla, there's a bunch of projects going on.

02:23.020 --> 02:30.100
You've heard about many of them today and there's a bunch of others more.

02:30.100 --> 02:36.980
Usually in pontoon, in our platform, to localize, you're going to see around 30 projects

02:36.980 --> 02:40.740
that are being worked on, sometimes the number is a little bit higher, sometimes a little

02:40.740 --> 02:43.220
bit lower.

02:43.220 --> 02:50.220
We have a total of 365 and a half double check, that number, 365 teams.

02:50.220 --> 02:55.180
So locales, the which we translate over, locales a little bit different than a language

02:55.180 --> 02:59.340
because we have several regional variants of some languages like Spanish.

02:59.380 --> 03:04.780
We have a separate localization of Mexican Spanish, Spanish, Spanish, and so forth.

03:04.780 --> 03:09.780
But 369 locales, it's a pretty huge number.

03:09.780 --> 03:16.900
I don't know if there's a lot of software or even out of software industry projects that

03:16.900 --> 03:23.260
have so many translations or like book translations or whatever.

03:23.300 --> 03:32.980
We have about 1,000 plus volunteers who in their free time make sure that Firefox Thunder

03:32.980 --> 03:39.860
Bird, AMO, MDN, Sumo is available for speakers of their language.

03:39.860 --> 03:44.900
They're awesome and I'm happy that I see some of them here and I'm happy that some

03:44.900 --> 03:48.700
of us are following us as I just read from chat.

03:48.700 --> 03:49.700
Hi.

03:53.700 --> 03:57.900
To give you an idea of the amounts, we're talking about a half a million translations

03:57.900 --> 03:59.900
submitted per year.

03:59.900 --> 04:05.260
The numbers, of course, varies a little bit year by year, but just to give you an idea.

04:05.260 --> 04:09.420
And a translation, a strength could be something very simple, even like one letter,

04:09.420 --> 04:12.860
but it would also be like a whole paragraph.

04:12.860 --> 04:17.220
Sadly, I don't have a number to give you the exact words.

04:17.460 --> 04:22.660
Okay, so what is the language portal?

04:22.660 --> 04:26.100
It's something that doesn't exist yet.

04:26.100 --> 04:28.260
Actually, I'm looking at you in Ludo.

04:28.260 --> 04:35.420
For those of you who didn't come to the contributor to meet up yesterday, I think it was

04:35.420 --> 04:39.620
you or Pascal who said something about that, we should be doing more things in the open

04:39.620 --> 04:44.140
to get the feedback, that's what we're trying to do here.

04:44.140 --> 04:53.620
We're in a very early stage of developing the Mozilla language portal, and it's a simple

04:53.620 --> 05:01.900
micro-absite, targeted at localization and translation experts globally from various industries

05:01.900 --> 05:04.300
various projects.

05:04.300 --> 05:08.900
And the basic idea is to share stuff that we do at Mozilla, our translations, or best

05:08.900 --> 05:17.060
practices, even tools with them, and make them available for their use.

05:17.060 --> 05:22.140
For example, I know that some of you are probably not familiar with the term translation

05:22.140 --> 05:23.140
memory.

05:23.140 --> 05:28.620
We'll basically collect all the translations that have ever ended up in any project that

05:28.620 --> 05:34.500
we translate, either today, either in the past, it's been like a project that doesn't exist

05:34.500 --> 05:37.380
anymore, and we store them.

05:37.380 --> 05:45.620
And that the base is called translation memory, and it's useful to Chrome localizers,

05:45.620 --> 05:50.100
open office localizers, Linux, digital localizers, everyone basically.

05:50.100 --> 05:55.580
And language portal is going to have simple search, which is going to allow people to search

05:55.580 --> 06:01.220
through it, don't lose these translations and incorporate them into their tools, as well

06:01.220 --> 06:06.900
as leverage this content through the API.

06:06.900 --> 06:11.940
Similarly, through the exact same means, we're going to make it available for people to

06:11.940 --> 06:16.260
use our glossary, so collection of terms.

06:16.260 --> 06:22.020
We have a list of terms that are frequently translated and part of translation strings,

06:22.020 --> 06:27.780
with their definitions, examples of use, and all of them are translated to maybe not

06:27.780 --> 06:32.580
360-nile accounts, but to most of them.

06:32.580 --> 06:36.860
That's going to be available similarly as translation memory.

06:36.940 --> 06:44.460
In addition to that, we have a process, the way we do localization, which is not necessarily

06:44.460 --> 06:51.820
specific for a Mozilla, there's no strong reason why some start up or some add-on developer

06:51.820 --> 06:57.100
who wants to have their piece of software localized, wouldn't be able to benefit from that.

06:57.100 --> 07:03.100
So we're going to make the relevant bits of our documentation available to people.

07:03.180 --> 07:08.780
We have a block on which we not only write about stuff that's going on at Mozilla, but it's

07:08.780 --> 07:14.060
potentially interesting for the wider industry, so we're going to make this thing available

07:14.060 --> 07:15.500
on the language portal as well.

07:15.500 --> 07:22.140
I also have style guys, which will probably come in the later states, not in the initial

07:22.140 --> 07:29.820
version of the language portal, but basically documents through which localization teams

07:29.900 --> 07:32.780
define the standards and rules under which they localize.

07:35.420 --> 07:40.780
We also have some language technology that we develop in-house. I've already mentioned

07:40.780 --> 07:47.900
pontoon, which is our TMS system, but the one that I want to highlight, I know that this URL

07:47.900 --> 07:52.700
is probably not very visible, and I didn't put a cure coat on.

07:53.020 --> 07:57.020
That's still half done, right?

07:57.020 --> 08:03.020
Yeah. The Moz Outen and Library is, I think, something that has a potential to become an

08:03.020 --> 08:13.420
industry's leading library to parse and serialize virtually anti-translation format into a

08:13.420 --> 08:19.500
data model that is mimicking the new standard message format too, you might have heard of,

08:20.460 --> 08:26.060
and as such, allows a very rich representation of virtually anti-massage stored in any

08:26.060 --> 08:33.180
format. This is actively worked on, we're going to be adding other features, checks soon,

08:34.300 --> 08:39.420
and I'm pretty sure there's a potential for lots of users for that.

08:44.300 --> 08:48.460
This is just a sneak peak of how the home page is going to look like or might look like,

08:49.180 --> 08:52.860
there's another URL you don't see at the bottom where this is being developed.

08:53.900 --> 08:59.900
Feel free to provide suggestions, file box, when we're so far that you can actually find bugs,

09:01.260 --> 09:03.740
as well as provide patches. You accept those as well.

09:08.060 --> 09:14.540
And now that I'm at the very end, I just want to co-out bird box, which is the framework

09:14.540 --> 09:23.740
that we use for building, a Mozilla language portal. It's a framework developed by our

09:23.740 --> 09:29.980
Moz Miao team, Mozilla Marketing and Operations, I think, is what the acronym stands for,

09:31.020 --> 09:37.020
specifically Steve Jalim, who allowed me to mention his name on stage. Bird box is basically

09:37.980 --> 09:43.020
helping developers to build the microsites like the one that I just discussed.

09:43.980 --> 09:50.460
With Mozilla branding, so it's leveraging Mozilla protocol, the design system used by

09:50.460 --> 09:58.220
Mozilla.org and some other Mozilla assets, and it allows developers to then hand over

09:58.780 --> 10:03.740
content creation to non-developers, because it's built on Vactel CMS,

10:05.260 --> 10:09.900
which basically allows anyone who can use Facebook to also add it to these websites.

10:13.100 --> 10:17.100
All right, that's it for me. Thank you so much. Thank you very much.

10:23.820 --> 10:26.060
Do we have questions, etc.

10:28.060 --> 10:36.620
Will the speed almost like crowd in? Do I see like a crowd in copy coming up from Mozilla?

10:37.420 --> 10:42.060
No, the language portal is not going to be like a crowd in, but we already have a similar

10:42.060 --> 10:47.500
tool like crowd in, called pontoon. It's available on pontoon.mozilla.org. That's where

10:47.500 --> 10:51.820
translations come in. That's where people see a set of projects that will localize, they can

10:51.820 --> 10:58.620
enjoy their teams, they see the string, they can actually translate strings. So that's sort of like

10:58.620 --> 11:06.380
our translation management system. It's specifically targeted at people that are translating

11:06.460 --> 11:13.100
Mozilla software. Mozilla language portal contains lots of or most of the translations or all

11:13.100 --> 11:17.420
of the translations that are submitted through pontoon, but makes them available in a more

11:18.780 --> 11:25.100
let's say user-friendly fashion. Then if you go directly into pontoon and try to find

11:25.100 --> 11:29.420
where to download some of these files that you can't really download them in the right form,

11:29.900 --> 11:36.700
you can't really search through them through pontoon. So it's basically repackaging that content

11:36.700 --> 11:42.300
and making it available separately to everybody, not just to Mozilla localizers.

11:44.380 --> 11:45.340
More questions?

11:45.340 --> 12:00.540
Thank you. Sorry, I, I, I, I, it's very simple question. Did you say that the main

12:00.540 --> 12:06.140
audience of this part is going to be developers or who else? I didn't catch it.

12:06.140 --> 12:11.340
The main audience of Mozilla language portal are going to be localization professionals,

12:11.420 --> 12:18.300
translators, from any project, from any, um, software projects they localize.

12:18.300 --> 12:23.340
So they, they lean into this to localizers, open office localizers, Mozilla localizers,

12:23.340 --> 12:28.940
Microsoft localizers, uh, translators that may be not even translators, um,

12:28.940 --> 12:34.620
software. Maybe they would find some of the terms, uh, that we translate useful in translation

12:34.620 --> 12:42.060
of whatever they translate. So it's targeting translators. Uh, the part where I might have confused

12:42.060 --> 12:49.260
you, uh, was the documentation, the best practices. So if you want to, if you have a project

12:49.260 --> 12:54.540
that you would like to get localized, then yes, we're basically also targeting developers with

12:54.540 --> 13:00.300
the documentation of processes, best practices, how to write the code such that it's

13:00.380 --> 13:05.180
localizable. We have these documents available today, but they're a little bit harder to find,

13:05.900 --> 13:12.540
and we think that if we put them on this, uh, hopefully more visible, uh, place, uh, more people are

13:12.540 --> 13:23.420
actually going to benefit from them. Do I have more questions? I have a question. Um,

13:23.500 --> 13:31.260
today, if I want to translate something, I can use a RIGI to do the translation, um,

13:31.260 --> 13:37.500
how, what's the value proposition that is different than me just using a RIGI to do my

13:37.500 --> 13:43.660
translation, because I'm just too busy to do the work, right? Um, it's the quality basically.

13:44.860 --> 13:51.660
Um, at least, at least today, it's very, very, okay. The quality is, and a very fine answer.

13:51.740 --> 14:03.180
Okay. At least for me. No, for me something, some, I, I contribute a bit in the past.

14:03.180 --> 14:07.900
For me, the most important thing is that we're not translating, we're localizing. So some

14:07.900 --> 14:13.820
of the AI will translate, word by word, or the sentence, we're in the culture that's why you have

14:13.820 --> 14:17.980
different versions like different locales, and we don't have just only languages, because we

14:18.060 --> 14:21.900
look alike, something to your local culture, expedience, or things like this.

14:36.380 --> 14:43.980
Yeah, so I have a similar question. Um, there's this trend of using machine translations

14:43.980 --> 14:50.860
for everything in software these days, and obviously the data set that Mozilla is intending to publish

14:50.860 --> 14:57.740
through the language portal will be of much higher quality, but how can you, let's say, um,

15:00.140 --> 15:05.420
motivate users to use the language portal instead of machine translations.

15:06.780 --> 15:13.740
Right. Um, that's a fair question. We're, um, we're already, um, giving the ability to localizers

15:13.820 --> 15:19.340
to opt in, um, to what we call a pre-translation, which means that when new strings are made

15:19.340 --> 15:25.180
available, they already see them as translated. They're marked as still meaning work, but they

15:25.180 --> 15:30.460
actually end up in the product, if we don't act, if they don't act upon them in real time. And the

15:30.460 --> 15:36.460
way that works is it's basically using machine translation that's trained with our existing data,

15:36.460 --> 15:40.860
which is going to be available on Mozilla language portal. Some localizers don't like that.

15:41.500 --> 15:47.340
They say, and we trust them, that it doesn't work for the local. There's 369 locales.

15:48.860 --> 15:53.500
I doubt that the quality is the same for all the languages. I know that for mine, for Slovenian,

15:53.500 --> 15:59.500
which is spoken by two million people, it's unbelievably good. I know that there's, there's not

15:59.500 --> 16:06.700
so much material available in my language probably, um, but, um, but it's way better, uh, than,

16:07.020 --> 16:13.020
than going to generic machine translation engine. So, uh, I think these questions, the answers

16:13.020 --> 16:18.860
to the question that Ludo asked might have a different answer in a different point in time in the future.

16:22.300 --> 16:29.820
There's a question in the chat. Is there any LLM use case involved in any part of the work flow?

16:29.820 --> 16:37.260
Also, where can I learn more about this? I'm, I'm not sure I got the question. Learn more

16:37.260 --> 16:47.100
of more about the language portal. The, uh, in LLM use case involved in any parts of the work flow?

16:47.900 --> 16:57.660
Okay. And where can this person learn more on this? Okay. So, I think one part is what I explained

16:57.740 --> 17:04.540
in my answer to the previous question, uh, but maybe the user is hinting at using that,

17:04.540 --> 17:10.620
using these translations also to train these engines, but the answer to that is that I think,

17:10.620 --> 17:17.660
even though the numbers look high, uh, and I know AI experts, but I think in terms of training material,

17:17.660 --> 17:25.180
these are really, really low numbers. Like, is there any AI involved? No, he says, in building that,

17:25.500 --> 17:40.540
no. I'm not sure I answered that question. Was that? Yeah, he's hiding. Other more questions?

17:46.140 --> 17:50.780
Thank you very much for attending the material room. We're going to clean up and close.

17:50.780 --> 17:57.100
Stickers are only available downstairs now. Okay, thank you. No more upstairs. Thank you very much.

