WEBVTT

00:00.000 --> 00:06.000
All right.

00:06.000 --> 00:07.000
All right.

00:07.000 --> 00:10.000
Thank you.

00:10.000 --> 00:12.000
Are we going to sound check, too?

00:12.000 --> 00:14.000
People can hear on the stream.

00:14.000 --> 00:15.000
All right.

00:15.000 --> 00:17.000
Well, I'm going to start with a confession.

00:17.000 --> 00:22.000
So the real reason why Evan and I are here today has to come and enjoy everything that

00:22.000 --> 00:27.000
Belgium has to offer, the good beer, the good cheese, the good chocolate.

00:27.000 --> 00:32.000
But we're also here to preach the gospel of open source, which I think is going to be easy with

00:32.000 --> 00:33.000
this crowd.

00:33.000 --> 00:37.000
It looks like we have a lot of very open source enthusiast here.

00:37.000 --> 00:44.000
But we're going to be up to talking about Apache supersets and let's kick it off.

00:44.000 --> 00:45.000
All right.

00:45.000 --> 00:48.000
So let's start with some interesting shows.

00:48.000 --> 00:49.000
Yeah.

00:49.000 --> 00:50.000
Hi.

00:50.000 --> 00:51.000
I'm Evan and Resecus.

00:51.000 --> 00:53.000
I'm one of the projects.

00:53.000 --> 00:57.000
So I am here for the beer for real.

00:57.000 --> 00:58.000
But yeah.

00:58.000 --> 00:59.000
Yeah.

00:59.000 --> 01:01.000
You've got more beer cred than I do.

01:01.000 --> 01:02.000
And I'm Max.

01:02.000 --> 01:05.000
I'm the original creator of Apache superset.

01:05.000 --> 01:08.000
You might know me to have heard me place is on the internet.

01:08.000 --> 01:10.000
Talk about Apache Airflow.

01:10.000 --> 01:16.000
If you're familiar with that, assume we have a lot of data and enthusiasm in the room.

01:16.000 --> 01:21.000
So I started at Apache superset back in 2015.

01:21.000 --> 01:24.000
It's almost like 10 years ago.

01:24.000 --> 01:31.000
And since then, I started a company coming to you called preset that helps kind of foster

01:31.000 --> 01:33.000
and offer services in and around superset.

01:33.000 --> 01:37.000
But today, we're really focused about talking about superset.

01:37.000 --> 01:38.000
Yeah.

01:38.000 --> 01:41.000
And of course, one of the things is, you know, global switch day.

01:41.000 --> 01:45.000
If we can talk you into one thing and one reason to come halfway across the globe.

01:45.000 --> 01:49.000
It's to switch from proprietary BI tools to open source BI.

01:49.000 --> 01:52.000
How many of you are paying for superset?

01:52.000 --> 01:53.000
Yeah.

01:53.000 --> 01:57.000
How many folks are already using open source BI?

01:57.000 --> 02:02.000
And how many folks are still paying or your orgs are paying for proprietary BI?

02:02.000 --> 02:05.000
Everybody, everybody's got some Tableau licenses.

02:05.000 --> 02:08.000
And some of what we want to be preaching today.

02:08.000 --> 02:15.000
And the core message is that open source in this area, business intelligence, open source, data visualization,

02:15.000 --> 02:21.000
has gotten extremely competitive due to the work of the community over the past decade or so.

02:21.000 --> 02:28.000
And is extremely competitive and pretty much all aspects of what you might want from a business intelligence tool.

02:28.000 --> 02:35.000
And ahead in many ways, you know, there's a lot of advantages you get from open source that you'll never get from an open source vendor.

02:35.000 --> 02:37.000
And we'll talk about some of that today.

02:37.000 --> 02:38.000
Yep.

02:38.000 --> 02:40.000
So really quickly what we're going to cover is all of this.

02:40.000 --> 02:41.000
What is the product?

02:41.000 --> 02:43.000
We're going to look a little bit at how it's built.

02:43.000 --> 02:46.000
We're going to talk about how you can tweak it, which is the fun stuff with open source.

02:46.000 --> 02:48.000
And then, of course, how you can get your hands on it.

02:48.000 --> 02:51.000
And, you know, talk your people into using it.

02:51.000 --> 02:54.000
The start off with exactly what it is.

02:54.000 --> 02:59.000
If what is a patchy superset, so it is, of course, open source we wouldn't be here.

02:59.000 --> 03:02.000
If it was not open source enterprise ready.

03:02.000 --> 03:04.000
Maybe not everyone here and works out enterprises.

03:04.000 --> 03:07.000
There's probably students and college attendees.

03:07.000 --> 03:10.000
But by enterprise, we mean it is very much.

03:10.000 --> 03:16.000
So a patchy superset is an application that you install and that you can use for you and your team, right?

03:16.000 --> 03:21.000
And open source we have, you know, frameworks and pieces of infrastructure.

03:21.000 --> 03:24.000
We have things like databases, small libraries.

03:24.000 --> 03:27.000
But in this case, a patchy superset is an app.

03:27.000 --> 03:35.000
It's a web app that you can install on your network and serve your entire organization company team with very easily.

03:35.000 --> 03:39.000
And then business intelligence, if you're not familiar with the term, even though it is a, you know,

03:39.000 --> 03:43.000
it's kind of a dated aging term that probably started like 20, 25 years ago.

03:43.000 --> 03:49.000
But what it means really is like data consumption, data visualization, dashboarding, right?

03:49.000 --> 03:51.000
So superset will get a little bit deeper into it.

03:51.000 --> 03:58.000
But superset is a tool that enables you and your team to, essentially, like connect a database,

03:58.000 --> 04:04.000
consumer data, rights of the SQL, if you're not a writer, a sample of dashboard, share it with people.

04:04.000 --> 04:08.000
So it's really about data exploration, visualization, creating interactive dashboards.

04:09.000 --> 04:14.000
Now, of course, being open source, we need to talk for a second about the Apache software foundation.

04:14.000 --> 04:21.000
This isn't a patchy project and what that really means is that no organization, like preset or any of the other people that own,

04:21.000 --> 04:23.000
or maintain it are actually the owners.

04:23.000 --> 04:32.000
Apache owns the repo, Apache owns the trademarks, and they provide a governance model that allows us to responsibly maintain the software and maintain the community.

04:32.000 --> 04:37.000
And they provide all kinds of other resources like legal coverage and all sorts of other.

04:37.000 --> 04:43.000
Yeah, in many ways, like it keeps a commercial organization honest around the open source software that is around.

04:43.000 --> 04:49.000
So you'll see, you know, big companies at some point kind of shift their license and things like that with the Apache software foundation.

04:49.000 --> 04:56.000
There's a really clear governance model around things that keeps everybody kind of fair and honest.

04:56.000 --> 05:00.000
And it's all based on the meritocracy as open source should be.

05:00.000 --> 05:03.000
Yeah, it's all about individuals, not companies, which is great.

05:03.000 --> 05:07.000
So we have to play by the rules of engagement to be patchy way.

05:07.000 --> 05:14.000
This is the slide about me, so I'll already introduce myself, but maybe I'll talk a little bit about the genesis of Superset and how it came to be.

05:14.000 --> 05:18.000
So about, yeah, 10 years ago or so maybe like a little bit longer than that.

05:18.000 --> 05:20.000
I was at Airbnb.

05:20.000 --> 05:24.000
And a lot of the tools that we wanted to use for self did not exist on the market.

05:24.000 --> 05:27.000
So we had tablo licenses.

05:27.000 --> 05:33.000
We look at different BI tools, but like none of the tools that existed out there would totally serve our use cases.

05:33.000 --> 05:42.000
And some of the issues is we had picked these big databases at the time, like Apache Druid, Presto, and Tablo did not play very well with these databases.

05:42.000 --> 05:47.000
And we had a set of like very kind of unique use cases.

05:47.000 --> 05:55.000
So we decided to build our own and like often an open source, like you need someone a little bit more delusional and kind of crazy and think like,

05:55.000 --> 06:00.000
why don't we, why can't I build a new state and new thing from scratch?

06:00.000 --> 06:04.000
That's going to what happened in the REST's history.

06:04.000 --> 06:10.000
Just a little bit around what we do, so part of what we do is we, you know, people use all sorts of databases.

06:10.000 --> 06:13.000
Every other data is a new database that's created.

06:13.000 --> 06:20.000
It's a C% plays very nicely with all of the databases that exist in the world pretty much.

06:20.000 --> 06:25.000
That means like whatever database you're using for your project, your company, you know,

06:25.000 --> 06:28.000
superset can most likely speak to it very well.

06:28.000 --> 06:32.000
And just being open source to, it really fosters, you know,

06:32.000 --> 06:34.000
extensibility, connectivity, and aggression.

06:34.000 --> 06:37.000
So we support all databases.

06:37.000 --> 06:38.000
Next slide.

06:38.000 --> 06:48.000
So if you use everywhere, so one thing about superset is we are one of the fairly big open source project by any open source metrics.

06:48.000 --> 06:53.000
A lot of stars on GitHub, like lots of contributors, super high velocity.

06:53.000 --> 06:59.000
Lots of different organizations contribute contributing to it, which is part of like,

06:59.000 --> 07:05.000
you know, when you make a decision to commit to an open source project or software for the long term,

07:05.000 --> 07:07.000
it's good to know that there's a really strong community around it.

07:07.000 --> 07:11.000
And this is very much the case with superset.

07:11.000 --> 07:15.000
So have anyone to talk about some of the components of the bits and pieces?

07:16.000 --> 07:25.000
Yeah, obviously, the point of the tools to build highly interactive dashboards and allow people to quickly get insight into any of the data that you have available to you.

07:25.000 --> 07:30.000
And but working from the bottom up, there's a workflow that typically happens when you're building these dashboards.

07:30.000 --> 07:32.000
You would start obviously with connecting your data.

07:32.000 --> 07:37.000
So like Max said, we connect it pretty much anything that speaks SQL, but even the things don't speak SQL,

07:37.000 --> 07:41.000
you can connect through some sort of intermediary normally, like a presto, you know,

07:41.000 --> 07:44.000
or drill something like that, some other query engine.

07:44.000 --> 07:46.000
So we can connect all sorts of stuff.

07:46.000 --> 07:52.000
You just put into URI here and then we drop you into SQL Lab, which is a fully fledged SQL IDE,

07:52.000 --> 07:57.000
where you can obviously write SQL queries, but you can share those with your co-workers.

07:57.000 --> 08:00.000
You can get a query history in case you were wondering what you changed in your SQL,

08:00.000 --> 08:03.000
between last week and this week or something like that.

08:03.000 --> 08:10.000
But that's where you basically select all of your dimensions and metrics and then you start to build up your data set.

08:11.000 --> 08:14.000
The data sets, one of the key entries is superset.

08:14.000 --> 08:23.000
We have kind of a data set centric philosophy where you really want to use SQL to kind of build up something that's going to power your videos or power your dashboard.

08:23.000 --> 08:27.000
And we do that through either virtual or materialized data sets.

08:27.000 --> 08:34.000
And then when you create those, we have this data set editor, you see, which is a light semantic layer.

08:34.000 --> 08:36.000
It allows you to annotate your data.

08:36.000 --> 08:50.000
You can add in extra calculated columns and metrics and say what's temporal or what's the primary temporal column, things like that, to kind of curate what is in the data set.

08:50.000 --> 08:54.000
Once you have that, we go into this, which is our chart builder.

08:54.000 --> 08:59.000
And we have somewhere hovering around 40 or so different visualization plugins.

08:59.000 --> 09:04.000
We support any kind of visualization library that's built upon JavaScript.

09:04.000 --> 09:11.000
We have many of them integrated and you drag your columns from your data set to populate the visualization controls.

09:11.000 --> 09:20.000
So you select your dimensions, your metrics, you can apply filters and all sorts of other display properties of any given visualization.

09:20.000 --> 09:25.000
And then once you've built up enough of these charts, of course you drop them into a dashboard.

09:25.000 --> 09:31.000
And dashboards are not just a drag and drop collection of charts, but they also provide a lot of other means of interactivity.

09:31.000 --> 09:37.000
Like if you click on a bar in a bar chart and it's a cross filter that will be referenced in that filter bar on the left.

09:37.000 --> 09:41.000
But it also applies the filter to all of the other charts that receive it.

09:41.000 --> 09:47.000
You can create and share filters in that left filter bar to apply.

09:47.000 --> 09:53.000
But who, well, anything you want to any of the tabs and other dashboard components that are available there.

09:53.000 --> 09:57.000
And then once you've got your dashboard and it tells the data story you want to tell.

09:57.000 --> 10:02.000
Often one of the things people ask for is how can I share this on the periodic basis with my team.

10:02.000 --> 10:11.000
So we have this alerts and reports feature that allows you to send dashboards as essentially screenshots to your Slack workspace or to an email address.

10:11.000 --> 10:13.000
And you can do that on a scheduled basis.

10:13.000 --> 10:18.000
Or you can write some SQL query that'll set some thresholds as like an alert.

10:18.000 --> 10:22.000
And it'll then send the screenshot of the data for your tune to review.

10:22.000 --> 10:27.000
Maybe I'll think it's slightly different take to if you go back to the previous slide on on this.

10:27.000 --> 10:33.000
So I know the talk right before is a DBT talk which is a way to sequence build a dagger SQL.

10:33.000 --> 10:38.000
The next talk after this one is an Apache Airflow talk that's also about writing data pipelines.

10:38.000 --> 10:41.000
So it's not necessarily as linear as it may seem here.

10:41.000 --> 10:44.000
But we assume that you have a database.

10:44.000 --> 10:51.000
Some flat-ish data sets on data structure that are pretty much like typically, you know, large data sets with a lot of column.

10:51.000 --> 10:55.000
Normally, I can make checks and dimensions.

10:55.000 --> 11:00.000
Based on these data sets, you can do exploration, the explorer that we just looked at,

11:00.000 --> 11:02.000
assemble some dashboard.

11:02.000 --> 11:06.000
And then the SQL IDE, sometimes it's part of the workflow, sometimes it's kind of on the side.

11:06.000 --> 11:13.000
It's totally possible to use super send get a lot of value from it without necessarily writing any SQL at all.

11:13.000 --> 11:18.000
But we're pretty kind of SQL centric as a tool.

11:18.000 --> 11:21.000
And if you want to talk about the API and SQL.

11:21.000 --> 11:23.000
Yeah, so I'll talk a little bit about this.

11:23.000 --> 11:32.000
So pretty much so because we're an open source solution and really often it's built by engineers for to support entire teams,

11:32.000 --> 11:38.000
but also engineers pretty much anything you can do in super set through the GUI or the UI.

11:38.000 --> 11:43.000
You can use my microphone, that's position one.

11:43.000 --> 11:47.000
Okay, I'll bring it up a little bit.

11:47.000 --> 11:49.000
Is that better? All right.

11:49.000 --> 11:54.000
So yeah, pretty much anything you can do in the GUI and the UI.

11:54.000 --> 11:56.000
You can do through the API.

11:56.000 --> 12:03.000
So you can define and create dashboards and charts, alerts and reports, all this stuff programmatically.

12:03.000 --> 12:05.000
So good API coverage.

12:05.000 --> 12:08.000
There's the component.

12:08.000 --> 12:10.000
So the building blocks of super set.

12:10.000 --> 12:15.000
So what we use to put super set together are accessible through component libraries,

12:16.000 --> 12:25.000
friend and library, so you can also build your own application using bits and pieces of super set.

12:25.000 --> 12:27.000
All right, I'll talk a little bit about what is pre set.

12:27.000 --> 12:38.000
So I think I think that Robert, a friend, Robert from Altunity gave a talk this morning about how to create a commercial entity around your open source product without selling your soul.

12:38.000 --> 12:42.000
So we're very much in that that mindset as well.

12:42.000 --> 12:50.000
In many ways, like, there's some really good incentives and interactions between the commercial entity around the open source project.

12:50.000 --> 13:00.000
In many ways, the fact that pre set is selling a managed service around super set allows us to go and invest and grow the open source project.

13:00.000 --> 13:01.000
She's really positive.

13:01.000 --> 13:08.000
I'm going to go quickly on the commercial stuff, but we essentially offer a managed service that is kind of, you know,

13:08.000 --> 13:13.000
so if you don't want to run super sound in your own and run your own, be on call for it.

13:13.000 --> 13:15.000
You know, you can use a managed service.

13:15.000 --> 13:17.000
If you want to try it, it's going to be easy.

13:17.000 --> 13:20.000
It's going to take super set for a test drive.

13:20.000 --> 13:23.000
Then we also provide multiple work spaces.

13:23.000 --> 13:28.000
So usually when you deploy super set yourself, you have one super set instance typically,

13:28.000 --> 13:32.000
but we allow you to spin up multiple of these for various reasons.

13:32.000 --> 13:35.000
You might want to put them in different regions for compliance purposes,

13:35.000 --> 13:40.000
or you might want to give different instances to different teams or departments to give, you know, separated,

13:40.000 --> 13:47.000
or you might want to do different workflows like dev stage product and kind of promote your business intelligence assets.

13:47.000 --> 13:49.000
We'll see how this thing works a little bit.

13:49.000 --> 13:51.000
Yeah, so I don't want to go to the here.

13:51.000 --> 13:55.000
I know the audience fairly technical, so we'll look a little bit about how works this,

13:55.000 --> 14:01.000
this super abstract, logical model of how this, this work behind the scene is, you know,

14:01.000 --> 14:03.000
you have a database at the bottom.

14:03.000 --> 14:05.000
Super set is going to the center of it.

14:05.000 --> 14:09.000
It's able to, and it's equal against your database, and it allows you to create also,

14:09.000 --> 14:16.000
try it's in dashboard that you can publish and share with your, with friends in your organization.

14:16.000 --> 14:19.000
Now look at a tiny bit more on the internal.

14:19.000 --> 14:21.000
So there's going to be a line on there.

14:21.000 --> 14:22.000
Here's the back end.

14:22.000 --> 14:26.000
So we have a python back end with slacks and SQL off-commies, stuff like salary.

14:26.000 --> 14:29.000
There's a little bit of pandas to do some protein in there.

14:29.000 --> 14:33.000
There's a thing of more than a hundred and fifty python libraries that are

14:33.000 --> 14:35.000
powering this thing behind the scenes.

14:35.000 --> 14:39.000
On the front end is mostly a react front end react app.

14:39.000 --> 14:43.000
That's using all sorts of data visualization libraries and kind of

14:43.000 --> 14:48.000
bringing that into a very cohesive type place.

14:48.000 --> 14:50.000
So that's a high level, kind of.

14:50.000 --> 14:54.000
So we're north of 10,000 then p.m. components on the front end.

14:55.000 --> 15:04.000
We have a Node.js side card that allows other different operations like asynchronous query operations.

15:04.000 --> 15:07.000
There's some web socket stuff that gets built in there.

15:07.000 --> 15:13.000
So probably not too much is going to shift into Node land, but there are a few of them.

15:13.000 --> 15:17.000
Yeah, it's like, it has become such a large web application.

15:17.000 --> 15:18.000
That's multi layered.

15:18.000 --> 15:21.000
So there's a little bit of Node in here in the back end.

15:21.000 --> 15:25.000
But it's not central to initially what we do.

15:25.000 --> 15:27.000
And it's mostly to manage the web socket.

15:27.000 --> 15:38.000
So that, you know, well, so I wish I had a better architecture.

15:38.000 --> 15:40.000
I let me repeat a question.

15:40.000 --> 15:42.000
What is in a superset?

15:42.000 --> 15:44.000
And what is the actual architecture?

15:44.000 --> 15:46.000
What does it take to run this thing?

15:46.000 --> 15:48.000
We don't have a good diagram handy.

15:48.000 --> 15:51.000
I can say it's just going to on the top of my head.

15:51.000 --> 15:56.000
So there is a Python web server that's running a flash application.

15:56.000 --> 16:02.000
As part of it, there is a salary, a synchronous, kind of worker back in.

16:02.000 --> 16:05.000
There is some optional components.

16:05.000 --> 16:07.000
So there's a metadata database.

16:07.000 --> 16:14.000
Multiple caching layers for results sets and larger results set to export CSVs and things like that.

16:14.000 --> 16:19.000
And then I won't get into the front end because it might take all day.

16:19.000 --> 16:21.000
You can deploy it in a number of means.

16:21.000 --> 16:22.000
You can deploy with Kubernetes.

16:22.000 --> 16:23.000
There's a Helm chart.

16:23.000 --> 16:26.000
There's Kubernetes operator that's coming out.

16:26.000 --> 16:29.000
You can use Docker compose or you can do kind of pip install.

16:29.000 --> 16:32.000
Yeah, the easiest way if you want to deploy it today on your laptop,

16:32.000 --> 16:35.000
you would just clone the repo and run Docker compose up.

16:35.000 --> 16:37.000
And then if you want to know what's happening behind the scene,

16:37.000 --> 16:41.000
you open up that Docker compose file and see what's under there.

16:41.000 --> 16:46.000
There would be, I think, out of the box, there would be some redness and postgres back in front

16:46.000 --> 16:49.000
and stuff and, you know, few bells and whistles.

16:49.000 --> 16:51.000
Yeah, and that's how you get all this stuff out of the box.

16:51.000 --> 16:55.000
But a lot of people obviously besides running their internal business intelligence tool,

16:55.000 --> 16:57.000
they want to, you know, tweak it a little bit.

16:57.000 --> 16:59.000
This is the joy of open source, right?

16:59.000 --> 17:01.000
So you want to make it what you want it to be.

17:01.000 --> 17:04.000
And there's a lot of things you can do beyond the normal,

17:04.000 --> 17:08.000
like, let's just run superset and look at some use case data.

17:08.000 --> 17:11.000
You can, you can use embedded.

17:11.000 --> 17:16.000
So we provide a SDK called our embedded SDK that allows you to install a superset dashboard

17:16.000 --> 17:18.000
as a React component.

17:18.000 --> 17:24.000
So you can pass a guest token from your host application and, basically, authenticate.

17:24.000 --> 17:29.000
And it follows all the RBAC rules and everything and shows your data in situ.

17:29.000 --> 17:31.000
And you can really make it look like your brand.

17:31.000 --> 17:34.000
So here it is embedded in somebody else's website.

17:34.000 --> 17:38.000
And you can feed it and all sorts of stuff to me.

17:38.000 --> 17:40.000
Yeah, I'll say a thing too about this.

17:40.000 --> 17:43.000
But like, you know, if you build a web application nowadays,

17:43.000 --> 17:48.000
there's often going to be a requirement to have a portion of your app that serves analytics

17:48.000 --> 17:49.000
to their user of your app.

17:49.000 --> 17:52.000
So if you're building a SaaS application or any web app,

17:52.000 --> 17:56.000
you're probably going to want to have some sort of dashboard in there.

17:56.000 --> 18:01.000
And we see it more and more that people use superset as a way to power this part,

18:01.000 --> 18:04.000
the analytics interactive part of their application.

18:04.000 --> 18:06.000
And then you can do that through embedded.

18:06.000 --> 18:09.000
Of course, it's got to match the look and feel of your app.

18:09.000 --> 18:11.000
So you can talk about theming a little bit.

18:11.000 --> 18:14.000
Yeah, so we provide CSS templating.

18:14.000 --> 18:17.000
And this is surprisingly one of the most popular things people have come to our website

18:17.000 --> 18:21.000
for is more advice on how to do CSS work, which is something we don't provide.

18:21.000 --> 18:26.000
A lot of warranty on because the DOM of any web application is going to change over time.

18:26.000 --> 18:30.000
But there are a lot of very safe things you can do to theme your app however you want.

18:30.000 --> 18:36.000
You can change colors and logos and kind of go crazy with all of the little granular details of how it looks.

18:36.000 --> 18:43.000
And you can do that one off for a single dashboard or you can create a CSS template and reuse it across your instance.

18:43.000 --> 18:50.000
You can also create visualization color themes that are either categorical or, you know, gradient on a spectrum.

18:50.000 --> 18:52.000
So like hop to cold kind of things.

18:52.000 --> 18:57.000
And if you're using open source superset, you can do that in the configuration file.

18:57.000 --> 19:00.000
And create a bunch of those little color palettes and code.

19:00.000 --> 19:06.000
Or if you happen to be using preset we provide a GUI so you can do that, you know, more interactively.

19:06.000 --> 19:11.000
And of course, CSS customization is not themeing.

19:11.000 --> 19:14.000
Themeing is one of the things we're working on right now.

19:14.000 --> 19:16.000
And it's going to be done hopefully.

19:16.000 --> 19:17.000
And if you must.

19:17.000 --> 19:18.000
And we'll see.

19:18.000 --> 19:19.000
Yeah, we'll see.

19:19.000 --> 19:25.000
But it's a lot of work to do because we're built on so many front end libraries, including anti-design.

19:25.000 --> 19:27.000
It's our core component library.

19:27.000 --> 19:30.000
Each part is a superset in creating a new plugin architecture.

19:30.000 --> 19:34.000
So it will be easier to build plugins for superset with all types.

19:34.000 --> 19:43.000
And then of course, if you want to get involved in the community and build more interesting stuff beyond just a plugin.

19:43.000 --> 19:49.000
We have these superset improvement proposals that are kind of the bigger moves we're making in the community, including

19:50.000 --> 19:57.000
including the extensibility and plugin architecture that we're looking at new permissions models, user preference models, all kinds of stuff.

19:57.000 --> 19:59.000
And these are all done via consensus.

19:59.000 --> 20:00.000
That's the Apache way.

20:00.000 --> 20:01.000
We send them to the mailing list.

20:01.000 --> 20:03.000
Everybody has a big discussion.

20:03.000 --> 20:04.000
They get on board.

20:04.000 --> 20:05.000
We vote.

20:05.000 --> 20:08.000
And it's a very open non corporate process.

20:08.000 --> 20:11.000
So if you're curious about what's on the roadmap of superset.

20:11.000 --> 20:12.000
One easy way.

20:12.000 --> 20:15.000
They're going to go look at what's happening now in the community.

20:15.000 --> 20:20.000
As to go look at our get hub issues and filter on the sit label.

20:20.000 --> 20:28.000
Or you're going to see we have an actual sit board with like dozens and dozens of proposals that are in different parts of the development life cycle.

20:28.000 --> 20:29.000
Yep.

20:29.000 --> 20:37.000
And then the other part you can get involved in people do get involved which makes open source business intelligence a really cool place to work is that we have all these different working groups.

20:37.000 --> 20:43.000
So we have a town hall where everybody kind of comes together once a month, but we have a lot of different working groups that are very ephemeral.

20:43.000 --> 20:46.000
So people are working on some geospatial stuff right now.

20:46.000 --> 20:48.000
Some people are working on security.

20:48.000 --> 20:52.000
There's software quality, developer experience, release management.

20:52.000 --> 20:54.000
All kinds of different little working groups.

20:54.000 --> 20:58.000
And we just break into our areas of interest and then come back and put it all together.

20:58.000 --> 21:04.000
That's kind of beauty of having like you know a mature open source project on a mature framework of the Apache delivery foundation.

21:04.000 --> 21:11.000
We have a lot of infrastructure or you know logistics to welcome just about anyone to contribute.

21:11.000 --> 21:20.000
So we see new contributors every month and then you know contributions of all shapes and size and flavors.

21:20.000 --> 21:24.000
I'm going to talk a little bit about like the future of business intelligence being open source.

21:24.000 --> 21:27.000
I think like the real idea there is like the the future of software is open.

21:27.000 --> 21:30.000
Actually the present of software is open source.

21:30.000 --> 21:36.000
Like more and more you know all the cycles and bits and lines of code that execute.

21:36.000 --> 21:40.000
They and they out are more and more open source.

21:40.000 --> 21:52.000
We see that like in their areas where there's a lot of innovation like the LLM world like not only open source is competitive but it's ahead in a lot of cases too.

21:52.000 --> 22:01.000
And I think that explained by that is explained by the fact that open source is just a superior way to write software and business intelligence and elsewhere.

22:01.000 --> 22:04.000
Some things about our community.

22:04.000 --> 22:12.000
So you know the adoption of super since been crazy we like by all metrics viewer to look at you know downloads and stars and contributors.

22:12.000 --> 22:16.000
There's you know we're very well ranked there.

22:16.000 --> 22:22.000
Another thing the next one is about more about the last city so we've had more than a thousand contributors today.

22:22.000 --> 22:26.000
A lot of steady like monthly active contributors as well.

22:26.000 --> 22:30.000
And tons of pull requests so tons of momentum there.

22:30.000 --> 22:35.000
There's no vendor lock in so that keeps us on us at presets.

22:35.000 --> 22:40.000
You know as much as you might want to use a commercial solution to you don't want to run it on your own.

22:40.000 --> 22:47.000
You can go to a solution on preset by the same time if you're unhappy with your vendor you can go straight back to open source.

22:47.000 --> 22:53.000
And we want to make sure that it's easy for people to onboard but also the off board of our management solution.

22:53.000 --> 23:02.000
Yeah there's a CLI tool that allows you to move all of your super set assets between open instances between preset across both very flexible.

23:02.000 --> 23:07.000
And of course you can't have a talk nowadays without talking about AI for a minute.

23:07.000 --> 23:11.000
Like Max said our repo is very active.

23:11.000 --> 23:19.000
We've merged somewhere north of 2000 P.R.s last year and we need all the help we can get not just from contributors but from the bots.

23:19.000 --> 23:29.000
So one of the reasons we think BI and open source in general are going to win on open source grounds is that not only is the code available but the slack is available.

23:29.000 --> 23:36.000
And everything's just wide open bots and AI in general have all of the context that proprietary software does not provide them.

23:36.000 --> 23:40.000
And so they're better able to answer questions and provide help on the repo.

23:40.000 --> 23:45.000
So we've got bots now that are closing a lot of issues on GitHub.

23:45.000 --> 23:49.000
They're able to answer questions very effectively people say thanks bot closed.

23:49.000 --> 23:53.000
And they're also now helping us review P.R.s and merge and all of that.

23:53.000 --> 23:56.000
They automatically tag issues and P.R.s answer questions on slack.

23:56.000 --> 23:57.000
It's all really great.

23:57.000 --> 24:01.000
Yeah we're getting slowly invaded by more and more agents in the repo.

24:01.000 --> 24:04.000
They're mostly helpful and it's kind of scary.

24:04.000 --> 24:10.000
The pace at which is happening but we have yet to receive our first pull request fully written by an AI.

24:10.000 --> 24:12.000
But it's probably coming this year.

24:12.000 --> 24:14.000
I got a hand in the crowd with the question.

24:14.000 --> 24:24.000
Yeah is there or is there.

24:24.000 --> 24:26.000
So let me repeat a question.

24:26.000 --> 24:31.000
So is there some form of like agent or AI integration in the product as it is today?

24:31.000 --> 24:32.000
Is that the right question?

24:32.000 --> 24:34.000
I think there is room for that.

24:34.000 --> 24:37.000
So what we realize trying to build something that we build.

24:38.000 --> 24:45.000
Some text to SQL kind of agent helping with SQL at preset but the design patterns are not super clear yet.

24:45.000 --> 24:49.000
So it's hard to build a solution that will work for the community at large.

24:49.000 --> 24:54.000
People might want to use certain models or certain piece of infrastructure in different ways.

24:54.000 --> 25:00.000
We're working on extensibility and a big extension framework inspired by the VS code ecosystem.

25:00.000 --> 25:06.000
And then we'll make sure that we have all the plugs and sockets for integration and extension.

25:06.000 --> 25:17.000
So that's our angle on that until the patterns around rag and integrating with AI's is our clear which is going to offer an extension framework for people to build those.

25:17.000 --> 25:25.000
And of course they can open source their implementation and other people can decide which plug-ins or extensions they want to bring into it.

25:25.000 --> 25:28.000
We got like three minutes left. How many slides do we have left?

25:28.000 --> 25:30.000
We're good. We're good. We're good. We're good.

25:30.000 --> 25:34.000
Obviously the main thing is you need to get your hands on it if you haven't already.

25:34.000 --> 25:42.000
How do you do that? The main way we would steer people toward is just go to the website and go to the repo and install the darn thing.

25:42.000 --> 25:48.000
So like Max said, Docker compose up. That's really as simple as it is and then you can get it running locally on your machine.

25:48.000 --> 25:53.000
It might be a little more intricate if you wanted to deploy it because there's a lot of things to secure and configure.

25:53.000 --> 26:00.000
They're set up RDS, Redis, RDS. It might be a little complicated to really get a production instance.

26:00.000 --> 26:05.000
But if you'd want to go on your laptop, get a clone, Docker compose up and you can get running.

26:05.000 --> 26:10.000
You can also try on preset. It's kind of the easy way. If you go to preset the IO, try for free.

26:10.000 --> 26:15.000
If you like it, as I mean you have to go at a commercial solution, you can go back to open source.

26:15.000 --> 26:22.000
So you know, there's just like training wheels there and it depends if you want to be on call for this thing or if you want people to write this software to kind of run it for you.

26:22.000 --> 26:27.000
Now it's free for five users forever. It's not a temporary free thing.

26:27.000 --> 26:34.000
Then of course, find us on Slack. Max and I are both on this Slack along with 18,000 other people.

26:34.000 --> 26:39.000
So your good company comes say hello and ask all the questions you want.

26:39.000 --> 26:47.000
Yes, someone will answer your question. It might be an AI agent. It might be someone from the community or if all falls off then it's probably going to end up being one of us.

26:47.000 --> 26:49.000
We keep the bots to a channel.

26:50.000 --> 26:53.000
Why is it called super set?

26:53.000 --> 26:55.000
Well, it's called super set.

26:55.000 --> 26:58.000
Boys at call super set. We have looked for a name for a while.

26:58.000 --> 27:08.000
And the idea was just to, you know, I think the idea was to being kind of a super set everything you want to do around around analytics and BI.

27:08.000 --> 27:16.000
So slowly we're kind of growing and becoming a super set of everything you might want to do in analytics.

27:16.000 --> 27:21.000
And with all apologies to those joining us on the stream. One of the reasons we're here is we'd love to talk to you all about it.

27:21.000 --> 27:29.000
Show you a demo. Get a little hands on. We're going to be at a table for the OSA open source analytics community tomorrow during those hours.

27:29.000 --> 27:35.000
So come on down and we'll show you in person or you can talk to us after the talk.

27:35.000 --> 27:40.000
We'll be hanging out right around here and are we out of time? More questions?

27:40.000 --> 27:44.000
One minute, there's more time for more questions over here.

27:44.000 --> 27:50.000
I saw at the beginning of the presentation. One of the companies out of the organization are using super set to Microsoft.

27:50.000 --> 27:53.000
So what argument is Microsoft has?

27:53.000 --> 27:56.000
I don't know, but there's a really good video online.

27:56.000 --> 28:00.000
That's about an hour long of everything they've built on and on top of super set.

28:00.000 --> 28:04.000
They actually forked the project and added all sorts of extensions and features.

28:04.000 --> 28:06.000
So you can watch it one hour video.

28:06.000 --> 28:08.000
It's the big team that uses it.

28:08.000 --> 28:13.000
Sorry, the question was, you know, why is Microsoft using super set rather than not basically power.

28:13.000 --> 28:18.000
Yeah, so they've actually, they've got a fork of it and they've been extending it with all kinds of extra functionality.

28:18.000 --> 28:21.000
They've got a visual query builder. The project there's called Titan.

28:21.000 --> 28:24.000
They've got a whole team around customizing that.

28:24.000 --> 28:30.000
I think it's about flexibility customization and like the pace at which you can customize open source.

28:30.000 --> 28:33.000
I don't think even being internal to Microsoft.

28:33.000 --> 28:37.000
I don't think they can really customize power BI to their needs.

28:38.000 --> 28:40.000
All right.

28:40.000 --> 28:42.000
I think we're out of time.

28:42.000 --> 28:45.000
We'll take more questions outside if anybody wants to ask them.

28:45.000 --> 28:47.000
Thank you.

28:47.000 --> 28:48.000
Thank you all.

28:48.000 --> 28:49.000
Thank you online.

