WEBVTT

00:00.000 --> 00:12.000
I am very, very delighted to introduce Alexis Jacqueline, who is the author of an exciting

00:12.000 --> 00:19.600
project and is told it's going to be developing custom UIs to explore graph databases using

00:19.600 --> 00:32.960
sigma.js, take it away, thanks. Hello, so I'm Alexis, I'm a web developer at West Square,

00:32.960 --> 00:40.160
who are a small firm in France in North, and we mostly develop web applications for data exploration.

00:41.840 --> 00:47.760
We produce some open source code, including sigma.js and a tool which is named

00:47.760 --> 00:57.680
defy light, the web version of jiffy, something else graph related, and I'm here to talk

00:57.680 --> 01:08.320
about developing custom UIs for exploring graph databases. So let's start with Ricardo. Ricardo is a

01:08.400 --> 01:18.640
research project from France to explore international commerce in the 19th and early 20th century.

01:20.240 --> 01:26.080
It's been started, I think, at the Shonspomedia Lab, from people who are also big fans of

01:26.800 --> 01:35.840
Phas Dam and open source software, and the main goal is to craft data sets to explore international

01:35.840 --> 01:47.680
trades between sovereign entities and titists from 1830 to 1930. The core that a set is made of trades.

01:48.480 --> 01:54.640
Basically researchers took huge journals from various countries where they listed

01:55.360 --> 02:04.480
trades they had with other entities with varying currencies. In Ricardo, we don't care what people buy,

02:04.560 --> 02:14.800
we just care about the monetary fixes. This is the core starting block of this project.

02:16.000 --> 02:23.360
The main issue is that the entities that are reported as partners from sovereign countries are

02:23.360 --> 02:30.640
not sovereign entities, and this data set is full of France, Southern France, Eastern Europe,

02:30.720 --> 02:39.760
US, Atlantic coast, this kind of thing, and this is not good because people wanted to explore

02:39.760 --> 02:46.640
trades between sovereign and titists. So researchers started another project, which is named

02:46.640 --> 02:55.040
geopolitist, and it's a database of sovereign countries a long time, because the variety evolves

02:55.120 --> 03:02.560
in the 19th and early 20th century, so it's even harder to apprehend, and we have this new

03:02.560 --> 03:08.720
data set with antities and connections between them. Things like Paris is a part of France,

03:10.400 --> 03:19.120
France is sovereign from this data to this data, etc., etc. You can check the page of this project

03:19.200 --> 03:28.480
because there have lots of exploration tools for this specific data set. Another part that sounds

03:28.480 --> 03:36.800
necessary since what I told you, we have trades reported in varying currencies, in varying countries,

03:36.800 --> 03:45.520
in varying times, so people from Ricardo also drafted, drafted data sets to get all the

03:45.520 --> 03:52.720
external rates a long time, and so we can finally have rural, girly trade reports,

03:52.720 --> 03:57.520
we've normalized monetary values, and relations between antities, and that's what we're going

03:57.520 --> 04:04.000
to focus today. We have this huge network between antities, and we have connections that are

04:04.000 --> 04:11.840
trades, and connections that are geopolitical, kind of. So let's put everything in the

04:11.840 --> 04:21.040
Neo4j database, put the slide with draws the rest of the old. We have a Neo4j database nice,

04:23.040 --> 04:30.480
and we can open Neo4j browser. I don't know who is familiar with the Neo4j browser here.

04:33.520 --> 04:40.480
So I opened it in the Neo4j browser, and the first image I had when I opened something with this,

04:41.440 --> 04:48.800
because the issue is that for each year, for each pair of antities that reported trades

04:50.000 --> 04:59.200
to with each other, I have one edge, and I have 105 years, so if I want to just draw the network,

04:59.200 --> 05:07.920
it's directly unreadable. So we need to find better strategies to represent this message.

05:08.240 --> 05:17.840
Also, we want to extract, so in the recap of the project, they have lots of heuristics,

05:17.840 --> 05:25.040
etc., and code to actually generate network graphs of trades between sovereign antities.

05:25.040 --> 05:30.080
But here, since I kept the trades, I, the road trades I, there were in the initial dataset,

05:30.080 --> 05:40.400
I don't have necessarily direct trades. I can have trades between Paris and UK or Belgium and

05:40.400 --> 05:44.320
North of France and this kind of things, and we want to be able to actually observe them.

05:46.560 --> 05:55.680
And finally, the Neo4j console was very good to actually spot some issues, because we have

05:55.760 --> 06:04.400
lots of antities that do trades with themselves, and that's probably bugs in the scripts that

06:04.400 --> 06:14.560
took the original sources, and generated this road trades dataset. So it's a bit tricky to explore,

06:14.560 --> 06:22.080
and it's a good use case for some useful new eyes. So I'm going to talk about Sigma.js now,

06:22.400 --> 06:28.560
it's a JavaScript library. We developed a quest where to draw networks on web pages,

06:29.760 --> 06:37.440
and it's focused within its ecosystem to build applications for network analysis.

06:38.480 --> 06:43.520
It's not very good at drawing schemas, cytoskeptoj, it will be better, it's not necessarily

06:43.520 --> 06:49.200
great at handling very custom renderings and interactivity within small networks as

06:49.200 --> 06:55.600
difficult as well, but as soon as you want to display larger graphs, it's a very good tool.

06:55.600 --> 07:02.800
And one of the main reasons for that, it's just handles rendering, and another or some tool

07:03.760 --> 07:10.560
developed by Mediara of Sianzpo again, graphology, handles everything that is computing related.

07:11.520 --> 07:14.960
So we have graphology, which is basically a graph model for JavaScript,

07:15.680 --> 07:22.080
that provides a lot of algorithms to compute metrics, scores, to apply layout to the network,

07:22.080 --> 07:28.560
etc. And then we give this refine networks to Sigma.js, that's just going to run the rates on web

07:28.560 --> 07:38.400
pages, and then we can focus on interactivity. So I forgot with the slides.

07:38.480 --> 07:44.480
And now, we have some, there's lots of features. If you go to the website,

07:44.480 --> 07:53.120
you will see the list of features that Sianzpo provides, but let's dive into the application.

07:53.680 --> 07:59.760
When I think it's already used in many tools, and actually here, Jeffy Lite could have been a

07:59.760 --> 08:05.040
good solution, because we don't have that much of a big network if we merge the edges together.

08:05.040 --> 08:10.720
I mean, we have around 4,000 entities, so Jeffy Lite could have been a good solution,

08:10.720 --> 08:15.680
and also G.V., which is unfortunately not open source, but they do a lot for

08:15.680 --> 08:21.200
an source, especially, they pay us to actually develop most of the recent features of Sigma.js,

08:21.200 --> 08:30.640
so big thanks to them. And that's basically, yeah, the Neo4j browser on an

08:30.640 --> 08:39.200
steroid that could have been a good solution. But let's go, Q-Stone. Everything, the code from

08:39.200 --> 08:46.960
the application is on GitHub. I didn't want to put Neo4j server, so if you want to run it,

08:46.960 --> 08:54.560
you have to run it yourself locally. But all the instructions are in the repository to build

08:54.800 --> 09:05.840
that sets and runs the application. Also, I tried to keep as Vanilla as I could. I used Type

09:05.840 --> 09:11.440
Street, because the other Street without Type Street is too painful for me now, but there's no

09:11.440 --> 09:18.960
view, no React, no Angular, and I tried to just integrate Sigma with Web Components, which is not

09:18.960 --> 09:28.720
something I'm very used to do, but it worked well. So, the first view I wanted to draw was

09:28.720 --> 09:36.240
in good networks. Basically, what are the neighbors of a given entity and how they are connected

09:36.240 --> 09:46.640
with each other? This is how it works. We have this Neo4j database. We will run a

09:46.640 --> 09:54.480
Cypher query. We will extract some row graph data. We will use some draw flow demodic to get

09:55.840 --> 10:01.440
a drawable network with everything we want, and then we will give it to Sigma and get this

10:01.440 --> 10:08.480
interactive view. So, the Cypher query here is we have a center C. We want to get all networks,

10:08.480 --> 10:15.200
and at this connecting my center to this neighbors. Sorry. And for each unique neighbors pairs,

10:15.200 --> 10:21.840
I want their relations basically, and this will give me the Agonet work. In graphology,

10:21.840 --> 10:29.920
I will aggregate all parallel trades. So, to avoid having all those parallel edges as we saw earlier

10:29.920 --> 10:37.920
in the Neo4j browser, I aggregate them for all years, and I will have a size of the edge that is

10:37.920 --> 10:46.960
related to its monetary value. And then we set various graphical variables, and that's it for

10:46.960 --> 10:52.480
graphology. Then in Sigma, we just have to write a bit of code to handle parallel edges,

10:52.480 --> 10:59.600
because we can it's easy to tell Sigma that all edges are curved or all edges are

11:01.520 --> 11:07.040
eros, but here I want it to have straight edges, because I think it's more readable. That's

11:07.120 --> 11:14.080
my opinion. And parallel edges as curved. So, there's a bit of code there. Adding some buttons,

11:14.080 --> 11:21.280
captions, highlight neighbors, interactions, these kind of things. So, about the code itself,

11:25.120 --> 11:31.360
we have at some point a Pupere graph, which takes a data graph that kind of directly comes from

11:31.920 --> 11:42.400
Neo4j, and we run some graphology code, make parallel edges curved. That's what I said earlier.

11:42.400 --> 11:45.520
For the position of the nodes, I will first put them all on a circle,

11:46.400 --> 11:52.480
then this comes from the next feature, but if I have some fixed nodes that I want to

11:52.480 --> 11:59.840
polarize my final view, as we will see later, I put them on a larger circle, and then I run some

11:59.920 --> 12:07.600
false at last two algorithm. This is kind of physics algorithms to get some position for the nodes

12:07.600 --> 12:14.560
based on the topology of the network. Most of the graph images we see in the literature

12:15.600 --> 12:21.280
that are, if they look like hairballs, this is the algorithm that has been used probably.

12:21.280 --> 12:31.920
And yeah, some interactions, I can show more code later if needed, and then we got this kind of

12:31.920 --> 12:47.120
views. So, application itself looks like this. I want to explore reported partners of the entity

12:47.280 --> 12:55.360
Belgium. I want to include the center of this Egonet work. This is not mandatory because

12:56.320 --> 13:00.960
I know that every node I will have will be connected to this one, so this will bring a lot of

13:00.960 --> 13:08.640
noise, but let's try anyway. And I want all trades on the whole period, but I will only keep

13:08.960 --> 13:17.280
trades with over 500,000 dollars. So, if two entities traded for less than this amount,

13:17.280 --> 13:23.520
I will skip the edges. And I will keep only exports edges for now.

13:29.600 --> 13:37.440
Okay, and here comes the network. I have small gaps shown at the square. I can roll over

13:39.040 --> 13:46.320
a node to see its neighbors and its context. This one is interesting because it appears a lot of

13:46.320 --> 14:01.680
time in lots of networks. And I wonder what it is. Sorry. Okay. But yeah, we see that we have lots

14:01.760 --> 14:08.320
of various entities that are not sovereign. I don't, maybe this one is, I don't know.

14:09.520 --> 14:20.560
South Africa is, I'm looking for the weird ones. Okay. Is it? I think it's a weird one.

14:21.040 --> 14:32.880
But yeah, this is a bit messy. Let's do the same one, but with, not the bellion, not in it.

14:36.720 --> 14:46.400
Okay. Yeah. Well, it's a nice herbal. I think we might be able to see more interesting things.

14:46.880 --> 14:58.080
This exact graph actually is the networks of, look, this is not supposed to be true.

14:59.760 --> 15:06.000
Basically trades and geopolitical relationships around Belgium only in the first 17 years of the

15:06.000 --> 15:14.480
data sets. There's still this world estimation, not that takes most of the information.

15:16.800 --> 15:24.960
Actually, that just removes the trades to see how it would behave. Okay. This is an easy one.

15:25.680 --> 15:32.880
So at least this gives me all the, thanks, all the direct relations, all the direct political

15:32.960 --> 15:39.600
relations I have between Belgium and other entities and this starts to get informative.

15:41.360 --> 15:49.680
But we can hope to do more. Okay. Yeah. I did this exact query, but for United States of America to

15:49.680 --> 15:56.800
list of the political entities linked to America to United States of America.

16:03.600 --> 16:10.000
No, we want to see indirect trades between two sovereign entities. So I will take two

16:11.360 --> 16:18.320
entities and I want to see when they trade with each other. When one entities, one entity trades with

16:18.320 --> 16:24.960
a part of the other entity, or when the part of both entities trade together.

16:25.920 --> 16:34.160
So I have a new cyber query that gives me trades between the two entities and all the

16:34.160 --> 16:39.600
path with depth two and all the path with depth three basically. Then I take the same

16:39.600 --> 16:50.080
graphology and sigma scripting code and I get some new networks. So according to the sources we have

16:51.040 --> 17:02.080
how does India trades with United Kingdom between 1833 and 1938? I decided for this network

17:02.080 --> 17:07.040
not to display the direct trades. There is an option for it because it took too much of the

17:08.560 --> 17:14.960
information. But yeah, I see the United Kingdom trades with Bengal, China and Mumbai while India

17:14.960 --> 17:21.920
trades with British and South Africa and British Borneo. This is quite informative to me about

17:23.120 --> 17:30.640
the data set. No, I know I can challenge the lyrics of the really of the code in regard to see

17:30.640 --> 17:37.520
if the final trades contain all this trade I am observing right now.

17:37.680 --> 17:48.400
Yeah, this was another example between United Kingdom and United States of America and

17:48.400 --> 17:58.560
in this graph I kept the direct trades because for once I had significant edges that were

17:58.560 --> 18:07.840
not direct trades and in the reports United Kingdom trades a lot with Atlantic coast United

18:07.840 --> 18:16.000
States of America and this might be interesting for researchers to know how those trades are reported

18:16.080 --> 18:25.280
etc. Again, so the more time I already demote some things, I can show more.

18:28.560 --> 18:34.960
Yeah, I won't draw too much but I don't know if you would Belgium and United Kingdom

18:34.960 --> 18:40.720
the main issue is that if I take Belgium and in United Kingdom I kind of expect just to get

18:40.720 --> 18:46.720
the direct trades because the countries are very close to each other. Yeah, okay, because just

18:46.720 --> 18:55.600
some trades very minor with British Western Africa in this time span. But most of the times

18:55.600 --> 19:00.320
this network in this view will look like this basically, which is also informative.

19:01.200 --> 19:12.480
And so I'm going to show a bit of code because I am early and I speak fast.

19:16.000 --> 19:25.200
For the graphology I showed some things. Once I have this graph basically I just so sigma is

19:25.200 --> 19:33.760
instant-hated. It's a class you you spun it by giving it some some settings here with things

19:33.760 --> 19:41.200
or documented, but it says that yes I might want to order not the energies on the depth.

19:41.920 --> 19:50.240
I want to run the edge labels etc. and then in my web components when I have a new graph

19:50.320 --> 19:59.520
I basically just give it to sigma and I tell it to refresh. And this is some advanced code to handle

19:59.520 --> 20:08.560
this. I want to highlight the networks of neighbors when I hover another end to display the

20:08.560 --> 20:17.760
edge labels but it would be usable without it. Honestly, the part that was most annoying to me

20:18.720 --> 20:23.280
was the safer queries because I'm not used to write them. But it worked well.

20:29.040 --> 20:39.360
So one of the good things we the good thing writing custom you you eyes is that we don't have

20:39.440 --> 20:48.640
control to most solutions that are plug and play like the browser or I also think about Neo4j

20:48.640 --> 20:53.760
Bloom for instance is everything has to be done in the query and here we can cheat because

20:54.400 --> 21:00.160
so that I will receive will be small enough to be displayed or at least we hope so because

21:00.720 --> 21:06.480
then if it's not small enough to be displayed no tool can actually work kind of and we can do

21:06.480 --> 21:10.800
we cannot graphology to do things that are easier to do in graphology after the query.

21:11.760 --> 21:17.760
It's kind of splits the difficulty. So in my case here I just I didn't want to merge things in the

21:17.760 --> 21:23.120
safer query because it was too hard for me and I just did it with graphology after and it worked

21:23.120 --> 21:29.120
well and it was easier to implement than if I had only one solution which was the safer query.

21:29.200 --> 21:37.840
Scripting graphology is really I really love this tool and that's why sigma is built on it as well and

21:38.640 --> 21:49.360
there's very very many different algorithms and yeah having this I have in this library just handling

21:49.360 --> 21:57.200
computation this makes things so easy I think then sigma just and does yeah rendering and

21:57.280 --> 22:07.280
interaction also when I when I display the the second view where I wanted to see all in the

22:07.280 --> 22:13.280
indirect trades between two sovereign and tight is I actually couldn't find examples where I have

22:14.160 --> 22:20.400
three levels I have never found any case where two and two and tight is both have a part

22:21.280 --> 22:29.360
that trades together and I was curious but it makes sense because the reporting and tight is

22:29.360 --> 22:37.360
are most of the time sovereign and tight is so if I come back to this one the reports I have

22:37.360 --> 22:45.040
from United Kingdom are from United Kingdom and not from British Western Africa so that explains

22:45.040 --> 22:49.040
why I didn't find it and I'm sad because I wrote the query for nothing.

22:50.480 --> 23:07.360
And thank you very much. If you have question we have to be the best thanks.

23:07.440 --> 23:28.240
Yeah yeah so the question was can I explain what's the difference between sigma j s and

23:28.320 --> 23:38.160
cytoscapes. cytoscapes is really focused on drawing networks for schema as I'd say so they have lots

23:38.160 --> 23:49.840
of tools to actually draw lines in more schematic ways to get some huge diagrams but if you use cytoscapes

23:49.920 --> 23:58.080
for huge networks and forth directed layouts first of all cytoscapes is all in one

23:58.080 --> 24:05.920
I need handles computation and rendering to my knowledge and it doesn't scale that much

24:07.280 --> 24:14.480
it's it's a generic tools to draw network diagrams as I'd say why sigma is a better tool to handle

24:14.480 --> 24:30.080
visual network analysis. I don't know if that's clear oh yeah sorry.

24:44.560 --> 25:03.200
Yeah so the question was is it possible with sigma j s to write applications where we

25:03.840 --> 25:10.720
modify the graph runtime and have animations etc yes it's completely possible here for this

25:11.280 --> 25:21.600
I wanted to keep the demo as low code as I could but I could actually go directly to the website

25:25.600 --> 25:34.800
you can have you can run the layout algorithm in live this is an example

25:35.120 --> 25:44.880
this kind of animations and about alterating the graph itself by adding data I think we have

25:44.880 --> 25:56.240
the story somewhere like hmm okay I can move and here I delete some code to say when I click the

25:56.240 --> 26:04.160
stage it pops the nodes and it connects to close nodes I see that the story book is broken but

26:04.160 --> 26:10.880
you have you have examples of this kind of an interaction yes

26:34.240 --> 26:41.520
so the question is how sigma j s behaves with graphs of feldons of nodes it behaves very well actually

26:41.520 --> 26:47.440
the thing is it uses webGL so if you don't alterate the network if you don't modify the data

26:48.480 --> 26:55.200
zooming and zooming in zooming out and panning and rotating the application it's all done by the GPU

26:55.840 --> 27:04.400
so if I come to this example we have an example for this so let's put 50,000 nodes and

27:05.840 --> 27:14.720
100,000 edges if I run the algorithm this is going to be very very slow

27:17.040 --> 27:25.120
but for zooming in zooming out no problem so if you have if your nodes positions are already

27:25.440 --> 27:36.080
handled yeah this works very well it's it's webGL too the labels are rendered in canvas so

27:36.080 --> 27:42.320
this can slow things sometimes if you have too much of them that they're asked what it is to mitigate

27:42.320 --> 27:55.040
that yes how do you compare sigma j s with cytoscape after you're here already

27:55.040 --> 28:02.640
as the question but no no where is basically cytoscape I think has been really designed to

28:03.520 --> 28:14.720
handle rendering network diagrams with hierarchy with more deterministic things things out most of

28:14.720 --> 28:21.760
the time in squares or rectangle and this kind of things and sigma doesn't handle network diagrams

28:21.760 --> 28:31.440
very well this is more done for what we call visuals visual network analysis so it's more for

28:31.440 --> 28:38.560
I have I have a network of things that are connected and I want to see patterns emerge by themselves

28:39.680 --> 28:48.480
so most of the time when you see sigma it's with this kind of visually appearing networks

28:49.280 --> 28:56.560
I think basically cytoscape is way more flexible and you can render things in way more

28:56.560 --> 29:03.440
different possibilities while sigma is less flexible but it scales better I'd say

