WEBVTT

00:00.000 --> 00:09.000
So, I'll skip this side.

00:09.000 --> 00:16.000
So, most of you are probably familiar with the way mySQL and Postgres to replication.

00:16.000 --> 00:19.000
So, they have a single node. Everybody writes that node.

00:19.000 --> 00:21.000
The transactions are committed.

00:21.000 --> 00:24.000
It's written to the binlog in mySQL.

00:24.000 --> 00:27.000
And in Postgres, there are like 500 ways to do it.

00:27.000 --> 00:29.000
The right-ahead log being one of them.

00:29.000 --> 00:32.000
But the concept is roughly the same.

00:32.000 --> 00:34.000
So, the transaction commits.

00:34.000 --> 00:37.000
So, you can think of it as a queue with a single producer.

00:37.000 --> 00:39.000
Multiple consumers. That's the abstraction.

00:39.000 --> 00:44.000
New big deal. Quite straightforward.

00:44.000 --> 00:46.000
Oh, sorry.

00:46.000 --> 00:53.000
TIDB is the database that developed by Pink App.

00:53.000 --> 00:58.000
So, it is a multi-writer, but not in the sense of Postgres and MySQL.

00:58.000 --> 01:01.000
It is a disaggregated compute and stored.

01:01.000 --> 01:06.000
So, you can have n number of nodes, n number of compute nodes for SQL, for example.

01:06.000 --> 01:09.000
And n number of nodes for storage.

01:09.000 --> 01:13.000
It doesn't have the concept of a primary node.

01:13.000 --> 01:16.000
So, the data is automatically sharded.

01:16.000 --> 01:19.000
And the shards are spread across the storage nodes.

01:19.000 --> 01:23.000
And each of those storage nodes, think of it as a page conceptually.

01:23.000 --> 01:26.000
It's called a region or a tablet in Spanner.

01:26.000 --> 01:28.000
It's inspired by Spanner.

01:28.000 --> 01:32.000
And each of the pages is a rough group.

01:32.000 --> 01:35.000
So, the scale at which TIDB is usually used is about,

01:35.000 --> 01:39.000
let's say, 250 terabytes is quite common.

01:39.000 --> 01:45.000
And with the default page size of 256, that's roughly about a million pages.

01:45.000 --> 01:48.000
So, that's a million rough groups.

01:48.000 --> 01:53.000
Each page is made up of three other pages that are part of a rough group.

01:53.000 --> 02:00.000
So, for TIDB, it's not as simple as Postgres or MySQL.

02:00.000 --> 02:02.000
So, you're mental model has to change.

02:02.000 --> 02:06.000
And if you understand that part of the rest of whatever,

02:06.000 --> 02:08.000
I have on the slides will make more sense.

02:08.000 --> 02:10.000
So, that's the challenge we are solving.

02:10.000 --> 02:15.000
And the other thing that I want to highlight that is very different from both of these,

02:15.000 --> 02:22.000
TIDB does online schema change, which means unlike MySQL and Postgres.

02:22.000 --> 02:25.000
The replication doesn't block.

02:25.000 --> 02:31.000
But in MySQL and Postgres, once it's a DDL, you have to wait till the DDL is finished.

02:31.000 --> 02:33.000
And then the rest of the changes move.

02:33.000 --> 02:35.000
TIDB doesn't work like this.

02:35.000 --> 02:38.000
In TIDB, DDL is online.

02:38.000 --> 02:40.000
It's happening in the background.

02:40.000 --> 02:46.000
And so, if the changes coming in the replication stream from changes to the same table or other table,

02:46.000 --> 02:49.000
the replication stream is still being propagated.

02:49.000 --> 02:51.000
So, that's how TIDB works.

02:51.000 --> 02:56.000
So, for TIDB, all this happens external to the storage nodes.

02:56.000 --> 03:02.000
So, that's entire CDC is a distributed system in itself,

03:02.000 --> 03:05.000
and HA distributed system in itself.

03:05.000 --> 03:10.000
It's not just connecting one socket to another socket running somewhere and just streaming TIDB.

03:10.000 --> 03:11.000
It's not as simple as that.

03:11.000 --> 03:16.000
There is an entire HA system that has to store some of the state.

03:16.000 --> 03:27.000
So, that the back pressure, because the nodes that are producing the data also have their own draft logs.

03:27.000 --> 03:30.000
And you don't want to block them because you can't read the data fast enough.

03:30.000 --> 03:32.000
So, they have to store some state.

03:32.000 --> 03:37.000
And if they crash, they have to resume and not have to start from the last checkpoint.

03:37.000 --> 03:40.000
So, it's a far more complicated and complex problem.

03:40.000 --> 03:46.000
So, if you get that mental model correct, the rest of the slides will make more sense hopefully.

03:46.000 --> 03:49.000
So, in this diagram, you can see lots of data coming in.

03:49.000 --> 03:52.000
They're all writing to the draft nodes or TIDB nodes.

03:52.000 --> 03:56.000
Then those events are collected and they are sorted.

03:57.000 --> 04:06.000
The two main events that are important parts of the stream are the sorting because you have multiple writers writing.

04:06.000 --> 04:14.000
So, you need to order those events so that when you send them downstream, they receive them in completely ordered fashion.

04:14.000 --> 04:18.000
So, that's very important. That's a huge cost in this system.

04:18.000 --> 04:22.000
The other thing is, you have to handle the schema changes.

04:22.000 --> 04:28.000
So, I'll get to that in the rest of the slides.

04:28.000 --> 04:32.000
So, how does it plug into the system?

04:32.000 --> 04:35.000
It uses an observer pattern.

04:35.000 --> 04:42.000
So, the two parts of the rough protocol, the way I can be uses it.

04:42.000 --> 04:45.000
So, you get the events coming into the rough log.

04:45.000 --> 04:49.000
Once they're committed, then they're applied locally on the storage nodes.

04:49.000 --> 04:55.000
So, you put an observer in there that is observing these events and then it's pushing it to the CDC system.

04:55.000 --> 04:58.000
That's the mental picture you must have.

04:58.000 --> 05:03.000
That's how it does it.

05:03.000 --> 05:09.000
So, this is, I want to highlight this up front and rather than do it later,

05:09.000 --> 05:15.000
because this is very important for how to explain one of the bigger challenges that CDC solves.

05:15.000 --> 05:19.000
So, assuming you have an insert, then you have, let's say, an alter column,

05:19.000 --> 05:21.000
and then you have another insert.

05:21.000 --> 05:26.000
You have to set the barrier for when the DDL takes place.

05:26.000 --> 05:30.000
So, one of the things in distributed systems is, I mean even standalone system.

05:30.000 --> 05:33.000
You need some kind of monotonically increasing counter.

05:33.000 --> 05:37.000
If you have that counter and you can have a strict less than relation,

05:37.000 --> 05:42.000
or which in distributed systems would be happens before, you can solve most of these problems.

05:42.000 --> 05:46.000
And TIDB does that through a what's called a TESO or a time stamp.

05:46.000 --> 05:53.000
So, it has a component called PD, which generates this global time monotonically increasing time stamp.

05:53.000 --> 05:56.000
And that is unique and it's stamped on to every transaction.

05:56.000 --> 06:02.000
So, the second thing you need in any database or transaction or distributed system is,

06:02.000 --> 06:05.000
you need to know when that transaction started.

06:05.000 --> 06:09.000
And when it was committed, or when the changes were externalized.

06:10.000 --> 06:13.000
So, you want to propagate what was externalized.

06:13.000 --> 06:17.000
You don't want to send data across that hasn't been committed yet,

06:17.000 --> 06:21.000
because downstream could be a system that is not TIDB.

06:21.000 --> 06:23.000
TIDB can probably handle it.

06:23.000 --> 06:27.000
But let's say we are sending data to, I don't know,

06:27.000 --> 06:30.000
somebody writes a converter for mySQL or one of those things.

06:30.000 --> 06:35.000
And that may just work on ordered committed transactions.

06:36.000 --> 06:39.000
So, once you're armed with these three things,

06:39.000 --> 06:42.000
the start commit and like a,

06:42.000 --> 06:45.000
some kind of counter in this time stamp or any counter,

06:45.000 --> 06:48.000
which is unique in your cluster.

06:48.000 --> 06:51.000
Conceptually there it becomes very easy.

06:51.000 --> 06:56.000
You read the events, you do a happens before and you do a sort and Bob Jarenkel.

06:56.000 --> 06:57.000
It's quite straightforward.

06:57.000 --> 06:59.000
So, conceptually it's not so difficult.

06:59.000 --> 07:02.000
The engineering of this is quite difficult.

07:02.000 --> 07:06.000
It's the scale it's HA, these are the difficult problems.

07:06.000 --> 07:09.000
And handling things like back pressure.

07:09.000 --> 07:12.000
Because as the rough log is being created,

07:12.000 --> 07:14.000
there's a lot of space and other pressure.

07:14.000 --> 07:17.000
So, you have almost like a garbage collector that has to

07:17.000 --> 07:19.000
truncate those rough logs.

07:19.000 --> 07:21.000
So, otherwise they just don't keep drawing.

07:21.000 --> 07:25.000
But you can't truncate it unless the changes have been pushed across to your CDC

07:25.000 --> 07:27.000
and across to wherever they have to go.

07:27.000 --> 07:29.000
So, that's the challenge here.

07:29.000 --> 07:34.000
Once you have the time stamp, you know that anything that comes,

07:34.000 --> 07:39.000
that are cannot push, that changes that were impacted by the time stamp.

07:39.000 --> 07:43.000
We have a log service as part of this.

07:43.000 --> 07:45.000
It sort of squirrels it away on the side.

07:45.000 --> 07:49.000
It waits for the barrier as that DDL barrier time stamp

07:49.000 --> 07:53.000
to be equal to whatever is up to the dissolved time stamp.

07:53.000 --> 07:56.000
And then it's not pushing the entries after that.

07:56.000 --> 07:59.000
But anything else that is not touched by this is being replicated,

07:59.000 --> 08:04.000
which is not the case in my SQL and Postgres.

08:04.000 --> 08:06.000
So, that's why I wanted to highlight.

08:06.000 --> 08:11.000
This is a very important part of what it achieves and what it does.

08:11.000 --> 08:15.000
So, we also learnt our lesson.

08:15.000 --> 08:18.000
The current architecture just didn't happen,

08:18.000 --> 08:20.000
like some kind of a maculate conception.

08:20.000 --> 08:22.000
It was a lot of pain to get to this.

08:22.000 --> 08:25.000
So, you have to start somewhere.

08:25.000 --> 08:27.000
Nobody was an expert at this.

08:27.000 --> 08:29.000
This was a new challenge for everybody.

08:29.000 --> 08:33.000
So, the first solution was, okay, we have a node.

08:33.000 --> 08:36.000
It reads the data, sorts in memory.

08:36.000 --> 08:40.000
And it had a polling mechanism.

08:40.000 --> 08:43.000
Most of our frontend software is in Go.

08:43.000 --> 08:44.000
So, this is in Go.

08:44.000 --> 08:47.000
So, they all use Go channels and whatever else Go has.

08:47.000 --> 08:48.000
Sorted in memory.

08:48.000 --> 08:50.000
Then they had ETCD in the background for HA,

08:50.000 --> 08:52.000
where the metadata was stored.

08:52.000 --> 08:55.000
And these worker nodes would poll data.

08:55.000 --> 08:57.000
Do whatever it takes.

08:57.000 --> 08:59.000
But the idea is growth.

08:59.000 --> 09:02.000
And the kind of data that the customers are storing.

09:02.000 --> 09:05.000
We would get OEM messages and all kinds of other problems.

09:05.000 --> 09:06.000
I didn't scale.

09:06.000 --> 09:11.000
Second step was, okay, you can have a hybrid spillover to disk.

09:11.000 --> 09:14.000
Then suddenly people had 6 million tables.

09:14.000 --> 09:17.000
We would run out of file handles.

09:18.000 --> 09:20.000
Third, final solution.

09:20.000 --> 09:23.000
Before cockroach DB, change the license.

09:23.000 --> 09:24.000
We have a fork.

09:24.000 --> 09:25.000
We write it into LSM.

09:25.000 --> 09:26.000
It's written in Go.

09:26.000 --> 09:27.000
So, it works quite well.

09:27.000 --> 09:28.000
And you can index it.

09:28.000 --> 09:29.000
So, current model is.

09:29.000 --> 09:34.000
It uses Pable DB to store the data.

09:34.000 --> 09:37.000
So, if you have a lot of pages.

09:37.000 --> 09:40.000
As I mentioned, so let's assume we have a million regions.

09:40.000 --> 09:42.000
Across three nodes.

09:42.000 --> 09:45.000
You will probably need more nodes, but let's say three nodes.

09:45.000 --> 09:51.000
So, if you are attaching to, that roughly is about 330k pages per node.

09:51.000 --> 09:55.000
Now, I imagine you have to do an observer on all of these.

09:55.000 --> 09:58.000
So, you have to have 330,000 observers.

09:58.000 --> 10:01.000
Each with an independent stream trying to stream it.

10:01.000 --> 10:02.000
It won't scale.

10:02.000 --> 10:05.000
So, this is one of the channels.

10:05.000 --> 10:08.000
And then you don't want to poll after some.

10:08.000 --> 10:11.000
It's just, it just doesn't work.

10:11.000 --> 10:13.000
But conceptually it will work.

10:13.000 --> 10:14.000
It doesn't scale.

10:14.000 --> 10:16.000
That's the hard engineering problem.

10:16.000 --> 10:17.000
Memory pressure.

10:17.000 --> 10:19.000
Huge problem.

10:19.000 --> 10:20.000
CPU overhead.

10:20.000 --> 10:23.000
So, the CPU overhead also will get into the pipeline later.

10:23.000 --> 10:26.000
So, these are real bugs that we had.

10:26.000 --> 10:31.000
So, these are this.

10:31.000 --> 10:33.000
So, sit down and think.

10:33.000 --> 10:35.000
You need decentralization.

10:35.000 --> 10:39.000
So, even though today, all this microservices is not fashionable.

10:39.000 --> 10:41.000
For this, you need something like that.

10:41.000 --> 10:44.000
So, that you can scale independent components.

10:44.000 --> 10:53.000
Especially the part that does the conversion from the internal format to the SQL or whatever

10:53.000 --> 10:56.000
raw format that is required by whatever your downstream is.

10:56.000 --> 11:00.000
That is a very heavy compute operation.

11:00.000 --> 11:07.000
And you want to be able to spread it as much as you can in a service that you can have multiple services running

11:08.000 --> 11:11.000
which read in parallel from some store.

11:11.000 --> 11:15.000
You want it to be event driven without polling.

11:15.000 --> 11:21.000
Polling even in a general programming sense is works where you can hammer the system with

11:21.000 --> 11:22.000
and always keep it busy.

11:22.000 --> 11:24.000
So, that polling, there is no busy way.

11:24.000 --> 11:26.000
But it's not always the case.

11:26.000 --> 11:28.000
So, polling puts a threshold.

11:28.000 --> 11:31.000
You can't go faster than that.

11:31.000 --> 11:34.000
If the events are not faster coming fast enough.

11:35.000 --> 11:38.000
And so, PSAP operation can serve the extended engineering.

11:38.000 --> 11:42.000
So, you would be divided this whole thing into four abstracts of the things.

11:42.000 --> 11:48.000
There is an upstream adapter that talks to the Tai KV.

11:48.000 --> 11:51.000
We have a log service which is stateful.

11:51.000 --> 11:53.000
A downstream adapter which is with a conversion.

11:53.000 --> 11:56.000
And the coordinator is what coordinates the cluster.

11:56.000 --> 12:01.000
So, they are like four main services.

12:01.000 --> 12:03.000
So, if you go a little bit deeper.

12:03.000 --> 12:06.000
As I mentioned, you need the timestamp.

12:06.000 --> 12:11.000
You need the watermark which is up to where the progress is on the rough log.

12:11.000 --> 12:13.000
You need global aggregation.

12:13.000 --> 12:15.000
You need to know across your entire cluster.

12:15.000 --> 12:20.000
What the minimum is so that you don't delete anything which is.

12:20.000 --> 12:23.000
This is quite similar to how things like Aurora also work.

12:23.000 --> 12:25.000
With the no DBs under log.

12:25.000 --> 12:26.000
You can't.

12:26.000 --> 12:30.000
You have to look at where the read view is across your entire cluster.

12:30.000 --> 12:33.000
You can't just look at one node.

12:33.000 --> 12:36.000
So, you calculate that you use events.

12:36.000 --> 12:38.000
You sort them.

12:38.000 --> 12:44.000
And then transaction reconstruction is basically you map all the events of the transaction.

12:44.000 --> 12:47.000
All the changes to the transaction.

12:47.000 --> 12:48.000
You need.

12:48.000 --> 12:50.000
So, Mount there is misnamed.

12:50.000 --> 12:52.000
But that's what it is in the source code on GitHub.

12:52.000 --> 12:55.000
Mount there is like a transformer.

12:55.000 --> 12:57.000
And this is the compute heavy part.

12:57.000 --> 13:02.000
So, it takes raw data because you want to sort on your opaque data because it's more compact.

13:02.000 --> 13:04.000
And it's a big.

13:04.000 --> 13:05.000
It's easier to manage.

13:05.000 --> 13:09.000
Once it's converted into its SQL type, it becomes bigger.

13:09.000 --> 13:12.000
And it's more difficult to sort.

13:12.000 --> 13:18.000
So, the amounter it's a transformer is the part that you can have multiple.

13:18.000 --> 13:23.000
And you can do lots in parallel.

13:23.000 --> 13:25.000
So, this is what it roughly looks like.

13:25.000 --> 13:30.000
You have a log service where all the chain feeds are coming in.

13:30.000 --> 13:32.000
It writes into PableDB.

13:32.000 --> 13:35.000
You don't do any polling.

13:35.000 --> 13:38.000
And the number of tables is no longer a problem.

13:38.000 --> 13:41.000
We don't store them based on table.

13:41.000 --> 13:48.000
We just store the events based on the changes to the pages and the transactions.

13:48.000 --> 13:51.000
So, some of the other optimizations we have to make,

13:51.000 --> 13:55.000
where that if you look at the number two,

13:55.000 --> 14:01.000
rather than have like the naive approach of multiple connections for each raft,

14:01.000 --> 14:06.000
you just have one connection and then you multi-plex it.

14:06.000 --> 14:11.000
And you do a connection per node rather than connection per that will never scale it.

14:11.000 --> 14:16.000
But you have to start somewhere and so that's how it started.

14:16.000 --> 14:21.000
So, you also need to order some of the events.

14:21.000 --> 14:24.000
For that internally, it's like implementation detail.

14:24.000 --> 14:26.000
It uses, I think, a go-be tree.

14:26.000 --> 14:28.000
Some be tree implementation grow.

14:28.000 --> 14:32.000
Also, you can put bounds on the memory that you use because now your storage.

14:32.000 --> 14:38.000
And you have the ability to put back pressure and slow the ingestion of your service.

14:38.000 --> 14:44.000
But you don't want any of the services to die.

14:44.000 --> 14:48.000
You're much better off telling the service to slow down.

14:48.000 --> 14:52.000
And that the user just needs to put more resources so that they can fix it.

14:52.000 --> 14:56.000
That's the basic idea of back pressure.

14:56.000 --> 15:00.000
I'll skip that part.

15:00.000 --> 15:05.000
So, this is the part about the transformer.

15:05.000 --> 15:11.000
So, the rough idea that I gave is the storage node.

15:11.000 --> 15:14.000
You have the puller is also misnamed.

15:14.000 --> 15:15.000
It's actually pushed.

15:15.000 --> 15:18.000
Then you sort the data, then transform the data.

15:18.000 --> 15:22.000
And the sink is where it receives the data, which does a more magic.

15:22.000 --> 15:24.000
And then it sends it downstream.

15:24.000 --> 15:26.000
So, what does it do?

15:26.000 --> 15:30.000
It decodes the key value pairs, which is what the storage knows about.

15:30.000 --> 15:32.000
Storage doesn't know about SQL.

15:32.000 --> 15:36.000
It also needs to know the schema.

15:36.000 --> 15:40.000
Because schema change is online, you know,

15:40.000 --> 15:44.000
Thikevi can have multiple schemas.

15:44.000 --> 15:49.000
And so, you shouldn't have to go back to the server to check the schema.

15:49.000 --> 15:50.000
That's an additional cost.

15:50.000 --> 15:54.000
You want a local version or cache of a schema, which is,

15:54.000 --> 15:56.000
you can apply an update.

15:56.000 --> 16:00.000
And when you see the DDL, you do the update and then you use the latest schema.

16:00.000 --> 16:05.000
Because some of your data will be working on old schema and some will be on the new schema.

16:05.000 --> 16:09.000
So, you have to maintain a local schema.

16:09.000 --> 16:16.000
So, so by decoupling the whole encoding logic thing,

16:16.000 --> 16:18.000
you can independently scale.

16:18.000 --> 16:22.000
And that's where the microservices thing comes about.

16:22.000 --> 16:24.000
So, how do you start all this?

16:24.000 --> 16:27.000
Because it's ETCD, when you start the system up,

16:27.000 --> 16:29.000
the instance is a register.

16:29.000 --> 16:31.000
And these are all the different services.

16:31.000 --> 16:33.000
They register with ETCD.

16:33.000 --> 16:37.000
They elect a coordinator that does more like housekeeping things.

16:37.000 --> 16:39.000
It doesn't do anything much more than that.

16:39.000 --> 16:42.000
And then you schedule the change fields that's going to talk to Tikevi

16:42.000 --> 16:46.000
and just get the whole system running.

16:46.000 --> 16:48.000
So, because it's an HSA system,

16:48.000 --> 16:50.000
it also has to do all these other tasks.

16:50.000 --> 16:52.000
Like dynamic registration.

16:52.000 --> 16:54.000
It has to do automatic recovery.

16:54.000 --> 16:56.000
If some service fails.

16:56.000 --> 16:59.000
So, that's the other aspect of this.

16:59.000 --> 17:02.000
So, that's, it's a complete system.

17:02.000 --> 17:04.000
It's not just like connecting sockets.

17:04.000 --> 17:06.000
That's the point I wanted to make.

17:06.000 --> 17:08.000
So, when there are failures,

17:08.000 --> 17:11.000
you know, when you're running the scale at which Tikevi runs,

17:11.000 --> 17:13.000
there are nodes going up and down.

17:13.000 --> 17:14.000
There's always some kind of problem.

17:14.000 --> 17:16.000
But you don't want the service to be affected.

17:16.000 --> 17:19.000
So, it handles all things like brain split

17:19.000 --> 17:24.000
and all the other things that come with any distributed systems.

17:24.000 --> 17:25.000
Also, it can do.

17:25.000 --> 17:27.000
So, in version one,

17:27.000 --> 17:29.000
as I mentioned, the crash recovery had.

17:30.000 --> 17:32.000
There was no intermediate state.

17:32.000 --> 17:34.000
The crash recovery had to go all the way back

17:34.000 --> 17:36.000
through a whole intermediate state

17:36.000 --> 17:39.000
and that RP over Tikevi forever.

17:39.000 --> 17:41.000
So, now, because it has local state,

17:41.000 --> 17:42.000
it gets a leader.

17:42.000 --> 17:44.000
It knows I need to get the latest data from

17:44.000 --> 17:46.000
such-and-such log service that's running

17:46.000 --> 17:49.000
and then it rebuilds from where it left off

17:49.000 --> 17:52.000
and then carries on from there.

17:54.000 --> 17:55.000
These are more or less.

17:55.000 --> 17:56.000
So, upgrade down here.

17:56.000 --> 17:58.000
It can handle partition tables quite easily.

17:59.000 --> 18:00.000
Let's skip this.

18:00.000 --> 18:02.000
Otherwise, I'll run over time.

18:04.000 --> 18:06.000
Yeah, this is also good enough.

18:06.000 --> 18:09.000
So, I've mentioned all this,

18:09.000 --> 18:11.000
but I just want to go over this again.

18:11.000 --> 18:14.000
So, the schema is also stored in the state service,

18:14.000 --> 18:15.000
which is the log service.

18:15.000 --> 18:17.000
The events are also there.

18:17.000 --> 18:19.000
The table, so it,

18:19.000 --> 18:21.000
table stream you can ignore.

18:21.000 --> 18:25.000
So, any local event in that cluster is also stored here.

18:26.000 --> 18:28.000
So, there's no metadata in ETCD.

18:28.000 --> 18:31.000
This is as the state of the system.

18:33.000 --> 18:36.000
So, the downstream adapter is for the events

18:36.000 --> 18:38.000
that are coming from the log service.

18:38.000 --> 18:41.000
It's more or less like a reader for the log service.

18:41.000 --> 18:45.000
Then talks to the downstream adapter.

18:48.000 --> 18:50.000
What data has mentioned just does,

18:50.000 --> 18:53.000
it's internal CTC node changes.

18:54.000 --> 18:56.000
So, what can it do?

18:56.000 --> 19:00.000
You can do massive change feeds and scales very easily.

19:00.000 --> 19:03.000
It's designed to scale by adding more services.

19:03.000 --> 19:08.000
Large tables can be split for more efficient storage in the service.

19:08.000 --> 19:12.000
Transaction integrity is preserved because the entire time stamp

19:12.000 --> 19:14.000
is exactly the same that comes from the cluster.

19:14.000 --> 19:17.000
So, there is no ambiguity anywhere in the pipeline way.

19:17.000 --> 19:22.000
You don't know how to order the events or order the transactions.

19:24.000 --> 19:29.000
So, it also supports go plugins, other hooks,

19:29.000 --> 19:33.000
and it has an extensible architecture and we have connectors for Kafka

19:33.000 --> 19:38.000
and all sorts of division and whatever else is the fashion of the day.

19:38.000 --> 19:41.000
So, how much time for QA?

19:41.000 --> 19:43.000
Excellent.

19:43.000 --> 19:44.000
Okay.

19:44.000 --> 19:47.000
So, anyway, there's the GitHub URL.

19:47.000 --> 19:50.000
So, anybody wants to look at the code,

19:50.000 --> 19:53.000
contribute, learn, go for it.

19:53.000 --> 19:55.000
So, it scales linearly.

19:55.000 --> 19:58.000
It has very high throughput.

19:58.000 --> 20:01.000
It's a clear architecture relative to what it was before.

20:01.000 --> 20:06.000
And it uses the last city of the cloud to scale very easily.

20:06.000 --> 20:08.000
It's quite easy.

20:08.000 --> 20:09.000
So, that's it.

20:09.000 --> 20:10.000
Thank you.

20:10.000 --> 20:11.000
Any questions?

20:11.000 --> 20:18.000
Anyone?

20:18.000 --> 20:26.000
Anyone?

20:26.000 --> 20:31.000
Yeah?

20:31.000 --> 20:35.000
Yeah, it's got all the checks on everything.

20:35.000 --> 20:37.000
Yes, that means.

20:37.000 --> 20:42.000
This is used by people who run very large.

20:42.000 --> 20:45.000
Oh, how do you check the quality of the data?

20:45.000 --> 20:50.000
So, it has all the other details like checks on whatever else is required.

20:50.000 --> 20:52.000
Yes.

20:52.000 --> 20:53.000
Okay.

20:53.000 --> 20:54.000
Thank you.

