WEBVTT

00:00.000 --> 00:04.600
How are you?

00:06.600 --> 00:10.620
I think the bird gave this, is a thing serious?

00:10.940 --> 00:12.060
Why are we saying that?

00:12.060 --> 00:13.060
I know.

00:13.060 --> 00:17.500
But maybe the thing is u who's are knowing one thing

00:20.500 --> 00:21.300
why?

00:21.780 --> 00:22.520
I know you know twerry

00:22.520 --> 00:23.340
meaning anything

00:23.360 --> 00:23.860
You know how a girl tastes?

00:23.900 --> 00:25.160
Is this a human?

00:26.300 --> 00:27.480
I know I know

00:27.480 --> 00:28.480
I'm sure.

00:28.480 --> 00:29.480
Hi.

00:29.480 --> 00:30.480
Hi.

00:30.480 --> 00:31.480
Yeah.

00:31.480 --> 00:32.480
No?

00:32.480 --> 00:33.480
Where are you going?

00:33.480 --> 00:34.480
You don't?

00:34.480 --> 00:35.480
Yeah.

00:35.480 --> 00:36.480
That's me.

00:36.480 --> 00:37.480
It's very, very delayed.

00:37.480 --> 00:40.480
I can hear people talking about burgy beds.

00:40.480 --> 00:42.480
The feature of iceberg.

00:42.480 --> 00:43.480
He didn't know that was coming.

00:43.480 --> 00:45.480
I thought he was whispering that.

00:45.480 --> 00:46.480
So nobody could hear me.

00:46.480 --> 00:47.480
But I'm Mike right here.

00:47.480 --> 00:48.480
Yeah.

00:48.480 --> 00:49.480
Yeah.

00:49.480 --> 00:51.480
Oh, it is 10.

00:51.480 --> 00:52.480
30.

00:52.480 --> 00:53.480
No.

00:53.480 --> 00:56.480
That's such a way to feed.

00:56.480 --> 00:58.480
That's really annoying.

00:58.480 --> 00:59.480
No.

00:59.480 --> 01:00.480
I'm welcome to these things.

01:00.480 --> 01:01.480
I have an experience.

01:01.480 --> 01:02.480
We've got a great night today.

01:02.480 --> 01:04.480
So we're a little bit behind.

01:04.480 --> 01:06.480
If you talk first.

01:06.480 --> 01:07.480
Speakers.

01:07.480 --> 01:08.480
Which is what is the spec.

01:08.480 --> 01:10.480
So the deco and Russell.

01:10.480 --> 01:11.480
Sorry, sir.

01:11.480 --> 01:12.480
Quick.

01:12.480 --> 01:14.480
That's all we need.

01:14.480 --> 01:16.480
We can introduce ourselves.

01:16.480 --> 01:18.480
In fact, we plan to do just that.

01:18.480 --> 01:20.480
So I guess we can take it away.

01:20.480 --> 01:21.480
Who are these people?

01:21.480 --> 01:22.480
Presdest.

01:22.480 --> 01:25.000
You probably shouldn't actually, but maybe by the end you will because we're going

01:25.000 --> 01:27.480
to have too much fun here.

01:27.480 --> 01:29.240
So to kick it off.

01:29.240 --> 01:30.240
I am Danica

01:30.240 --> 01:31.240
No, Louder.

01:31.260 --> 01:33.460
I'm going to leave some ears.

01:33.460 --> 01:34.460
Okay.

01:34.460 --> 01:35.460
My name is Danica Fine.

01:35.460 --> 01:39.640
I'm a developer advocate at Snoof Lake focusing specifically on open

01:39.640 --> 01:40.640
source.

01:40.640 --> 01:43.520
I absolutely fry blue police bullets.

01:43.520 --> 01:47.920
So in the past, I kind of finished it.

01:47.920 --> 01:50.940
We can do some practices for the public, and excuse me.

01:50.940 --> 01:52.500
with this gentleman here.

01:52.500 --> 01:55.260
Hi everyone, so I'm Russell Spitzer.

01:55.260 --> 01:57.940
I am one of the Apache iceberg PMC members.

01:57.940 --> 01:59.740
I'm also a member of the Apache Polaris

01:59.740 --> 02:01.540
incubating PPMC.

02:01.540 --> 02:03.260
You might know me from a lot of my previous work

02:03.260 --> 02:04.540
and a lot of open source things.

02:04.540 --> 02:06.780
I've done a lot of stuff with Cassandra and Spark

02:06.780 --> 02:07.900
and all that.

02:07.900 --> 02:10.180
And I also currently work at Snowflake.

02:10.180 --> 02:12.620
And I'm excited to talk to you all about open source software

02:12.620 --> 02:14.420
today.

02:14.420 --> 02:17.060
So as you may know, we're going to be talking

02:17.060 --> 02:19.180
about the latest features that are being added

02:19.180 --> 02:22.100
to the V3 spec of Apache iceberg.

02:22.100 --> 02:25.140
But before we do that, we want to kind of set the stage,

02:25.140 --> 02:26.580
make sure that we're all on the same page here

02:26.580 --> 02:28.300
and what we're actually talking about.

02:28.300 --> 02:30.580
So Russell, what even is Apache iceberg?

02:30.580 --> 02:32.780
Yeah, well, so a lot of you might think

02:32.780 --> 02:35.060
that Apache iceberg is just this library

02:35.060 --> 02:36.700
and a bunch of code associated with it.

02:36.700 --> 02:39.500
But what I'd like to say first is that primarily

02:39.500 --> 02:42.700
Apache iceberg is a specification.

02:42.700 --> 02:46.140
It's a set of rules about how various libraries

02:46.140 --> 02:48.980
or engines would interact with the set of data.

02:48.980 --> 02:51.020
This is a really important distinction

02:51.020 --> 02:52.820
because this is one of the key factors

02:52.820 --> 02:55.700
that makes Apache iceberg such an interoperable system.

02:55.700 --> 02:58.060
We basically said, these are the rules first

02:58.060 --> 02:59.900
and then we have the code next.

02:59.900 --> 03:01.740
So for example, we might have a bunch of rules

03:01.740 --> 03:03.540
that are like when we add a new data file,

03:03.540 --> 03:06.780
what you do is you add it to something called a manifest.

03:06.780 --> 03:08.780
And then that's the important part.

03:08.780 --> 03:10.180
Then all the engines go out

03:10.180 --> 03:11.700
and they do their own implementation

03:11.700 --> 03:14.780
of how exactly they want that to happen.

03:14.780 --> 03:18.020
But because we have the rules all set out the exact same way,

03:18.020 --> 03:20.620
Trino, Spark, Python, whatever language

03:20.620 --> 03:22.980
you're using to interact with your iceberg table

03:22.980 --> 03:25.940
knows exactly what they have to do to be compatible

03:25.940 --> 03:30.820
with every other engine that's working with iceberg tables.

03:30.820 --> 03:33.100
So there's more than one specification

03:33.100 --> 03:34.460
within the Apache iceberg product.

03:34.460 --> 03:36.780
And that's one of the quickly go over a couple examples.

03:36.780 --> 03:38.620
So the one that we're going to be talking about today

03:38.620 --> 03:41.460
is basically the table format specification.

03:41.460 --> 03:44.180
This is the key specification, which is literally,

03:44.180 --> 03:46.220
this is how you're going to lay out files.

03:46.220 --> 03:49.140
This is how you transition from one state to the next.

03:49.140 --> 03:51.020
But we have a whole bunch of other great specs

03:51.020 --> 03:52.340
that we've written up.

03:52.340 --> 03:55.580
There's one on views, one for interacting as a catalog system.

03:55.580 --> 03:57.260
We're not going to talk too much about the catalog

03:57.260 --> 03:59.500
interactions today.

03:59.500 --> 04:02.540
And then one called Puffin, which is aboutcillary statistics

04:02.540 --> 04:04.900
and other blogs, we might want to have associated

04:04.900 --> 04:06.420
with our iceberg tables.

04:06.420 --> 04:07.980
And then of course, along with this,

04:07.980 --> 04:11.780
the Apache iceberg project has a bunch of Apache projects

04:11.780 --> 04:14.540
or subprojects for different implementations

04:14.540 --> 04:16.340
of these specifications.

04:16.340 --> 04:18.660
The one that you'll find at the Apache slash iceberg

04:18.660 --> 04:21.260
repository is our Java implementation.

04:21.260 --> 04:24.300
But we also are strongly in support of implementations

04:24.300 --> 04:26.260
like Pious, Bergen, iceberg rust.

04:26.260 --> 04:28.260
And then of course, we have organizations that are not

04:28.260 --> 04:30.060
part of the Apache Foundation that are also

04:30.060 --> 04:32.500
doing implementations of the iceberg spec.

04:32.500 --> 04:34.620
And Trino is one of the examples of a product

04:34.620 --> 04:35.500
that's doing that.

04:35.500 --> 04:36.340
Rice.

04:36.340 --> 04:38.500
So that's kind of the high level overview

04:38.500 --> 04:42.300
of what iceberg is really by definition.

04:42.300 --> 04:43.780
But how many of you are currently

04:43.780 --> 04:46.620
working with iceberg right now?

04:46.620 --> 04:46.980
All right.

04:46.980 --> 04:49.900
So this next bit is for the rest of you.

04:49.900 --> 04:52.420
So that we all know we're talking about through the rest

04:52.420 --> 04:53.780
of the talk.

04:53.780 --> 04:56.260
So in order to actually look a little more forward

04:56.260 --> 04:58.740
at what the new features are, we need to set the stage

04:58.740 --> 05:00.140
with some actual definitions.

05:00.140 --> 05:03.180
And by that, I need to look at the iceberg architecture.

05:03.180 --> 05:05.140
Because we're going to be talking about a lot

05:05.140 --> 05:07.060
using a lot of these definitions as we move on.

05:07.060 --> 05:09.780
So we saw what I suppose.

05:09.780 --> 05:10.700
We have our set of rules.

05:10.700 --> 05:12.220
And it tells us how we're actually interacting

05:12.220 --> 05:13.940
and building out these files.

05:13.940 --> 05:16.340
So when you want to write data to an iceberg table,

05:16.340 --> 05:17.900
what we're really doing is we're writing data

05:17.900 --> 05:19.620
to some data files.

05:19.620 --> 05:20.580
Wonderful.

05:20.580 --> 05:23.060
These are by default format in Part K.

05:23.060 --> 05:25.900
And then from there, iceberg has a wonderful layer

05:25.900 --> 05:28.340
of metadata that helps to coordinate those data files

05:28.340 --> 05:29.900
into iceberg tables.

05:29.900 --> 05:30.940
So we have our data files.

05:30.940 --> 05:34.340
And those data files are then tracked by manifests.

05:34.340 --> 05:38.980
So the manifests may point to a number of data files.

05:38.980 --> 05:40.540
But an important thing about these is that they

05:40.540 --> 05:43.460
will track a subset of data files so that when we actually

05:43.460 --> 05:45.980
want to query the data and those data files later on,

05:45.980 --> 05:49.340
we can prune off irrelevant data files.

05:49.340 --> 05:51.020
So we'll keep track of some statistics

05:51.020 --> 05:54.460
on the actual data that these files then point to.

05:54.460 --> 05:56.740
From there, we're going to roll those manifest files up

05:56.740 --> 05:59.540
into a manifest list, which is exactly what it sounds like.

05:59.540 --> 06:02.940
We have a list of manifest files in that manifest list.

06:02.940 --> 06:05.540
And then those statistics that we're in the manifest files

06:05.540 --> 06:07.540
are also rolled up into our manifest list.

06:07.540 --> 06:08.740
And an important thing to note here

06:08.740 --> 06:11.460
is that the manifest list itself is actually

06:11.460 --> 06:15.020
just a effectively a snapshot of what that table currently

06:15.020 --> 06:16.100
looks like.

06:16.100 --> 06:19.180
So all the files that are relevant to that table at this point

06:19.180 --> 06:22.460
time are listed out in that manifest list.

06:22.460 --> 06:23.300
But we're not done here.

06:23.300 --> 06:24.580
You think we need to be done.

06:24.580 --> 06:27.300
But there are more metadata files, specifically

06:27.300 --> 06:28.860
a metadata file.

06:28.860 --> 06:31.500
We've got real clever with that naming here.

06:31.500 --> 06:37.260
So we have a metadata file to track the individual snapshots

06:37.260 --> 06:38.500
that exist within that table.

06:38.500 --> 06:41.180
So here we have snapshot 0 that is maintained

06:41.180 --> 06:43.020
by that manifest list at a point, too.

06:43.020 --> 06:45.940
But we also include some additional non-data information

06:45.940 --> 06:48.180
in here, like the schema, how we're actually

06:48.180 --> 06:50.100
partitioning our data as well.

06:50.100 --> 06:51.660
Everything else that we need to sort of round out

06:51.660 --> 06:54.140
how we're organizing our data.

06:54.140 --> 06:56.020
But we don't stop there, OK?

06:56.020 --> 06:57.820
There's more, but wait.

06:57.820 --> 07:01.380
So we need two really important things.

07:01.380 --> 07:04.780
One, we need a way to link a table identifier

07:04.780 --> 07:07.940
to this metadata file that we have here.

07:07.940 --> 07:10.980
And then we also need a way to transactionally indicate

07:10.980 --> 07:14.980
that this metadata is the metadata that we care about right now.

07:14.980 --> 07:16.420
And if we want to update that, we need

07:16.420 --> 07:17.860
to be able to automatically do that.

07:17.860 --> 07:21.100
And so these two things are handled by an external bit,

07:21.100 --> 07:22.300
called a catalog.

07:22.300 --> 07:23.940
So this is what we're going to keep that mapping

07:23.940 --> 07:26.860
of our actual tables to the relevant metadata files

07:26.860 --> 07:28.700
that we care about.

07:28.700 --> 07:31.220
So if you want to add more data to your iceberg table,

07:31.220 --> 07:33.460
which probably you would want to do, if you're

07:33.460 --> 07:35.700
adding that you're working with iceberg.

07:35.700 --> 07:37.740
So say we want to add some more data files.

07:37.740 --> 07:39.380
Great, you are free to do that.

07:39.380 --> 07:42.980
But naturally, we need to track that with a new manifest file.

07:42.980 --> 07:45.620
And then we're going to roll that up and create a new manifest

07:45.620 --> 07:46.260
list.

07:46.260 --> 07:48.820
But since we're adding data to an existing table,

07:48.820 --> 07:50.460
we're not just going to point to that manifest file.

07:50.460 --> 07:52.020
We're going to point to all of the manifest files

07:52.020 --> 07:54.260
that are relevant to that table now.

07:54.260 --> 07:57.020
And then from there, we'll create a new metadata file.

07:57.020 --> 08:00.100
Now we have a new snapshot because we've added a new data

08:00.100 --> 08:00.820
to this table.

08:00.820 --> 08:04.980
We have a new valid state of this table.

08:04.980 --> 08:07.060
And so now we have snapshot 0 that we're still

08:07.060 --> 08:10.500
making track of, but we have the newest one snapshot 1.

08:10.500 --> 08:12.380
And then, of course, at the very end, we

08:12.380 --> 08:15.220
want to be able to update the pointer in our catalog

08:15.220 --> 08:18.580
to this new metadata file so that we know what we're looking at.

08:18.580 --> 08:20.580
OK?

08:20.580 --> 08:22.700
That's your crash course and all things iceberg.

08:22.700 --> 08:23.780
You're good now.

08:23.780 --> 08:25.180
We can take it away.

08:25.180 --> 08:26.620
Cool.

08:26.620 --> 08:29.140
So all the stuff that Danica just went over

08:29.140 --> 08:32.460
is part of what's in the format specification.

08:32.460 --> 08:34.580
And that's a lot of what we're going to talk about today.

08:34.580 --> 08:36.020
And just to give you a quick history

08:36.020 --> 08:38.220
of where we've been before and where we're going to,

08:38.220 --> 08:41.500
I'm going to talk about how the spec is basically evolved.

08:41.500 --> 08:44.860
So we started out with V1 of the table format spec.

08:44.860 --> 08:48.700
This is iceberged about circa five years ago.

08:48.700 --> 08:52.580
And it was just the initial description of that system

08:52.580 --> 08:55.740
of manifest, manifest list, data files,

08:55.740 --> 08:58.420
as well as the rules about how we're going to put metrics

08:58.420 --> 09:00.260
at these various levels and how we're going to do

09:00.260 --> 09:01.860
partitioning.

09:01.860 --> 09:04.340
So with this, you had everything you needed to do

09:04.340 --> 09:05.860
to build a basic table.

09:05.860 --> 09:08.100
But one of the things that we realized really quickly,

09:08.100 --> 09:11.260
but we were working with this version of the table,

09:11.260 --> 09:13.260
is that we have a lot of difficulty doing

09:13.260 --> 09:16.860
row-level operations, which is extremely common concern.

09:16.860 --> 09:19.260
Because we want to be GDPR compliant.

09:19.260 --> 09:22.860
We want to be basically very good to our users with privacy

09:22.860 --> 09:24.540
and getting rid of rows that they don't want

09:24.540 --> 09:26.140
to exist in our data sets.

09:26.140 --> 09:28.180
But that's traditionally very difficult

09:28.180 --> 09:31.620
if we have this layout where every single data file basically

09:31.620 --> 09:34.460
is either in the data set or not in the data set.

09:34.460 --> 09:37.700
So it makes it very difficult to do tiny updates.

09:37.700 --> 09:42.060
So V2 of the iceberg table spec introduced a lot of things

09:42.060 --> 09:43.900
to help out with row-level operations,

09:43.900 --> 09:46.700
specifically the concept of delete files.

09:46.700 --> 09:48.820
We're going to talk a lot more about delete files later,

09:48.820 --> 09:50.980
because it's one of the big changes in V3s,

09:50.980 --> 09:52.300
how we deal with those.

09:52.300 --> 09:54.340
But we introduced basically two ideas,

09:54.340 --> 09:56.780
a quality deletes and position deletes,

09:56.780 --> 09:59.500
which allowed you to have a secondary set of files

09:59.500 --> 10:02.580
that just indicate a delta to your existing files.

10:02.580 --> 10:04.300
So you can imagine you would have a file

10:04.300 --> 10:06.860
with your actual rows, and then another file that says,

10:06.860 --> 10:10.460
actually, if you're at this point in time, ignore rows 5, 6, and 7,

10:10.460 --> 10:11.580
or something like that.

10:11.580 --> 10:14.980
Like I said, we'll go into more details about that later.

10:14.980 --> 10:18.020
And then today, you're here to hear about V3,

10:18.020 --> 10:20.500
which is the changes we're going to do over V2

10:20.500 --> 10:23.300
to hopefully get even more use cases into iceberg,

10:23.300 --> 10:25.100
improve the way we handle deletes.

10:25.100 --> 10:30.260
And of course, move us forward to an even more efficient data format.

10:30.260 --> 10:32.180
Tell us what else is in V3 Russell?

10:32.180 --> 10:35.020
So V3 has a lot of things.

10:35.020 --> 10:38.820
Let's go over the headlining features.

10:38.820 --> 10:40.580
One thing to note before we go through this,

10:40.580 --> 10:42.820
V3 is going to be an upgrade process.

10:42.820 --> 10:45.980
So if you have a V2 table, you will have to upgrade your metadata.

10:45.980 --> 10:47.540
If you would like it to be a V3 table,

10:47.540 --> 10:49.140
this should be very lightweight, because it's only

10:49.140 --> 10:52.140
at the metadata layer, not at those data layers.

10:52.140 --> 10:54.220
But we have a couple of great things we're about to introduce.

10:54.220 --> 10:57.220
Like I mentioned, we're about to move into delete vectors

10:57.220 --> 10:58.460
from position delete files.

10:58.460 --> 11:01.820
We'll talk about the details of why this is important later.

11:01.900 --> 11:04.780
But basically, we're moving to a more efficient representation

11:04.780 --> 11:06.100
of our deletes.

11:06.100 --> 11:08.940
We're also going to introduce something called a variant type.

11:08.940 --> 11:11.260
If you're familiar with some data warehousing solutions,

11:11.260 --> 11:14.780
they already have this idea of a semi-structured data type

11:14.780 --> 11:17.060
that lives within your structured table.

11:17.060 --> 11:19.700
The really cool thing about variant is that suddenly,

11:19.700 --> 11:23.020
you can have this kind of varying structure from row to row,

11:23.020 --> 11:25.300
while still having structure within those rows.

11:25.300 --> 11:28.500
We'll talk about that implementation later as well.

11:28.500 --> 11:30.380
We're also adding a geometric type.

11:30.380 --> 11:33.500
This is a common concern for a lot of folks working with data,

11:33.500 --> 11:35.340
where we have parquet files, we're

11:35.340 --> 11:37.300
assoring our geometric representations,

11:37.300 --> 11:39.260
and all sorts of different ways, which makes them

11:39.260 --> 11:40.620
not very interoperable.

11:40.620 --> 11:42.580
And the key message of iceberg, right?

11:42.580 --> 11:44.420
It's we want to have those rules set out

11:44.420 --> 11:46.620
that say how every engine will deal with this data,

11:46.620 --> 11:49.500
so that we can have them all working in the same way.

11:49.500 --> 11:51.380
So we're making sure that geometric type

11:51.380 --> 11:54.180
is represented inside of the iceberg spec.

11:54.180 --> 11:56.660
We're also implementing features for row lineage.

11:56.660 --> 11:58.220
We've found that a lot of folks really

11:58.220 --> 12:01.340
want to be able to say when particular rows were added to their data set,

12:01.340 --> 12:02.500
or when they were updated.

12:02.500 --> 12:04.740
So we're adding that again as a first class feature

12:04.740 --> 12:07.340
into the specification, so that all the engines can

12:07.340 --> 12:10.740
work with row lineage in the exact same way.

12:10.740 --> 12:14.940
And finally, on this list, we have default values.

12:14.940 --> 12:18.100
Previously, we had this system where if you wanted

12:18.100 --> 12:21.060
to have a field be optional, your only option

12:21.060 --> 12:24.780
was to have no, as your value of people are not submitting a value.

12:24.780 --> 12:26.620
We realize that there's a lot of use cases

12:26.620 --> 12:29.580
where people do want to have a value in that column,

12:29.580 --> 12:32.660
even if they haven't actually send something in with the writer.

12:32.660 --> 12:34.540
So we're going to have that as part of the spec as well.

12:34.540 --> 12:36.500
This ends up being a little bit more complicated

12:36.500 --> 12:39.900
and I'll go into reasons why when we get to that part of the talk.

12:39.900 --> 12:42.980
But for now, we're going to go on a journey

12:42.980 --> 12:45.660
from the beginning of deletions.

12:45.660 --> 12:50.380
So I know I took you on an overview of the iceberg architecture,

12:50.380 --> 12:52.900
keep all of that in your mind, because it's going to be relevant here.

12:52.900 --> 12:55.380
So deletes, that's a simple thing.

12:55.380 --> 12:58.380
We all know how to delete things.

12:58.380 --> 13:01.700
There's actually a little bit more to it in iceberg.

13:01.700 --> 13:05.540
So there's two ways to actually think about deletions in iceberg.

13:05.540 --> 13:08.100
But as Russell mentioned, we're going to add in this concept

13:08.100 --> 13:11.180
of a delete file of a way to say, these are the rows

13:11.180 --> 13:15.300
that we don't really care about in the data set that we currently have.

13:15.300 --> 13:19.020
So the first way that you would think about our deletions

13:19.020 --> 13:22.220
is with copy on right, and then we also have merge on read.

13:22.220 --> 13:25.340
So in copy on right mode, if we have three rows,

13:25.340 --> 13:29.780
in our iceberg table, A, B, and C, and say we want our delete row C.

13:29.780 --> 13:31.100
It's no longer relevant to us.

13:31.100 --> 13:34.180
Well, then we're just going to copy on right.

13:34.180 --> 13:36.420
When we decide that we want to remove that row,

13:36.420 --> 13:40.540
we are going to rewrite the entire file without that row C.

13:40.540 --> 13:42.780
If that sounds horribly inefficient, it does.

13:42.780 --> 13:44.780
Yes, you're probably right.

13:44.780 --> 13:46.940
It can be horribly inefficient.

13:46.940 --> 13:49.260
But that's an option for you.

13:49.260 --> 13:54.540
With merge on read, we actually have two mechanisms for handling our deletes,

13:54.660 --> 13:57.420
depending on what type of deletes you want to use.

13:57.420 --> 14:01.980
So but with merge on read, the construct here is that we are adding the idea of a delete file

14:01.980 --> 14:06.860
to tell us which rows are no longer relevant to us as of a current timestamp.

14:06.860 --> 14:11.140
So with the first type of deletes on merge on read, we have an equality delete.

14:11.140 --> 14:17.380
So with equality deletes, this delete file can apply to multiple data files.

14:17.380 --> 14:20.060
And it says whenever we see a row from a timestamp earlier,

14:20.060 --> 14:24.460
then in this case T1, and it has this value C in it,

14:24.460 --> 14:26.180
and we are going to ignore it.

14:26.180 --> 14:29.420
We wanted to remove that from our data set, ignore it.

14:29.420 --> 14:30.700
And when do we ignore it?

14:30.700 --> 14:33.740
On read, hence merge on read.

14:33.740 --> 14:37.900
The other way, otherwise, we could be using what is called a position delete file.

14:37.900 --> 14:40.860
And so this file could also apply to multiple files,

14:40.860 --> 14:45.740
but you'll see that it gets a little more specific on determining which row we're going to ignore.

14:45.740 --> 14:50.380
So here we're storing a tuple that says, for a given data file, file A,

14:50.380 --> 14:52.340
we are going to ignore the third row.

14:52.340 --> 14:56.500
So it identifies that specific row by its index.

14:56.500 --> 15:00.420
So position delete files might seem simple,

15:00.420 --> 15:03.540
but they've got some fun things going on under the hood.

15:03.540 --> 15:06.340
So, namely, you can choose the granularity

15:06.340 --> 15:10.700
at which these position delete files actually apply to your data set.

15:10.700 --> 15:12.740
So if these are our delete files here,

15:12.740 --> 15:17.540
we can say that all these data files belong to one single partition.

15:17.540 --> 15:21.980
So then we can choose to have delete files apply at the partition level.

15:21.980 --> 15:24.540
So for this entire partition, all the data files in there,

15:24.540 --> 15:27.740
we are going to have our delete files applied to that partition.

15:27.740 --> 15:34.060
So all the deletes across those files are going to be stored in one delete file.

15:34.060 --> 15:38.620
Or, we can decide to have the granularity be at the file level.

15:38.620 --> 15:43.580
So for each individual data file, we are going to store a corresponding delete file

15:43.580 --> 15:47.340
that then tells us to ignore rows in that specific file.

15:47.340 --> 15:49.580
And so there are trade-offs here, obviously.

15:49.580 --> 15:54.220
So in iceberg, we have an ongoing problem called the small files problem,

15:54.220 --> 15:56.140
which we like well-named things in iceberg.

15:56.140 --> 15:58.860
So generally, we don't want small files,

15:58.860 --> 16:02.300
because when you're actually running queries on your iceberg tables,

16:02.300 --> 16:04.700
file IO is going to be the biggest killer of performance.

16:04.700 --> 16:08.780
So where we can reduce the number of files that we are opening over time,

16:08.780 --> 16:10.140
we want to be doing that.

16:10.140 --> 16:12.300
So obviously here at the partition granularity,

16:12.300 --> 16:15.260
we get the benefit of only having the open one file.

16:15.260 --> 16:17.580
And for reading from all the files in that partition, that's great.

16:17.580 --> 16:20.060
We only have to open one delete file.

16:20.060 --> 16:21.660
But if you're only reading from one of those files,

16:21.660 --> 16:23.980
and you have to open the delete file and scan through it,

16:23.980 --> 16:25.980
well, then that sounds inefficient.

16:25.980 --> 16:28.700
So then you might want to use the file-based granularity,

16:28.700 --> 16:31.260
because then if you're reading just from a single data file,

16:31.260 --> 16:35.020
you only have to read that one corresponding delete file.

16:35.020 --> 16:37.740
But the trade-off there is that we have more small files,

16:37.740 --> 16:41.260
more of those small delete files that we have to deal with.

16:41.260 --> 16:44.460
So we're constantly battling between these two things.

16:44.460 --> 16:46.540
We want to find an optimal way to deal with them.

16:46.620 --> 16:49.580
And so this is where deletion vectors comes in.

16:49.580 --> 16:52.620
So with the addition of deletion vectors in the V3 spec,

16:52.620 --> 16:55.580
we're going to replace this idea of the position delete files

16:55.580 --> 16:58.060
with a new type of file, the puff and file.

16:58.060 --> 17:00.620
And we already kind of mentioned the puff and file.

17:00.620 --> 17:01.420
This is actually a thing.

17:01.420 --> 17:04.780
It's a door-ably named because they live on icebergs.

17:04.780 --> 17:05.900
That's the joke.

17:05.900 --> 17:08.780
It makes sense, okay, when you think about it.

17:08.780 --> 17:12.300
So puff and files are already used in iceberg.

17:12.300 --> 17:14.620
They're a very efficient lightweight file.

17:14.620 --> 17:17.420
And we use them to store statistics elsewhere in iceberg.

17:17.420 --> 17:22.780
And so we're repurposing them here to store our deletion vectors.

17:22.780 --> 17:27.260
So in this case, a single puff and file can store many data files worth

17:27.260 --> 17:29.660
of deletion vectors.

17:29.660 --> 17:32.460
And we're going to store those vectors at specific offsets

17:32.460 --> 17:35.660
within the puff and file so that when we need to come and read

17:35.660 --> 17:37.740
those deletion vectors for a specific data file,

17:37.740 --> 17:39.500
we don't have to read through the entire puff and file.

17:39.500 --> 17:42.780
We can access that by its offset within the file, okay.

17:42.780 --> 17:45.500
So we kind of get the best of both worlds here.

17:45.500 --> 17:49.740
We can store our data in one file so we get the benefit

17:49.740 --> 17:51.740
of a bigger file of deletion vectors.

17:51.740 --> 17:54.780
But we can still, we're only opening one file then

17:54.780 --> 17:59.180
and we can access that data specifically within the puff and file.

17:59.180 --> 18:03.340
And then also often our metadata, we're going to say which offset

18:03.340 --> 18:07.020
that deletion vector exists at within the puff and file, okay.

18:07.020 --> 18:08.300
So this brings a lot of benefits.

18:08.300 --> 18:10.620
It means that for the small files problem,

18:10.620 --> 18:14.060
we don't have to worry about compacting our data later on,

18:14.060 --> 18:16.940
reducing the small files that we have

18:16.940 --> 18:19.340
and compressing them into a bigger file.

18:19.340 --> 18:21.340
We're going to have larger files and we're still going to be

18:21.340 --> 18:23.940
more performance because we're reducing that file

18:23.940 --> 18:28.460
I-O and we can access exactly what data we need in that puff and file, okay.

18:28.460 --> 18:31.420
So this is effectively what it'll look like here.

18:31.420 --> 18:34.620
It's pretty lightweight for showing for the data files.

18:34.620 --> 18:36.300
They are going to have their deletion vectors

18:36.300 --> 18:39.260
specifically at an offset in the puff and file.

18:39.260 --> 18:42.700
We know exactly where that mapping is.

18:42.700 --> 18:44.380
So what do you get out of this?

18:44.380 --> 18:47.820
First of all, you don't have to do anything to make use of this.

18:47.820 --> 18:48.460
That's wonderful.

18:48.460 --> 18:51.980
No user code changes necessary.

18:51.980 --> 18:54.220
And you get a lot of benefits here based on performance.

18:54.220 --> 18:56.220
Your query planning is going to be better.

18:56.220 --> 18:58.300
And maintenance tasks like compaction.

18:58.300 --> 19:00.460
You don't have to worry about that as much.

19:00.460 --> 19:02.860
So there's a lot of benefits to actually moving to this construct

19:02.860 --> 19:04.700
over the position files.

19:04.700 --> 19:05.660
All right, Russell.

19:05.660 --> 19:06.300
Very intives.

19:06.300 --> 19:07.180
What are we dealing with those?

19:07.180 --> 19:08.540
So we just talked a lot about planning.

19:08.540 --> 19:11.980
Now we're going to talk about core data types that we're adding into the system.

19:11.980 --> 19:17.260
So variant type is one of the most exciting things we're adding into iceberg.

19:17.900 --> 19:20.620
And it's really great for a lot of different scenarios.

19:20.620 --> 19:24.860
I'm sure everyone here has dealt with some kind of IoT or streaming instance

19:24.860 --> 19:29.180
where the folks that are giving you data will not tell you what the scheme is.

19:29.180 --> 19:31.500
Or they'll want to change it every few months.

19:31.500 --> 19:34.380
Or maybe they're giving you elements from a whole bunch of different sensors

19:34.380 --> 19:35.580
that have different formats.

19:36.060 --> 19:38.620
This is the time when you probably would like to have something that

19:38.620 --> 19:41.900
has a variable schema itself with inside of your table format.

19:41.900 --> 19:44.300
So you can handle that appropriately.

19:44.300 --> 19:48.620
We're also dealing with a lot of places where maybe schema changes over time.

19:48.620 --> 19:52.060
And although we have a couple common fields, we have a few that just kind of come

19:52.060 --> 19:55.580
and go depending on which application version is publishing events.

19:56.700 --> 19:59.660
And finally, if anyone is actually using a map type,

19:59.660 --> 20:01.100
this is way more efficient.

20:01.100 --> 20:04.860
So commonly, I see people using maps as basically a string mapping

20:04.860 --> 20:09.180
to some other type, but within a normal type database,

20:09.180 --> 20:13.420
that means you get one type for your value and you have one type for your key.

20:13.420 --> 20:15.740
And that ends up being very painful.

20:16.540 --> 20:19.900
All right, so let's go quickly through how this is actually implemented.

20:19.900 --> 20:21.740
So the way that a variant is actually going to look

20:21.740 --> 20:26.460
within an iceberg table is primarily stored as two different fields

20:26.460 --> 20:28.300
that belong to the same struct.

20:28.300 --> 20:31.900
You can imagine this with a metadata and value portion.

20:31.980 --> 20:35.020
The metadata portion is going to explain to us exactly

20:35.020 --> 20:39.100
what is the schema of this particular rows value portion?

20:39.100 --> 20:43.100
So I might have a particular row that has a x, which is an int,

20:43.100 --> 20:45.260
y, which is a long error, which is a string.

20:45.260 --> 20:47.740
We take that, we have a binary representation,

20:47.740 --> 20:50.940
which is then specified inside the variant spec.

20:50.940 --> 20:55.180
And then we have a value portion, which then has the actual values

20:55.180 --> 21:00.220
for this schema laid out according to the metadata.

21:00.300 --> 21:03.340
So this is all really good, but you're probably looking at this and saying,

21:03.340 --> 21:06.540
that sounds like I'm going to be disserializing and serializing things

21:06.540 --> 21:09.580
all of the time, and how in the world am I going to get any metrics

21:09.580 --> 21:11.100
that help me scan through this?

21:11.100 --> 21:15.180
I can't prune based on metadata blob looks like this,

21:15.180 --> 21:16.780
because that's just a bunch of binary.

21:16.780 --> 21:18.300
My minmax values mean nothing.

21:19.180 --> 21:21.740
So there's one step forward past this.

21:21.740 --> 21:25.260
This is something that's implemented in a lot of commercial data lake houses

21:25.260 --> 21:28.220
and commercial data warehouses, and it's called

21:28.220 --> 21:30.780
Shredding or Subcounterization.

21:30.780 --> 21:36.380
And basically, the idea is, what if I combine this idea of a unstructured type

21:36.380 --> 21:40.620
with some actual typed values for the things that appear very frequently?

21:40.620 --> 21:44.220
Now, the actual method for determining which are those common fields

21:44.220 --> 21:47.020
and exercise that we're kind of leaving outside of the spec,

21:47.020 --> 21:51.100
but what we will be saying is that we are going to accept a variant type,

21:51.100 --> 21:55.340
which basically has an unstructured portion of the metadata and the value,

21:55.340 --> 21:59.900
and then a structured portion, where if for this particular row,

21:59.900 --> 22:03.100
we have the presence of one of these columns in a known type.

22:03.100 --> 22:06.300
We'll store that as an actual typed column in parquet.

22:06.300 --> 22:09.180
So suddenly, I now have an actual column with minmax values

22:09.180 --> 22:11.100
that I can use for pruning.

22:11.100 --> 22:14.220
So in this example, say we always see x.

22:14.220 --> 22:18.300
We always know that x is an integer and it appears all of the time in our data

22:18.300 --> 22:20.140
that's coming into this variant field.

22:21.100 --> 22:25.260
Basically, what we can make is an x typed sub column

22:25.260 --> 22:27.340
that exists within the same struct,

22:27.340 --> 22:30.380
and inside of the variant spec works describing,

22:30.380 --> 22:34.380
if you see that, you know that that's actually part of the same value,

22:34.380 --> 22:38.620
and if you see only values in that column, you can do pruning on it.

22:38.620 --> 22:43.020
So basically, a smart engine can take data that's been written with a lot of common columns

22:43.020 --> 22:47.660
and extract them out into typed columns that can be used for pruning.

22:47.660 --> 22:51.500
In this example, I'm showing what if y has been typed in this particular instance

22:51.500 --> 22:53.740
to be a Boolean, we can actually have it.

22:53.740 --> 22:57.580
So sometimes we say that it's typed, but if we don't fit that typing,

22:57.580 --> 23:01.100
we can still have a different value inside of the untyped portion.

23:01.100 --> 23:04.860
So engines can kind of be smart about whether or not they're going to use that information.

23:04.860 --> 23:07.580
So for example, if you're doing a query that says,

23:07.580 --> 23:12.860
I'm looking for when y is true, you know that you don't actually care if there's a y value in here,

23:12.860 --> 23:15.100
because if it's Boolean, it's going to be out there.

23:15.100 --> 23:18.780
And if it got stored as some other type of value, we know we don't care about it,

23:18.780 --> 23:21.180
and we can ignore pruning on that point.

23:21.180 --> 23:24.700
Anyway, so that's what's really cool about some structure data in various types.

23:26.300 --> 23:30.300
If you want to use this type in V3, you're going to actually have to change your application code,

23:30.300 --> 23:34.620
because you actually have to start writing a variant object, which will end up being different,

23:34.620 --> 23:37.900
depending on what engine you use, but it will be encoded.

23:37.900 --> 23:42.620
Like I said, if you're using shredding, there'll be no performance difference between this

23:42.700 --> 23:47.660
and using real typed columns for the values that are untreaded.

23:47.660 --> 23:52.940
Those that stay inside that serialized blob, you do data serialization penalty any time you try to read them.

23:55.260 --> 23:59.900
Let's move on, because I think we're running a little low on time to geometric type.

23:59.900 --> 24:02.700
If you thought we were talking fast before, we're going to speak even faster.

24:02.700 --> 24:06.620
Okay, so geotypes, you might need a little bit of motivation here.

24:07.340 --> 24:11.580
Actually, no, I probably don't need to convince a cell the room on the fact that geotypes

24:12.140 --> 24:17.980
are useful. I'm not saying you should just come to peer pressure here, but geotypes are out there

24:17.980 --> 24:22.380
and they're here to stay. Sorry, there's already been widespread adoption of geotypes and

24:22.380 --> 24:28.540
number of projects within the iceberg ecosystem, like Dr. D. B. Hive, Flank, Spark, the list goes on.

24:28.540 --> 24:33.500
So it makes sense to support geotypes at iceberg as well. So we have succumbed to peer pressure,

24:33.500 --> 24:39.180
I guess you could say. So more practically adding support for the geotypes at iceberg would unlock

24:39.900 --> 24:43.820
a lot of benefits here. And namely, we're not just trying to store the geotype,

24:43.820 --> 24:47.580
although that is an important part of this, but we want to be able to interact with those geotypes.

24:47.580 --> 24:54.060
So enabling additional, like being able to see if certain geometries are within other geometries,

24:54.060 --> 24:59.180
using other predicates, and then even going so far as to actually partition the data based

24:59.180 --> 25:03.580
on the geometry, the geotypes as well. So this is what we're adding as part of the V3 spec.

25:04.300 --> 25:08.380
So how do we make that happen in iceberg? Well, it starts by formalizing what we're actually

25:08.460 --> 25:12.780
talking about. So we have geometries, like flap services, and then we also have geography,

25:12.780 --> 25:16.700
which are more interesting services. So being able to support both of those types.

25:17.500 --> 25:22.780
And then the key thing to know about geotypes in iceberg is that it's a hard problem to sell for,

25:22.780 --> 25:27.260
okay? So going back to the column-wise metrics that we want to keep track of so that we can actually

25:27.260 --> 25:31.740
do effective pruning when we're querying data later on. We need to be able to capture that

25:31.740 --> 25:36.940
within the geotype. So to support the pruning of a relevant data in iceberg, we actually

25:37.820 --> 25:42.220
capture those metrics as sort of a bounding box around the geometries, and that's how we determine,

25:42.220 --> 25:46.620
you know, what that sort of maximum in the range of data that is in that partition is.

25:47.340 --> 25:51.900
And the one important thing to note here is that we're not just making this up on how we're dealing

25:51.900 --> 25:56.620
with an iceberg. We're aligning with the implementation in parquet. Are there file formats?

25:58.220 --> 26:01.820
Who's to say? We'll get there at some point, but it is aligned with parquet.

26:01.820 --> 26:06.220
If you're interested in other file formats, I encourage you to get involved in the iceberg project,

26:06.380 --> 26:10.460
because we currently do not have a lot of maintainers who are focused on anything other than parquet.

26:10.460 --> 26:14.860
This is a job offer, folks, for open source. So what does this all mean for you?

26:14.860 --> 26:19.740
Well, to use geotypes, obviously, you're going to have to use geotypes first and foremost,

26:19.740 --> 26:24.140
so there are some intervention on your part. Only new columns are going to be able to use

26:24.140 --> 26:28.700
geotypes. You can't do promotion of old columns to keep that in mind. And there will be some,

26:28.700 --> 26:33.420
you know, hands-on effort with involving some user code here to take advantage of it. But the cool thing

26:33.420 --> 26:37.420
is that you get some stuff for free here. You'll get that pruning automatically and predicate

26:37.420 --> 26:41.580
pushed down. So there's some benefits of it. Role lineage.

26:41.580 --> 26:46.940
All right. Role lineage is basically the idea that for every single row we want to know exactly

26:46.940 --> 26:50.540
when it was made in the history of the table and when it was last modified. We're going to

26:50.540 --> 26:54.940
accomplish this for a couple different reasons. The main things that we want out of this are the

26:54.940 --> 27:01.020
ability to do better CDC and basically get delta's out of table between different time points.

27:01.100 --> 27:05.740
So for that, we basically, like I said, need these kind of identifiers for when a row was made

27:05.740 --> 27:09.900
and when it was last changed. The way we're actually implementing this is to through,

27:10.540 --> 27:15.020
we have an R there that shouldn't be there. It's just row ID and last updated sequence number.

27:15.020 --> 27:19.180
These are two fields that will now be required by the spec if you are deciding to enable

27:19.180 --> 27:24.060
Role lineage, which is basically then a contract that we've described for all of the engines to follow

27:24.060 --> 27:29.900
to properly fill in these values. Now, we're trying to do this in the lightest way possible. So,

27:29.980 --> 27:35.500
basically, when nothing has changed for any of your rows, there's no change to the data files,

27:35.500 --> 27:40.140
which is we're going to keep the information required to populate these fields at the metadata layer.

27:41.020 --> 27:44.460
Now, once you get through a point and those things have changed, you're actually updating

27:44.460 --> 27:48.940
rows. That's when the engines actually have to start taking account of what these values are

27:48.940 --> 27:55.100
and persisting them. So for a penders and people who are using this for the first time, no penalty.

27:55.100 --> 27:59.340
The moment you start doing updates, we start having an additional cost in terms of storage and

27:59.340 --> 28:05.260
engines need to do a little bit more work. So I'm going through this fast, but we're running a little

28:05.260 --> 28:10.220
low on time. If you're using this as an end user, like I said, it's something that must be enabled

28:10.220 --> 28:15.020
within your table. It's in the new field for the spec. So even if you upgrade to V3, you do then

28:15.020 --> 28:19.900
have to turn it on. And once you do that, all the writers that you're using must be able to respect

28:19.900 --> 28:24.620
these rules. So like I said, this is something that we're kind of making an optional feature

28:24.700 --> 28:29.100
within V3. So you might have certain engines that will support it and won't support it, or might

28:29.100 --> 28:35.980
support it points in the future. And that's really all I want to say about that. Let's go on to default

28:35.980 --> 28:41.740
values in our home strip. The fastest overview of default values ever. So default values might seem

28:41.740 --> 28:46.620
like an easy problem to solve for, but it's actually not. You should have new perspective words.

28:46.620 --> 28:52.860
But we are actually bringing, I think, and the worst mistake in Peter Science 2 iceberg are actually,

28:52.940 --> 28:55.660
well, Nulls were already in there, so we're kind of solving for it at this point by adding

28:55.660 --> 29:01.020
the default values. Why bother? Well, Nulls aren't that great. You know, we can throw them in there,

29:01.020 --> 29:04.940
but they might break things later on for our readers. So to get around it, I'm sure we've all

29:04.940 --> 29:11.500
implemented in some way or form default values for ourselves. But they're also valuable in being

29:11.500 --> 29:16.940
able to specify common values as well. So in iceberg, as we said, Nulls can already exist,

29:16.940 --> 29:21.980
but default values are going to affect how we deal with those Nulls and actually when we deal with

29:21.980 --> 29:27.260
those Nulls, either at read time where we replace the existing Nulls or at right time where we

29:27.260 --> 29:33.180
avoid storing the Null are all together. So how can you use default values in iceberg? Well, with

29:33.180 --> 29:37.660
some effort. So we're not going to hold your hands, but we're going to try to make it as easy as possible.

29:38.380 --> 29:42.620
So with those two things to consider, as I mentioned, at read time or at right time, we can handle

29:42.620 --> 29:48.380
our default values, our Nulls. We are adding two additional table parameters to help you manage those.

29:48.700 --> 29:54.300
So with initial default, we're looking at the Nulls that already exist in our table already.

29:55.180 --> 29:59.900
And with this parameter, we're saying that when we actually read those data later on,

29:59.900 --> 30:06.380
we're going to replace those Nulls with the default value that we specify. And it might not sound

30:06.380 --> 30:10.300
like it at first, but this is sort of your one and done. You're reading this data and you're

30:10.300 --> 30:17.340
replacing that Nulls. So we also read data when we compact our files later on to avoid the small

30:18.220 --> 30:22.380
files. So when this data is rewritten, you were done. That Null does not exist anymore. You cannot

30:22.380 --> 30:27.420
go back and change this. That's why it's sort of a final. We also have right default. So here

30:27.420 --> 30:32.620
we're avoiding adding those Nulls and replacing it with a default value to begin with. And you

30:32.620 --> 30:36.540
can change this whenever. And as a bonus, you can use these instructions. Cool. What's coming next,

30:36.540 --> 30:41.740
Russell? Cool. So obviously, we're not done with the format. Even though V3 has not come out yet,

30:41.740 --> 30:45.740
we're already starting to think about all the things that are coming in V4. So just as a quick

30:45.740 --> 30:50.700
send away, join the Apache iceberg community and come and be involved in all the work we're doing

30:50.700 --> 30:56.460
for V4. We want you to be involved. The iceberg summit for 2025 is about to come up in our

30:56.460 --> 31:00.300
call for papers as open now. If you have anything you want to say about iceberg. And of course,

31:00.300 --> 31:05.420
visit us at Slack in zero, all kinds of fun places. And if you'd like to work at snowflake

31:05.420 --> 31:09.980
like us, we are also hiring lots of open source books. So thank you very much for your time.

31:09.980 --> 31:19.980
Thank you.

31:19.980 --> 31:25.660
We did it. If you don't mind, there are a lot of people that want to answer. So we are going to be

31:25.660 --> 31:30.620
at seeking his way. So people can answer that way. If you're really important, let's go.

31:30.700 --> 31:32.940
Do you like how to live? What's mine? Do you want to live in?

31:32.940 --> 31:35.980
I don't think you're interested in this, right? And we did it.

31:35.980 --> 31:36.700
Oh, that?

31:36.700 --> 31:38.700
Oh, we did it.

31:38.700 --> 31:40.700
Thank you. I was also interested in it.

31:40.700 --> 31:42.700
Oh, I love it.

31:42.700 --> 31:46.540
Thank you.

31:46.540 --> 31:48.700
Thank you. I thought I was about to go there.

31:48.700 --> 31:49.740
Ah, that's it.

31:49.740 --> 31:51.340
Oh, that's okay.

31:51.340 --> 31:52.540
Did it, yes?

31:52.540 --> 31:54.380
Yeah, I thought I was about to go there.

31:54.380 --> 31:57.180
So, this is what we use. You had to be sent off today?

31:57.180 --> 31:59.980
Uh, I don't know. We're in this, but I think I just made this a nice one.

32:00.060 --> 32:02.060
I don't know. I don't know.

32:02.060 --> 32:03.420
I got used to saying something.

32:03.420 --> 32:04.220
This is a nice lot, guys.

32:04.220 --> 32:05.420
Yeah, thank you.

32:05.420 --> 32:06.940
Um, and let us know.

32:06.940 --> 32:10.540
How do we think you about getting into the intro to...

32:10.540 --> 32:13.020
Do you like intro? Or do you want something more deeper?

32:13.020 --> 32:15.900
If he's had some other stuff?

32:15.900 --> 32:18.140
So, I'm not sure what. Have you had some heavy air?

32:18.140 --> 32:19.740
What time do you want to back in?

32:19.740 --> 32:20.700
In the intro?

32:20.700 --> 32:21.980
Uh, by no one.

32:21.980 --> 32:23.260
Or about back, back, back.

32:23.260 --> 32:24.620
We did it in the description.

32:24.620 --> 32:25.580
Good up for the air?

32:25.580 --> 32:26.620
No, no.

32:26.620 --> 32:27.180
Yes.

32:27.180 --> 32:27.900
Oh, thank you.

32:27.900 --> 32:28.300
Yes.

32:28.860 --> 32:30.300
Yes, yes.

32:30.300 --> 32:32.300
Yes, yes.

32:32.300 --> 32:34.300
It's kind of fun here.

32:34.300 --> 32:36.300
They didn't have anything else.

32:36.300 --> 32:38.300
Yeah, so, uh, I can send.

32:38.300 --> 32:40.300
So, I can send.

32:40.300 --> 32:42.300
Okay, I guess so.

32:44.300 --> 32:46.300
Thank you. Sorry.