WEBVTT

00:00.000 --> 00:16.680
Okay, we can start hello, my name is Genia and today we're going to talk about mobility

00:16.680 --> 00:24.880
of virtual machines in Kubernetes clusters and especially about storage live migration and

00:24.960 --> 00:32.960
cross cluster live migration for your virtual machines we will cover how and why in features and

00:32.960 --> 00:40.480
requirements and we will see a short demo first of all how many of you familiar with cube

00:40.480 --> 00:46.320
verted self like do you know what cube that does yeah most of you okay for those who don't

00:46.320 --> 00:52.960
know cube it like we run virtual machines in Kubernetes and how many of you familiar with live

00:53.040 --> 01:05.280
migrations and we talk about cube version okay so let me explain usually when we talk about live migration

01:05.280 --> 01:11.360
in Kubernetes we mean compute live migration it's been there like since the beginning of

01:11.360 --> 01:18.800
Kubernetes and it means that your virtual machine remains running while your virtual machine

01:18.880 --> 01:28.320
instance is moving from one node to another and one of the requirements for this type of live migration

01:28.320 --> 01:37.520
is that your VM must use a shared storage means your VMs PVC needs to have read right many

01:37.520 --> 01:43.520
access mode because at some point at some point you need to have two virtual launcher pods that will

01:43.680 --> 01:52.560
be connected to the same PVC so and in this case we have one VMI virtual machine instance

01:52.560 --> 02:01.360
that whose job is to synchronize between the source and the target and this VMI serves as

02:01.360 --> 02:09.200
the point of truth for the for this synchronization remember this you will need this information

02:09.200 --> 02:19.680
in five minutes but I'm from the storage team and we were like everyone talks about live migration

02:19.680 --> 02:26.960
but we were not invited to the party and so we invited ourselves and we heard feedback from

02:26.960 --> 02:36.000
customer from abusers who say I want to live migrate my VM but it uses some old storage

02:36.000 --> 02:45.120
with read rate once file system PVC but my VMs are so great I want to keep them running and they say

02:45.120 --> 02:54.480
okay we might have a solution for you so we said we can do storage live migration so when we do storage

02:54.560 --> 03:04.800
live migration your VMI move still moves from one node to another but also your disk data needs to

03:04.800 --> 03:15.040
be copied from source to the target so it means we removed and it means that there is no point in

03:15.040 --> 03:25.360
time when you have two pods who need to use the same PVC so it means that even if your VM has

03:26.160 --> 03:33.360
simple storage with read rate once access mode it still can be live migrated to another node

03:34.800 --> 03:42.640
and this will work with for any combination of volume mode and access mode so you may have a

03:42.640 --> 03:49.760
local storage with read rate once file system and then you want to upgrade this storage like

03:50.640 --> 03:57.040
you buy a new storage you install it and it has like read rate money block you can migrate your VMs

03:57.040 --> 04:04.320
to use this new storage so like this is helpful like when you upgrade your infrastructure

04:04.320 --> 04:11.280
when you want to rebalance your storage and when you want to retire your legacy storage solutions

04:11.280 --> 04:20.800
and adopt new new ones. Now let me show you how would you do it like if you want to storage life

04:20.800 --> 04:28.720
migrate one VM so we have a user guide you you can find instructions there but in general you would

04:28.720 --> 04:34.960
need to update your keyword CR you may also need to add the feature gate for disk expansion if

04:34.960 --> 04:42.720
you're using cubic most likely already have it because there's very few things you can do without it

04:42.720 --> 04:50.320
so but the things are related to storage life migration like throughout strategy and the

04:50.320 --> 05:00.000
workload update methods so you update your cubic you create your destination data volume with the

05:00.000 --> 05:08.800
blank source and your target storage class name you can not specify the storage class name it

05:08.800 --> 05:18.240
then we will use just your diff whatever you have default in the cluster and then you need to update

05:18.240 --> 05:29.600
your VM spec just add the update volume strategy and new new PVCs or data volumes names once you

05:29.600 --> 05:37.360
updated your VM spec you will see that virtual machine instance migration resource had created

05:37.360 --> 05:47.600
and your VM is started migrating so and in the end you will see your VMI moved to another node

05:47.600 --> 05:56.400
and you have your new PVC and your VM is using this new PVC and we still keep in the old PVC

05:56.560 --> 06:08.560
you can you can go and clean it up manually and so this was about migrating one VM like if you want

06:08.560 --> 06:16.000
to do it in bulk we already have orchestrator for that it's available in cube width it's super fresh

06:16.000 --> 06:23.520
from the oven you just need to install the cube width migration operator it will install the

06:23.520 --> 06:35.520
controller and all the CRDs you need and you would just need to create migration plan in this case

06:35.520 --> 06:43.120
I'm showing you multi namespace migration plan so you can list the namespaces and the virtual

06:43.120 --> 06:50.480
machines inside those namespaces and the disks that you want to migrate in the part of where destination

06:50.560 --> 06:57.520
PVC you can specify the storage class name volume mode access mode whatever you want

06:58.880 --> 07:04.000
and just create the migration resource once you create the migration resource the migration

07:04.000 --> 07:12.480
will start and migration resource will report the progress and the plan also will show you how how

07:12.480 --> 07:24.400
is it going okay so we migrated our VM to VMs to the new storage but then it looked too simple

07:25.440 --> 07:33.040
and you got feedback from the users who say yeah how great in the storage is nice but

07:33.120 --> 07:43.440
what if I want to move my VM to another cluster and they're like okay moving to another cluster

07:43.440 --> 07:51.040
sounds like a challenge but sounds like a very nice one an interesting one to solve so

07:53.200 --> 08:00.240
generally can help you with your upgrades with again with your load balancing

08:01.200 --> 08:07.360
infrastructure consolidation if you have multi-cluster environment you can just easily move your VMs

08:07.360 --> 08:20.240
between the clusters and one of use cases like the most painful use cases is if you have a single

08:20.240 --> 08:29.120
load cluster and you want to keep your VMs running without cluster of migration you wouldn't be able

08:29.120 --> 08:38.880
to do any maintenance on this single load cluster without stopping the VM so today this cross

08:38.880 --> 08:46.960
cluster live migration you can move your VM to another cluster do whatever maintenance you want

08:46.960 --> 08:52.080
to do in your source cluster and move it back or keep it in the new cluster like so

08:52.800 --> 08:59.280
live becomes easier and you have more options of how to keep your VMs running

09:04.160 --> 09:12.240
so this was divide and now let's talk about the essence of the cross cluster live migration

09:13.120 --> 09:22.400
so it appears that cross cluster live migration is just a variation of storage live migration

09:23.440 --> 09:32.800
let me explain why you still need to move your VMI from one node to another you still need to

09:32.800 --> 09:40.560
move your storage from the source to the target and in this case it just appears that your target

09:41.520 --> 09:50.320
node resides in another cluster so we say okay we're a simple storage guys this is storage

09:50.320 --> 09:59.760
migration everywhere let's let's do a cross the clusters so how we did it

09:59.920 --> 10:10.560
leave it and give you give you solutions to live migrate your compute and storage and this is

10:10.560 --> 10:16.880
not new and this is already used in different virtual in different virtualization solutions like

10:16.880 --> 10:25.680
open stack and it's used for compute live migration but in our case we had a problem we didn't

10:25.760 --> 10:34.080
we need to somehow synchronize between the source and the target and the source and the

10:34.080 --> 10:41.200
target are not even in one cluster because you remember when we talk about compute live migration

10:41.200 --> 10:51.040
you have VMI which is one VMI and one VMI can do it all it can access the the source

10:51.040 --> 11:00.960
the target and do this synchronization so what we did like the essentially the innovation bits that we

11:00.960 --> 11:11.840
added to the Kubernetes is the synchronization controller which is now responsible for synchronizing

11:11.840 --> 11:22.000
between the source and the target so we invented the centralized live migration that's how we

11:22.000 --> 11:30.720
called it so with the centralized live migration you know how to VMI you have two virtual machine

11:30.720 --> 11:43.520
instance migrations and you have a simple controller that serves as a source of truth between the

11:44.640 --> 11:56.320
source and destination VMI and so generally you need to and very important part of this is

11:56.320 --> 12:03.760
that your weird handlers need to be connected through migration network and your sync controllers

12:03.760 --> 12:14.800
need to be able to talk to each other through migration network so now let's see how would we do it

12:14.800 --> 12:26.400
if you want to migrate our VM again if you have a user guide you you can find the little steps there

12:26.960 --> 12:37.680
but then in what we need to do in cubordial we need to activate the centralized live migration

12:37.680 --> 12:42.800
feature gate once you activate the feature gate you see that your virtual synchronization

12:42.800 --> 12:51.680
controller pods appear in your cluster and like they you need to wait for them to start training

12:52.400 --> 13:00.800
and another thing you need to specify the migration network like network deserves a talk on

13:00.800 --> 13:10.320
itself and this talk happened here and now we go and we get it here he explained it he explained how to

13:10.480 --> 13:16.320
how you can do it so once you update the migration network you will see that your virtual handler pods

13:17.120 --> 13:25.360
will restart and now they will they will know that they need to use this network for communication

13:25.360 --> 13:34.800
with other virtual controllers pods for like migration itself okay and then

13:35.040 --> 13:44.160
we would go to the target cluster and create a target virtual machine resource

13:45.920 --> 13:52.400
the best way would be to copy the source VM spec to replace the network and disks and

13:52.400 --> 13:59.520
config maps and secrets whatever you need so that so that it will match what you have in your target

13:59.520 --> 14:08.400
cluster then the most important part is to set the run strategies wait as receiver wait as receiver

14:08.400 --> 14:18.000
is a special run strategy which means that once you create your VM it does not start training

14:18.000 --> 14:25.200
it just it creates VM it creates VM i but it's waiting for the migration to start and waiting

14:25.360 --> 14:35.200
to receive the data and you also need to add the annotation for restore run strategy so once your migration

14:35.200 --> 14:43.280
will be finished you will get the desired run strategy like a normal run strategy of VM it's like

14:43.360 --> 14:57.360
it can be always run a fader or manual next step we are still in the target cluster and in the

14:57.360 --> 15:03.120
target cluster we would need to create virtual machine instance migration resource this one looks

15:03.120 --> 15:11.760
pretty simple like name namespace migration ideas whatever idea you want VM i name your target VM i

15:11.760 --> 15:21.120
name super easy but now let's go to the source your source virtual machine instance migration needs to

15:21.120 --> 15:31.040
get like say the same migration ID the VM i name and it needs to get the connect URL so this is the address

15:31.040 --> 15:39.200
that this is the synchronization address for your virtual machine instance migration to communicate

15:39.200 --> 15:48.320
with the other virtual handler pods in the in the target cluster so you can you can get

15:50.560 --> 15:56.960
this synchronization address from two places one is cubir tiar you have this synchronization address

15:56.960 --> 16:04.960
in cubir tiar status or another place if you don't have access to cubir tiar you can do you can

16:04.960 --> 16:12.720
look at it at the target VM i am which we created one step back so once we created it will get

16:12.720 --> 16:21.680
the synchronization address in its in its YAML so you can take it from the take it from there so

16:21.680 --> 16:27.920
it will be just IP address and and the port and you just need to copy it here

16:28.640 --> 16:42.160
okay now requirements of course your two clusters need to be somehow connected it can be L2

16:42.160 --> 16:49.760
L3 network because your synchronization controllers need to communicate with each other somehow

16:49.840 --> 17:01.760
and your handlers need to communicate with each other also your clusters cluster no like source

17:01.760 --> 17:09.200
and target nodes need to have compatible node architectures and VM CPU models it generally

17:09.200 --> 17:15.760
it requires for every live migration but for crossplaster it's like especially worth mentioning

17:16.720 --> 17:23.040
and also you need to create or make sure that your target cluster has all the resources

17:23.040 --> 17:30.880
that your VM requires so if your VM uses like vgpu make sure that you have it in the target

17:30.880 --> 17:36.720
or you can do an orchestrator who can do it for you actually have an orchestrator I will talk

17:36.720 --> 17:44.480
a bit later about it and another thing that you need to do is you need to

17:44.480 --> 17:55.360
is to make your clusters trust each other so each of your clusters has a ubered instance

17:55.360 --> 18:01.600
and each ubered instance has its own certificate authority you need to exchange the certificate

18:01.600 --> 18:08.240
between the source and the target cluster so they will exchange the data you can find

18:08.240 --> 18:16.960
the info in the user guide how to do it like it's super easy yeah and every time

18:16.960 --> 18:30.560
you talk about security it's super easy no limitations we would not live my great storage or crosscluster

18:30.800 --> 18:36.960
this that are shareable and shareable I'm not talking about read rate minorities I'm talking

18:36.960 --> 18:45.360
about read rate 90 plus specially configured that this disk has several writers like you can do it

18:45.360 --> 18:54.240
in pub weer you need to like a special set up for it but we cannot guarantee that we will keep your data

18:54.240 --> 19:02.480
consistent if you have multiple writers then we also do not support for systems devices

19:03.280 --> 19:09.520
do not be confused we confused with the file system volume mode it's not the same it's different

19:09.520 --> 19:16.240
it's like for system devices it devices that you list under like the main devices for system

19:17.040 --> 19:26.000
different things and also landings because we can not guarantee that you have the same lands in the target

19:29.120 --> 19:41.840
and now let's see how little demo to short demo it will not be as impressive as we get

19:41.920 --> 19:49.600
demo I'm not like sending pinfields just network guys they send the packages versus the package

19:49.600 --> 19:57.440
yes we cannot we cannot like lose the packages we are chilling storage we just don't lose the data

19:58.400 --> 20:09.920
no okay I'm not so chill but okay so you can see on the left is my source cluster on the target

20:11.200 --> 20:17.680
on the left is my source cluster on the right is my target cluster we created virtual machine

20:18.640 --> 20:33.520
it has a data volume it's now it now pulls image from registry it also has a network

20:33.520 --> 20:40.520
I've already done it for this

20:43.520 --> 20:48.520
I've already done it for this to import

20:52.520 --> 21:01.520
I just connect to the console and we write some super simple file

21:02.080 --> 21:06.080
a little special

21:13.760 --> 21:25.600
now we got to the target cluster I already have VM YAMO prepared for this

21:31.520 --> 21:43.120
so I create my target VM you see that VMI target VMI also created it still has waiting for

21:43.120 --> 21:50.560
sync status so we're just sitting there doing nothing waiting

21:50.560 --> 22:00.560
and you can see that I have the sanitation for the restaurant restore and strategy

22:06.560 --> 22:10.560
and my run strategy is very to the receiver

22:20.560 --> 22:31.920
and you see that the data volume also got created but it's not populated it's still waiting for

22:31.920 --> 22:38.400
for consumer and we create the VMI

22:39.360 --> 22:44.400
VMI sorry

22:46.400 --> 23:00.400
and now we want to get the the connect URL from the from the cube with CR

23:08.400 --> 23:15.400
and our source VMI

23:18.400 --> 23:26.400
has this connect URL so we created and you see the migration started

23:26.400 --> 23:32.400
this is the target VMI I am started scheduling

23:33.400 --> 23:39.400
I'm the source VM is still running

23:39.400 --> 24:01.400
and now we connect to the console in the target cluster

24:01.400 --> 24:09.400
now we connect to the console in the target cluster

24:09.400 --> 24:12.400
and you see it didn't even ask us for password

24:12.400 --> 24:19.400
so generally you can watch your YouTube on the same VM and it will be migrated across the clusters

24:19.400 --> 24:22.400
and you wouldn't even notice

24:23.400 --> 24:30.400
so let's eat with demo

24:30.400 --> 24:36.400
now let's see and I talked about orchestrator

24:36.400 --> 24:43.400
the risk orchestrator for bulk cross cross cluster migrations

24:43.400 --> 24:48.400
it's available and open source it's called the forklift

24:48.400 --> 24:54.400
and it also has nice UI you can you can try it

24:54.400 --> 25:01.400
and then the created for developing the feature go to a leecher, the Xander Viles, Alex Kelenuk,

25:01.400 --> 25:07.400
the CD and the stems to Miguel for demo setup

25:07.400 --> 25:13.400
and stems to community, keyboard community for reviews and acceptance

25:13.400 --> 25:20.400
and thank you for watching it