A Comprehensive Guide to K8S Rancher Cluster Migration

K8S Rancher Cluster Migration
K8S Rancher Cluster Migration

Managing multiple Kubernetes clusters can be a daunting task, but Rancher simplifies the process by centralizing access, administration, and provisioning. However, when it comes to migrating clusters provisioned through Rancher, some limitations can arise that you need to be aware of.

Why Migrate Your K8S Rancher Cluster?

Rancher’s provisioning capabilities allow you to deploy clusters using various Kubernetes distributions and cloud providers through a single Terraform provider. However, one significant limitation is that you cannot unregister clusters provisioned through Rancher without destroying them. In contrast, imported clusters have a straightforward migration process.

Note: This article shares personal experiences and should not be considered an official procedure. Always back up your data before making any changes. For backups, I highly recommend using Velero, which we’ll discuss in a future article.

Current Scenario

In our case, we had an older Rancher instance running on an RKE cluster that we aimed to de-provision. Over time, we managed to detach several clusters as they were decommissioned. However, we still had two Azure AKS clusters, labeled “Rancher-Owned,” that were not scheduled for decommissioning anytime soon. Given that these are sensitive and longstanding production environments, we wanted to avoid extensive platform reconstruction.

Procedure Overview

The clusters in question are Azure AKS, but this procedure should work for other cluster types as long as you can generate a KUBECONFIG file independent of Rancher.

Step 1: Generate a KUBECONFIG

The first step is to create a local KUBECONFIG file, enabling you to manage the cluster directly via the API without going through Rancher. For Azure AKS, this is done using the Azure CLI:

export KUBECONFIG=~/.kube/tmp-aks.yaml
az aks get-credentials --resource-group MyRG --name MyCluster

After running the above commands, you can verify the nodes in your cluster:

kubectl get nodes

Step 2: Clean Up Rancher

Next, you need to remove the Rancher agent and any traces of Rancher from the cluster. Use the Rancher Cleanup tool to do this:

git clone https://github.com/rancher/rancher-cleanup.git
kubectl create -f deploy/rancher-cleanup.yaml

You can monitor the progress of this operation with:

kubectl -n kube-system logs -l job-name=cleanup-job -f

Once the job is completed, don’t forget to delete it to prevent accidental execution later:

kubectl delete -f deploy/rancher-cleanup.yaml

Now, your cluster should be restored to a standard Azure AKS environment without any Rancher traces.

Step 3: Remove the Cluster from Rancher

At this point, your old Rancher instance’s UI will likely display several error messages indicating communication issues with the cluster. This is normal, as the Rancher agent and RBAC account have been removed. However, you cannot delete the cluster through the UI just yet; Rancher will attempt to destroy it.

To resolve this, you’ll need to edit the cluster configuration in Rancher. This configuration is stored as a Kubernetes object, which you can access with kubectl from your supporting cluster.

First, retrieve the cluster ID from the Rancher dashboard URL, which looks like this: https://rancher.mydomain.com/dashboard/c/c-12345/explorer#cluster-events.

Then, you’ll want to remove the finalizers that handle complete deprovisioning when the cluster is deleted:

kubectl get clusters.management.cattle.io
kubectl patch clusters.management.cattle.io c-12345 -p '{"metadata":{"finalizers":null}}' --type=merge

After running this command, you can return to the Rancher UI, navigate to Cluster Management, and proceed to delete the cluster.

Step 4: Reintegrate into the Destination Rancher Instance

Finally, you can follow the standard procedure to import your AKS cluster into the new Rancher instance. If you’re using Terraform, take your existing code and add the imported = true switch. Just be cautious about potential drifts:

resource "rancher2_cluster" "my-aks-to-import" {
  name        = "my-aks-to-import"
  description = "Terraform AKS Cluster"
  aks_config_v2 {
    cloud_credential_id = rancher2_cloud_credential.aks.id
    name                = var.aws_aks_name
    region              = var.aws_region
    imported            = true
  }
}

After a few minutes, your cluster should be available in the new Rancher instance, ready for further management.

Conclusion

Migrating a K8S Rancher cluster doesn’t have to be complicated. By following the steps outlined in this guide, you can effectively manage your Kubernetes clusters without facing the common pitfalls associated with Rancher’s provisioning model. Remember to back up your data and proceed cautiously to ensure a smooth migration.