Failover an application

In case of a disaster, where one of your Kubernetes clusters is down and inaccessible, you can failover the applications running on it to an operational Kubernetes cluster. To achieve this, you should stop your application on the source cluster and start the application on an active Kubernetes cluster.

The following considerations are used in the examples on this page. Update them to the appropriate values for your environment:

  • Source Cluster is the Kubernetes cluster which is down and where your applications were originally running. The cluster domain for this source cluster isus-east-1a.
  • Destination Cluster is the Kubernetes cluster where the applications will be failed over. The cluster domain for this destination cluster isus-east-1b.

Deactivate the failed cluster domain

In order to failover an application, you need to instruct Stork and Portworx that one of your Kubernetes clusters is down by marking the source cluster as inactive, if the cluster is accessible.

  1. Run the following command to deactivate the source cluster. You need to run this command on the destination cluster where Portworx is still running:

    storkctl deactivate clusterdomain us-east-1a
    Cluster Domain deactivate operation started successfully for us-east-1a
  2. Verify if your source cluster domain has been deactivated:

    storkctl get clusterdomainsstatus
    NAME                            LOCAL-DOMAIN   ACTIVE                           INACTIVE                         CREATED
    px-dr-cluster                   us-east-1a     us-east-1b (SyncStatusUnknown)   us-east-1a (SyncStatusUnknown)   29 Nov 22 22:09 UT

    You can see that the cluster domain of your source cluster is listed under INACTIVE indicating that your source cluster domain is deactivated.

Stop the application on the source cluster (if accessible or applicable)

If your source Kubernetes cluster is still alive and is accessible, Portworx, Inc. recommends you to stop the applications before failing them over to the destination cluster.

You need to stop the applications by manually changing the replica count of your deployments and statefulsets to 0. In this way, your application resources will persist in Kubernetes, but the actual application will not be running.

kubectl scale --replicas 0 statefulset/<your-app-name> -n <migrationnamespace>

The above command will scale down the replica count of your application running in the <migrationnamespace> namespace.

Suspend the migrations on the source cluster (if accessible)

NOTE: Skip this section if autoSuspend is set, which will automatically suspend your migration schedules on the source cluster. Therefore, proceed to the next section.
  1. Run the following command to suspend the migration schedule. Once the replicas for your application’s statefulset are set to 0, you need to suspend the migration schedule on the source cluster. This is done so that your application’s stateful sets are not updated to 0 replicas on the destination cluster:

    storkctl suspend migrationschedule migrationschedule -n <migrationnamespace>
  2. Verify if the schedule has been suspended:

    storkctl get migrationschedule -n <migrationnamespace>
    migrationschedule   <your-schedule-policy>   <your-clusterpair-name>   true      01 Dec 22 23:31 UTC   10s

Start the application on the destination cluster

You can allow Stork to activate migration either on all namespaces or one namespace at a time. For performance reasons, if you have a high number of namespaces in your migration schedule, Portworx, Inc. recommends you migrate one namespace at a time.

  1. Each application spec will have the annotation indicating the replica count on the source cluster. Run the following command to update the replica count of your app to the same number as on your source cluster:

    storkctl activate migration -n <migrationnamespace>

    Run the following command to migrate all namespaces:

    storkctl activate migration --all-namespaces

    Stork will look for that annotation and scale it to the correct number automatically. Once the replica count is updated, the application will start running, and the failover will be completed.

  2. Verify that your application is up and running:

    kubectl get pods -n <migrationnamespace>
    zk-0   1/1     Running   0             3m18s
    zk-1   1/1     Running   0             2m54s
    zk-2   1/1     Running   0             99s

    You can see that the status of all application pods (for example, Zookeeper pods) in the <migrationnamespace> namespace are running, indicating that your application is operational.

Last edited: Tuesday, May 9, 2023