Failback an application


Once your unhealthy Kubernetes cluster is back up and running, the Portworx nodes in that cluster will not immediately rejoin the cluster. They will stay in Out of Quorum state until you explicitly Activate this cluster domain.

After this domain is marked as Active you can failback the applications if you want.

The following considerations are used in the examples on this page. Update them to the appropriate values for your environment:

  • Source Cluster is the Kubernetes cluster which is down and where your applications were originally running. The cluster domain for this source cluster isus-east-1a.
  • Destination Cluster is the Kubernetes cluster where the applications will be failed over. The cluster domain for this destination cluster isus-east-1b.

Reactivate your source cluster domain

Follow these steps from your destination cluster to initiate a failback:

  1. Run the following command to activate the source cluster:

    storkctl activate clusterdomain us-east-1a
    Cluster Domain activate operation started successfully for us-east-1a
  2. Verify if the source cluster domain is activated:

    storkctl get clusterdomainsstatus
    NAME                            LOCAL-DOMAIN   ACTIVE                                     INACTIVE   CREATED
    px-dr-cluster                   us-east-1a     us-east-1a (InSync), us-east-1b (InSync)              29 Nov 22 22:09 UTC

Reverse sync your clusters

If the destination cluster has been running applications for some time, it is possible that the state of your application might differ from your source cluster. This is because new resources are created, or data in the stateful application has changed on your destination cluster. To ensure that you have the most updated applications on your source cluster before failing back your application, you must reverse sync your clusters using the reverse migration schedule that you created previously.

  1. Activate your reverse migration schedule on your destination cluster:

    storkctl resume migrationschedule reversemigrationschedule -n <migrationnamespace>
  2. Verify if at least one migration cycle has been successfully completed:

    storkctl get migration -n <migrationnamespace>
    NAME                                                  CLUSTERPAIR                 STAGE   STATUS       VOLUMES   RESOURCES   CREATED               ELAPSED                               TOTAL BYTES TRANSFERRED
    reversemigrationschedule-interval-2023-02-01-201747   <your-remote-clusterpair>   Final   Successful   0/0       4/4         01 Feb 23 20:17 UTC   Volumes () Resources (21.71709746s)   0
  3. Deactivate the reverse migration schedule:

    storkctl suspend  migrationschedule reversemigrationschedule -n <migrationnamespace>

Stop the application on the destination cluster

Stop the applications from running by changing the replica count of your deployments and statefulsets to 0:

storkctl deactivate migration -n <migrationnamespace>

Start back the application on the source cluster

  1. After you have stopped the applications on the destination cluster, start the applications on the source cluster by editing the replica count:

    storkctl activate migration -n <migrationnamespace>
  2. Verify if your application (for example, Zookeeper) pods are up and running:

    kubectl get pods -n <migrationnamespace>
    NAME   READY   STATUS        RESTARTS   AGE
    zk-0   1/1     Running       0          4m
    zk-1   1/1     Running       0          5m
    zk-1   1/1     Running       0          7m
  3. Resume the migration schedule:

    storkctl resume migrationschedule migrationschedule -n <migrationnamespace>
    MigrationSchedule migrationschedule resumed successfully
  4. Verify if the migration schedule is active:

    storkctl get migrationschedule -n <migrationnamespace>
    NAME                POLICYNAME                CLUSTERPAIR              SUSPEND   LAST-SUCCESS-TIME     LAST-SUCCESS-DURATION
    migrationschedule   <your-schedule-policy>   <your-clusterpair-name>   false     01 Dec 23 22:25 UTC   10s

    The false value for the SUSPEND field shows that the migration schedule for your policy is active on the source cluster. Hence, your application has successfully failed back to your source cluster.



Last edited: Tuesday, May 9, 2023