Update HA Pair with Cluster Splitting

Customer Managed Applies to customer-managed instances of Alation

Follow these steps to update your High Availability (HA) pair by first splitting the cluster and rebuilding it after the update.

Split and Update

To update:

  1. Make sure you have a valid backup or a full system image before you run the update.

  2. Separate both the servers into standalone instances by running the following command from the Alation shell on each of them:

    sudo /etc/init.d/alation shell
    
    alation_action cluster_enter_standalone_mode
    
  3. Start with updating the Secondary instance. It is not required to stop the Alation services for the time of the update.

  4. On Secondary, outside the Alation shell: unpackage the update binary:

    sudo rpm -Uvh <path_to_the_update_package>/<alation-####>.rpm
    

    Note

    If you receive an error headerRead failed: hdr data: BAD, no. of bytes(...) out of range at this step, troubleshoot using recommendations in RPM Installation Error During Update

  5. On Secondary, outside the Alation shell: initialize the update:

    sudo /etc/init.d/alation update
    
  6. You can monitor the progress by tailing /opt/alation/<alation-####>/var/log/installer.log (path outside of the Alation shell):

    tail -f /opt/alation/<alation-####>/var/log/installer.log
    
  7. After the update of the Secondary server is completed, log in to Alation using the IP address and validate the system.

  8. Proceed to update the Primary instance.

  9. On Primary, launch a Screen session.

  10. On Primary, outside of the Alation shell: unpackage the RPM.

    sudo rpm -Uvh <path_to_the_update_package>/<alation-####>.rpm
    
  11. On Primary, outside of the Alation shell: initialize the update.

    sudo /etc/init.d/alation update
    
  12. You can monitor the progress by tailing /opt/alation/<alation-####>/var/log/installer.log (path outside of the Alation shell):

    tail -f /opt/alation/<alation-####>/var/log/installer.log
    

    Note that #### represents the Alation version number in x.y.z.nnnn format (x = major, y = minor, z = patch, and nnnn = build), for example:

    tail -f /opt/alation/alation-4.14.7.20232/var/log/installer.log
    
  13. After the update is completed, log in to Alation using the server IP and validate the update.

  14. Rebuild the HA pair.

Rebuild HA Pair

To rebuild the HA pair:

  1. Put the updated Primary server back into the active instance mode. This action is unsafe. The action includes a restart of the Alation services. Run this command from the Alation shell:

    alation_action cluster_enter_master_mode
    
  2. On the Secondary instance, disable instance protection. This action is safe. Run this command from the Alation shell:

    alation_conf alation.cluster.protected_instance -s False
    
  3. On the Primary instance, from inside the Alation shell, run the command to add the Secondary server to the cluster. This action is unsafe as it deletes any instance that is not protected from replication.

    alation_action cluster_add_slaves
    
  4. On the Secondary instance, from inside the Alation shell, in a Screen session, run the command to copy the KV Store over from the Primary instance. This action is unsafe because it deletes all your KVStore data on the target machine. This may take from five minutes to 1 hour depending on the size of the data.

    alation_action cluster_kvstore_copy
    
  5. On the Secondary instance, from inside the Alation shell, in the same screen session, run the command for the Postgres replication. This action is unsafe because it deletes all your Postgres data on the target machine. This may take from 30 minutes up to 8 hours depending on the size of the data.

    alation_action cluster_replicate_postgres
    
  6. Check that the replication is happening: from the UI of the Primary, check the following URL: <your_alation_URL>/monitor/replication. It will return the byte lag with Postgres and Mongo (Mongo is only present in V R4 and older releases). If replication is running, it will return some realistic byte lag values. This means you have successfully rebuilt your HA pair. If replication is not running, it will return “unknown”, and this may be indicative of replication failure.

  7. Synchronize the Event Bus data. From the Secondary, run:

    alation_action cluster_start_kafka_sync