Update HA Pair with Cluster Splitting¶
Customer Managed Applies to customer-managed instances of Alation
Follow these steps to update your High Availability (HA) pair by first splitting the cluster and rebuilding it after the update.
Split and Update¶
To update:
Make sure you have a valid backup or a full system image before you run the update.
Separate both the servers into standalone instances by running the following command from the Alation shell on each of them:
sudo /etc/init.d/alation shell
alation_action cluster_enter_standalone_modeStart with updating the Secondary instance. It is not required to stop the Alation services for the time of the update.
On the Secondary instance, disable the scheduled queries. See Enable or Disable Query Scheduling.
Start a Screen session to keep track of the update output.
On Secondary, outside the Alation shell: unpackage the update binary:
sudo rpm -Uvh <path_to_the_update_package>/<alation-####>.rpm
Note
If you receive an error
headerRead failed: hdr data: BAD, no. of bytes(...) out of rangeat this step, troubleshoot using recommendations in RPM Installation Error During UpdateOn Secondary, outside the Alation shell: initialize the update:
sudo /etc/init.d/alation update
You can monitor the progress by tailing /opt/alation/<alation-####>/var/log/installer.log (path outside of the Alation shell):
tail -f /opt/alation/<alation-####>/var/log/installer.log
After the update of the Secondary server is completed, log in to Alation using the IP address and validate the system.
Proceed to update the Primary instance.
On Primary, launch a Screen session.
On Primary, outside of the Alation shell: unpackage the RPM.
sudo rpm -Uvh <path_to_the_update_package>/<alation-####>.rpm
On Primary, outside of the Alation shell: initialize the update.
sudo /etc/init.d/alation update
You can monitor the progress by tailing /opt/alation/<alation-####>/var/log/installer.log (path outside of the Alation shell):
tail -f /opt/alation/<alation-####>/var/log/installer.log
Note that
####represents the Alation version number inx.y.z.nnnnformat (x= major,y= minor,z= patch, andnnnn= build), for example:tail -f /opt/alation/alation-4.14.7.20232/var/log/installer.log
After the update is completed, log in to Alation using the server IP and validate the update.
Rebuild the HA pair.
Rebuild HA Pair¶
To rebuild the HA pair:
Put the updated Primary server back into the active instance mode. This action is unsafe. The action includes a restart of the Alation services. Run this command from the Alation shell:
alation_action cluster_enter_master_modeOn the Secondary instance, disable instance protection. This action is safe. Run this command from the Alation shell:
alation_conf alation.cluster.protected_instance -s False
On the Primary instance, from inside the Alation shell, run the command to add the Secondary server to the cluster. This action is unsafe as it deletes any instance that is not protected from replication.
alation_action cluster_add_slavesOn the Secondary instance, from inside the Alation shell, in a Screen session, run the command to copy the KV Store over from the Primary instance. This action is unsafe because it deletes all your KVStore data on the target machine. This may take from five minutes to 1 hour depending on the size of the data.
alation_action cluster_kvstore_copyOn the Secondary instance, from inside the Alation shell, in the same screen session, run the command for the Postgres replication. This action is unsafe because it deletes all your Postgres data on the target machine. This may take from
30minutes up to8hours depending on the size of the data.alation_action cluster_replicate_postgresCheck that the replication is happening: from the UI of the Primary, check the following URL: <your_alation_URL>/monitor/replication. It will return the byte lag with Postgres and Mongo (Mongo is only present in V R4 and older releases). If replication is running, it will return some realistic byte lag values. This means you have successfully rebuilt your HA pair. If replication is not running, it will return “unknown”, and this may be indicative of replication failure.
Synchronize the Event Bus data. From the Secondary, run:
alation_action cluster_start_kafka_sync