MOVEit Automation: Failover node cannot save the state
What to do if your failover node isn't communicating.
You may have a time when your primary and secondary nodes are not communicating in MOVEit Automation. Whether you are using the desktop app or the web app you may come across a notification that your nodes cannot save the state file from the primary.
If this situation happens to you there is a solution to fix the issue. The issue in question is that one of the statefiles for a task is corrupted. You can identify the task by checking your logs to find the task ID number, For the example we will use the task ID of 216657478.
In your logs you should see an entry similar to the following:
2023-08-13 03:43:44 z1 2250: FO: Command GETSTATEFORTASK failed: Failed to load state XML
2023-08-13 03:43:44 z2 2250: FO: Failed to send GETSTATEFORTASK command for task ID 216657478
This is how you can identify the task ID, once you have the task ID go into your logs folder for that relevant ID and open up one of the task run logs, Within this you will be able to see the name of the task. For this example we will call the task “Test task”.
Open up the task under the web admin or the desktop app and review the task itself. What we are looking to identify is if the task is set to only collect new files. If this isn't the case you can skip to the resolution below. If the task is set to collect only new files you will need to make arrangements to remove any files MOVEit Automation has interacted with previously before continuing to avoid the files being processed more than once.
The resolution to fix this is to stop the services of both nodes, Start with the secondary node and then the primary to avoid the system from failing over. Identify the task folder inside your statefiles folder, The default location will be C:/Program files/MOVEit/MOVEit automation/Statefiles/. Once inside identify the task via the task ID and delete this folder on both nodes.
Once complete then start the services up again, First with the primary then the secondary. The replication should now continue between both nodes.