Migration Best Practices
When we run a mainnet validator for a network, we always try to run a fully-synced non-validator node as backup. Below is the details about practicing validator swap with a fully synced node.
Scenario
We have two nodes: Node A (validator) and Node B (fully-synced non-validator). Both are running under the management of cosmovisor. We have a backup copy of priv_validator_key.json
from Node A. We want to move the validator role from Node A to Node B.
Best Practice
Double Check: Double check that we have the correct backup for
priv_validator_key.json
.Stop Node A: Stop
cosmovisor
service on Node A. Double check the service status to make sure it is off. On pundiscan explorer, you should see the validator missing blocks.Restart Node A As Non-Validator: Delete
priv_validator_key.json
from Node A and restartcosmovisor
service. Check 3 places to ensure no double-signing. First, the newly generatedpriv_validator_key.json
should be different from the one on the backup. Second, on pundiscan explorer, you should see the validator missing blocks. Third, run aBINARY status
command and make sure to seeVotingPower
as 0.Change Validator Role to Node B: Stop Node B's
cosmovisor
service. Replace Node B'spriv_validator_key.json
with the backup copy from A. Restart B'scosmovisor
service. Make sure it is making blocks on pundiscan explorer.House Clean: The specifics only apply to how we set up the cluster. You should adapt according to your setup.
Swap the server name with "sudo hostname xxx" on each server (for example, between "fxcore_mainnet" and "fxcore_mainnet_backup")
Swap the server shortcut in ~/.ssh/config file for each ssh login
Swap the log name in the promtail.yml file on each server and restart promtail.
Swap the server hosts in the monitoring server deployment script so Prometheus and Grafana get the servers right. Redeploy the monitoring script
Update the server names on the server provider (Hetzner, Contabo, AWS, GCP, Alicloud) to avoid confusion
Update the inventory file in the cosmos server deployment script. This is not very important since the deployment of both servers are completed. However, it is so to avoid future confusions because both servers will still be running, although in the opposite role of validator vs. non-validator.
Last updated