FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
cgustave
Staff
Staff
Article Id 197460

Description

 
The article describes how to restore the master role to the cluster unit 'preferred' master after a fail-over has taken place.
The goal is to illustrate the use of the CLI command 'diag sys ha reset-uptime' on a simple scenario.
 
Command 'diag sys ha reset-uptime' is documented in 'FortiOS Handbook: High Availability' documents available at https://docs.fortinet.com.
 
It is recommended to first read the 'Primary unit selection' chapter from 'FortiOS Handbook: High Availability'..
 
Note:
The use of 'diag sys ha reset-uptime' is only relevant when the cluster is configured with 'set ha override disabled'.

 

Scope

 

FortiGate.


Solution

 

Pros and cons of 'set ha override enable'.
 
Pros.
  • In a normal situation, the cluster's master is the unit with the highest priority, so the master is always the same unit which makes it easier to identify.
 
Cons.
  • If the master fails and recovers, it triggers a double fail-over: First one is normal because the other unit takes over. Second one, because it comes later up and takes priority because of the override enable. If this is something we want to avoid, it is recommended to configure ha with 'set ha override disable'.
  • If the cluster is set up to monitor a certain link and that link is flapping only on one node, but stable on the other, then the failover will happen repeatedly, possibly cutting the network access entirely.
 
Once the original master has failed over in 'set ha override disable' context.
 
how to force it to take the master role again?
 
When the ha cluster is configured with 'set ha override disable', if the original 'Active' unit fails and re-joins cluster after recovery, it is expected to join with the 'Backup' role (unless the ha uptime difference between the 2 units is less than 5 minutes, see the ha guide for more details on this).
 
It may be wanted to have the original master becoming master again.
This operation needs to be manually triggered by the administrator in a controlled timing (a maintenance window for instance).  
To achieve this, the administrator has to connect to the current master CLI (console, telnet/ssh cli,GUI CLI) and issue the command '# diag sys ha reset-uptime'. 
This would reset to 0 the current master internal HA uptime timer forcing the slave to have a higher ha uptime and therefore being promoted as new master.
 
To illustrate this, let's use an example:
 
Unit A.
is the preferred master
has a priority of 200
is configured with ha override disable
 
Unit B.
is the preferred slave
has a priority of 100
is configured with ha override disable
Time line:
|
|  t= 0 s : A and B are just booted
|              <- ha uptime difference is less than 5 minutes. As a consequence, HA uptime difference is ignored in master election process.
|              <- A is promoted to master because its priority is higher than B (200>100).
|
|
|  t=1 mn : A is rebooted
|              <-  A leaves the cluster but re-joins it as master after 2 minutes.
|                    This is expected because the HA uptime difference between A and B is less than 5 minutes.
|                    AS a result, the HA aging condition is ignored in the election algorithm (and A's priority trumps B's priority).
|
|
...
| t= 15 mn : A is again rebooted
|              <-   This time A rejoins the cluster as slave.
|                     Because ha uptime difference between A and B is greater than 5 minutes.
|
| The status is now : B=master, A=slave
|
|
|
| t= later... in a maintenance window
|       Administrator wishes to have its preferred master A back as the cluster master.
|       Administrator connects to B (current master) CLI and issues the following command:
|     
       diag sys ha reset-uptime
|      <- This resets B's internal HA uptime making A the oldest one.
|      <- A is promoted master.
|      <- B is degraded to slave.
|
V
 
Note:
The default 5 minutes 'ha uptime difference' limit is configurable.
It is possible to use 'set ha-uptime-diff-margin' for this (default remains 5 minutes). 
 
The 5min timer can be lowered, but in case of a failure that triggers units to repeatedly fail-over, it will possibly not allow enough time to access the unit and remedy the situation.
 
Why there is a 5 minute difference in the master election process.
 
When two units are booted more or less at the same time, we want the one with the highest priority to be elected master.
This is needed to consistently have the same unit elected as master.
 
This is even desirable in a virtual cluster context where it is expected for the virtual cluster 2 to be master on the slave blade.
For this, a time limit is needed where the HA aging condition should be ignored.

 

How to check the difference between members:

 

diagnose sys ha dump-by group
'FGVM16TM24000014': ha_prio/o=0/0, link_failure=0, pingsvr_failure=0, flag=0x00000001, mem_failover=0, uptime/reset_cnt=407/0 <----- '407' is a difference measured in seconds.
'FGVM16TM24000037': ha_prio/o=1/1, link_failure=0, pingsvr_failure=0, flag=0x00000000, mem_failover=0, uptime/reset_cnt=0/<-----'0' is for the device with the lowest HA uptime and '2' is the number of times HA uptime has been reset for this device.

 

The above shows how to identify the HA uptime difference between members. The member with 0 in the uptime column indicates the device with the lowest uptime. The example shows that the device with the serial number ending in 37 has an HA uptime that is 407 higher than that of the other device in the HA cluster. The reset_cnt column indicates the number of times the HA uptime has been reset for that device.