FortiGate
FortiGate Next Generation Firewall utilizes purpose-built security processors and threat intelligence security services from FortiGuard labs to deliver top-rated protection and high performance, including encrypted traffic.
ssener
Staff
Staff
Article Id 221176
Description

This article describes about the issue where, while upgrading a two chassis a-p environment, the secondary node upgrades itself and the failover does not automatically take place. 


ssener_0-1660829155782.png

 

The upgrade process is initiated from the master. Slave upgrades itself. Master reports slave is back online but the failover never takes place. 

Master counts down until the upgrade times out, slave reverts its firmware to the initial version and the upgrade process is finished.

(Both members are in starting FortiOS version and no failover happens. Meaning no service disruption but the upgrade is not completed either).

Comlog of slave will report one successful upgrade and 20 minutes after the message of rollback:

 

ssener_1-1660829601085.png

 

(In case there is console connectivity, the same messages could be tracked through console as well).

Upgrade failure does not happen for all the paths.

Remark : 6.2.8 to 6.4.4 upgrade completes without issues. This article is only for the upgrade that does not complete.

Scope 5K - SLBC (a-p) / 6.2.8 -> 6.4.7 upgrade
Solution

Normally, the workflow that is expected would be, After master reports the slave member is all up, failover should take place with a 'Force-to-' flag and initial master upgrades itself. 

 

Forcing the failover manually once the slave Firewall blade is up and running helps with the failing upgrade process.

 

Following the below steps will help with the upgrade that is not completed properly:

 

1) Perform the regular checks, get backup etc, verify cluster is in sync.


2) Initiate the upgrade from master.


3) Wait for the slave to boot up.


4) Constantly check the Comlog (or console )of MASTER. It will keep reporting the message 'Image upgrade in progress. 19 minutes before aborting'

 

ssener_2-1660829869852.png

 

Seeing 'All members of the secondary chassis are up' message on Master FortiGate Blade is a prerequisite to proceed.


5) Verify the slave booted up without config loss 'get system startup-error-log' (Also verify there is no major config loss.) Verify the blades on slave chassis are in running state.


6) On Master FortiGate Comlog , once the 'Image upgrade in progress. 7 minutes before aborting' is seen (7 is not an exact number, just give the chassis 10-12 minutes.) from controller, force a failover.

Remark: DO NOT do the failover with PRIORITY!


7) Assuming chassis 1 is master (upgrade was initiated from Chas1 blade) the command would be 'diagnose system ha force-slave-state by-chassis 3 1'
(3 means '3 seconds delay' 1 means 'chassis id 1' will be forced to slave.)


8) Once the failover is triggered, the Comlog of the INITIAL (now former) master will report the following:

 

ssener_3-1660830098082.png


9) There is nothing special to be done after this step. Verify that the initial master is also upgraded to the same firmware.
Its reporting 'Config-Sync: Slave'


10) Verify that the cluster is back in sync using the command 'diagnose sys confsync status' (this can take up to 5-10 minutes).
Both should be equal to 1 -> in_sync=1

 

11) Once the upgrade operation is finished and units are in sync, make sure to remove the 'force-slave-state' flag using the command -> 'diagnose system ha force-slave-state clear 3'

 

Information on Comlog feature : https://community.fortinet.com/t5/FortiGate/Technical-Tip-How-to-use-COMLog-feature/ta-p/195390

Contributors