FortiSIEM
FortiSIEM provides Security Information and Event Management (SIEM) and User and Entity Behavior Analytics (UEBA)
FortiKoala
Staff
Staff
Article Id 197470
Description

How Do I Safely Reboot the Processing Node(s)


Scope

Installation and Administration


Solution

Valid as of ZoneFox version 4.1

Preparing for Reboot

In order to safely reboot the processing node(s) you must ensure that there is no data throughput. This can be done by first switching off the Collector Server (CS) on the Windows server(s). The CS can be managed through Internet Information Service (IIS) Manager on the windows server. Select the Collector Server site and under Manage Website click Stop.  



With the CS stopped the agents will no longer send events to the server. Instead they will cache events locally until the CS is back online.


Ensure that there are no events in the kafka queue that are waiting to be written to elasticsearch. You check the events waiting to be written to elasticsearch by running the following command on the processing node(s):

/opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group zf_logstash_live --describe

 Your output should look something like this:



The ‘LAG’ column will tell you how many events are waiting to be written to elasticsearch. When this is 0 for all topics then there is no backlog and you are free to continue the reboot process. Otherwise wait until the backlog is cleared. If these numbers do not decrease then contact ZoneFox support.


Next you must stop the services running on the processing node(s). First stop the Logstash service by running the command:

systemctl stop logstash.service

(or ‘service logstash stop’ if you’re running ubuntu 14). 

Check that it has stopped successfully with:

systemctl status logstash.service

(or ‘service logstash status’ on ubuntu 14).


Next stop the kafka service by running the command:

systemctl stop kafka.service

(or ‘service kafka stop’ if you’re running ubuntu 14). 

Check that it has stopped successfully with:

systemctl status kafka.service

(or ‘service kafka status’ on ubuntu 14).


Next the Zookeeper service:

systemctl stop zookeeper.service for ubuntu 16, systemctl stop zookeeper-server.service for centos 7, and service zookeeper stop if you’re running ubuntu 14.

Then use the status command to ensure that is has stopped successfully.


Finally stop the ai service with the following command:

systemctl stop adai.service

(or ‘service adai stop’ if you’re running ubuntu 14). 

Check that it has stopped successfully with:

systemctl status adai.service

(or ‘service adai status’ on ubuntu 14).

Recovering After Reboot

Before allowing the data throughput again you need to ensure that all components are ready.


First ensure that the service that you stopped in the previous section started up after the reboot by using the status command. If any of these service failed then attempt to start them again using the systemctl start (or service xx start if you are using ubuntu 14). If any of these service continue to fail to start then consult their respective error logs and contact ZoneFox support for any assistance.


Next you need to ensure AI is set to anomaly detection mode. This can either be done using the api. The curl command for example can be used like this:

curl -XPOST http://<processing node ip>:8144/api/live?type=ad

Or you can use the web interface by going to http://<processing node ip>:8144 in a modern web browser and switching AI into anomaly detection mode by click the AD button in the top right Live Stats Widget.



Once this is done you can once again allow the data throughput. Go back to the collector server(s) that you switched off and select Start under Manage Website.

Checking Recovery

Finally you should ensure that the system has recovered to its previous state. As an additional check you should ensure that there is a throughput of events and alerts. The simplest and most effective way of doing this is to monitor the Threat Hunting, the Policy Alerts and AI Alerts pages for new items. However for systems with multiple Windows or processing nodes this may only tell you that some of these nodes have a throughput.


You can also check the system status page in your ZoneFox console. Go to Admin - System and expand the different graphs for different components to check their throughput.



The AI web interface will also let you know if that component in receiving event.



Finally you can check the kafka queue again with the command:

/opt/kafka/bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group zf_logstash_live --describe



An increase in the leg-end-offset means that new events are coming into the kafka queue from the collector, while changes in the current-offset mean that logstash is successfully reading events from kafka and uploading them to elasticsearch. The lag is the difference between them. A lag of several hundred is quite normal can can reach several thousand at peak times.






















Contributors