FortiMail
FortiMail provides advanced, multi-layer protection against the full spectrum of email-borne threats
Adrian_Buckley_FTNT
Article Id 195759
Purpose
The purpose of this article is to give a reliable way to initially configure a Bayesian database. This applies to FortiMail 4.0.

Bayesian databases are the last line of defense with spam and one of the few ways to mount the decimal points in spam filtering after 99%. In order to begin using a Bayesian database, the FortiMail requires that the database has been trained with minimum amounts of emails as described in the following table:
Global database
Spam 100
Not Spam 200
Domain/Personal Database
Spam 100
Not Spam 200
In order to get to this level the database needs to be trained and the FortiMail has the mechanisms to do this automatically. If the database does not reach these minimum levels then the system will not use the database for email scanning.

These counters will continue to grow even without training. Bayesian scanning will actually refine the database, a process often known as "learning".

Expectations, Requirements

Setup a trained Baseline Bayesian database.


Configuration
Enable antispam options that will not cause false positives. It is important that the initial data the database is trained with is as accurate as possible. The options that are least likely to cause false detections are:
  • FortiGuard Antispam
  • DNSBL
  • SURBL
  • Heuristic
  • Image Scan - but only if it does not cause false positives in email
  • Deep Header (Black IP scanning) - but only if it does not cause false positives in email
  • Bayesian Scan. Use other techniques for auto training

Screenshot for FortiMail 4.0

abuckley_FD31773_a_FD31773_Fortimail-Antispam-Profile(TB).jpg


It is important to use a spam action of Quarantine and Archiving while training is in progress. This allows the review of the emails that get put into the SPAM and NOT SPAM portions of the database. The reason for this review is that email exist which are specifically designed to corrupt Bayesian databases.

Some email contains several paragraphs/pages from a news article, novel, or random text at the bottom. Bayesian is designed to work so that this extra text is not ignored but goes into the database which then impairs it's ability to properly determine spam and not spam.

abuckley_FD31773_a_FD31773-viagra-spam3.png

Once the database has been trained, it is important to make regular backups when it is working well. In this way if any issues arise due to the email it has been trained with, the database can be cleared out and restored to a working status without completely restarting the training process.

Automatic training should not be left running indefinitely. After reaching the minimum levels for operation it is important to turn off automatic training and begin targeted training of the database. This can be done in 2 ways:

1. From the GUI using email sames that are incorrectly identified by the Bayesian
2. Via email

Failure to disable this option will result in Bayesian functioning incorrectly and possibly causing mail flow issues.

If email training is being used it is also very important to use the addresses properly. If a sample is sent to an incorrect training email then the database will not be correctly updated. This could negatively impact email detection causing spam to get through and produce false positives.

is-spam – correct a false-negative
is-not-spam – correct a false-positive
learn-is-spam – new, never before seen piece of spam
learn-is-not-spam – new never before seen good email

Contributors