/

1 mai 2024

Email Phishing Training Efficiency: An experiment

Realizing that the classic training campaign ritual does not meet expectations and trying a more efficient – less effort, more impact – alternative.

90% of attacks involve email phishing, and around 40% of successful attacks are carried through this vector. Despite the efforts to filter those intrusions, some messages will inevitably find their way in. Among the traditional defensive arsenal, we see the combination of campaign simulation and training. Vendors sell tools, insurance and auditors request such activities, and everyone sleeps better when the job is delivered. And few ask the question: are those campaigns efficient?

Like many corporations, we have religiously stuck with the practice over the years. Meanwhile, we have neglected research questioning our approach, rivaling creativity to make such campaigns more appealing to our fellow colleagues, and we have respectfully reported clicking and participation rates to upper management. As a worldwide corporation, the cost of such pieces of training is not neglectable: material translation, HR synchronization, or worse, chasing phished colleagues reluctant to follow education sessions. Costly and not often the best use of stretched InfoSec team members’ time.

The goal of a prevention campaign should not be to reach some delivery KPI. Instead, what matters is to alter the recipients’ behavior and make them less prone to let an attacker in. And if it matters, it can be measured. 

The Method

To convince ourselves, we decided to conduct a simple experiment. We randomly assign office employees to two groups. One group – the test sample – is exposed to the classic protocol: a phishing campaign. The recipients can react by a) ignoring the email; b) reporting it through the dedicated client button; c) clicking on the insidious link; d) eventually entering their credentials on the opened page. Remote training follows, based on videos and quizzes. The second control group receives the email, but no further actions are taken. Three weeks later, we released a second phishing campaign for the whole population and counted how many users had demonstrated each behavior. To validate a training effect hypothesis, one should observe a significant decrease in dangerous behaviors c) and d) and an increase in the correct ones b). We shot the second campaign shortly after, arguing that if an effect is to be triggered, it can only fade out with time.

We are not interested in comparing the measures between the two campaigns, as one email could be more intriguing. Moreover, we crafted moderately abusive emails (for example, an email with an HR signature attached with an Excel file named “bonuses.xlsx” in May could unleash a frantic clicking rate).

First Experiment: the “Classic” Training Campaign

The first experiment was conducted similarly to most of our regular campaigns. The email, translated into the recipient’s language, was sent to 1832 persons (office employees and store managers) and contained only a phishing link (no credentials were asked). The test population was then invited to follow a 20-minute dedicated training sold by a major vendor. By “invited,” we mean that they received an invitation by email. For the ones who did not proceed, other emails would follow, then personal messages, phone calls, and eventually management escalation. 

Despite all those efforts, 270 persons assigned to the training did not follow it. 

A second message was sent three weeks later to everyone, and behaviors were measured. In the reposting below, we split the test sample among test1, who followed the training, and test2, who did not. 

 controltest1test2test
Population size934628270898
Linked clicked29%32%25%29%
Email reported9%18%4%14%

(Temporary) Conclusions

The first striking observation is how the training does not positively influence the behavior when the time comes not to click on the malicious link. However, the difference among our measures is too small to demonstrate the even detrimental effect shown in the aforementioned article – exposition to training inducing more dangerous behaviors.

However, all those efforts were not a total waste of time. People following the training doubled their propensity to adequately report the malicious email. Even worse, people avoiding the activity have a four-fold probability of not to reporting – a simple button click. Somehow, training does not affect how many people open the door to a pirate. Still, it doubles the number of those who will signal the intruder – which is, in practice, used to block other similar messages and call back those already delivered.

Second experiment: Inviting Riot’s Chatbot

The first experiment carried a shaking revelation: has all the work carried over all those years – almost – useless? We promptly dismissed engaging in more pieces of training, partly because of the effort to run and the little engagement demonstrated by the audience.

The motivation was low among the InfoSec team. But they have a hacker soul, never resting in front of adversity – that is part of the job description!

Gamification has been around in the prevention arena for a while, but OpenAI releasing ChatGPT has opened many other possibilities. Among them, we crossed the path of Riot, a promising French company active in the field. The team decided to give it a try.

The experiment followed the same protocol: 200 random employees from five countries were assigned to the test and control groups. The population size is significantly smaller than in the first experiment, driven by the number of demo licenses available. The effort is only to register the test people to the tool, which will trigger phishing emails (translated and randomly drawn from a pool of five). Our goal is not to describe Riot tool behavior. To summarize, a chatbot – audaciously named Albert – contacts the user through Teams messages, explains the eventual error, and engages in a conversation. The conversational agent naturally carries the thread’s context and proposes an awareness training on email phishing. A significant change is how the InfoSec team had no follow-up to do after configuring the tool with the proper access – especially not the colleagues’ harassment part.

Two weeks later (we shall return to the two vs. three-second round delay), another email is sent to the whole population, and data are ready to be collected. However, we can compare the rate between the bootstrap and final campaign as emails are drawn from the same pool – taking care not to send twice the same to any given person.

 Initial email (all)controltest
Population size19080110
Link Clicked22%19%10%
Credentials given16%11%6%
Email reported25%25%32%

At last! An effect is apparent. It is reinsuring to see no noticeable difference between the initial email to all and the second one to the control cohort. However, dangerous behaviors were reduced by two-fold for the test population engaged by the chatbot.

Conclusions

If the “classic” method had shown little efficiency (confirming literature), Riot’s ChatBot has demonstrated a clear behavior modification. However, it must be noted that the experiment only lasted for one week, when a regular mode would have to have frequent interactions, as the operation cost is marginal.

Besides the behavioral change, other effects were unexpected – and welcome:

      • 94% of the population completed the training without human intervention within a couple of days, compared to 70% in the first experiment, involving massive efforts. This difference explains why the second email was shot two weeks after the first one, compared to three weeks previously.

      • 84% have spontaneously completed another proposed training on password healthiness, and 75% on spear phishing. People following InfoSec training on their own could be seen as a kind of Grail.

      • Even after a week, users started referring to Albert in email communication with InfoSec (“you should tell Albert that this email looks suspicious”), and a satisfaction survey was apologetic.

      • The load on the InfoSec team was marginal.

    a propoS