Once again, the subject is spam.
Depending on the “stars” and the time of year, the proportion of spam can range from anywhere between 70 and 90% of all email traffic.
Sounds like a lot, eh? But when you take all Internet traffic into consideration, it’s not actually that much – email traffic accounts for around just 1%. On the other hand, you can’t just forget about spam. Here is a bit more about spam’s role in the cybercrime ecosystem. Combating this particular evil is part of the massive war we are waging on cybercriminals. It’s no exaggeration to say that if we fail on this front, the rest of our efforts will amount to nothing.
In other words, we love anti-spam technologies and promote them as much as possible. There is, however, a subtle difference from anti-malware technologies. More precisely, there are different criteria for evaluating the quality of protection for anti-spam and anti-malware technologies. For malware it’s fairly easy: the higher the detection level, the better. For spam it’s more important to have no false positives. This is quite reasonable: it’s much better for the user to take a couple of seconds to delete a spam message that sneaks through the filter than miss important business correspondence. So, protection against spam is, in a way, a more complicated task, literally trying to kill two birds with one stone. In this difficult task, cloud technologies are a great help.
As I wrote earlier, we’ve been using cloud technologies for a while, and with considerable success. But one interesting detail has amazingly been overlooked, and unfairly so. In the cloud-based Kaspersky Security Network (KSN), (video, details) there’s a rather impressive anti-spam cloud. It started from the Urgent Detection System (UDS). The link to similar anti-malware technology is no coincidence: both are based on similar principles.
This is how the traditional anti-spam technology works.
Let’s say an email arrives at a computer. It is immediately assailed by various anti-spam technologies, both local and cloud-based, which test the message and give verdicts. Based on these, the system decides whether this message lives or dies.
And this is what happens in the UDS.
The system takes a micro-signature from the email message and sends it to the cloud to check it against a dedicated spam database. Earlier we used 16-byte hashes; in 2011 we started the UDS2 (UDS 2nd generation) procedure involving 4-byte fuzzy hashes, which are more effective against obfuscated texts and are therefore better at filtering out spam. Importantly, these hashes do not create extra work for the analyst, since the system creates them automatically based on collected spam samples.
The two main issues for anti-spam are: (i) how fast updates are developed (analysts are only human and their resources are not unlimited); and (ii) how fast those updates are delivered to the user. For that reason, most developments in this field are now geared towards creating and refining a variety of dedicated technologies, like Möbius, and replacing human expertise as much as possible. This is where UDS2 has enormous potential.
You see, UDS2 has a feature called clustering. The first generation of UDS returned simple replies to users’ queries about whether a certain signature existed in the database or not. UDS2, however, groups similar signatures into clusters and calculates their spam reputations to boot! This in turn helps process spam automatically in the cloud.
The clustering and automatic spam processing is performed by Content Reputation technology. In 2012, we plan to release a number of new features based on it. The first is Rescan, which allows the user to impose a short (20-30 minutes) optional delay in checking only suspicious emails. In real life, such messages make up no more than 1% of the entire mail traffic, so the amount of quarantined mail will not exceed 100 MB even in a large organization. This period of time will be sufficient for us to figure out what is spam and what isn’t, and add a signature.
The second feature is Auto-ban, or automatic blocking of emails which were grouped into high spam reputation clusters. Naturally, they will then be checked manually by the analyst, and the blocking can be instantly rolled back if necessary. In this approach, the system prompts the human to do the analysis rather the human asking the system, which makes things faster and more effective.
Both features are now being put through their paces in internal tests and are delivering very encouraging results. For instance, Rescan reduces the amount of unrecognized spam by a factor of ten (yes, ten!) and improves the general detection rate from 99.50% to 99.95% (that’s right!) This is in fact among the best results across the entire industry, while still keeping the false positive rate at zero.
We also have some other very ambitious plans. First of all, we intend to develop a multidimensional ‘clusterizer’ to group and calculate reputations for a number of other attributes on top of fuzzy hashes. There are oodles of such attributes in KSN.
I wonder if we’re going to make it into the elite 100/0 club (100% detection / 0 false positives) in testing this year? Wish us luck!