Finding the Needle in the Haystack. Introducing: Astraea.
Somewhere in the office there’s a carefully guarded little big black book that contains a collection of up-to-date KL facts & figures, which we use in public performances. You know, things like how many employees we have, how many offices and where, turnover, etc., etc. One of the most oft-used figures from this book is the daily number of new malicious programs – a.k.a. malware. And maybe this daily figure is so popular because of how incredibly fast it grows. Indeed, its growth amazed even me: a year ago it was 70,000 samples of malware – remember, per day; in May 2012 it was 125,000 per day; and now – by the hammer of Thor – it’s already… 200,000 a day!
I kid you not my friends: every single day we detect, analyze and develop protection against just that many malicious programs!
How do we it?
Simply put, it all comes down to our expert know-how and the technologies that come about from it – about which another big black book could be compiled from the entries on this here blog (e.g., see the features tag). In publicizing our tech, some might ask if we aren’t afraid our posts are read by the cyber-swine. It’s a bit of a concern. But more important for us is users getting a better understanding of how their (our) protection works, and also what motivates the cyber-scoundrels and what tricks they use in their cyber-bogusness.
Anyway, today we’ll be adding another, very important addition to this tech-tome – one on Astraea technology. This is one of the key elements of our KSN cloud system (video, details), which automatically analyzes notifications from protected computers and helps uncover hitherto unknown threats. In actual fact Astraea has a lot of other plusses going for it – plusses which for a while already our security analysts simply couldn’t imagine their working day without. So, as per my techie-blog post tradition, let me go through it all for you – step by step…
Let’s start with another key statistic from the BBB: 60 million (more than). That’s the number of folks who today use KSN. And when I say use, I mean constantly exchange with the cloud information about suspicious files, sites, system events, detections and lots more besides, all of which comes under the title “the epidemiological environment on the Internet”!
To analyze this huge KSN flow manually at the required tempo is, as you might guess, practically impossible. It’s like looking for a needle in a haystack. However, at the same time that needle (and a highly valuable one at that) is in fact in (out?) there, searching for it is worthwhile, and solving this task is a basically a matter of software engineering excellence.
Turns out that, with the right approach to the processing of such a flow, it’s possible to kill three proverbial birds with one stone: (i) to quickly, effectively and with a minimal amount of effort detect malware; (ii) to build up a highly valuable statistical base for keeping one’s finger on the proverbial malware pulse to keep up with trends in the field of virus writing; and (iii) to create a constantly developing automatic expert system able to automatically release “treatments” – with false positives kept to a bare minimum.
So there you have it! You now know the basic tenets of Astraea – a system for processing colossal volumes of data in order to extract from it required specific results, a.k.a. big data, a.k.a. autosearching for the needle in the haystack.
And now – to completely finish you off – yet more figures for you!: More than 150 million KSN notificationsrun through Astraea every day, and out of those – ten million objects (files and websites) are given ratings!
So how does it work?
At the first stage, taking a big leaf out of the how-to-do-crowdsourcing book, Astraea gets notifications about suspicious files and sites from KSN participants. All the events are automatically analyzed and ranked from the standpoint of both significance (how prevalent and popular objects are) and danger. The level of danger is calculated on the basis of dynamically changing weights, meaning that between the notifications and the expert system there’s always feedback. The list of weights today is populated by several hundred criteria, which are regularly adjusted and readjusted by our analysts, and the list itself is updated. In essence the list represents a big chunk of knowledge of a qualified security analyst – a set of rules under which malware has a good chance of being spotted.
At the last stage Astraea returns its calculated rating back to KSN, where it becomes accessible to all users of our products, and in this way the chain closes up. Moreover, the bigger the statistical base, the more likely the uncovering and suppression of new malware outbreaks.
Thanks to the statistics we have on the behavior of malware on users’ computers, Astraea knows all about malware features – like the absence of digital signatures, presence in autolaunch, use of certain packers, etc. And when Astraea starts to receive notifications indicating that new files have malware features, it lowers the rating of the “warrant” for these files accordingly as per the accumulated data. As a result, when the rating of files reaches a critical threshold, the system marks them as all-out malicious, produces the necessary signatures, and transfers those signatures to users via KSN. And all completely automatically!
In a similar way the system conducts a preemptive search of malicious sites. It detects resources similar to previously revealed malicious hosts or sites pretending to be legitimate ones. Here too there’re a lot of criteria; for example, concurrence of e-mail addresses or the name of the owner, the date of registration of the resource, the presence of untrusted files on the host, etc.
What’s important here is that the system doesn’t simply calculate ratings for files and sites; it correlates them so as to obtain more accurate verdicts. Thus, it’s logical to assume that a file downloaded from a site that was earlier noted in the distribution of malware receives a lower rating than a file downloaded from a “clean” site.
It goes without saying that Astraea saves the whole history of interaction with KSN, which helps us then react to an outbreak at the moment it arises and locate its primary source, and also track its development – in both time and geography (which countries). Besides, these data can be used (i) to create specific reports and analyze trends, practically of any level of customization – different “tops” per countries, hosts, files, malware families, etc. (plus cross-referenced reports); (ii) for forecasting the development of cybercriminal activity in the profile of attacks in different industries; and (iii) for forecasts of the tempo of growth of specific maliciousness in its respective behavior and attacking platform profiles.
But there’s more!
Astraea is also a system of proactive detection. That is, it can detect not only already known threats, but also planned threats still just appearing in the heads of virus writers! By possessing a huge database of knowledge about how malware behaves in the real world, we can come up with behavior templates and add them to KSN too. The reaction time to new threats is currently 40 seconds; but with the proactive approach it will be equal to zero!
Another pro of Astraea: minimization of false positives.
On the one hand, the system works with both a gigantic statistical base and highly-honed mathematical model, which together permit bringing the quantity of false detections down to a minimum. Since 2010, when Astraea stepped up for battle duty, our specialists can’t recall a single more or less significant incident.
On the other hand, a mechanism controlling the human factor is built into the system. This automatically checks on the fly each attempt of a security analyst to add a new entry to the black or white list.
A couple of simple examples:
File “ABC” is on the list of clean files (white list), but suddenly Astraea receives a notification that our product has found a Trojan in it. The system finds a false signature, flags it as a false positive, and initiates the process of testing and correcting the detection.
Or like so: a security analyst in some mad rush of passion (or hangover) adds the file “XYZ” to the blacklist. However, the file already is on the whitelist. The system tells the analyst that he probably got a little too worked up (or drunk last night), and doesn’t permit the addition of the new entry until the conflict is sorted out.
Actually, Astraea on the whole is a system that’s expanding all the time, and there are simply too many examples of this to describe here.
With Astraea what we do is actively “dig” both wide and deep. We modernize the mathematical model of analysis of data, add new and reappraise existing criteria, bring in new technologies for raising the speed and quality of finding threats, and put into operation adjacent systems for building complex correlations. In general, our plans, as usual, are ambitious and far-reaching, but this can’t be a bad thing J. And since we’re at a peak of patent trollism, we’re steadily patenting these tasty morsels. Out of those already patented we have minimizing false positives, warning about virus outbreaks, and detecting previously unknown threats.