Notes, comment and buzz from Eugene Kaspersky – Official Blog

November 15, 2012

Finding the Needle in the Haystack. Introducing: Astraea.

Somewhere in the office there’s a carefully guarded little big black book that contains a collection of up-to-date KL facts & figures, which we use in public performances. You know, things like how many employees we have, how many offices and where, turnover, etc., etc. One of the most oft-used figures from this book is the daily number of new malicious programs – a.k.a. malware. And maybe this daily figure is so popular because of how incredibly fast it grows. Indeed, its growth amazed even me: a year ago it was 70,000 samples of malware – remember, per day; in May 2012 it was 125,000 per day; and now – by the hammer of Thor – it’s already… 200,000 a day!

I kid you not my friends: every single day we detect, analyze and develop protection against just that many malicious programs!

How do we it?

Simply put, it all comes down to our expert know-how and the technologies that come about from it – about which another big black book could be compiled from the entries on this here blog (e.g., see the features tag). In publicizing our tech, some might ask if we aren’t afraid our posts are read by the cyber-swine. It’s a bit of a concern. But more important for us is users getting a better understanding of how their (our) protection works, and also what motivates the cyber-scoundrels and what tricks they use in their cyber-bogusness.

Anyway, today we’ll be adding another, very important addition to this tech-tome – one on Astraea technology. This is one of the key elements of our KSN cloud system (video, details), which automatically analyzes notifications from protected computers and helps uncover hitherto unknown threats. In actual fact Astraea has a lot of other plusses going for it – plusses which for a while already our security analysts simply couldn’t imagine their working day without. So, as per my techie-blog post tradition, let me go through it all for you – step by step…

Let’s start with another key statistic from the BBB: 60 million (more than). That’s the number of folks who today use KSN. And when I say use, I mean constantly exchange with the cloud information about suspicious files, sites, system events, detections and lots more besides, all of which comes under the title “the epidemiological environment on the Internet”!

To analyze this huge KSN flow manually at the required tempo is, as you might guess, practically impossible. It’s like looking for a needle in a haystack. However, at the same time that needle (and a highly valuable one at that) is in fact in (out?) there, searching for it is worthwhile, and solving this task is a basically a matter of software engineering excellence.

Turns out that, with the right approach to the processing of such a flow, it’s possible to kill three proverbial birds with one stone: (i) to quickly, effectively and with a minimal amount of effort detect malware; (ii) to build up a highly valuable statistical base for keeping one’s finger on the proverbial malware pulse to keep up with trends in the field of virus writing; and (iii) to create a constantly developing automatic expert system able to automatically release “treatments” – with false positives kept to a bare minimum.

So there you have it! You now know the basic tenets of Astraea – a system for processing colossal volumes of data in order to extract from it required specific results, a.k.a. big data, a.k.a. autosearching for the needle in the haystack.

And now – to completely finish you off – yet more figures for you!: More than 150 million KSN notificationsrun through Astraea every day, and out of those – ten million objects (files and websites) are given ratings!

So how does it work?

At the first stage, taking a big leaf out of the how-to-do-crowdsourcing book, Astraea gets notifications about suspicious files and sites from KSN participants. All the events are automatically analyzed and ranked from the standpoint of both significance (how prevalent and popular objects are) and danger. The level of danger is calculated on the basis of dynamically changing weights, meaning that between the notifications and the expert system there’s always feedback. The list of weights today is populated by several hundred criteria, which are regularly adjusted and readjusted by our analysts, and the list itself is updated. In essence the list represents a big chunk of knowledge of a qualified security analyst – a set of rules under which malware has a good chance of being spotted.

At the last stage Astraea returns its calculated rating back to KSN, where it becomes accessible to all users of our products, and in this way the chain closes up. Moreover, the bigger the statistical base, the more likely the uncovering and suppression of new malware outbreaks.

Thanks to the statistics we have on the behavior of malware on users’ computers, Astraea knows all about malware features – like the absence of digital signatures, presence in autolaunch, use of certain packers, etc. And when Astraea starts to receive notifications indicating that new files have malware features, it lowers the rating of the “warrant” for these files accordingly as per the accumulated data. As a result, when the rating of files reaches a critical threshold, the system marks them as all-out malicious, produces the necessary signatures, and transfers those signatures to users via KSN. And all completely automatically!

In a similar way the system conducts a preemptive search of malicious sites. It detects resources similar to previously revealed malicious hosts or sites pretending to be legitimate ones. Here too there’re a lot of criteria; for example, concurrence of e-mail addresses or the name of the owner, the date of registration of the resource, the presence of untrusted files on the host, etc.

What’s important here is that the system doesn’t simply calculate ratings for files and sites; it correlates them so as to obtain more accurate verdicts. Thus, it’s logical to assume that a file downloaded from a site that was earlier noted in the distribution of malware receives a lower rating than a file downloaded from a “clean” site.

It goes without saying that Astraea saves the whole history of interaction with KSN, which helps us then react to an outbreak at the moment it arises and locate its primary source, and also track its development – in both time and geography (which countries). Besides, these data can be used (i) to create specific reports and analyze trends, practically of any level of customization – different “tops” per countries, hosts, files, malware families, etc. (plus cross-referenced reports); (ii) for forecasting the development of cybercriminal activity in the profile of attacks in different industries; and (iii) for forecasts of the tempo of growth of specific maliciousness in its respective behavior and attacking platform profiles.

But there’s more!

Astraea is also a system of proactive detection. That is, it can detect not only already known threats, but also planned threats still just appearing in the heads of virus writers! By possessing a huge database of knowledge about how malware behaves in the real world, we can come up with behavior templates and add them to KSN too. The reaction time to new threats is currently 40 seconds; but with the proactive approach it will be equal to zero!

Another pro of Astraea: minimization of false positives.

On the one hand, the system works with both a gigantic statistical base and highly-honed mathematical model, which together permit bringing the quantity of false detections down to a minimum. Since 2010, when Astraea stepped up for battle duty, our specialists can’t recall a single more or less significant incident.

On the other hand, a mechanism controlling the human factor is built into the system. This automatically checks on the fly each attempt of a security analyst to add a new entry to the black or white list.

A couple of simple examples:

File “ABC” is on the list of clean files (white list), but suddenly Astraea receives a notification that our product has found a Trojan in it. The system finds a false signature, flags it as a false positive, and initiates the process of testing and correcting the detection.

Or like so: a security analyst in some mad rush of passion (or hangover) adds the file “XYZ” to the blacklist. However, the file already is on the whitelist. The system tells the analyst that he probably got a little too worked up (or drunk last night), and doesn’t permit the addition of the new entry until the conflict is sorted out.

Actually, Astraea on the whole is a system that’s expanding all the time, and there are simply too many examples of this to describe here.

With Astraea what we do is actively “dig” both wide and deep. We modernize the mathematical model of analysis of data, add new and reappraise existing criteria, bring in new technologies for raising the speed and quality of finding threats, and put into operation adjacent systems for building complex correlations. In general, our plans, as usual, are ambitious and far-reaching, but this can’t be a bad thing J. And since we’re at a peak of patent trollism, we’re steadily patenting these tasty morsels. Out of those already patented we have minimizing false positives, warning about virus outbreaks, and detecting previously unknown threats.

comments 6 Leave a note

Namit k

Sounds like a breakthrough to me! And my mind wants to know more about it.. Hence request more tech bytes on it.

Reply to conversation

Harsh V Sharma

Lol! That is the first word that came on my mind after knowing about the technology you are using. Its’ like from the movies like “Skynet” from Terminator… or like “Aarya” from Eagle Eye. In short, you have created an Artificial Intelligence System for Malware…! Keep it Up! :D

Reply to conversation

Harsh V Sharma

You must be using Super Computer for that I guess. What is the name of your Super Computer ? :D

Reply to conversation

Laurie chmiel

How, exactly, do I do to find and kill a console controlled chunk of malware? It uses Ichat, Itunes, Iphoto, terminal, and Bluetooth. And my hotspot. My upstairs neighbors airport express and a “device” plus when I go on line. I search for “sqilite”, delete.*, delete. Java script disabled, airport hotspot programmed for monk 3 MAC addresses

Reply to conversation

Laurie chmiel

Cut me off. Hmmm. Anyway I’ve been messing with this for at least a year. I’ve wiped x5,~ bought mountain lion and installed Kaspersky. It makes a fake one so I think its working. Same with sophos. It had to brute force thru hotspot. It
knows how to find me , MAC ADDRESS. I filter it. Nothing. I use an IP in Belgium, and switch. Often. I have logs and files, I don’t know what you need.

Laurie chmiel


When I wipe drive, reinstall OS and it drops. Its package in spotlight. Then I in not allowed to upgrade again.

Reply to conversation

Peter Mathia (@Peter_Mathia)

Do you use GPUs in this sytem? If yes which?

Reply to conversation
Trackbacks 2

Nothing found for 2013 03 01 Back-from-the-dead-the-original-virus-writers

New viruses from Chelyabinsk so advanced they blow the mind. | Nota Bene | Nota Bene

Leave a note
July 3, 2015

In Kimberley, Oz, I was. Part 6.

G’day possums! Herewith, the penultimate post in what has turned out to be a bit of a marathon travelogue series from down-under… After lunch after our morning adventures on our last full day here, it was finally time for some retail therapy! But not in the traditional sense of mall-traipsing + inevitable food-court submission, naturally; […]

July 2, 2015

In Kimberley, Oz, I was. Part 5.

Time to move from all things on-land to all things just off it – to the more attractive sections of coastline, for we were told the best natural charms of Kimberley lie on or near its shoreline. From Broome (the region’s ‘capital’ if you missed it earlier, also our base) the nearest bit of awesome […]

July 1, 2015

In Kimberley, Oz, I was. Part 4.

G’day possums! Back. In the outback… The next point of call on our tour or northwestern Australia was the Tunnel Creek National Park, around half an hour’s drive from Windjana Gorge. Tunnel Creek itself runs through a natural cave cut into the limestone that was once the Devonian reef here under the ocean. Tourists come […]

June 29, 2015

In Kimberley, Oz, I was. Part 3.

Hi all! After our first, somewhat tame forays into the wonderful wizard corner of Oz called Kimberely, it was high time we headed to the hellishly hot central part of the region – into the savanna and nearer to the Great Sandy Desert. For there’s plenty to see there too… Now, if you were to travel […]

June 25, 2015

In Kimberley, Oz, I was. Part 2.

G’day folks! I’m back – with tales from the outback… In today’s installment, a bit of narrative, but mostly just lots of pics – both from up in the air and on the ground. The landscapes here are stupefying. Endless horizons and beaches, islands, savannah, rock formations, cliffs. There’s so much to take in… so […]

June 23, 2015

In Kimberley, Oz, I was. Part 1.

By way of a preface: Without looking on the Internet, who can tell me which is bigger: the Moon or Australia? This is the first in a mini-series of posts on Kimberley, Australia! I’d heard a lot about this place. That it’s impressively beautiful, with scenic landscapes and fantastic views. But that it’s also huge, hard to […]