September 18, 2012
Crowdsourcing in Security.
To think of all the yummy stuff the Internet has brought us, though interesting, would probably be a waste of time: by the time you’d have finished totting up all the scrumptiousness you remember, just as much new scrummyness would have appeared. But there is one particular Internet-delicacy concept that, due to its importance and value, should really never be overlooked, even in just a “Best Hits” of the Internet. This concept deserves closer consideration. And this concept is crowdsourcing.
I won’t go into loads of detail – you can get that at the other end of the Wikipedia link above (incidentally, Wikipedia is also a crowdsourcing project :) or via a search engine. Here, let me briefly go through the idea:
The WWW permits large numbers of folks from all over the world to very quickly all get together and combine efforts to solve some kind of difficult task or other. The result is collective intelligence, backed up by gigahertz, gigabytes and terabytes of computers and communication channels. Technically, its all about the sharing and allocation of computing power. For example, I remember well how at the end of the nineties many at night connected their comps to SETI@Home – a non-commercial project that searched for radio signals of extraterrestrial civilizations. The project is still going, with 1.2 million participants and a total processing power running up to 1.6 petaflops.
Perhaps surprisingly, you’ll generally find network crowdsourcing being applied in practically every sphere of life. And security is no exception. Recent examples: the international brainstorming that went into solving the Duqu Framework, and into trying to crack the mystery of the encrypted Gauss payload. (For the former, by the way, we received a rather flattering write-up on darkreading.com.) Still, these cases aren’t really the best examples of crowdsourcing at work…
The best example is probably to be found in the way we (KL) successfully process 125,000 samples of malware every day (up from 70,000 late last year). Of course, robots and other technologies of automation and data-flow analysis help, but the most important ingredient to make it all work – the statistical food – is furnished by you! Yes, you! The system’s a big you-scratch-my-back, I’ll-scratch-yours gig in which our users help both us and one another in the business of preventing cyber break-ins around the world, and in particular of tackling unknown threats. And everyone helps anonymously and voluntarily after having clearly expressed a willingness to take part; and none of it affecting computer performance!
Let me tell you how it works.
The technological infrastructure for our crowdsourcing is our cloud-based KSN service. You can sign up for it when installing KIS, and enable/disable at any time in the settings. Let’s examine how crowdsourcing works with KSN.
Let’s say a software developer creates a program and uploads it to the Internet. The program looks interesting to users and they start to download and install it. When the program is run the traditional antivirus scanner detects no evil; however, proactive protection shows up some suspicious activity. KIS then sends a brief report (without any personal data whatsoever) about this activity to KSN. There it is checked against our analytical systems and knowledge bases, together with millions of other incomings from KSN participants, and a verdict is then sent back to the user’s computer as to whether the program could contain new, previously unknown malware intending to attack the system. If a definite verdict can’t be reached, users can check out the reputation of the program and take a decision accordingly as to whether the program’s safe or not. The whole process of checking in the cloud takes just a few seconds (or quicker – depending on the speed of the connection and how busy KSN is).
Websites are checked in the same way. If, for example, the anti-phishing heuristic module detects possibly evil intentions, the URL thereof is sent to KSN to be analyzed. Then, based on analysis of reports of other cloud participants, the verdict is given – right in your browser window (if the URL Advisor module is turned on):
It’s true that with this form of crowdsourcing some problems soon arise. Real tricky suspicious items that can’t be analyzed automatically (and there are thousands of those every day) require the intervention of an analyst. Yes, old good doing things manually. But faced with thousands of suspicious files, all the same in terms of their potential threat, which ones does the analyst start with? He/she could of course simply process them in the order they arrived, and indeed many antivirus companies still do that… and learn about cyber outbreaks from the news :) You’ve guessed it, we sure don’t do things so lamely. Here a technology is needed to establish priorities for the incoming stream of suspicious objects. And here again crowdsourcing rears its clever head again.
So how do we sort the feedback from KSN participants? On the basis of source rating, of course! I think you’ll agree it’s logical to assume that the personal verdict of tech-savvy expert-user should have a higher rating than that of a beginner. But how can you tell them apart? Who’s the expert and who’s the beginner?
Well, for several years already we’ve been developing technology that categorizes users (for which we recently obtained three patents in the USA – 8209758, 8214904, 8214905). The aim is to evaluate the expertise level of users based on, first, a range of indicators that reveal themselves upon installation (static evaluation), and second, use (dynamic evaluation) of the antivirus software.
For the former we look at the installation type selected (normal or advanced), whether the interactive mode is activated, and other non-personalized things. For the latter, we analyze the user’s KSN feedback, including that on the quantity and quality of reported threats, the share of false negatives, reaction speed, and more. Of course, the higher the rating of the source, the higher the weight of his evaluation and, accordingly, the higher the priority for processing the data from him.
Turns out the share of users deemed “expert” is only around 5% of the total number of KSN members. All the same, this still represents several million amply qualified folks with the required level of know-how to uncover unknown threats. In the future, maybe even in the next version of KIS, we’re thinking of giving especially gifted KSN members badges that could be used to decorate the interface of the product and maybe even for boasting on Facebook!
So there you have it – just how users come together to do the crowdsourcing thing and pool their strengths to be able to do effective battle against cyber break-ins! The important thing is that every participant, independent of his/her own personal input, gets the overall benefit of KSN’s crowdsourcing on the whole. Nice.