Machine learning in cybersecurity – an overview
ANYONE with even a passing involvement with cybersecurity – a junior helpdesk staff and upwards – will be aware that the most susceptible points of failure in any network are the carbon-based entities operating computerized technology.
To you and me, that’s the human beings pressing buttons and tapping at screens.
And even the most battle-hardened cybersecurity officer will, at some stage of their life, have at least begun to click on a link found in an email of dubious provenance.
BYOD and the increased digital fluency of every organization’s users have increased the pressure on security teams, as users spend longer online, on more devices, doing more complex tasks, and interacting more with others on the LAN and the internet.
Endpoint security, therefore, is increasingly important. Traditional systems to protect endpoints largely work on the basis of signature detection, and methods inherited from their firewall cousins, such as black & whitelists and databases of known exploits.
Some endpoint protection suppliers are using machine learning to establish patterns of malicious behaviors and better recognize malware signatures – or so they claim. Few areas of technology, however, are today free from being termed “powered by AI,” or “deep-learning enabled.”
In cybersecurity, do these claims of AI/ML/DL hold water?
According to DARPA, the three “waves” of AI are:
- Handcrafted knowledge. Humans create rules which machines can follow. A simulacrum of logical reasoning can take place.
- Statistical learning. Machines perform probabilistic decisions, classifying and predicting data – but the context of those decisions is not “understood” by the machine.
- Contextual adaptation. In the third wave, the code itself can construct explanatory models for a real-life situation, and so, the systems are able to describe why characterizations (malware or harmless document) occur, just as a human might.
Most attempts to use AI/ML in cybersecurity fall into the first wave category: humans have defined rules and provided examples (this is a piece of malware, this is an executable free of viruses). The routines can identify, albeit in a limited way, what the likelihood is of an examined file being malicious.
Even at this limited level, ML-powered digital security measures still produce fewer false-positives than methods not using this type of algorithm and are indeed a magnitude quicker than human involvement.
There are a few suppliers of cybersecurity solutions which offer products that operate according to DARPA’s second wave – see below for three of the leading examples – and it is therefore worthwhile to see how, in more detail, this wave operates.
The second wave of machine learning
There are five factors reflecting the intersection of data science and the latest cybersecurity platforms:
Runtime – where cogs whir
The training phase for machine learning can occur at the endpoint, somewhere on the local network, or can be cloud- or cluster-based. Training is usually highly processor-intensive and can sometimes take months of continuous data throughput before predictions achieve decent degrees of accuracy.
Features – what’s examined
Which specific aspects of the supplied learning materials are scrutinized? It could be file size and entropy, plus base 2 logarithms of file size, plus parsed sections of portable executables.
Data sets – the learning subject
The bigger the supplied data set of learning materials, the better – in general. Clearly, the data should be properly labeled (harmless, harmful, potentially harmful) and therefore of good quality. Large amounts of data are also useless if they are unbalanced (too many positives in a dataset) but should also represent those which will be experienced once the solution goes to live deployment.
Human interaction – overseeing
Getting human involvement into the model creation can save time and (literally) power. Anscombe’s quartet (see below) is a significant example of how the same predictive model can be created by four different data sets when left unattended – with only one of them coherent & desirable.
The Massachusetts-headquartered company expresses one of 2018’s undeniable cybersecurity truths; that humans, and more specifically, the devices they use, are the greatest point of cybersecurity failure in the modern enterprise.
The company’s cloud delivered endpoint security platform remedies this with a comprehensive endpoint protection system which enables security operations and incident response (IR) teams to combat all attack vectors now open to malicious adversaries.
The solution protects all connected devices, including laptops/desktops, servers, ATMs, POS terminals and cloud workloads, from acting as a portal into the business network.
When threats do appear on any device, the software (which has been monitoring and storing unfiltered data continually in the cloud,) can track the exact way in which the malicious code is behaving – even if the endpoint in question is powered off.
The attack can be visualized and the appropriate remediation measures either automated or handled on a case-by-case basis. Key here is the speed in which measures can be taken – minutes, not days. Plus, the software supplies a single place from which all endpoints’ security measures are deployed, monitored and acted upon.
Carbon Black’s offering integrates well with other systems, and API integrations allow different technologies to be all brought together, making silo-ed cybersecurity management a thing of the past.
Endpoint protection systems are playing a constant game of balance, between utilization of resources to prevent attacks and allowing the system enough power to continue to function as a usable device.
However, Cylance’s endpoint platform uses only around 1 to 2 percent of endpoint resources – most solutions clock up at least 20 percent utilization overhead.
But despite the lack of computing grunt of an average enterprise endpoint, CylancePROTECT prevents over 99 percent of cyberattacks, malware and file-less, before payload deployment or execution.
The company’s second wave ML cyberattack mitigation systems help prevent the majority of memory-based attacks, rogue scripts, phishing attempts, zero-day malware instances, unwanted privilege escalations, and unwanted program installation.
CylanceOPTICS (recently with upgraded routines showcased at the San Francisco RSA Conference April 16 this year) uses machine learning modules to identify file-less attacks, unknown zero-days, and malicious application behaviors and automate detection and response for prevention-first security.
While server-based push of signature and code updates to endpoints is widely adopted by Cylance’s competitors, this relies on network and/or cloud connectivity in the majority of cases: this is not necessary for Cylance installs.
Additionally, the company’s cloud-based management console simplifies oversight and makes deployment much more straightforward. This lessens required staff resources and allows network traffic overheads for cybersecurity purposes to be kept to a minimum.
The CylanceAPI is a collection of RESTful APIs which deliver secure access to malware data, administration, and investigative routines. The APIs enable integration of Cylance’s tech into existing security frameworks and workflows – adding the most advanced ML capabilities out there at present for endpoints.
Symantec’s position, according to Gartner, is the highest achiever in its measure of execution and vision in the survey, “Endpoint Protection Platforms Magic Quadrant.”
Whether or not this plaudit resonates with your organization, it’s proven that Symantec’s multi-layered endpoint protection methods reduce false positive detections. It achieves this by using both a machine learning engine and more traditional behavioral analysis.
Memory leaks or dissemination of malicious macros run rampant in enterprises with a homogenous desktop – a single new zero-day can spread quickly if appropriate action is not taken swiftly.
The ubiquity of Microsoft Office, installed locally, for instance, has led it to become prime material for targeted malware. The greater the number of targets, the greater the rewards for the hacker.
New zero-days appear all the time, so the ability to fine tune protection is an essential component in Symantec’s endpoint protection systems.
As an additional layer, the software (locally installed or cloud-based) can create honey traps, which, when deployed, are used to examine possible attack scenarios. The information from these simulations can be used to mitigate against further incursion attempts.
Like the other companies listed here, Symantec’s products are configurable through open APIs, so integration with existing security stacks is viable.
*Some of the companies featured in this article are commercial partners of Tech Wire Asia