With handheld technology growing at an exponential rate, almost every advancement in the digital world gets more attention than ever. This can largely be credited to the ever-growing mobile phone ecosystem. As of now, there are over two billion mobile phone devices across the world (including feature phones and tablet devices).
When it comes to the operating system (OS) or software that powers these smartphones, Google’s Android has clearly emerged as the winner in the ecosystem, as compared to Apple’s iOS or Microsoft’s Windows Phone, with a lion’s share of 80 percent. This success and popularity can be attributed to the ease at which Android offers developers to build applications and its open-source availability.
With popularity comes the darker side — the vulnerabilities in Android. Due to the OS being an open-source platform, it has become the breeding ground for miscreants to develop malware applications to expose security and other flaws leaving critical information such as user data and privacy out in the open. Although Google has made a stringent effort to curb malware applications in the recent years, it cannot be said that it has completely mitigated these negative intentions.
Discerning Malware Through Machine Learning
Studies and methods to detect malware in Android can be dated back to its release in 2008. A plethora of software tools such as sandboxes and debuggers have been developed and used for malware analysis since then. However, with the staggering rise of malware outbreaks in the recent times, it is difficult to curtail with just these tools alone. This issue urged computer scientists and researchers come up with machine learning (ML) methods.
Earliest studies have explored ML techniques such as classification to differentiate harmful applications from the genuine ones. Android Package Kits (generally known as ‘.apk’ files) which are Android’s application files, were extensively tested through ML algorithms to look for malicious software code. These studies also analysed the code for discrepancies, which may leave applications vulnerable to attacks.
But, the challenge here lies in capturing the exact features for ML in applications. To tackle this, some studies used support vector machines (SVM), for identifying different types of malware classes. Along with this, these studies have made use of tools like control flow graph (CFG) for representation of the application learning flow to boost detection. What was evident from these studies was the feature extraction was made easier compared to previous studies. Subsequent research on identifying Android malware through ML saw Bayesian classification approaches that have seen significant results in achieving a higher detection accuracy.
Computer researchers have also looked into app permissions to see if ML methods have an impact here. It was observed to be fairly good — detection accuracies up to 90 percent! Some computer developers even started integrating ML in sandboxes, which has the potential to deal with vulnerabilities in applications’ online services as well.
Types Of Malware Detection
Malware detection in computers is generally divided into two types, static analysis and dynamic analysis. The same is applicable to Android OS. Static analysis deals with examining the functionality of an application/file without executing it, whereas dynamic analysis examines the file by running it in a computer (or even sandbox tools) to investigate the behaviour of the malware in depth.
Since modern malware in Android is loaded with extra elements such as evasion techniques, all recent studies focus on dynamic analysis in ML. Static analysis usually considers techniques such as packed encryption and mitigates the effects of malware but then again cannot help because they may leave out ‘traces’ susceptible for attacks.
Another form of malware which is quite popular lately is botnets. What was once omnipresent in computers has now spread to mobile devices such as smartphones. Device networks affected by botnets can act independently and thus pose a peril to the mobile ecosystem. Generally, botnet attacks are done without the knowledge of the user knowing it, and are now being deployed on smartphones powered by Android. All critical data can be stolen through distributed denial of service (DDoS) attacks such as HTTP Flood, Ping Flood and so on regardless of the mobile platform. Although ML detection has emerged to tackle DDoS, it has yet to successfully make a mark in the mobile space.
ML models bring in a proactive approach to eliminate malware. But as mentioned earlier, the feature extraction bit is still to see a strong improvement. For example, there is a possibility that a particular part of a malware can be avoided in the output if features do not match closely in the training. Therefore, a rigid framework for ML is always suggested to counter modern shapeshifting malware, and the adversarial impact it can afflict sensitive information.