MITB Banner

Machine Learning The Easiest Way To Detect Malware In Android OS

android

With handheld technology growing at an exponential rate, almost every advancement in the digital world gets more attention than ever. This can largely be credited to the ever-growing mobile phone ecosystem. As of now, there are over two billion mobile phone devices across the world (including feature phones and tablet devices).

When it comes to the operating system (OS) or software that powers these smartphones, Google’s Android has clearly emerged as the winner in the ecosystem, as compared to Apple’s iOS or Microsoft’s Windows Phone, with a lion’s share of 80 percent. This success and popularity can be attributed to the ease at which Android offers developers to build applications and its open-source availability.

With popularity comes the darker side — the vulnerabilities in Android. Due to the OS being an open-source platform, it has become the breeding ground for miscreants to develop malware applications to expose security and other flaws leaving critical information such as user data and privacy out in the open. Although Google has made a stringent effort to curb malware applications in the recent years, it cannot be said that it has completely mitigated these negative intentions.

Discerning Malware Through Machine Learning

Studies and methods to detect malware in Android can be dated back to its release in 2008. A plethora of software tools such as sandboxes and debuggers have been developed and used for malware analysis since then. However, with the staggering rise of malware outbreaks in the recent times, it is difficult to curtail with just these tools alone. This issue urged computer scientists and researchers come up with machine learning (ML) methods.

Earliest studies have explored ML techniques such as classification to differentiate harmful applications from the genuine ones. Android Package Kits (generally known as ‘.apk’ files) which are Android’s application files, were extensively tested through ML algorithms to look for malicious software code. These studies also analysed the code for discrepancies, which may leave applications vulnerable to attacks.

But, the challenge here lies in capturing the exact features for ML in applications. To tackle this, some studies used support vector machines (SVM), for identifying different types of malware classes. Along with this, these studies have made use of tools like control flow graph (CFG) for representation of the application learning flow to boost detection. What was evident from these studies was the feature extraction was made easier compared to previous studies. Subsequent research on identifying Android malware through ML saw Bayesian classification approaches that have seen significant results in achieving a higher detection accuracy.

Computer researchers have also looked into app permissions to see if ML methods have an impact here. It was observed to be fairly good — detection accuracies up to 90 percent! Some computer developers even started integrating ML in sandboxes, which has the potential to deal with vulnerabilities in applications’ online services as well.

Types Of Malware Detection

Malware detection in computers is generally divided into two types, static analysis and dynamic analysis. The same is applicable to Android OS. Static analysis deals with examining the functionality of an application/file without executing it, whereas dynamic analysis examines the file by running it in a computer (or even sandbox tools) to investigate the behaviour of the malware in depth.

Since modern malware in Android is loaded with extra elements such as evasion techniques, all recent studies focus on dynamic analysis in ML. Static analysis usually considers techniques such as packed encryption and mitigates the effects of malware but then again cannot help because they may leave out ‘traces’ susceptible for attacks.

Botnets

Another form of malware which is quite popular lately is botnets. What was once omnipresent in computers has now spread to mobile devices such as smartphones. Device networks affected by botnets can act independently and thus pose a peril to the mobile ecosystem. Generally, botnet attacks are done without the knowledge of the user knowing it, and are now being deployed on smartphones powered by Android. All critical data can be stolen through distributed denial of service (DDoS) attacks such as HTTP Flood, Ping Flood and so on regardless of the mobile platform. Although ML detection has emerged to tackle DDoS, it has yet to successfully make a mark in the mobile space.

Conclusion

ML models bring in a proactive approach to eliminate malware. But as mentioned earlier, the feature extraction bit is still to see a strong improvement. For example, there is a possibility that a particular part of a malware can be avoided in the output if features do not match closely in the training. Therefore, a rigid framework for ML is always suggested to counter modern shapeshifting malware, and the adversarial impact it can afflict sensitive information.

Access all our open Survey & Awards Nomination forms in one place >>

Picture of Abhishek Sharma

Abhishek Sharma

I research and cover latest happenings in data science. My fervent interests are in latest technology and humor/comedy (an odd combination!). When I'm not busy reading on these subjects, you'll find me watching movies or playing badminton.

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox
Recent Stories