MITB Banner

Google Doubles Down On Spammers With Tensorflow

Share

 

Junk mail or spam come with different levels of potency. Their capability can range from being annoying to shutting down of nuclear power plants. Spam emails are sent out in mass quantities by spammers and cybercriminals that are looking to do one or more of the following:

  • Make money from the small percentage of recipients that actually respond to the message
  • Run phishing scams – in order to obtain passwords, credit card numbers, bank account details and more
  • Spread malicious code onto recipients’ computer

Nowadays spammers are as inventive as the cybersecurity guys. They send fake unsubscribe letters, in an attempt to collect active email addresses. Clicking ‘unsubscribe’, may simply increase the amount of spam one receives.

With more than 1.5 billion email accounts, Gmail is the frontrunner for the most number of phishing attacks. But, Google’s spam detection mechanism has perfected the art of thwarting these attacks.

99.95% Is Not Enough

A couple of years ago, Google announced that it blocks 99.95% of spam emails.

Gmail has been using AI in addition to rule-based filters for years. While rule-based filters can block the most obvious spam, machine learning looks for new patterns that might suggest an email is not to be trusted. Algorithms trained in this way balance a huge number of metrics, everything from the formatting of an email to the time of day it’s sent.

Google now aims at blocking spam categories that previously were difficult to detect.

Google deployed TensorFlow to block image-based messages, emails with hidden embedded content, and messages from newly created domains that try to hide a low volume of spam within legitimate traffic.

Google wants to double down on the spammers who slip through that less than 0.1 percent, without accidentally blocking messages that are important to users.

Since the definition of spam changes from user to user, decision making(spam classification) at such granular level is dependent on many factors and ML-based protections help in a big way. Consider that every email has thousands of potential signals. Just because some of an email’s characteristics match up to those commonly considered a spam, doesn’t necessarily mean it’s spam.

How Effective Is Text Classification With Tensorflow

TensorFlow offers the flexibility to easily train and experiment with different models in parallel to develop the most effective approach, instead of running one experiment at a time.

Even within Gmail, TensorFlow is being used in other security-related areas, such as phishing and malware detection.

  • Estimators, which represent a complete model. The Estimator API provides methods to train the model, to judge the model’s accuracy, and to generate predictions.
  • Datasets for Estimators, which build a data input pipeline. The Dataset API has methods to load and manipulate data, and feed it into your model. The Dataset API meshes well with the Estimators API.

A demonstration of  how Tensorflow helps in spam detection with few lines of code:

import tensorflow as tf

import pandas as pd

def download_and_load_datasets(force_download=False):  

            dataset = tf.keras.utils.get_file(.....)

# Training input on the whole training set with no limit on training epochs.

train_input_fn = tf.estimator.inputs.pandas_input_fn(....)

#Prediction on the whole training set.

predict_train_input_fn = tf.estimator.inputs.pandas_input_fn(....)

# Prediction on the test set.

predict_test_input_fn = tf.estimator.inputs.pandas_input_fn(...)

#Using a  DNN Classifier

estimator = tf.estimator.DNNClassifier(    hidden_units=[500, 100],    feature_columns=[embedded_text_feature_column],    n_classes=2,    optimizer=tf.train.AdagradOptimizer(learning_rate=0.003))estimator.train(...)

Check the full code here

Future Direction

Users still occasionally have to click the “not spam” button, which essentially meant that they had to wade through their spam folder to find an email that was wanted but flagged as spam.

Tools like Gmail Postmaster were released in the past to analyse spam reports and data on delivery errors. A routine check of spam folder definitely brings out some missed newsletters or even job opportunities. While Google works to refine its algorithm to make the filtering more user-centric, it is a good habit to keep an eye on what is being labelled as spam.

 

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.