Junk mail or spam come with different levels of potency. Their capability can range from being annoying to shutting down of nuclear power plants. Spam emails are sent out in mass quantities by spammers and cybercriminals that are looking to do one or more of the following:
- Make money from the small percentage of recipients that actually respond to the message
- Run phishing scams – in order to obtain passwords, credit card numbers, bank account details and more
- Spread malicious code onto recipients’ computer
Nowadays spammers are as inventive as the cybersecurity guys. They send fake unsubscribe letters, in an attempt to collect active email addresses. Clicking ‘unsubscribe’, may simply increase the amount of spam one receives.
With more than 1.5 billion email accounts, Gmail is the frontrunner for the most number of phishing attacks. But, Google’s spam detection mechanism has perfected the art of thwarting these attacks.
99.95% Is Not Enough
A couple of years ago, Google announced that it blocks 99.95% of spam emails.
Gmail has been using AI in addition to rule-based filters for years. While rule-based filters can block the most obvious spam, machine learning looks for new patterns that might suggest an email is not to be trusted. Algorithms trained in this way balance a huge number of metrics, everything from the formatting of an email to the time of day it’s sent.
Google now aims at blocking spam categories that previously were difficult to detect.
Google deployed TensorFlow to block image-based messages, emails with hidden embedded content, and messages from newly created domains that try to hide a low volume of spam within legitimate traffic.
Google wants to double down on the spammers who slip through that less than 0.1 percent, without accidentally blocking messages that are important to users.
Since the definition of spam changes from user to user, decision making(spam classification) at such granular level is dependent on many factors and ML-based protections help in a big way. Consider that every email has thousands of potential signals. Just because some of an email’s characteristics match up to those commonly considered a spam, doesn’t necessarily mean it’s spam.
How Effective Is Text Classification With Tensorflow
TensorFlow offers the flexibility to easily train and experiment with different models in parallel to develop the most effective approach, instead of running one experiment at a time.
Even within Gmail, TensorFlow is being used in other security-related areas, such as phishing and malware detection.
- Estimators, which represent a complete model. The Estimator API provides methods to train the model, to judge the model’s accuracy, and to generate predictions.
- Datasets for Estimators, which build a data input pipeline. The Dataset API has methods to load and manipulate data, and feed it into your model. The Dataset API meshes well with the Estimators API.
A demonstration of how Tensorflow helps in spam detection with few lines of code:
import tensorflow as tf
import pandas as pd
dataset = tf.keras.utils.get_file(.....)
# Training input on the whole training set with no limit on training epochs.
train_input_fn = tf.estimator.inputs.pandas_input_fn(....)
#Prediction on the whole training set.
predict_train_input_fn = tf.estimator.inputs.pandas_input_fn(....)
# Prediction on the test set.
predict_test_input_fn = tf.estimator.inputs.pandas_input_fn(...)
#Using a DNN Classifier
estimator = tf.estimator.DNNClassifier( hidden_units=[500, 100], feature_columns=[embedded_text_feature_column], n_classes=2, optimizer=tf.train.AdagradOptimizer(learning_rate=0.003))estimator.train(...)
Check the full code here
Users still occasionally have to click the “not spam” button, which essentially meant that they had to wade through their spam folder to find an email that was wanted but flagged as spam.
Tools like Gmail Postmaster were released in the past to analyse spam reports and data on delivery errors. A routine check of spam folder definitely brings out some missed newsletters or even job opportunities. While Google works to refine its algorithm to make the filtering more user-centric, it is a good habit to keep an eye on what is being labelled as spam.