MITB Banner

What Is SSD & How It Improved Computer Vision Forever

Share

Photo by Andrew Pons

Object detection systems usually employ bounding boxes, pixel resampling and application of high-quality classifiers. These approaches are heavy on computation and high on latency when it comes to real-time applications.

Single shot detection, unlike other object detectors, doesn’t resample pixels or features for bounding boxes.

Object Detection With SSD

By eliminating the bounding boxes approach, SSD (single shot detector) brings a lot of improvement with regards to the speed at which the computer vision tasks are carried out.

Single shot detection approach uses a small convolutional filter to predict object categories and these filters are used to multiply feature maps to perform detection at multiple scales.

This results in high-accuracy detection even in low-resolution images.

The category scores for a fixed default bounding boxes are predicted using small convolutional filters and are then applied to feature maps.

Source: GoogleAI

The above figure illustrates the working of SSD. The two animals in the above picture, a cat on the left side and a dog on the right are marked with blue and red bounding boxes which are ground truth boxes for each object. Now each location is evaluated in a convolutional fashion with different scales 8 x 8 and 4 x 4.

Every default box is checked for shape offsets and confidences (conf) regarding categories of the object.

The loss is the model is a weighted sum between localisation loss (such as Smooth L1) and confidence loss (like Softmax).

In case of SSD, the ground truth boxes information needs to be provided with specific outputs unlike in other detectors where region proposals are used before a final classifier.

The ground truth box is matched with the default box using jaccard overlap, which ensures that each ground truth box has exactly one matched default box.

This enables the network to predict high confidences for multiple overlapping default boxes (black dotted lines in the above figure) instead of picking only one with maximum overlap.

This approach is similar to that of MultiBox apart from its multiple object handling.

Feature maps, be it 8 x 8 or 4 x 4, have different receptive field sizes. With SSD, default boxes do not have to deal with these receptive fields. Instead, specific feature map locations can be taught to be responsive to specific areas in the image corresponding to scales of the objects.

Suppose an object, say dog, in this context, has been matched in the 4 x 4 but not in the 8 x 8 because of default boxes different scales. These unmatched ones are considered as negatives during training.

So this process leads to a lot of negatives which in turn creates an imbalance between positive and negative training examples.

To balance this, the default box with the highest confidence is picked so that the ratio between negatives and positives is 3:1.

Modelling With SSD

Step-by-step procedure:

  • A feedforward CNN produces a fixed-size collection of bounding boxes and prediction scores with respect to a certain object class.
  • A non-maximum suppression step is performed for final detections.
  • An auxiliary structure is added to the network to detect features at multi-scale. This structure includes multi-scale feature maps for detection, convolutional predictors, default boxes and aspect ratios.

SSD is sensitive to the size of the bounding boxes. Its performance drops with a decrease in the size of the objects. Bigger the better. SSD is similar to regional proposal network (RPN) in Faster R-CNN when it comes to using default boxes which are anchor boxes in RPN. But, SSD uses scores for each object category in each box.

Given the same VGG-16 base architecture, SSD does well as compared to other object detectors (YOLO and Faster R-CNN) in both speed and accuracy.

 

Read more about SSD here

 

Share
Picture of Ram Sagar

Ram Sagar

I have a master's degree in Robotics and I write about machine learning advancements.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.