MITB Banner

What is Representer Theorem in Machine Learning?

Share

bryan_alexander
bryan_alexander
Banner Courtesy: Bryan Alexander

Machine learning involves a plethora of concepts from mathematics and statistics at its core. Therefore, a good solid understanding of both these subjects is essential to analyse ML in depth. In fact, to develop algorithms in ML, knowledge of statistics is crucial. There’s a separate branch called Statistical Learning Theory that derives insights from statistics as well as from Functional Analysis. This article explores a concept called Representer Theorem in statistical learning theory, which finds application in areas such as pattern analysis and specifically, Support Vector Machines (SVM).

Kernel Methods And RKHS

In the statistical context for ML, kernel methods are a group of algorithms that cater to pattern analysis, which focus on identifying patterns in data. In order to perform tasks related to detecting patterns, most of the algorithms (apart from kernel methods), require data which is converted into feature vectors. On the other hand, kernel methods (or kernels, specifically) rely on similarity functions. Kernels have the advantage of operating on a feature space, simply by computing values of the pairs of data without considering the coordinates of that data space. This is called the ‘inner product’.  

Kernels employ memory-based learning, which means that instead of using the generalisation approach, it compares unknown and new instances to the training instances stored in the memory.

Kernel methods rose to importance as a result of advances in pattern recognition, specifically handwriting recognition. As development in kernel methods grew, many concepts related to this were idealised in mathematics. One particular concept that serves importance in ML was Reproducing Kernel Hilbert Space (RKHS), which was first developed by Polish mathematician Stanislaw Zaremba when he worked with harmonic functions.

As the name suggests, RKHS derives its mathematical function from Hilbert Space, a vector space that generalises two-dimensional and three-dimensional objects (in the backdrop of Euclidean geometry). In mathematical terms, it is defined as:

“Hilbert space is a vector space H with an inner product (f, g) such that norm defined by,

|f| = √(f,f)

turns into a complete metric space.”

Now, RKHS establishes a linear relationship in a Hilbert space of different functions. For example, the inner product mentioned earlier contains two functions, f and g. If RKHS is applied for these functions, the norm should be as small as possible for it to be linearly functional. These two concepts form the basis for representer theorem

Representer Theorem

With the help of RKHS, a unique proposition known as the representer theorem was formulated. This is because popular kernels possess the problem of infinite dimensional space which may seem mathematically feasible but not practically viable, especially for training a learning machine which generally deals with optimisation.

There are two cases in the representer theorem, one without prior assumptions (nonparametric) and the other with partial assumptions (semiparametric). The definition and mathematical representation for both of these cases are given below:

Nonparametric Representer theorem (Theorem 1): Suppose we are given a nonempty set X , a positive definite real-valued kernel k on X ×X , a training sample (x1, y1),…,(xm, ym) ∈X×R, a strictly monotonically increasing real-valued function g on [0,∞], an arbitrary cost function c : (X × R2)m → R ∪ {∞}, and a class of functions

F = {f ∈ RX |f(·) = (Σ) i=1 βik(·, zi), βi ∈ R, zi∈ X , ||f|| < ∞}

Here, · is the norm in the RKHS associated with k, i.e. for any zi ∈ X , βi ∈ R (i ∈ N) given

|| Σi=1 βik (·, zi)||2 = Σi=1 Σj=1 βi βj k (zi, zj)

Then any f ∈ F minimising the regularised risk functional

c ((x1, y1, f(x1),…,(xm, ym, f(xm))) + g (|| f ||)

admits a representational form

f (·) = Σmi=1 ik (·, xi) ”

Semiparametric representer theorem (Theorem 2) : Suppose that in addition to the assumptions of the previous theorem we are given a set of M real-valued functions {ψp}Mp=1  on X , with the property that the m × M matrix (ψp(xi))ip has rank M. Then any f’ := f + h, with f ∈ F and h ∈ span{ψp}, minimizing the regularized risk

c ((x1, y1, f’(x1),…,(xm, ym, f’(xm))) + g (|| f ||)

admits a representational of the form

f’(·) = (Σ)mi=1 αik (xi, ·) + (Σ)Mp=1 βpψp (·),

with unique coefficients βp ∈ R for all p = 1,…., M.”

The above theorems minimise factors such as real-valued function g and cost function c. In ML context, these theorems give provisions for kernels in the training data.

Conclusion:

These mathematical representations may confuse many beginners. It is suggested that they go through the basic concepts of kernel methods for better understanding before working with representer theorems. They are very important while training models. As a result of an understanding of these concepts, algorithms will have an optimal risk functional with minimum regularisation.

Share
Picture of Abhishek Sharma

Abhishek Sharma

I research and cover latest happenings in data science. My fervent interests are in latest technology and humor/comedy (an odd combination!). When I'm not busy reading on these subjects, you'll find me watching movies or playing badminton.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.