Human conversations consists of metaphors, analogies, allegories and synonyms along with cultural references. To get a sense of what a sentence means is a daunting task for the machines. In other words ‘ambiguous’.
Word Sense Disambiguation (WSD) was first formulated into as a distinct computational task during the early days of machine translation in the 1940s, making it one of the oldest problems in computational linguistics.
In the 1990s, the statistical revolution swept through computational linguistics, and WSD became a paradigm problem on which to apply supervised machine learning techniques.Traditionally there are 4 approaches to WSD: Dictionary, semi-supervised, supervised and unsupervised.
Currently, kernel-based methods such as support vector machines have shown superior performance in supervised learning. However, supervised WSD methods treat senses labels and also resort to predicting the Most-Frequent-Sense (MFS) for words unseen during training. This leads to poor performance on rare and unseen senses.
To address this problem, a team of NLP researchers at IISc Bangalore have published a paper, which was awarded the outstanding paper at the recently concluded Association for Computational Linguistics (ACL) summit in Italy.
They propose Extended WSD Incorporating Sense Embeddings (EWISE), a supervised model to perform WSD by predicting over a continuous sense embedding space as opposed to a discrete label space.
Achieving State-of-art With EWISE
The above picture gives an overview of WSD in EWISE. A sequence of input tokens is encoded into context-aware embeddings using a BiLSTM and self-attention layer. The context-aware embeddings are then projected onto the space of sense embeddings.
The score for each sense in the sense inventory is obtained using a dot product (indicated by) of the sense embedding with the projected word embedding.
The sense embedding for each sense in the inventory is generated using a BiLSTM-Max definition encoder. The encoder is learnt using the training signal present in WordNet Graph. LSTM at its core, preserves information from inputs that has already passed through it using the hidden state.
A bidirectional LSTM (BiLSTM) layer learns bidirectional long-term dependencies between time steps of time series or sequence data. These dependencies can be useful when you want the network to learn contexts in a given sentence.
As shown in the above example, the sentence “he wore a tie” was rightly identified as neckwear and not as ‘tie’ in sports context.
For both context and definition encoding, we used BiLSTMs of hidden size 2048. The input embeddings for the BiLSTM was initialized with GloVe embeddings and kept fixed during training. We used the Adam optimizer for learning all our models.
The authors use definitions of senses available in WordNet to obtain sense embeddings. The model used an initial learning rate of 0.0001, a batch size of 32, and trained our models for a maximum of 200 epochs.
The results show that with only 20% of the training data, EWISE is able to outperform the most-frequent-sense baseline of WordNet S1.
Nurturing AI In India Through Institutions
The Machine And Language Learning (MALL) Lab at the Indian Institute of Science (IISc), Bangalore is a group of researchers, engineers, and students from the Department of Computational & Data Sciences (CDS) and the Department of Computer Science and Automation (CSA). The group is led by Partha Talukdar. Their research spans the areas of Machine Learning and Natural Language Processing. IISc’s MALL Lab actively collaborates with the likes of Brain Research Group at Carnegie Mellon University.
This successful demonstration of IISc Bangalore stands as a testimony to the growing attention of Indian researchers towards AI. The advancements in computational sciences such as above fortifies India’s bid to become a global tech hub.
With IISc and other premier institutes offering AI as a degree at the undergraduate and postgraduate level, there is a high possibility for breakthroughs in the world of machine learning, especially from India.
Access the full paper here.