In the world of machine learning and artificial intelligence, every unique real-world problem encountered has its own implications and perils. Despite all the efficient techniques, it is very hard to preempt simple factors such as ‘uncertainty’ at times. For example, in image classification, if the image features in the data are not accounted for in detail, the output in the system will be vague, even if the learning algorithms classify them accordingly.
This is just the tip of the iceberg when it comes to ambiguity in ML. Even though ML systems are designed meticulously, sometimes it comes across new, uncertain problems. The uncertainty may lie in any part of ML — be it in its goals or in the data it receives. These factors lead to open interpretation. In this article, we will look at a few cases where the ML has handled ambiguity in the most appropriate manner.
Case 1: Natural Language Processing
One of the earliest investigations in ambiguity with ML was regarding the development of natural language tasks accurately, where the algorithms were made to act on the linear separators in the feature space. This was to resolve semantic as well as syntactic errors present in the language processed by the algorithms. In a study by Dan Roth, a professor at the University of Pennsylvania, US, he presents a learning approach in which linear separators are used to resolve language ambiguity.
The study focuses on linguistic aspects such as word choice for machine translation, parts of speech-tagging and word-sense disambiguation. The study’s research paper considers the language learning process as a disambiguation problem and applies the linear separator technique. A formal definition of the disambiguation problem is defined in terms such as different word predicates, their classifications and features for the learning problem. In addition, various disambiguation methods are also emphasised for using them as linear separators.
The linear separator method mentioned in the study did perform well compared to other methods such as Naive-Bayes and Transformation-Based Learning (TBL), thereby giving a better alternative for ambiguities in natural language.
Case 2: DNA Sequencing
The advances in genomics are so swift that it has generated loads of possible data for sequencing process. Sequencing is the process of arranging nucleotides in a DNA in order to ascertain genetic information. Although there are machines which analyse sequencing in quicker times. A novel machine called Ibis (improved base identification system) was developed by Max Planck Institute for Evolutionary Anthropology in Germany, to work with Illumina, an analyser which uses fluorescence for sequencing DNA bases (the process is called ‘base calling’)
The system utilises ML and statistical methods such as clustering and support vector machines (SVM). It mainly improves the base calling process by learning the intensities (strength) of the bases in millions of DNA molecules. The intensities are labelled in the ML process. The ambiguity lies with intensities of the bases where the whole process of sequencing may be invalid if they are wrongly interpreted, or if they are not captured correctly all along the process. Ibis tackles this by making sure that the intensity levels are captured perfectly. Hence, it uses multiclass SVMs for this to achieve.
Case 3: Image Classification To Recognise Words a.k.a Visual Words
One of the most challenging problems in ML is the use of verbal descriptions for image classification (such as colour or a feature), which lead to many interpretations. Words expressing visual depictions are usually not accounted for techniques in ML such as image classification since it should consider both the image as well as textual features. It leads to a large amount of data where it may further be complicated for classification. Although there have been studies that have taken both text and image into account for training ‘visual words’, these rely on the best possible definition of each word for each visual depiction.
One such study that has alleviated this problem was by researchers at the University of Amsterdam where they devised a ‘codebook’ which contains a vocabulary for generic words mapped to image features through ML. The researchers test these on five datasets and find that the image-word matching is significantly better.
The few cases mentioned above has covered only the text aspect of ML. Just like this, ML encompasses a host of different data such as images, videos, codes etc. Ambiguity will only be less if more quality data is incorporated. In addition, the goal of the ML idealised should be precise and in tandem with the requirements of the ML project in the picture.