Natural Language Processing has garnered great attention of late for two reasons — there is so much room for improvement and any chance of success being immensely rewarding.
Neural networks, that are widely tasked with NLU, usually process language by generating fixed-or-variable-length vector-space representations. After starting with representations of individual words or even pieces of words, they aggregate information from surrounding words to determine the meaning of a given bit of language in context.
Whereas, large scale knowledge graphs, which are usually known for their ability to support NLP applications like semantic search or dialogue generation have shown promising results in drawing intelligent insights
Knowledge Graphs(KGs) are intrinsically incomplete and adopting reinforcement learning gives the model a better chance in targeted searches.
But incorporating Knowledge Graphs for language models comes with its own set of issues:
- Structured Knowledge Encoding: extracting and encoding related informative facts in KGs for language representation models is an important problem.
- Heterogeneous Information Fusion: the pre-training procedure for language representation is quite different from the knowledge representation procedure, leading to two individual vector spaces.
- Designing a special to fuse lexical, syntactic, and knowledge information.
To address these issues, researchers from Tsinghua University and Huawei Noah’s Ark Lab recently proposed a new model ERNIE(Enhanced Language Representation with Informative Entities), that incorporates knowledge graphs (KG) into training on large-scale corpora for language representation.
For the Structured Knowledge Encoding, named entities mentioned in the text were first identified and then aligned to the corresponding entities in KGs.
For the Heterogeneous Information Fusion, ERNIE utilizes a BERT-like architecture and adds a new objective for better-named entity alignments.
How ERNIE Got The Better Of BERT
Pre-training a binarised prediction model helps understanding common NLP tasks like Question Answering or Natural language Inference.
Unidirectional models are efficiently trained by predicting each word conditioned on the previous words in the sentence. However, it is not possible to train bidirectional models by simply conditioning each word on its previous and next words, since this would allow the word that’s being predicted to indirectly “see itself” in a multi-layer model.
Bidirectional Encoder Representations from Transformers or BERT, which was open sourced late 2018, offered a new ground to embattle the intricacies involved in understanding the language models.
BERT boasts of training any question answering model under 30 minutes. Given the number of steps BERT operates on, this is quite remarkable. And, this was only made possible by Google’s custom-built cloud TPUs which can accelerate dense matrix multiplications and convolutions and, minimize the time-to-accuracy while training large models.
Although BERT has been quite reliable in the way it has captured rich semantic meanings from plain text, when asked the question “Is Bob Dylan a songwriter or a book author?” BERT’s response follows a tortuous path. It is certainly tricky even for a human being to wrap their minds around the fact that Bob Dylan, a musician won Nobel for literature.
To fill in for these shortcomings of BERT and to improvise on the advantages of knowledge graphs, the researchers demonstrate the findings on ERNIE.
ERNIE has achieved comparable results with the basic version of BERT on eights datasets of the GLUE (General Language Understanding Evaluation) benchmark, which indicates that ERNIE does not degenerate its performance on other common NLP tasks.
The above figure is an example of incorporating extra knowledge information for language understanding. The solid lines present the existing knowledge facts. The red dotted lines present the facts extracted from the sentence in red. The green dot-dash lines present the facts extracted from the sentence in green.
As shown above, without knowing Blowin’ in the Wind and Chronicles: Volume One is song and book respectively, it is difficult to recognize the two occupations of Bob Dylan, i.e., songwriter and writer, on the entity typing task. Furthermore, it is nearly impossible to extract the fine-grained relations, such as composer and author on the relation classification task. For the existing pre-trained language representation models.
To evaluate ERNIE’s performance, experiments were conducted primarily on two knowledge driven NLP tasks: Entity typing and Relation classification. English Wikipedia was used as the corpus for pretraining, and Wikidata was used to align the text (Knowledge embeddings were trained on Wikidata with TransE).
Future Of ERNIE
Large scale knowledge graphs are usually known for their ability to support NLP applications like semantic search or dialogue generation. Whereas, companies like Pinterest already have opted for a graph-based system(Pixie) for real-time high-performance tasks.
The researchers behind ERNIE believe that there are three important directions remain for future research:
- Inject knowledge into feature-based pre-training models such as ELMo.
- Introduce diverse structured knowledge into language representation models
- Annotate more real-world corpora heuristically for building larger pre-training data.
Transfer learning with unsupervised pre-training forms the foundation of many natural language understanding systems. And, with a deep bi-directional architecture of BERT, the above findings seem more authentic.
The transformer network helps to gain insights on how the information flows in the architecture. This study also helped researchers to demonstrate how sufficiently pre-trained models lead to improvements with trivial tasks when scaled to extremities.
The next big challenge for these NLP models is to reach a human-level understanding of language which has been in the pursuit since the times of Leibniz and Descartes.
Know more about ERNIE here
Download the pre-trained model here.