Fundamentals of Google BERT and Transformers


BERT stands for Bidirectional Encoder Representations from Transformers. It is a technique for NLP pre-training developed by Google. Jacob Devlin and his colleagues from Google created and published BERT in 2018.

BERT helps understand better the context of words in searches and match those queries to the results better. It is also used to display snippets as featured snippets.

BERT was open-sourced last year and talked about in detail in the Google AI blog. Google gave a few examples as to how best helps in the searches.

How does the BERT system work?

BERT systems would be particularly helpful for searches where prepositions like for and to matter a lot to the meaning. Google gives an example of the search “2019 brazil traveler to the USA need a visa” and explains how the usage of the word ‘to’ is important in the search. It’s about a Brazilian traveling to the U.S. and not the other way around. Previously, google algorithms wouldn't understand the importance of this connection, and we returned results about U.S. citizens traveling to Brazil. With BERT, Search is able to grasp this nuance and know that the very common word “to” actually matters a lot here, and Google can provide a much more relevant result for this query.


Working of Google BERT system

Similarly, queries like “do estheticians stand a lot at work.” Previously, google systems were taking an approach of matching keywords, matching the term “stand-alone” in the result with the word “stand” in the query. But that isn’t the right use of the word “stand” in context. The BERT models, on the other hand, understand that “stand” is related to the concept of the physical demands of a job, and displays a more useful response.

understanding Google BERT for Natural Language Processing

Source: https://www.blog.google/products/search/search-language-understanding-bert/


Improvement in other languages


A powerful characteristic of BERT systems is that it can use their learnings from one language and use it in others. So BERT can take models that learn from improvements in English (a language where the vast majority of web content exists) and apply them to other languages. This helps BERT, better return relevant results in the many languages that search is offered in.

Technology Behind Google BERT

BERT builds upon pre-training contextual representations — including Semi-supervised Sequence Learning, Generative Pre-Training, Elmo, and ULMFit. However, unlike these previous models, BERT is the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus says the Google AI blog.

Comparison between the architecture of Google BERT, OpenAI GPT-3, and ELMo

BERT makes use of TRANSFORMER, which is an attention mechanism that learns the context in which the words are used and the relation between them. The BERT transformer consists of two mechanisms, in its vanilla form:  an encoder that reads the text input and a decoder that produces a prediction for the task. Since BERT’s goal is to generate a language model, only Encoder is required.

BERT uses two training strategies:

Masked LM (MLM)

Before feeding word sequences into BERT, 15% of the words in each sequence are replaced with a [MASK] token. The BERT model then attempts to predict the original value of the masked words, based on the context provided by the other, non-masked, words in the sequence.

Next Sentence Prediction (NSP)

In the BERT training process, the model receives pairs of sentences as input and learns to predict if the second sentence in the pair is the subsequent sentence in the original document. During training, 50% of the inputs are a pair in which the second sentence is the subsequent sentence in the original document, while in the other 50% a random sentence from the corpus in BERT is chosen as the second sentence. The assumption is that the random sentence will be disconnected from the first sentence.

 RankBrain VS BERT

RankBrain by Google, which was the first artificial intelligence solution for understanding searches is not dead. BERT systems are an additional aid to decipher the different queries. The method better suitable will be used for the queries. In fact, queries can use multiple methods together in BERT.



Founder Of Aipoint, A very creative machine learning researcher that loves playing with the data.