How Google’s Open Source BERT Model is Enhancing NLP
Josh MiramantAugust 27, 2020
Bidirectional Encoder Representations from Transformers, otherwise known as BERT; is a training model that has drastically improved the efficiency and effect of NLP models. Now that Google has made BERT models open source it allows for the improvement of NLP models across all industries. In the article, we take a look at how BERT is making NLP into one of the most powerful and useful AI solutions in today's world.
Applying BERT models to Search
Google’s search engine is world-renowned for its ability to present relevant content and they have made this natural language processing program open source to the world.
The ability of a system to read and interpret natural language is becoming more and more vital as the world exponentially produces new data. Google’s library of word meanings, phrases, and general ability to present relevant content, is OPEN SOURCE. Beyond natural language processing, their BERT model has the ability to extract information from large amounts of unstructured data and can be applied to create search interfaces for any library.
In this article, we will compare before and after BERT enhanced search results and dissect an application in the energy sector.
BERT (Bidirectional Encoder Representations from Transformers) is a pre-training approach proposed by the Google AI Language group, developed to overcome a common issue of early NLP models: the lack of sufficient training data.
Let us elaborate, without going into too much detail:
Low-level (e.g. named entity recognition, topic segmentation) and high-level (e.g. sentiment analysis, speech recognition) NLP tasks require task-specific annotated datasets. While they are hard to come by and expensive to assemble, labeled datasets play a crucial role in the performance of both shallow and deep neural network models. High-quality inference results could only be achieved when millions or even billions of annotated training examples were available. And that was a problem that made many NLP tasks unapproachable. That is until BERT was developed.
BERT is a general-purpose language representation model, trained on large corpora of unannotated text. When the model is exposed to large amounts of text content, it learns to understand context and relationships between words in a sentence. Unlike previous learning models that only represented meaning at a word level (bank would mean the same in “bank account” and “grassy bank”), BERT actually cares about context. That is, what comes before and after the word in a sentence. Context turned out to be a major missing capability of NLP models, with a direct impact on model performance. Designing a context-aware model such as BERT is known by many as the beginning of a new era in NLP.
Training BERT on large amounts of text content is a technique known as pre-training. This means that the model’s weights are adjusted for general text understanding tasks and that more fine-grained models can be built on top of it. The authors have proved the superiority of such a technique when they employed BERT-based models on 11 NLP tasks and have achieved state-of-the-art results.
The best thing is: pre-trained BERT models are open source and publicly available. This means that anyone can tackle NLP tasks and build their models on top of BERT. Nothing can beat that, right? Oh, wait: this also means that NLP models can now be trained (fine-tuned) on smaller datasets, without the need of training from scratch. The beginning of a new era, indeed.
These pre-trained models help companies cut down the cost and time to deploy for NLP models to be used internally or externally. The effectiveness of well-trained NLP models is emphasized by Michael Alexis, CEO of a virtual team-culture building company, teambuilding.com.
“The biggest benefit of NLP is the scalable and consistent inference and processing of information.” - Michael Alexis CEO of teambuilding.com
Michael states how NLP can be applied to culture fostering programs such as icebreakers or surveys. A company can gain valuable insight into how company culture is doing by analyzing the responses of employees. This is achieved not only through just analyzing text but analyzing the annotation of text. Essentially the model also “reads between the lines” to draw inferences on emotion, feel, and overall outlook. BERT can aid in situations such as this one by pretraining models with a basis of indicators that it can go off to uncover the nuances of language and provide more accurate insights.
The capability to model context has turned BERT into an NLP hero and has revolutionized Google Search itself. Below is a quote from the Google Search product team and their testing experiences, while they were tuning BERT to understand the intent behind a query.
“Here are some of the examples that demonstrate BERT’s ability to understand the intent behind your search. Here’s a search for “2019 brazil traveler to USA needs a visa.” The word “to” and its relationship to the other words in the query are particularly important to understanding the meaning. It’s about a Brazilian traveling to the U.S. and not the other way around. Previously, our algorithms wouldn't understand the importance of this connection, and we returned results about U.S. citizens traveling to Brazil. With BERT, Search is able to grasp this nuance and know that the very common word “to” actually matters a lot here, and we can provide a much more relevant result for this query.” - Understanding searches better than ever before, by Pandu Nayak, Google Fellow and Vice Presient of Search.
In our last piece on NLP and OCR, we have illustrated some NLP uses in the real-estate sector. We have also mentioned how “NLP tools are ideal information extraction tools”. Let us look at the energy sector and see how disruptive NLP technologies like BERT enable new application use cases.
Applications of NLP in the Energy Sector
NLP models can extract information from large amounts of unstructured data
One way in which NLP models can be used is for the extraction of critical information from unstructured text data. Emails, journals, notes, logs, and reports are all examples of text data sources that are part of businesses’ daily operations. Some of these documents may prove crucial in organizational efforts to increase operational efficiency and reduce costs.
When aiming to implement wind turbine predictive maintenance,failure reports may contain critical information about the behavior of different components. But since different wind turbine manufacturers have different data collection norms (i.e. maintenance reports come in different formats and even languages), manually identifying relevant data items could quickly become expensive for the plant owner. NLP tools can extract relevant concepts, attributes, and events from unstructured content. Text analytics can then be employed to find correlations and patterns in different data sources. This gives plant owners the chance to implement predictive maintenance based on quantitative measures identified in their failure reports.
NLP models can provide natural language search interfaces
Similarly, geoscientists working for oil and gas companies usually need to review many documents related to past drilling operations, well logs, and seismic data. Since such documents also come in different formats and are usually spread across a number of locations (both physical and digital), they waste a lot of time looking for the information in the wrong places. A viable solution in such a case would be an NLP-powered search interface, which would allow users to look up data in natural language. Then, an NLP model could correlate data across hundreds of documents and return a set of answers to the query. The workers can then validate the output based on their own expert knowledge and the feedback would further improve the model.
However, there are also technical considerations for deploying such models. One aspect would be that industry-specific jargon can confuse traditional learning models that do not have the appropriate semantic understanding. Secondly, the models’ performance may be affected by the size of the training dataset. This is when pre-trained models such as BERT can prove beneficial. Contextual representations can model the appropriate word meaning and remove any confusion caused by industry-specific terms. By using pre-trained models, it is possible to train the network on smaller datasets. This saves time, energy, and resources that would have otherwise been necessary for training from scratch.
What about your own business?
Can you think of any NLP tasks that might help you cut down on costs and increase operational efficiency?
Josh Miramant is the CEO and founder of Blue Orange Digital, a data science and machine learning agency with offices in New York City and Washington DC. Miramant is a popular speaker, futurist, and a strategic business & technology advisor to enterprise companies and startups. He is a serial entrepreneur and software engineer that has built and scaled 3 startups. He helps organizations optimize and automate their businesses, implement data-driven analytic techniques, and understand the implications of new technologies such as artificial intelligence, big data, and the Internet of Things.
Featured on IBM ThinkLeaders, Dell Technologies, and NYC’s Top 10 AI Development and Custom Software Development Agencies as reviewed on Clutch and YahooFinance for his contributions to NLP, AI, and Machine Learning. Specializing in predictive maintenance, unified data lakes, supply chain/grid/marketing/sales optimization, anomaly detection, recommendation systems, among other ML solutions for a multitude of industries.