What is natural language processing? AI for speech and text

Deep learning has improved machine translation and other natural language processing tasks by leaps and bounds

What is natural language processing? AI for speech and text
Bet Noire / Getty Images

From a friend on Facebook:

Me: Alexa please remind me my morning yoga sculpt class is at 5:30am.

Alexa: I have added Tequila to your shopping list.

We talk to our devices, and sometimes they recognize what we are saying correctly. We use free services to translate foreign language phrases encountered online into English, and sometimes they give us an accurate translation. Although natural language processing has been improving by leaps and bounds, it still has considerable room for improvement.

My friend’s accidental Tequila order may be more appropriate than she thought. ¡Arriba!

What is natural language processing?

Natural language processing, or NLP, is currently one of the major successful application areas for deep learning, despite stories about its failures. The overall goal of natural language processing is to allow computers to make sense of and act on human language. We’ll break that down further in the next section.

Historically, natural language processing was handled by rule-based systems, initially by writing rules for, e.g., grammars and stemming. Aside from the sheer amount of work it took to write those rules by hand, they tended not to work very well.

Why not? Let’s consider what should be a simple example, spelling. In some languages, such as Spanish, spelling really is easy and has regular rules. Anyone learning English as a second language, however, knows how irregular English spelling and pronunciation can be. Imagine having to program rules that are riddled with exceptions, such as the grade-school spelling rule “I before E except after C, or when sounding like A as in neighbor or weigh.” As it turns out, the “I before E” rule is hardly a rule. Accurate perhaps 3/4 of the time, it has numerous classes of exceptions.

After pretty much giving up on hand-written rules in the late 1980s and early 1990s, the NLP community started using statistical inference and machine learning models. Many models and techniques were tried; few survived when they were generalized beyond their initial usage. A few of the more successful methods were used in multiple fields. For example, Hidden Markov Models were used for speech recognition in the 1970s and were adopted for use in bioinformatics—specifically, analysis of protein and DNA sequences—in the 1980s and 1990s.

Phrase-based statistical machine translation models still needed to be tweaked for each language pair, and the accuracy and precision depended mostly on the quality and size of the textual corpora available for supervised learning training. For French and English, the Canadian Hansard (proceedings of Parliament, by law bilingual since 1867) was and is invaluable for supervised learning. The proceedings of the European Union offer more languages, but for fewer years.

In the fall of 2016, Google Translate suddenly went from producing, on the average, “word salad” with a vague connection to the meaning in the original language, to emitting polished, coherent sentences more often than not, at least for supported language pairs such as English-French, English-Chinese, and English-Japanese. Many more language pairs have been added since then.

That dramatic improvement was the result of a nine-month concerted effort by the Google Brain and Google Translate teams to revamp Google Translate from using its old phrase-based statistical machine translation algorithms to using a neural network trained with deep learning and word embeddings using Google’s TensorFlow framework. Within a year neural machine translation (NMT) had replaced statistical machine translation (SMT) as the state of the art.

Was that magic? No, not at all. It wasn’t even easy. The researchers working on the conversion had access to a huge corpus of translations from which to train their networks, but they soon discovered that they needed thousands of GPUs for training, and that they would need to create a new kind of chip, a Tensor Processing Unit (TPU), to run Google Translate on their trained neural networks at scale. They also had to refine their networks hundreds of times as they tried to train a model that would be nearly as good as human translators.

Natural language processing tasks

In addition to the machine translation problem addressed by Google Translate, major NLP tasks include automatic summarization, co-reference resolution (determine which words refer to the same objects, especially for pronouns), named entity recognition (identify people, places, and organizations), natural language generation (convert information into readable language), natural language understanding (convert chunks of text into more formal representations such as first-order logic structures), part-of-speech tagging, sentiment analysis (classify text as favorable or unfavorable toward specific objects), and speech recognition (convert audio to text).

Major NLP tasks are often broken down into subtasks, although the latest-generation neural-network-based NLP systems can sometimes dispense with intermediate steps. For example, an experimental Google speech-to-speech translator called Translatotron can translate Spanish speech to English speech directly by operating on spectrograms without the intermediate steps of speech to text, language translation, and text to speech. Translatotron isn’t all that accurate yet, but it’s good enough to be a proof of concept.

Natural language processing methods

Like any other machine learning problem, NLP problems are usually addressed with a pipeline of procedures, most of which are intended to prepare the data for modeling. In his excellent tutorial on NLP using Python, DJ Sarkar lays out the standard workflow: Text pre-processing -> Text parsing and exploratory data analysis -> Text representation and feature engineering -> Modeling and/or pattern mining -> Evaluation and deployment.  

Sarkar uses Beautiful Soup to extract text from scraped websites, and then the Natural Language Toolkit (NLTK) and spaCy to preprocess the text by tokenizing, stemming, and lemmatizing it, as well as removing stopwords and expanding contractions. Then he continues to use NLTK and spaCy to tag parts of speech, perform shallow parsing, and extract Ngram chunks for tagging: unigrams, bigrams, and trigrams. He uses NLTK and the Stanford Parser to generate parse trees, and spaCy to generate dependency trees and perform named entity recognition.

Sarkar goes on to perform sentiment analysis using several unsupervised methods, since his example data set hasn’t been tagged for supervised machine learning or deep learning training. In a later article, Sarkar discusses using TensorFlow to access Google’s Universal Sentence Embedding model and perform transfer learning to analyze a movie review data set for sentiment analysis.

As you’ll see if you read these articles and work through the Jupyter notebooks that accompany them, there isn’t one universal best model or algorithm for text analysis. Sarkar constantly tries multiple models and algorithms to see which work best on his data.

For a review of recent deep-learning-based models and methods for NLP, I can recommend this article by an AI educator who calls himself Elvis.

Natural language processing services

You would expect Amazon Web Services, Microsoft Azure, and Google Cloud to offer natural language processing services of one kind or another, in addition to their well-known speech recognition and language translation services. And of course they do—not only generic NLP models, but also customized NLP.

Amazon Comprehend is a natural language processing service that extracts key phrases, places, peoples’ names, brands, events, and sentiment from unstructured text. Amazon Comprehend uses pre-trained deep learning models and identifies rather generic places and things. If you want to extend this capability to identify more specific language, you can customize Amazon Comprehend to identify domain-specific entities and to categorize documents into your own categories

Microsoft Azure has multiple NLP services. Text Analytics identifies the language, sentiment, key phrases, and entities of a block of text. The capabilities supported depend on the language.

Language Understanding (LUIS) is a customizable natural-language interface for social media apps, chat bots, and speech-enabled desktop applications. You can use a pre-built LUIS model, a pre-built domain-specific model, or a customized model with machine-trained or literal entities. You can build a custom LUIS model with the authoring APIs or with the LUIS portal.

For the more technically minded, Microsoft has released a paper and code showing you how to fine-tune a BERT NLP model for custom applications using the Azure Machine Learning Service.

Google Cloud offers both a pre-trained natural language API and customizable AutoML Natural Language. The Natural Language API discovers syntax, entities, and sentiment in text, and classifies text into a predefined set of categories. AutoML Natural Language allows you to train a custom classifier for your own set of categories using deep transfer learning.

Copyright © 2019 IDG Communications, Inc.