HOW ALEXA USES NEURAL NETWORKS
NEURAL NETWORK
Neural Network is a computational learning system that uses a network of functions to understand and translate a data input of one form into a desired output, usually in another form. The concept of the artificial neural network was inspired by human biology and the way neurons of the human brain function together to understand inputs from human senses.
What does a neural network consist of?
A typical neural network has anything from a few dozen to hundreds, thousands, or even millions of artificial neurons called units arranged in a series of layers, each of which connects to the layers on either side. Some of them, known as input units, are designed to receive various forms of information from the outside world that the network will attempt to learn about, recognize, or otherwise process. Other units sit on the opposite side of the network and signal how it responds to the information it's learned; those are known as output units. In between the input units and output units are one or more layers of hidden units, which, together, form the majority of the artificial brain. Most neural networks are fully connected, which means each hidden unit and each output unit is connected to every unit in the layers either side. The connections between one unit and another are represented by a number called a weight, which can be either positive (if one unit excites another) or negative (if one unit suppresses or inhibits another). The higher the weight, the more influence one unit has on another. (This corresponds to the way actual brain cells trigger one another across tiny gaps called synapses.)
HOW NEURAL NETWORK WORKS?
The first type of neuron that we are going to explain is Perceptron.
A perceptron uses a function to learn a binary classifier by mapping a vector of binary variables to a single binary output and it can also be used in supervised learning. In this context, the perceptron follows these steps:
- Multiply all the inputs by their weights w, real numbers that express how important the corresponding inputs are to the output,
- Add them together referred as weighted sum: ∑ wj xj,
- Apply the activation function, in other words, determine whether the weighted sum is greater than a threshold value, where -threshold is equivalent to bias, and assign 1 or less and assign 0 as an output.
ABOUT ALEXA
Amazon Alexa, also known simply as Alexa, is a virtual assistant AI technology developed by Amazon, first used in the amazon echo smart speakers developed by AmazonLab126. It is capable of voice interaction, music playback, making to-do lists, streaming podcasts, playing audiobooks, and providing weather, traffic, sports, and other real-time information, such as news. Alexa can also control several smart devices using itself as a home automation system. Users are able to extend the Alexa capabilities by installing "skills" (additional functionality developed by third-party vendors, in other settings more commonly called apps) such as weather programs and audio features.
HOW ALEXA WORKS?
Alexa is a cloud-based service with natural-language-understanding capabilities that powers devices like Amazon Echo, Echo Show, Echo Plus, Echo Spot, Echo Dot, and more. Alexa-like voice services traditionally have supported small numbers of well-separated domains, such as calendar or weather. In an effort to extend the capabilities of Alexa, Amazon in 2015 released the Alexa Skills Kit, so third-party developers could add to Alexa’s voice-driven capabilities. We refer to new third-party capabilities as skills, and Alexa currently has more than 40,000.Four out of five Alexa customers with an Echo device have used a third-party skill
Finding the most relevant skill to handle a natural utterance is an open scientific and engineering challenge, for two reasons:
1. The sheer number of potential skills makes the task difficult. Unlike traditional digital assistants that have on the order of 10 to 20 built-in domains, Alexa must navigate more than 40,000. And that number increases each week.
2. Unlike traditional built-in domains that are carefully designed to stay in their swim lanes, Alexa skills can cover overlapping functionalities. For instance, there are dozens of skills that can respond to recipe-related utterances.
The problem here is essentially a large-scale domain classification problem over tens of thousands of skills. It is one of the many exciting challenges Alexa scientists and engineers are addressing with deep-learning technologies, so customer interaction with Alexa can be more natural and friction-free.
Alexa uses a two-step, scalable, and efficient neural shortlisting-reranking approach to find the most relevant skill for a given utterance. This post describes the first of those two steps, which relies on a neural model we call Shortlister. Shortlister is a scalable and efficient architecture with a shared encoder, a personalized skill attention mechanism, and skill-specific classification networks.
The shared encoder network is hierarchical: Its lower layers are character-based and orthography sensitive and learn to represent each word in terms of character structure or shape; its middle layers are word-based, and with the outputs from the lower layers, they learn to represent an entire utterance. The skill attention mechanism is a separate network that is personalized per user. It computes a summary vector that describes which skills are enabled in a given user’s profile and how relevant they are to the utterance representation. Both the utterance representation vector and the personalized skill-summary vector feed into a battery of skill-specific classification networks, one network for each skill.
During training, the system as a whole is evaluated on the basis of the skill classification networks’ outputs. Consequently, the shared encoder learns to represent utterances in a way that is useful for skill classification, and the personalized skill attention mechanism learns to attend to the most relevant skills.
In experiments, the system performed significantly better when it used the skill attention mechanism than when it simply relied on a vector representing user-enabled skills, with one bit for each skill. But it performed better when it used both in tandem than when it used either in isolation.
While making architecture scalable to tens of thousands of skills, they keep practical constraints in mind by focusing on minimizing memory footprint and runtime latency, which are critical to the performance of high-scale production systems such as Alexa. Currently, inference consumes 50 megabytes of memory, and the p99 latency is 15 milliseconds. Moreover, our architecture is designed to efficiently accommodate new skills that become available between our full-model retraining cycles.
Thankyou For Reading