Consumer Analytics using Natural Language Processing and Artificial Intelligence in the Cloud
By Veena Mokal* and Wolfgang Gentzsch*
Customers and shoppers have benefited greatly from advances in internet connectivity in recent years. Rapidly growing e-commerce firms have yielded actual big data as a result of these developments. The enormous popularity of big data on social media allows buyers to express their opinions and views on a wide range of topics, such as the state of the economy, or to express their unhappiness with specific products or services, or to express their joy with their purchases.
A significant number of consumer comments and product evaluations provide a wealth of useful information and have recently emerged as important resource for both consumers and businesses. Consumers frequently seek quality information from online reviews before purchasing a product, and many businesses use online reviews as crucial input for their products, marketing, and customer relationship management.
Therefore, understanding the psychology behind online consumer behavior became the key to compete in today's markets which are characterized by ever-increasing competition and globalization.
Sentiment analysis & text analysis are applications of big data analysis, which aim to aggregate and extract emotions and feelings from many types of reviews. These big data which is growing exponentially are mainly available in an unstructured format, making it impossible to interpret with human efforts. As a result, employing Natural Language Processing (NLP) Machine Learning, which focuses on gathering facts and opinions from the huge amount of information available on the internet, is crucial.
This article (based on a more extensive case study, free download here) presents the application of an NLP – Machine Learning model to predict sentiments based on consumer evaluations retrieved from social media and e-commerce websites. The NLP process consists of several steps, with:
1. Data Pre-processing and Feature Extraction turns your text into a predictable and analyzable format for your task. Tokenization, lower casing, stop words removal, stemming, lemmatization and Parts-of-speech tagging are some of the stages involved in data pre-processing and feature extraction.
2. Perform Sentiment Analysis on each review, categorizing it as excellent or poor and then generating sentiments.
3. Topic Modelling is used to find themes of interest from a set of review data. These are aspects, and there could be multiple words for the same aspect.
4. Algorithm Development creating a predictive model that can predict and classify any input review statement using ML techniques that leverage statistical methods to compute sentiment scores. They refine their own rules by repeated training based on the training data they are supplied.
This research addresses the fundamental challenge of customer behavior by utilizing advanced Machine Learning algorithms that democratize and enable real-time access to key insights for your niche. It is a useful resource for assessing affective information in social platforms and ecommerce channels, as it relies not only on domain-specific keywords but also on common sense knowledge that allows for extrapolation of cognitive and affective information connected with natural language text.
Performance Benchmarking on Workstation and HPC Cloud
The NLP – Machine Learning algorithm for e-commerce is a very compute intensive technique, therefore, to complete the study, we have run a performance analysis using a high-performance desktop machine that has 16 CPU Cores and 32 GB RAM. The performance analysis was conducted to study the computing system requirement to run millions of review data as a next step in the HPC Cloud:
The HPC environment features the Python-based Anaconda platform that aided in data analysis and the construction of predictive models. Dealing with such large volumes of data is a real challenge for this project and demands a significant amount of computing power. Therefore, we found that the handling and speeding up the processing of such massive amounts of data is ideally made possible by scaling the algorithm on cloud HPC.
Further experiments conducted in the HPC Cloud environment will demonstrate the ability to remotely set up and run Big Data analysis as well as build AI models in the cloud. Next, the AI - Machine Learning model setup requirements will be pre-installed in UberCloud’s HPC container, allowing the user to access the tools without installing any kind of prior set up.
Acknowledgement: The authors would like to thank Praveen Bhat, HPC/Python Technology Consultant for his support during the implementation and the benchmarking of the NLP application.
The complete case study can be downloaded for free HERE.
_____________
*) Veena Mokal is a Data Science Expert with MBA in Business Analytics from the Institute of Management Technology in India. Wolfgang Gentzsch is co-founder and president of UberCloud who loves working with and encouraging young engineers to publish excellent research for the broader public.