As Deloitte gears up for MoveAmerica ‘24 we decided to switch gears and implement an onboard (NVIDIA DRIVE AGX) conversational digital assistant as part of an automotive digital dash that showcases multi-screen and multi-model interactions. We wanted to show how OEMs can quickly leverage open source and ready-made functions to support multiple operating systems from Red Hat In-vehicle Operating System, Android Automotive operating systems (AAOS) and QNX.
While technically not rocket science the timelines were short, and we needed to speed up the development process that included:
An onboard (off-line) speech to text model e.g. whisper by Open AI
Session management and voice activation e.g. wake-up word
Support multiple target devices / SoC (TI, IMX, NVIDIA) and Operating systems
Full control over UI's look and feel with state-of-the-art 2D and 3D graphics using QT
A large language model for the chatbot implementation, including guardrails & a knowledge base to keep the conversation bounded and contextualized to the driver e.g. manuals, diagnostics, directions, state of charge
I will show you the overall demo next week.
What is most inspiring for me though---is being able to implement everything myself in less than 2 days. I decided to implement the off-line speech to text, voice activation and text to speech engine as a small application on my own x86 server.
The following command line was used to run the conversational AI after build / compilation to target (WSL/Intel x86) platform:
Use Whisper which is OpenAI’s general-purpose speech recognition model that is trained on a large dataset of diverse audio—and comes pre-trained e.g. base.en, tiny.end etc. Whisper can be installed (pip install -U openai-whisper) and used inside of python as a library. You can download Whisper's Language Model files and use them in your python code, for example.
However, I wanted to use an off-line version of Whisper that is compiled to my target platform (Intel x86). I was not about to start converting the pytorch model to GGML which is a C++ tensor library that can run on multiple hardware and OS’ platforms. The GGML library is similar to ML libraries such as PyTorch and TensorFlow allowing LLM’ to run on hardware with techniques like quantization for speed and efficiency. Luckily there is an open-source community of C++implementations of various transfer models: i) Llama2.c: Inference Llama 2 in one file of pure C, ii) whisper.cpp: Port of OpenAI’s Whisper model in C/C++ and iii) stable-diffusion.cpp: Stable Diffusion in pure C/C++
Whisper.cpp
An open-source project which is an implementation of OpenAI’s whisper project in a C implementation using GGML (C library for machine learning). The core tensor operations are implemented are implemented in C using GGML. An original whisper model and its weights are loaded by first converting to a custom format. Any type of model can be used Keras, Tensorflow, PyTorch, MXNet, JAX, etc. Whisper models are in PyTorch. I had to type “make” and the C code was compiled without too much fuss.
The Whisper.cpp project does much more, for example: (i) plenty of command line options and supports integer quantization of the Whisper ggml models. Iii) Quantized models that require less memory and disk space and depending on the hardware can be processed more efficiently.
Compiling whisper.cpp was not so hard, just needed build-essentials for my Ubuntu installation on WSL that includes a list of packages considered essential for building software.
To honest everything was very well documented by ggerganov (Georgi Gerganov) (github.com) in the README.md. However, I did have to set up Windows Subsystem for Linux with root etc. and get re-familiar with mnt, make, setting up python environments etc.
While I wanted to transcribe—I also want to take the text and then answer a question--which is the basis of the conversation. An offline LLM model would be needed. Thankfully the same project had an implementation called “talkllama” that allowed me to choose my LLM model, in my case the LLaMa (llama-2-7b) GPT model from Hugging Face.
The talk-llama tool supports session management to enable more coherent and continuous conversations.
I would also need a text to speech engine: 1) Coqui TTS models, installing from PyPI is the easiest option. 2) gTTS (Google Text-to-Speech), a Python library and CLI tool to interface with Google Translate's text-to-speech API. (online). 3) The Pico Text-to-Speech (TTS) service uses the TTS binary from SVOX for producing spoken text (off-line)
Lessons from Open Source
No substitute to coding—if you want to profoundly learn and internalize deep technology.
Open source is about standing on the shoulders of others, and it allows for monumental gains time to market. For example, I started with pico as the speech synthesis then moved to Coqui that is customizable and supports a range of languages and voices, making it a great choice for more human-friendly TTS applications
As a novice, but suspect this translates into the real-world somehow, 80% of my integration effort was spend in configuration management:
While containerization (e.g. docker, OCI) is ultimately is the answer to portability, when you bubble up to the product there is a whole world of "SW bill of materials management" from a product-line perspective which are getting even more complicated for complex products like cars—where the OEM is starting from a world of parts.
With the use GPT model libraries, there is a world of issues to content with and vulnerabilities—that is alleviated by ML & DevOps—but its gnarly out there: Tokens to authorize the use of models, Guardrails and prompt engineering. Local vs. online models, Public versus private
Living in the world of Linux has not changed much--it's still the same friendly environment, but it's layered now with machine learning / Gen AI frameworks, CUDA etc. ….and that is why we have Amazon Web Services, Azure and Google—to abstract all of these issues away as infrastructure as a code/as a service so developers can focus on building value
The most important lesson for me personally was being fortunate to study computer hardware and learning C in my early years. Those hard knock days where a good foundation. Now I am able to tinker and "do it myself' in code....and "Make" pretty much anything.
Our team here at Qt Group appreciates the incredible collaboration with your team at Deloitte around this innovative initiative. Excited for this to be showcased at Move America 24 and driving this collaboration further post-conference. Thanks for all the great work together Walid.. Excited at what the future holds!
What’s like to code ML in c++?
Strategic Product Solutions, Industries, Partner Alliances
3moOur team here at Qt Group appreciates the incredible collaboration with your team at Deloitte around this innovative initiative. Excited for this to be showcased at Move America 24 and driving this collaboration further post-conference. Thanks for all the great work together Walid.. Excited at what the future holds!
WW Partner Lead - Automotive Segment at Amazon Web Services
3moGreat post Walid! Looking forward to meet at MOVE America
Team Builder & Leader | Cloud & GenAI Champion | Ally | Diversity & Inclusion Advocate | Board Member | Public Speaker
3moGreat post, Walid. I must say my long weekend looked very different from yours! Impressive!
Founder & CEO @ GUARDDOG AI | CTO, CMO
3moI suddenly feel like and underachiever with my weekend coding projects. Well done!