Do not be afraid to “Make”: An automotive grade, off -line conversational digital assessment boot-strapped by open-source

Walid Negm

Engineering amazing things | Nothing ventured, nothing gained - GenAI, Automotive Software, Cloud-Native & Open Source

Published Sep 3, 2024

As Deloitte gears up for MoveAmerica ‘24 we decided to switch gears and implement an onboard (NVIDIA DRIVE AGX) conversational digital assistant as part of an automotive digital dash that showcases multi-screen and multi-model interactions. We wanted to show how OEMs can quickly leverage open source and ready-made functions to support multiple operating systems from Red Hat In-vehicle Operating System, Android Automotive operating systems (AAOS) and QNX.

While technically not rocket science the timelines were short, and we needed to speed up the development process that included:

An onboard (off-line) speech to text model e.g. whisper by Open AI
Session management and voice activation e.g. wake-up word
Support multiple target devices / SoC (TI, IMX, NVIDIA) and Operating systems
Full control over UI's look and feel with state-of-the-art 2D and 3D graphics using QT
A large language model for the chatbot implementation, including guardrails & a knowledge base to keep the conversation bounded and contextualized to the driver e.g. manuals, diagnostics, directions, state of charge

I will show you the overall demo next week.

What is most inspiring for me though---is being able to implement everything myself in less than 2 days. I decided to implement the off-line speech to text, voice activation and text to speech engine as a small application on my own x86 server.

The following command line was used to run the conversational AI after build / compilation to target (WSL/Intel x86) platform:

./talk-llama -mw ./models/ggml-base.en.bin -ml ./models/llama-2-7b.Q2_K.gguf -p "Walid" -t 8 -sf ./to_speak.txt -s ./gtts.sh -vth .60 -w "ambition"

My Project Objectives:

Use Whisper which is OpenAI’s general-purpose speech recognition model that is trained on a large dataset of diverse audio—and comes pre-trained e.g. base.en, tiny.end etc. Whisper can be installed (pip install -U openai-whisper) and used inside of python as a library. You can download Whisper's Language Model files and use them in your python code, for example.
However, I wanted to use an off-line version of Whisper that is compiled to my target platform (Intel x86). I was not about to start converting the pytorch model to GGML which is a C++ tensor library that can run on multiple hardware and OS’ platforms. The GGML library is similar to ML libraries such as PyTorch and TensorFlow allowing LLM’ to run on hardware with techniques like quantization for speed and efficiency. Luckily there is an open-source community of C++implementations of various transfer models: i) Llama2.c: Inference Llama 2 in one file of pure C, ii) whisper.cpp: Port of OpenAI’s Whisper model in C/C++ and iii) stable-diffusion.cpp: Stable Diffusion in pure C/C++

Whisper.cpp

An open-source project which is an implementation of OpenAI’s whisper project in a C implementation using GGML (C library for machine learning). The core tensor operations are implemented are implemented in C using GGML. An original whisper model and its weights are loaded by first converting to a custom format. Any type of model can be used Keras, Tensorflow, PyTorch, MXNet, JAX, etc. Whisper models are in PyTorch. I had to type “make” and the C code was compiled without too much fuss.
The Whisper.cpp project does much more, for example: (i) plenty of command line options and supports integer quantization of the Whisper ggml models. Iii) Quantized models that require less memory and disk space and depending on the hardware can be processed more efficiently.
Compiling whisper.cpp was not so hard, just needed build-essentials for my Ubuntu installation on WSL that includes a list of packages considered essential for building software.
To honest everything was very well documented by ggerganov (Georgi Gerganov) (github.com) in the README.md. However, I did have to set up Windows Subsystem for Linux with root etc. and get re-familiar with mnt, make, setting up python environments etc.

Recommended by LinkedIn

The Most Insane Week in The History of AI

AIM 1 year ago

ODSC's AI Weekly Recap: Week of August 9th

Open Data Science Conference (ODSC) 4 months ago

LLM Pulse - Nov 1, 2024

Blackstraw 2 months ago

talkllama.cpp

While I wanted to transcribe—I also want to take the text and then answer a question--which is the basis of the conversation. An offline LLM model would be needed. Thankfully the same project had an implementation called “talkllama” that allowed me to choose my LLM model, in my case the LLaMa (llama-2-7b) GPT model from Hugging Face.
The talk-llama tool supports session management to enable more coherent and continuous conversations.
I would also need a text to speech engine: 1) Coqui TTS models, installing from PyPI is the easiest option. 2) gTTS (Google Text-to-Speech), a Python library and CLI tool to interface with Google Translate's text-to-speech API. (online). 3) The Pico Text-to-Speech (TTS) service uses the TTS binary from SVOX for producing spoken text (off-line)

Lessons from Open Source

No substitute to coding—if you want to profoundly learn and internalize deep technology.
Open source is about standing on the shoulders of others, and it allows for monumental gains time to market. For example, I started with pico as the speech synthesis then moved to Coqui that is customizable and supports a range of languages and voices, making it a great choice for more human-friendly TTS applications
As a novice, but suspect this translates into the real-world somehow, 80% of my integration effort was spend in configuration management:

...libraries, SDKs, certificates, models, source code versions, C++ compiler versions, build dependencies, python versions & virtual environments, sudo apt install nvidia-cuda-toolkit, package management, OS compatibility etc....

While containerization (e.g. docker, OCI) is ultimately is the answer to portability, when you bubble up to the product there is a whole world of "SW bill of materials management" from a product-line perspective which are getting even more complicated for complex products like cars—where the OEM is starting from a world of parts.
With the use GPT model libraries, there is a world of issues to content with and vulnerabilities—that is alleviated by ML & DevOps—but its gnarly out there: Tokens to authorize the use of models, Guardrails and prompt engineering. Local vs. online models, Public versus private

Living in the world of Linux has not changed much--it's still the same friendly environment, but it's layered now with machine learning / Gen AI frameworks, CUDA etc. ….and that is why we have Amazon Web Services, Azure and Google—to abstract all of these issues away as infrastructure as a code/as a service so developers can focus on building value

The most important lesson for me personally was being fortunate to study computer hardware and learning C in my early years. Those hard knock days where a good foundation. Now I am able to tinker and "do it myself' in code....and "Make" pretty much anything.

Xiao-Fei Zhang

3mo

What’s like to code ML in c++?

1 Reaction

Roger Mazzella

Strategic Product Solutions, Industries, Partner Alliances

3mo

Our team here at Qt Group appreciates the incredible collaboration with your team at Deloitte around this innovative initiative. Excited for this to be showcased at Move America 24 and driving this collaboration further post-conference. Thanks for all the great work together Walid.. Excited at what the future holds!

2 Reactions

Ajit Kolhe

WW Partner Lead - Automotive Segment at Amazon Web Services

3mo

Great post Walid! Looking forward to meet at MOVE America

1 Reaction

Dawn Masterson

3mo

Great post, Walid. I must say my long weekend looked very different from yours! Impressive!

1 Reaction

Peter Bookman

Founder & CEO @ GUARDDOG AI | CTO, CMO

3mo

I suddenly feel like and underachiever with my weekend coding projects. Well done!

Do not be afraid to “Make”: An automotive grade, off -line conversational digital assessment boot-strapped by open-source

Walid Negm

Engineering amazing things | Nothing ventured, nothing gained - GenAI, Automotive Software, Cloud-Native & Open Source

Recommended by LinkedIn

More articles by this author

Insights from the community

Others also viewed

A large week for Mistral

The Most Insane Week in The History of AI

Sir Paul McCartney: Thanks To AI, There Will Be A Final Beatles Song Featuring John Lennon's voice! 🎵

AI & Startups September 23rd - September 29th

AMD Enters the Chat!

NVIDIA Reshapes the AI Landscape: Beyond GPUs to Language Models

AI - Monday, November 18, 2024: Commentary with Notable and Interesting News, Articles, and Papers

Top AI Tech Trend Coheres’ Embed 3 & Runways’ Character Animation

AI Newsletter

AI in 2024 - The Latest Innovations Shaping Our Future

Explore topics

Recommended by LinkedIn

I Think, Therefore I Am: Your Car's AI Assistant

Dec 19, 2024

AI's crush on vehicle software bugs

Jul 12, 2024

I only want a software car and soon everyone else will.

May 1, 2024

Towards a quality SdV (& EV) experience

Nov 2, 2023

Gen AI: To code or not to code?

Jul 19, 2023

Scaling up the Robotaxi with reliable rides

Jan 4, 2023

Towards big dreams: Charging for subscription-based car functions and our willingness to pay.

Dec 30, 2022

AN APTITUDE IN QUALITY FOR EMBEDDED SOFTWARE AND SYSTEMS

Oct 17, 2022

12 steps to build a self-driving car. Really.

Dec 15, 2021

Baffled by autonomous vehicles: Here is a 101 on vehicle perception and situational awareness

Dec 14, 2021

Insights from the community

Others also viewed

A large week for Mistral

The Most Insane Week in The History of AI

Sir Paul McCartney: Thanks To AI, There Will Be A Final Beatles Song Featuring John Lennon's voice! 🎵

AI & Startups September 23rd - September 29th

AMD Enters the Chat!

NVIDIA Reshapes the AI Landscape: Beyond GPUs to Language Models

AI - Monday, November 18, 2024: Commentary with Notable and Interesting News, Articles, and Papers

Top AI Tech Trend Coheres’ Embed 3 & Runways’ Character Animation

AI Newsletter

AI in 2024 - The Latest Innovations Shaping Our Future

Explore topics