Aleph Alpha: Rome Wasn't Built in a Day—And neither was Pharia 1- LLM-7B model

Gary Hilgemann

REBOTNIX | INDUSTRIAL AI DRIVEN | A HUMAN

Published Aug 29, 2024

In the fast-paced world of AI development, understanding and optimizing the performance of models is essential. Recently, I conducted an experiment to evaluate the inference times and GPU utilization of the Aleph-Alpha/Pharia-1-LLM-7B model. This test aimed to provide a clear, data-driven picture of how the model performs under different conditions and what kind of computational resources it requires.

The Pharia-1-LLM-7B Model

The Pharia-1-LLM-7B model family, developed by Aleph Alpha Research. We tested the foundation model Pharia-1-LLM-7B-control-aligned for this articel. This model is designed for multilingual support, specifically optimized for German, French, and Spanish, making them culturally and linguistically vrsatile. The models are available under the Open Aleph License, permitting non-commercial research and educational use.

The Setup

For our test, the Pharia-1-LLM-7B model was run on a Gustav AGX Orin platform with 64 GB of GPU RAM, supported by a Braincell module. The AGX Orin is a low-energy GPU system built on an ARM 64-bit architecture. This setup allowed us to load and test the model almost instantly, making it easier to benchmark its performance in real-time. Throughout the test, the system consumed approximately 30-35 watts of power, which is impressively efficient given the computational load. We had to custom cross-compile pytorch to run the model under Jetpack 5.1 as well under Jetpack 6 to make sure that we can accelerate all GPU cuda cores

The Testing Process

The experiment involved processing a series of queries to measure how long it took for the model to generate responses. Importantly, the model was loaded into memory only once at the start of the test. After this initial load, multiple queries were sent to the model in a loop, simulating a real-world scenario where the model needs to handle a continuous stream of requests.

We enabled CPU profiling and GPU memory check to capture detailed metrics about resource usage and identify the most resource-intensive operations during inference.

Key steps in the process included:

Timing Inference: Recording the time taken by the model to generate a response for each query.
Profiling: Focusing on CPU and mem GPU activities to monitor which operations consumed the most time and resources.
Resource Utilization: Measuring both the active and reserved GPU memory usage to ensure the model was running efficiently.

Loop questions

# Array of different questions

Q: When was Rome founded?

A: Rome was founded on April 21, 753 BC, according to tradition.

Time taken: 3.0334 seconds

Q: What is the capital of France?"

A: The capital of France is Paris.

Time taken: 0.8495 seconds

Q: Who invented the telephone?

A: The telephone was invented by Alexander Graham Bell. He made the first successful telephone call on March 10, 1876, to his assistant, Thomas A. Watson, saying, "Mr. Watson, come here!

Time taken: 5.2807 seconds

Q: How does photosynthesis work?

A: Photosynthesis is a process used by plants, algae, and some bacteria to convert sunlight into chemical energy in the form of glucose (a sugar). It occurs primarily in the chloroplasts, which contain chlorophyll (a green pigment)

Time taken: 5.2832 seconds

Q: What is quantum mechanics?

A: Quantum mechanics is a fundamental theory in physics that describes the behavior of matter and energy on microscopic scales, such as atoms and subatomic particles. It was developed in the early 20th century to overcome the limitations of classical physics in explaining phenomena.

Time taken: 5.2799 seconds

Aleph Alpha: Rome Wasn't Built in a Day—And neither was Pharia 1- LLM-7B model

Gary Hilgemann

REBOTNIX | INDUSTRIAL AI DRIVEN | A HUMAN

The Pharia-1-LLM-7B Model

The Setup

The Testing Process

Recommended by LinkedIn

Results: InferGPU memory utilization

Conclusion

More articles by Gary Hilgemann

Insights from the community

Others also viewed

The Future of AI: Insights from Eric Schmidt’s Stanford Engineering Lecture

How Does GPU Technology Help In Machine Learning?

Harnessing AI for Solving Partial Differential Equations: A Journey into Computational Engineering

Artificial Intelligence – coming of age?

How Does GPU Technology Help In Machine Learning?

AI: Gödel’s incompleteness theorems and Turing’s halting problem are deeply related

Redefining the Future: AI's Industrial Revolution and the Democratization of Innovation

AlphaGeometry: Pioneering the Frontier of Automated Theorem Proving in Olympiad Geometry

A First Demonstration of Thermodynamic Matrix Inversion

Groq: A Game-Changer in the AI Hardware Space

Explore topics

The Pharia-1-LLM-7B Model

The Setup

The Testing Process

Recommended by LinkedIn

Results: InferGPU memory utilization

Conclusion

More articles by Gary Hilgemann

Titans: A New Approach to Integrating Long-Term Memory in AI Models.

Silicon Highway and REBOTNIX Accelerate Industrial Edge AI Applications NVIDIA Metropolis Microservices for NVIDIA Jetson

iOS Ethernet Adapter from rebotnix for Apple devices

Raypack® - Deep Learning in your Pocket

Raypack® Deep Learning Color Filter

Raypack® Deep Learning Part -1 Black to Color

Raypack® meets Deep Learning!

RB-Okeanos - The Social Media Live Encoder!

NAB 2016 Pre-Special rebotnix releases Okeanos Tribble HEVC & H.264 Transcoder

Raypack Transatlantic 4k live streaming successfull

Insights from the community

Others also viewed

The Future of AI: Insights from Eric Schmidt’s Stanford Engineering Lecture

How Does GPU Technology Help In Machine Learning?

Harnessing AI for Solving Partial Differential Equations: A Journey into Computational Engineering

Artificial Intelligence – coming of age?

How Does GPU Technology Help In Machine Learning?

AI: Gödel’s incompleteness theorems and Turing’s halting problem are deeply related

Redefining the Future: AI's Industrial Revolution and the Democratization of Innovation

AlphaGeometry: Pioneering the Frontier of Automated Theorem Proving in Olympiad Geometry

A First Demonstration of Thermodynamic Matrix Inversion

Groq: A Game-Changer in the AI Hardware Space

Explore topics