GenAI: Struggling to choose the right foundation model?

Shailesh Mishra

Linkedin Top Voice (awarded by Linkedin) | AWS, Ex (Google, Oracle, IBM, TCS) | Public Speaking | Writer, Author | Tech Leader | Mentor (Empowering the Next Generation) | Trusted Advisor to Fortune 500 Companies

Published Oct 22, 2024

Are you struggling to choose the right foundation model and infrastructure setup for your generative AI workload? AWS’s open-source FM Bench might be exactly what you need.

The Challenge of Model Selection

In today’s rapidly evolving generative AI landscape, organisations face a critical challenge: how do you select the optimal foundation model while balancing performance, cost, and accuracy? With numerous models available — from open-source options like Llama to proprietary solutions like Anthropic’s Claude — and various deployment options on AWS, making the right choice can feel overwhelming.

How can it help businesses?

FM Bench is an open-source tool from AWS that simplifies the selection and optimization of foundation models for generative AI. It benchmarks models across cost, performance, and accuracy, supporting various AWS services, instance types, and inference containers.

With FM Bench, businesses can:

Identify cost-effective instance types
Make data-driven decisions about model selection and infrastructure
Test custom datasets and fine-tuned models
Compare different serving strategies
Validate performance across workload sizes

FM Bench generates comprehensive reports with visualisations, recommendations, and insights. It supports a wide range of models and is continuously updated with new features.

What is FM Bench?

FM Bench is AWS’s answer to this challenge — an open-source Python package that provides comprehensive benchmarking capabilities for any foundation model deployed on AWS’s generative AI services. What makes FM Bench particularly powerful is its ability to evaluate models across three critical dimensions:

Performance: Measures inference latency and transaction throughput

Cost: Calculates dollar cost per transaction

Accuracy: Evaluates model responses using a panel of LLM judges

FM Bench is a powerful, flexible tool designed to run performance benchmarks and accuracy tests for any foundation model deployed on AWS generative AI services. Whether you’re using Amazon SageMaker, AWS Bedrock, Amazon EKS, or Amazon EC2, FM Bench provides a standardised way to evaluate and compare models.

Key Features

Universal Compatibility

Works with any AWS service (SageMaker, Bedrock, EKS, EC2)
Supports various instance types (g5, p4d, p5, Inf2)
Compatible with multiple inference containers (DeepSpeed, TensorRT, HuggingFace TGI)

2. Flexible Model Support

Open-source models (Llama, Mistral)
Third-party models through Bedrock (Claude, Cohere)
Custom fine-tuned models
First-party AWS models (Titan)

3. Sophisticated Evaluation System

Uses a panel of three LLM judges (Claude 3 Sonnet, Cohere Command-R+, Llama 2 70B)
Implements majority voting for accuracy assessment
Supports custom evaluation datasets

4. Automated Analysis

Generates comprehensive HTML reports
Provides interactive visualisations
Creates heat maps for cost-performance analysis
Tracks accuracy trajectories across different prompt sizes

How It Works

FM Bench simplifies the benchmarking process into three main steps:

Configuration: Create a YAML file specifying your benchmarking parameters (or use pre-built configurations)
Execution: Run a single command to execute the benchmarking suite
Analysis: Review the auto-generated report with detailed insights and recommendations

The tool handles everything from model deployment to data collection and analysis, providing you with actionable insights about which model and infrastructure combination best meets your requirements.

Getting Started with FM Bench

Ready to optimise your foundation model deployment? Here’s how to get started:

Visit the FM Bench GitHub repository and start the project.
Join the FM Bench interest channel to engage with the community and developers.
Try FM Bench with your own models and datasets for valuable insights.

Follow these steps:

Installation: Install FMBench using pip:

pip install fmbench

2. Create a configuration file: Create a YAML configuration file specifying the models, instance types, and evaluation parameters you want to test. You can find example configuration files in the FMBench GitHub repository. Run FMBench: Execute FMBench:

fmbench - config-file config-file-name.yml > fmbench.log 2>&1

3. Analyse the results: After the benchmarking is complete, FMBench generates an auto-generated report in Markdown format called report.md in the results directory. This report contains:

Price-performance comparisons across different models and instance types
Accuracy evaluations using a panel of LLM judges
Visualisations like heat maps and charts

4. Interpret the results: The report provides insights such as[3]:

Optimal model and serving stack based on price-performance requirements
Model accuracy across different prompt sizes
Latency and throughput metrics
Cost estimates for running the benchmarks

Following figure shows the heat map chart showcases the price performance metrics for running the Llama2–13B model on various Amazon SageMaker instance types. The data is based on benchmarking the model using prompts from the LongBench Q&A dataset, where the prompt lengths ranged from 3,000 to 3,840 tokens.

The key metrics displayed include:

- Inference latency (P95 latency threshold set at 3 seconds)

- Transactions per minute that can be supported

- Concurrency level (number of parallel requests)

The chart allows you to quickly identify the most cost-effective and performant instance type options for your specific workload requirements. For example, at 100 transactions per minute, a single P4d instance would be the optimal choice, providing the lowest cost per hour. However, as the throughput needs to scale to 1,000 transactions per minute, utilising multiple G5.2XL instances becomes the recommended configuration, balancing cost and instance count.

This granular price-performance data empowers you to make informed decisions on the right serving infrastructure for deploying your Llama2–13B model in production, ensuring it meets your latency, throughput and cost targets.

For example in the following figure, the benchmarking report also includes charts that illustrate the relationship between inference latency and prompt size, across different concurrency levels. As expected, the inference latency tends to increase as the prompt size grows larger.

However, what’s particularly interesting to observe is that the rate of latency increase is much more pronounced at higher concurrency levels. In other words, as you scale up the number of parallel requests being processed, the latency starts to rise more steeply as the prompt size increases.

These detailed latency vs. prompt size charts provide valuable insights into how the model performance scales under different workload conditions. This information can help you make more informed decisions about provisioning the right infrastructure to meet your latency requirements, especially as the complexity of the input prompts changes. More you can see here.

Customise for specific needs

You can modify the configuration file to benchmark specific models, use custom datasets, or evaluate fine-tuned models for your particular use case. To get started with FM Bench for benchmarking your own models, you can follow these steps:

Install FM Bench as discussed before.
Create a configuration file:

Use one of the provided config files in the FM Bench GitHub repo as a template
Edit the config file to specify your model, deployment settings, and test parameters
A simple annotated config file example is provided in the repo (config-llama2–7b-g5-quick.yml)

Key points for benchmarking your own models:

FM Bench is flexible and can benchmark models deployed on SageMaker, Bedrock, EKS, or EC2
You can use the “Bring your own endpoint” mode to benchmark already deployed custom models
Customise the config file to specify your model, instance types, inference containers, and other parameters
Use your own dataset or fine-tuned model by specifying it in the config file

Following are are the key steps to create a configuration file for FM Bench:

Choose a base configuration file:

Use an existing config file from the configs folder in the FM Bench GitHub repository as a starting point
Or edit an existing config file to customise it for your specific requirements

2. Specify the model details:

Model name/type (e.g. Llama2–7b)
Model source (e.g. Hugging Face model ID)

3. Define the deployment settings:

AWS service to use (SageMaker, Bedrock, EKS, EC2)
Instance types to benchmark (e.g. ml.g5.xlarge, ml.g5.2xlarge)
Inference container (e.g. huggingface-pytorch-tgi-inference)

5. Configure benchmarking parameters:

Dataset to use (e.g. LongBench or custom dataset)
Prompt sizes/token ranges to test
Number of concurrent requests
Latency thresholds
Accuracy requirements

6. Set constraints and metrics:

Price/cost limits
Performance targets (latency, throughput)
Accuracy thresholds

7. Specify output settings:

S3 bucket to store results
Report format preferences

8. Add any custom parameters:

Model-specific settings
Advanced inference options (e.g. tensor parallelism)

9. Save the configuration as a YAML file

The config-llama2–7b-g5-quick.yml file provided in the FM Bench repository serves as a good annotated example to reference when creating your own configuration.

Essential parameters to include in an FM Bench configuration file:

Model details:

Model name/type (e.g. Llama2–7b)
Model source (e.g. Hugging Face model ID)

2. Deployment settings:

AWS service to use (SageMaker, Bedrock, EKS, EC2)
Instance types to benchmark (e.g. ml.g5.xlarge, ml.g5.2xlarge)
Inference container (e.g. huggingface-pytorch-tgi-inference)

3. Benchmarking parameters:

Dataset to use (e.g. LongBench or custom dataset)
Prompt sizes/token ranges to test
Number of concurrent requests
Latency thresholds
Accuracy requirements

4. Constraints and metrics:

Price/cost limits
Performance targets (latency, throughput)
Accuracy thresholds

5. Output settings:

S3 bucket to store results
Report format preferences

6. Custom parameters:

Model-specific settings
Advanced inference options (e.g. tensor parallelism)

The configuration file is usually YAML format. FM Bench provides example configuration files in its GitHub repository that can be used as templates and customised for specific benchmarking needs.

Key points:

The config file specifies all the details needed to run the benchmark
It allows customising the models, deployment, testing parameters, and constraints
Example config files are provided that can be modified as needed
The file format is YAML

This sample configuration file should includes following info

Model details: Specifies the model name and source.
Deployment settings: Defines the AWS service, instance types, and inference container.
Benchmarking parameters: Sets the dataset, prompt sizes, concurrent requests, and latency threshold.
Constraints and metrics: Specifies price limits and accuracy thresholds.
Output settings: Defines where to store results and report format.
Custom parameters: Includes optional model-specific settings.
LLM Judges: Lists the models to be used for evaluation.
Evaluation settings: Specifies parameters for model evaluation.

Users can modify this template based on their specific benchmarking needs, adjusting parameters such as model names, instance types, constraints, and evaluation settings as required for their use case.

Real-World Benefits

Organisations using FM Bench can:

Make data-driven decisions about model selection
Optimise infrastructure costs
Ensure accuracy requirements are met
Compare different serving strategies
Validate performance across varying workload sizes

Latest Enhancements

Recent updates to FM Bench include:

Support for NVIDIA Triton Inference Server
Model evaluation using a panel of LLM judges (LLAMA 70B, Claude, and Cohere)
Compilation support for AWS Inferentia and Trainium chips
A dedicated website with comprehensive documentation

Call to action

FM Bench represents a significant step forward in making foundation model selection and optimization a more systematic and data-driven process. Whether you’re a platform team managing deployments at scale or an application team looking to optimise your specific workload, FM Bench provides the insights you need to make informed decisions about your generative AI infrastructure.

As the generative AI landscape continues to evolve, tools like FM Bench will play an increasingly important role in helping organisations navigate their AI infrastructure choices. The open-source nature and active development of FM Bench make it an invaluable resource for anyone working with foundation models on AWS.

Take the next step and start leveraging the power of FM Bench for your foundation model benchmarking and optimization needs. Join the FM Bench interest channel to engage with the development team, share your feedback, and contribute to the growth of this essential open-source tool.

Don’t let your foundation model deployment decisions be driven by guesswork — empower your team with the data-driven insights provided by FM Bench. Start today and unlock the full potential of your generative AI workloads on AWS.

Rahul Shringarpure

4mo

This is extremely useful. Getting the choice right is key to ROI.

1 Reaction

To view or add a comment, sign in

GenAI: Struggling to choose the right foundation model?

Shailesh Mishra

Linkedin Top Voice (awarded by Linkedin) | AWS, Ex (Google, Oracle, IBM, TCS) | Public Speaking | Writer, Author | Tech Leader | Mentor (Empowering the Next Generation) | Trusted Advisor to Fortune 500 Companies

The Challenge of Model Selection

How can it help businesses?

What is FM Bench?

Key Features

How It Works

Getting Started with FM Bench

Customise for specific needs

Recommended by LinkedIn

Key points for benchmarking your own models:

Following are are the key steps to create a configuration file for FM Bench:

Essential parameters to include in an FM Bench configuration file:

Real-World Benefits

Latest Enhancements

Call to action

More articles by Shailesh Mishra

Insights from the community

Others also viewed

AI as a Service (AIaaS) in the era of “buy not build”

Revolutionizing Generative AI: Introducing Amazon Bedrock and Titan Models - Our Teams Review

Harness the Power of Generative AI with AWS Bedrock: Unlock Innovation with ExpertsCloud

Unlocking the Power of Generative AI with AWS Services

The Codex MLOps Accelerator - AWS Approved

Working with Generative AIs in AWS

The Future Of Cloud-Based Machine Learning: Highlights from AWS re:Invent 2021

Serverless Machine Learning: Redefining the Infrastructure of AI Development

Unlocking Insights with AWS Machine Learning and AI Services

H2O.ai is Building Smaller AI Models

Explore topics

The Challenge of Model Selection

How can it help businesses?

What is FM Bench?

Key Features

How It Works

Getting Started with FM Bench

Customise for specific needs

Recommended by LinkedIn

Key points for benchmarking your own models:

Following are are the key steps to create a configuration file for FM Bench:

Essential parameters to include in an FM Bench configuration file:

Real-World Benefits

Latest Enhancements

Call to action

More articles by Shailesh Mishra

The Untold Story of AWS Config: How a Small Team Revolutionized Compliance at Scale

AWS S3 Billing Shock: How Empty Buckets Can Explode Your Costs and What You Need to Know

Large scale database migration case study on AWS

Simplify Debugging in Python with Icecream 🍦

Master PostgreSQL Effortlessly with Your Oracle Expertise

How to Answer "Tell Me About Yourself" in an Interview: a simple guide

Transform Your Development Process: Key Open Source Projects from AWS

Zero to AWS SageMaker Hero: Master AWS AI/ML, GenAI with Mind Maps & Build Your Dream Team

24/7 AWS Gurus at Your Fingertips: Meet AWS re:Post GenAI Agent

How to pass the AWS Machine Learning specialty certification?

Insights from the community

Others also viewed

AI as a Service (AIaaS) in the era of “buy not build”

Revolutionizing Generative AI: Introducing Amazon Bedrock and Titan Models - Our Teams Review

Harness the Power of Generative AI with AWS Bedrock: Unlock Innovation with ExpertsCloud

Unlocking the Power of Generative AI with AWS Services

The Codex MLOps Accelerator - AWS Approved

Working with Generative AIs in AWS

The Future Of Cloud-Based Machine Learning: Highlights from AWS re:Invent 2021

Serverless Machine Learning: Redefining the Infrastructure of AI Development

Unlocking Insights with AWS Machine Learning and AI Services

H2O.ai is Building Smaller AI Models

Explore topics