GenAI Weekly — Edition 8

Shuveb Hussain

Co-founder at Unstract

Published Apr 15, 2024

Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs

Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.

➕ Click subscribe to be notified of future editions

Allen Institute for AI releases OLMo: A truly open LLM

From their blog:

Today, The Allen Institute for AI (AI2) has released OLMo 7B, a truly open, state-of-the-art large language model released alongside the pre-training data and training code. This empowers researchers and developers to use the best and open models to advance the science of language models collectively.

“Open foundation models have been critical in driving a burst of innovation and development around generative AI,” said Yann LeCun, Chief AI Scientist at Meta. “The vibrant community that comes from open source is the fastest and most effective way to build the future of AI.”

OLMo and the framework is designed to aid researchers in training and experimenting with large language models. They are available for direct download on Hugging Face and in GitHub.

As we’ve previously discussed here when organizations say their models are “open”, it can mean many things, but this is as open as open gets:

“With OLMo, open actually means ‘open’ and everyone in the AI research community will have access to all aspects of model creation, including training code, evaluation methods, data, and so on” said Noah Smith, OLMo project lead, a senior director of NLP Research at AI2, and a professor in the UW’s Allen School. “AI was once an open field centered on an active research community, but as models grew, became more expensive, and started turning into commercial products, AI work started to happen behind closed doors. With OLMo we hope to work against this trend and empower the research community to come together to better understand and engage with language models in a scientific way, leading to more responsible AI technology that benefits everyone.”

Intel launches Gaudi 3 AI accelerator chip

From the Intel newsroom:

Intel Corporation

Intel Gaudi 3 accelerator will deliver significant performance improvements for training and inference tasks on leading GenAI models.

Specifically, the Intel Gaudi 3 accelerator is projected to deliver on average versus Nvidia H100:

50% faster time-to-train1 across Llama2 7B and 13B parameters, and GPT-3 175B parameter models.
50% faster inference throughput2 and 40% greater inference power-efficiency3 across Llama 7B and 70B parameters, and Falcon 180B parameter models. An even greater inference performance advantage on longer input and output sequences.
30% faster inferencing4 on Llama 7B and 70B parameters, and Falcon 180B parameter models against Nvidia H200.

We’ve discussed Nvidia’s moat before in this newsletter. I guess that we’ll discuss it more and more—especially if it goes away—albeit very slowly.

The lifecycle of a code AI completion

Philipp Spiess on the Sourcegraph blog:

Groq CEO: ‘We No Longer Sell Hardware’

Sally Ward-Foxton writing for EETimes:

Sally Ward-Foxton

Groq CEO Jonathan Ross is adamant his company no longer sells hardware—the data center AI chip startup is now an AI cloud services provider.

“Long term, we always wanted to go there, but the realization was, you cannot sell chips as a startup, it’s just too hard,” Ross told EE Times in a recent in-person interview. “The reason is the minimum quantity of purchase for it to make sense is high, the expense is high, and no-one wants to take the risk of buying a whole bunch of hardware—it doesn’t matter how amazing it is.”

Groq’s customer is now the AI developer. Following a number of viral social media posts showcasing the latency of its rack-scale AI inference systems, the company currently has 70,000 developers registered for its real-time large language model (LLM) inference cloud service, GroqCloud, with 19,000 new applications running.

“You get the sort of developer traction we’ve gotten, and people want to buy hardware, but we are no longer selling hardware, because why would we at this point?” Ross said. “It’s not a pivot—we always intended to have a cloud service, we just expected we would do both.”

Hardware is hard.

How faithful is the output across various LLMs in book-length summarization

Yekyung Kim et al:

While long-context large language models (LLMs) can technically summarize book-length documents (>100K tokens), the length and complexity of the documents have so far prohibited evaluations of input-dependent aspects like faithfulness. In this paper, we conduct the first large-scale human evaluation of faithfulness and content selection on LLM-generated summaries of fictional books. Our study mitigates the issue of data contamination by focusing on summaries of books published in 2023 or 2024, and we hire annotators who have fully read each book prior to the annotation task to minimize cost and cognitive burden. We collect FABLES, a dataset of annotations on 3,158 claims made in LLM-generated summaries of 26 books, at a cost of $5.2K USD, which allows us to rank LLM summarizers based on faithfulness: Claude-3-Opus significantly outperforms all closed-source LLMs, while the open-source Mixtral is on par with GPT-3.5-Turbo. An analysis of the annotations reveals that most unfaithful claims relate to events and character states, and they generally require indirect reasoning over the narrative to invalidate. While LLM-based auto-raters have proven reliable for factuality and coherence in other settings, we implement several LLM raters of faithfulness and find that none correlates strongly with human annotations, especially with regard to detecting unfaithful claims. Our experiments suggest that detecting unfaithful claims is an important future direction not only for summarization evaluation but also as a testbed for long-context understanding. Finally, we move beyond faithfulness by exploring content selection errors in book-length summarization: we develop a typology of omission errors related to crucial narrative elements and also identify a systematic over-emphasis on events occurring towards the end of the book.

LLMs are just like people—they differ in their ability to both “understand” and “speak”.

For the extra curious

GenAI Weekly

2,297 followers

+ Subscribe

Allen Swain

Unemployed (currently)

8mo

Insightful!

See more comments

To view or add a comment, sign in

See all

GenAI Weekly — Edition 8

Shuveb Hussain

Co-founder at Unstract

Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs

Allen Institute for AI releases OLMo: A truly open LLM

Intel launches Gaudi 3 AI accelerator chip

The lifecycle of a code AI completion

Recommended by LinkedIn

Groq CEO: ‘We No Longer Sell Hardware’

How faithful is the output across various LLMs in book-length summarization

For the extra curious

GenAI Weekly

2,297 followers

More articles by this author

Insights from the community

Others also viewed

From Concept to Reality: A Timeline of Generative AI's Evolution

The Rise of Real-Time Deepfakes, NLP Techniques for Improving LLMs, Getting GenAI Into Production, and Can’t-Miss ODSC Europe Virtual Sessions

The Limits of Retrieval Augmentation, 8 AI Research Labs Worth Exploring, and Supercharging LLMs with LangChain

🌐 The Best AI Frameworks in 2024: Powering Innovation and Efficiency🚀

Exploring the Limits of GPT-4 Turbo: A Deep Dive into Greg Kamradt's Experiment

Generative AI and Large Language Models: Transforming Industries and Redefining Possibilities

What is GPT-4 and Why Does it Matter?

SykoActive Studios Custom GPT Models-Revolutionizing Personal AI Assistants

Generative AI: Enterprise-Grade LLMs

Navigating the AI Frontier: An In-Depth Glossary of Cutting-Edge Concepts in Large Language Models

Explore topics

Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs

Allen Institute for AI releases OLMo: A truly open LLM

Intel launches Gaudi 3 AI accelerator chip

The lifecycle of a code AI completion

Recommended by LinkedIn

Groq CEO: ‘We No Longer Sell Hardware’

How faithful is the output across various LLMs in book-length summarization

For the extra curious

GenAI Weekly

2,297 followers

GenAI Weekly — Edition 37

Nov 25, 2024

GenAI Weekly — Edition 36

Nov 18, 2024

GenAI Weekly — Edition 35

Nov 11, 2024

GenAI Weekly — Edition 34

Nov 4, 2024

GenAI Weekly — Edition 33

Oct 28, 2024

GenAI Weekly — Edition 32

Sep 30, 2024

GenAI Weekly — Edition 31

Sep 23, 2024

GenAI Weekly — Edition 30

Sep 16, 2024

GenAI Weekly — Edition 29

Sep 9, 2024

GenAI Weekly — Edition 28

Sep 2, 2024

Insights from the community

Others also viewed

From Concept to Reality: A Timeline of Generative AI's Evolution

The Rise of Real-Time Deepfakes, NLP Techniques for Improving LLMs, Getting GenAI Into Production, and Can’t-Miss ODSC Europe Virtual Sessions

The Limits of Retrieval Augmentation, 8 AI Research Labs Worth Exploring, and Supercharging LLMs with LangChain

🌐 The Best AI Frameworks in 2024: Powering Innovation and Efficiency🚀

Exploring the Limits of GPT-4 Turbo: A Deep Dive into Greg Kamradt's Experiment

Generative AI and Large Language Models: Transforming Industries and Redefining Possibilities

What is GPT-4 and Why Does it Matter?

SykoActive Studios Custom GPT Models-Revolutionizing Personal AI Assistants

Generative AI: Enterprise-Grade LLMs

Navigating the AI Frontier: An In-Depth Glossary of Cutting-Edge Concepts in Large Language Models

Explore topics