🚀 What is Trending in AI Research?: PromptTTS 2 + CoALA + BigVSAN + Verba + Persimmon-8B + Falcon 180B + AskIt...
Hey Folks!
This newsletter will discuss some cool AI research papers and AI tools. But before we start, we have included a small message from our sponsor.
Free Webinar- OpenAI for FinTech: Building a Stock Market Advisor Chatbot (Sponsored)
Date: Wednesday, September 13th, 10:00 am PDT
Dive into the thrilling intersection of FinTech and AI, where code meets commerce in unexpected ways! Join us for our upcoming webinar “OpenAI for FinTech: Building a Stock Market Advisor Chatbot” where you’ll learn how to build an application leveraging Langchain, OpenAI, and a vector contextual database. Perfect for software engineers, developers, and data analytics professionals, this is your chance to code, converse, and potentially predict the next big market move. Join us for a blend of finance, fun, and futuristic tech!
What You’ll Learn :
➡️ Microsoft Researchers Unveil PromptTTS 2: Revolutionizing Text-to-Speech with Enhanced Voice Variability and Cost-Effective Prompt Generation
This paper from Microsoft introduces PromptTTS 2, which aims to tackle two major issues: the inability to fully describe voice variability through text prompts (the one-to-many problem) and the limited availability of text prompt datasets. PromptTTS 2 employs a "variation network" that predicts voice attributes not captured by text prompts. It also features a prompt generation pipeline that uses a speech understanding model and a large language model to formulate high-quality text prompts for speech. Testing on a 44K-hour dataset shows that PromptTTS 2 outperforms previous methods in generating voices consistent with text prompts and allows for diverse voice sampling, thus providing users with more voice-generation options. Importantly, the prompt generation pipeline reduces the need for costly manual labeling.
➡️ Princeton Researchers Propose CoALA: A Conceptual AI Framework to Systematically Understand and Build Language Agents
What is the best way to systematically design language agents that can perform tasks requiring grounding or reasoning? This paper from Princeton University presents a conceptual framework called Cognitive Architectures for Language Agents (CoALA). Drawing from the rich history of agent design in symbolic AI, it aims to systematize the development of large language models (LLMs) that can interact with external resources or use internal control flows like prompt chaining. The authors argue that LLMs share many properties with production systems, a class of symbolic AI systems. CoALA serves as a blueprint to bring together diverse methods for reasoning, grounding, learning, and decision-making in LLMs. The framework also highlights existing gaps and proposes future research directions for creating more capable language agents.
➡️ Meet Open Interpreter: An Open-Source Locally Running Implementation of OpenAI’s Code Interpreter
Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing. Open Interpreter equips developers with a broad array of capabilities, including Content Creation; it enables effortless content creation and editing of various formats such as photos, videos, PDFs, and more. Developers can take control of a Chrome browser, facilitating efficient research and automation. Open Interpreter seamlessly handles data-related tasks, allowing users to plot, clean, and analyze large datasets for informed decision-making.
Recommended by LinkedIn
➡️ Meet TinyLlama: A Small AI Model that Aims to Pretrain a 1.1B Llama Model on 3 Trillion Tokens
In the ever-evolving landscape of Language Model research, the quest for efficiency and scalability has led to a groundbreaking project – TinyLlama. This audacious endeavor, spearheaded by a research assistant at Singapore University, aims to pre-train a 1.1 billion parameter model on a staggering 3 trillion tokens within a mere 90 days, utilizing a modest setup of 16 A100-40G GPUs. The potential implications of this venture are monumental, as it promises to redefine the boundaries of what was once thought possible in the realm of compact Language Models. While existing models like Meta’s LLaMA and Llama 2 have already demonstrated impressive capabilities at reduced sizes, TinyLlama takes the concept a step further. The 1.1 billion parameter model occupies a mere 550MB of RAM, making it a potential game-changer for applications with limited computational resources.
➡️ Adept AI Labs Open-Sources Persimmon-8B: A Powerful Fully Permissively-Licensed Language Model with <10 Billion Parameters
In recent times, the field of artificial intelligence has witnessed remarkable progress, particularly in the development of language models. At Marktechpost Media, we have covered many language models based on various parameters and SOTA performance. Following this trend, we have another release, and this time, it is from Adept AI Labs releasing Persimmon-8B. Persimmon-8B is an open-source, fully permissively licensed model in the 8B class. This model holds immense potential for a wide array of applications, aiming to assist users in various computer-related tasks. However, it is important to note that in its raw form, the model may produce outputs that are not curated for potential toxicity. This raises a critical concern about the need for more refined evaluation techniques.
➡️ Meet Falcon 180B: The Largest Openly Available Language Model With 180 Billion Parameters
Technology Innovation Institute (TII) researchers introduced a groundbreaking language model: Falcon 180B. Falcon 180B represents a leap forward in language models, boasting 180 billion parameters. But what sets it apart from its predecessors and competitors is its size and the promise of versatility and accessibility. While Falcon 180B is not the first large language model, it is distinctive in its open-access nature. Unlike many closed-source models that remain proprietary, Falcon 180B is designed to be available for research and commercial use. This shift towards open access aligns with a broader trend in the AI community, where transparency and collaboration are increasingly valued.
➡️ Bridging the Gap Between Clinicians and Language Models in Healthcare: Meet MedAlign, a Clinician-Generated Dataset for Instruction Following Electronic Medical Records
The paper introduces MedAlign, a benchmark dataset specifically designed to assess the ability of LLMs to follow complex, clinician-generated instructions in the realm of EHRs. Curated by 15 clinicians across 7 specialties, MedAlign features 983 natural language instructions, clinician-written reference responses, and 276 longitudinal EHRs. Using this dataset, the authors evaluated six general domain LLMs and found substantial error rates, ranging from 35% for GPT-4 to 68% for MPT-7B-Instruct. The study also explored the impact of context length on accuracy and proposed automated metrics that correlate with clinician rankings for LLM performance evaluation. MedAlign is made publicly available to further research and drive improvements in LLM applications in healthcare.
➡️ MIT Researchers Propose AskIt: A Domain-Specific Language for Streamlining Large Language Model Integration in Software Development
Researchers from MIT CSAIL have presented a new paper titled AskIt: Unified Programming Interface for Programming with Large Language Models. According to the researchers, this approach significantly lowers the overhead and work needed by software development professionals in terms of development. AskIt can do a wide array of tasks and is a domain-specific language designed for LLMs. AskIt is used to simplify the integration process and uses a specified approach, reducing the distinction between LLM-based code production and application integration by providing type-guided output control, template-based function declarations, and a uniform interface.
➡️ Researchers from Sony Propose BigVSAN: Revolutionizing Audio Quality with Slicing Adversarial Networks in GAN-Based Vocoders
How can the performance of Generative Adversarial Network (GAN)-based vocoders be improved for synthesizing high-fidelity audio waveforms? This paper from Sony investigates the effectiveness of Slicing Adversarial Network (SAN), a modified GAN framework, in enhancing vocoding tasks. The authors propose a modification to the Least-Squares GAN, which is commonly used in vocoders, so as to make it compatible with the SAN framework. This involves altering the loss functions to meet SAN's requirements. Experimental results indicate that this SAN-adapted approach can enhance the performance of existing GAN-based vocoders, including BigVGAN, with only minor adjustments to the architecture.
What is Trending in AI Tools?
margaris ventures I #VentureCapitalist I #StrategicAdvisor I #BoardMember I Global No. 1 #Finance, #Fintech & top #AI Thought Leader
1yAmazing and congratulations!