See DAIR.AI’s activity on LinkedIn

DAIR.AI reposted this

Cofounder & CEO at DAIR.AI | Ph.D. | Prev: Meta AI, Galactica LLM, Elastic | Prompting Guide (6M+ learners) | I teach how to build with AI ⬇️

VideoRAG A framework that enhances RAG by leveraging video content as an external knowledge source. Unlike existing RAG approaches that primarily focus on text or images, VideoRAG dynamically retrieves relevant videos based on queries and incorporates both their visual and textual elements into the generation process. The framework utilizes Large Video Language Models (LVLMs) to process video content directly, enabling more effective capture of temporal dynamics, spatial details, and multimodal cues that static modalities often fail to convey. For videos lacking textual descriptions, they propose using automatic speech recognition to generate transcripts, ensuring both visual and textual modalities can be leveraged. The system achieves particularly strong results in domains requiring procedural knowledge or visual demonstrations, such as "Food & Entertaining" tasks. paper: https://lnkd.in/dBXPPh-T -- Learn how to build RAG systems here: https://lnkd.in/eEiYwhVx

1 Comment

Daniel Svonava

Vector Compute @ Superlinked | xYouTube

When VideoRAG retrieves relevant videos, how does it determine the specific segments within those videos that are most relevant to the query? Is it relying on timestamps or some form of visual content analysis within the video?

To view or add a comment, sign in

DAIR.AI’s Post

More from this author

🥇Top AI Papers of the Week: DeepSeek-R1, Humanity's Last Exam, Scaling RL with LLMs, Chain-of-Agents

🥇Top AI Papers of the Week

🥇Top ML Papers of the Week

Explore topics