What happens when the web’s biggest users aren’t people anymore? AI agents now scrape content in real time, delivering answers instantly while bypassing the websites, ads, and creators that built the internet as we know it. Traffic is disappearing. Ad revenue is drying up. The old web model is breaking. But this isn’t just a challenge—it’s a chance to rebuild. Our latest article explores how AI, ads, and content are colliding to shape the next era of the web. Read more here: https://lnkd.in/eV9zrBZ9 #TrainingData #ContentMonetisation #GenerativeAI #ChatGPT #DataLicensing
Valyu
Technology, Information and Internet
London, England 456 followers
High Quality Licensed Data AI Models and Apps (Training & Context Enrichment)
About us
Generative AI has increased the demand for high quality, diverse datasets for model training, performance and personalisation. This growing demand is raising challenges like copyright of training data, provenance, attribution and compensation for content owners or platforms leading to challenges in model scaling, LLM application development, legal use and revenue allocation. Data licensing is crucial to address these issues, ensuring that data usage complies with legal standards, respects rights, and provides appropriate credit and means of compensation to content platforms and creators. Valyu is a smart contract based platform that connects data providers with AI companies seeking diverse, high-quality training data. We bridge the gap between content platforms and AI companies, facilitating the licensing, discovery, packaging and distribution of high-quality datasets. Our platform also offers data valuation and tooling to simplify dataset licensing, provenance, and distribution process. Founded by leading academics and engineers from University College London (UCL), our team has extensive experience in enterprise data/ML companies and large scale data infrastructures for AI. We use advancements in ML and cryptography, and smart contracts to enable responsible data commercialisation for AI. Our mission is to accelerate AI with the responsible use and monetisation of data. We love building products that people enjoy and pushing the boundaries of engineering and research! :) #WeBuild 🛠️ Learn more at valyu.network
- Website
-
https://www.valyu.network/
External link for Valyu
- Industry
- Technology, Information and Internet
- Company size
- 2-10 employees
- Headquarters
- London, England
- Type
- Privately Held
- Founded
- 2022
- Specialties
- Data Governance, Data Monetisation, Machine Learning, Data Licensing, Data Valuation, Copyright , Data Valuation, LLMs, RAG, Data Provenance, and Attribution
Locations
-
Primary
18 Soho Square
London, England W1D 3QL, GB
Employees at Valyu
Updates
-
The Make It Fair campaign is a wake-up call. AI is changing how content is consumed, but one thing hasn’t changed: content and creative work is the lifeblood of the web. Books, journalism, music, film, research—this work shapes how we learn, stay informed, and tell stories. It fuels everything from entertainment to science. Now, AI models are consuming this content at scale. Not just reading it, but retrieving, remixing, and repurposing it into new outputs. Creators across publishing, news, music, and film are asking a simple question: If AI depends on our work, why are we being left out of the equation? This isn’t about stopping AI. It’s about ensuring recognition, control, and fair compensation for the work that powers it. AI can’t function without high-quality content, and yet too often, that content is used without attribution, transparency, or payment. At Valyu, we’ve spent years—starting from our research at UCL—solving these hard technical problems. Attribution at scale works. We’ve built it. AI and content creators don’t have to be at odds. The web has always evolved, but it has never worked through invisible extraction. The “Make It Fair” campaign is demanding that writers, artists, journalists, musicians, researchers, and filmmakers aren’t erased in the age of AI. That they have a seat at the table. That creative work doesn’t become a one-way pipeline into AI models with no return. Publishing, journalism, music, and the creative industries are not just business sectors—they are how we tell stories, record history, and share knowledge. As AI applications and autonomous agents become a larger part of how information is accessed and used online, the need for clear attribution and compensation only grows. AI should work with the creative industries, not around them. Attribution, transparency, and fair compensation aren’t obstacles—they’re the foundation of a sustainable AI-driven web. Learn more about the campaign 👉 https://lnkd.in/e5mDkJ8h Image source: News Media Association #MakeItFair #AI #Copyright #Publishing #Journalism #Music #Film #FairCompensation #UCL
-
-
Valyu reposted this
🚀 Valyu AI Playground is now live! Testing and using the Valyu Context API is now easier than ever—no code, no setup, just instant access to context. 🔍 What’s inside? ✅ Explore the Valyu Context API in a seamless, zero-code environment ✅ Test real-time retrieval from trusted sources instantly ✅ Fine-tune retrieval settings before integration ⚡ Why use the Context API? 🔹 Precision search that truly understands technical context 🔹 Instant access to deep academic knowledge beyond summaries and abstracts 🔹 Integrate knowledge into your applications and AI agents with 2 lines of code The future of deep search is here. Try the Valyu AI Playground today: https://lnkd.in/eC6NbsEz
-
🚨 Introducing ContextAPI 🚨 : Multimodal Retrieval for Trusted, Premium Content. We’ve launched ContextAPI—an API that gives your AI models access to content that actually matters: citations, figures, tables, equations, full-text—not just links and summaries. Because the real work—the decisions, the breakthroughs—sits in the details. The figure that shows the result. The table that holds the data. The equation that proves it. But AI tools and search engines skim past this, leaving the hard stuff buried. We’re starting with Arxiv and Wikipedia, with proprietary sources rolling out next through partnerships with publishers and content platforms. We’re working directly with partners to help unlock their data for AI use—with revenue sharing and attribution baked in. Because high-quality data should fuel the next generation of tools for researchers, analysts, lawyers, consultants—anyone doing serious knowledge work. We also wrote about why we built this and why search isn’t enough for real work: 👉 https://lnkd.in/epAne-6Z We’re also giving away free credits to get you started- check it out: 🛠️ https://lnkd.in/eXXgKcbd 🎤 If you’re building RAG, LLM apps, or AI agents, you’ll get it, join our Discord: https://lnkd.in/eJZQi4sZ
-
-
This Wednesday, join us in our upcoming event “AI Agents, Crawling and the Future of the Web” where we'll discuss: 📌 AI-driven retrieval & content access 📌 Robots.txt & AI exclusions 📌 The future of opt-in vs. opt-out models Plus, hear from Thom Vaughan (Common Crawl) on AIPREF—a new proposal to help publishers control AI access. 🗓️ Wed, 19 Feb 2025 | 15:00 - 17:00 🔗 Still some spots left! Register now: https://lu.ma/8cm08ber #ContextEnrichment #WebCrawling #GenAI #CommonCrawl #TrainingData
Valyu x Common Crawl x UCL: AI Agents, Crawling and the Future of the Web · Luma
lu.ma
-
Great discussions at AWS’s Well Architected Enterprise event last night!⚡️ It was a pleasure sharing how we at Valyu approach scaling AI infrastructure with large-scale video/image datasets and retrieval systems. Many thanks to AWS for hosting and to Simone Zucchet, Giuseppe Battista and to other fellow speakers for the insightful conversations. 🤝 Looking forward to continuing these discussions! 🚀
-
-
🚀 Happening today! Our Co-Founder Hirsh Pithadia is speaking at AWS’s Well Architected Enterprise event this evening, sharing how we scale fast—even with petabyte-scale video/image datasets and retrieval systems handling tens of millions of vectors. If you’re working on AI at scale, don’t miss this! Join us tonight from 17:30 to 21:30. 📍 60 Holborn Viaduct, London EC1A 2FD Final spots available—register here! 👉 https://lnkd.in/eiH-2ZBF
Well Architected Enterprise: Supercharging your PoCs - Getting from idea to value faster!
aws-experience.com
-
🚀 Scaling fast with petabyte-scale data? We make it happen. Our Co-Founder, Hirsh Pithadia, is speaking at AWS's Well Architected Enterprise event on Wednesday, Feb 12, discussing how we move fast—even when working with large scale video/image data and retrieval systems handling tens of millions of vectors. If you're building AI at scale, you won’t want to miss this. Join us! 📅 12 February, 2025 🕛 17:30 - 21:30 🏢 London, 60 Holborn Viaduct, London EC1A 2FD Register here 👉 https://lnkd.in/eiH-2ZBF
Well Architected Enterprise: Supercharging your PoCs - Getting from idea to value faster!
aws-experience.com
-
AI agents are now the web’s biggest users—crawling, retrieving, and learning from everything online. But where does that leave publishers, content platforms, and anyone creating content? We’re hosting an event with Common Crawl Foundation and UCL to unpack: ➡️ How AI-driven retrieval is changing content access on the web ➡️ Robots Exclusions for AI (robots.txt) ➡️ Signalling AI preferences ➡️ The future of opt-in vs. opt-out models We’ll also hear from Thom Vaughan (Common Crawl) on AIPREF, a new proposal to help publishers control AI access to their content. If you’re curious about the future of AI, the web, and content rights, join us. 📅 When: Wednesday, 19 February 2025, 15:00 - 17:00 GMT 🔗 Register here: https://lu.ma/8cm08ber #AI #WebCrawling #CommonCrawl #TrainingData #ContextEnrichment #DataLicensing
Valyu x Common Crawl x UCL: AI Agents, Crawling and the Future of the Web · Luma
lu.ma
-
AI runs on data—but not all data is equal 🚫 The choice between public and proprietary datasets impacts everything from performance to commercial use. While licensed datasets offer quality, public datasets drive accessibility and research. Read more: https://lnkd.in/eQfuuX5x #TrainingData #ContextEnrichment #ScalingAI
Licensed Data for AI: Model Scaling and Performance with Public vs. Proprietary Data • Valyu Blog
valyu.network