OSAI tl;dr 9th ed — Making and using clear definitions
Byline: Editors of OSAI tl;dr
Week of June 16 - 22, 2024
This issue is dedicated to our dear sweet Selene, 2016 - 2024. Selene was the editorial team’s favorite soft black lap cat and emotional co-regulator, who was tragically killed this week. We mourn her loss. She will be missed by her adopted sister cat, Persephone, and the humans who cared and loved Selene since she was a kitten from the shelter.
This week the editorial team begins a more diligent practice of story classification. Frankly, we’ve been too loose with where we put stories about not by-the-coming-definition Open Source AI efforts. Improving story classification is a natural outcome of the OSAI Definition getting closer, as well as the broader understanding of whether something is or is not Open Source AI.
Let’s define these sections once and for all:
OSAI News: For stories where the code is OSI-licensed, any non-code licenses are OSD-compliant, and the AI system components seem to align with the parts of the Open Source AI Definition.
OSAI WTFaux?: For stories about mixed-Open/closed projects claiming to be Open Source without qualifying where they are not. E.g., models derived from LLaMa and subject to the LLaMa Community License.
Fauxpen Source: For stories where the project is ignorantly or deliberately claiming an association to Open Source, including "open-source approach like model thinking" efforts that lack OSD-compliant licenses or practices.
Now, let’s set sail on the salty seas!
OSAI News
Fedora’s AI Future Fit: Red Hat's AI products are making waves in Fedora Linux. This includes updates to Fedora Workstation with InstructLab, Granite LLM, better PyTorch acceleration, and user experience improvements ranging from a better NVIDIA driver experience to enhancements for VR headsets. Fedora is getting smarter by the day, but does it dream of electric sheep?
Runway for OpenVLA: Generalist Robotics Model: OpenVLA is a 7B-parameter vision-language-action (VLA) generalist model for robotics, made to be adapted for specific needs. Researchers from Stanford, UC Berkeley, Google Deepmind, and the Toyota Research Institute collaborated to create this model, aiming to avoid the need for many specific robot VLA models by working together to create a powerful general model. The model is trained on an Open Source learning dataset, the Open X-Embodiment collaboration that was recently released. It’s like having a Swiss Army knife robot trained on the entire Victornix history of multitools.
Fetch.ai’s Big Grab: Fetch.ai merges ASI token with AGIX and Ocean Protocol, creating what they claim is the "largest decentralized open-source AI network." This network offers AI services, models, and data while preserving privacy and acting as a marketplace for developers. Talk about making waves in the data ocean!
Debating Open Source AI: The San Francisco Examiner explores the Open Source AI debate, explaining what qualifies as Open Source and what doesn’t. While the article accurately describes the basics, it misses pointing directly to the Open Source Definition (OSD) or the OSI list of licenses, leaving room for interpretation. Thank you for explaining the rules of Monopoly but you forgot to mention you need a board to play the game.
Free as in LibreChat: Privacy-First Chat: After a data leak scandal, Danny Avila created LibreChat, a fully-Open Source web service that lets you connect to various LLMs while keeping your data private. Users can connect to services like ChatGPT or local LLMs, ensuring privacy and control over their data. Now you can have your secret conversation in a crowded room.
Keeping Track of Numenta’s Thousand Brains Project: Funded by the Gates Foundation, Numenta promises to create an Open Source AI platform by "reverse engineering the neocortex." This ambitious project aims to unlock new AI capabilities and make significant strides in understanding human cognition. If this is trying to build a better brain from scratch—good luck, folks!
Direction of NVIDIA’s HelpSteer2 Dataset: NVIDIA released HelpSteer2, an OSD-compliant CC-BY 4.0 licensed dataset for autonomous driving systems. This move aligns with NVIDIA's likely strategy to support Open Source to drive hardware sales. Seems they’re steering the conversation in the right direction.
InstructLab Schools Models: InstructLab is an Open Source tool for creating and improving AI models. This deep dive explains its functionality, showcasing how Red Hat and IBM are leading in AI development. A personal trainer for your AI model? Now hit the gym!
Making a Splash With Podman AI: Another Red Hat/IBM initiative, Podman AI runs on various desktops, making it easier to manipulate and run LLMs locally. This tool aims to democratize AI by making powerful models accessible to more users. AI all wrapped up in a pretty box, anyone?
Virtual First for ZymCTRL: Designing Sustainable Enzymes: ZymCTRL is an Open Source AI model for designing enzymes, aiming to promote sustainable industrial processes. Released under Apache 2.0, it leverages publicly available datasets for training. It’s a more eco-friendly approach to letting AI do the work.
AWS Integrates MLFlow, World Holds Breath: Amazon’s SageMaker now supports MLFlow, an Open Source tool for managing the machine learning lifecycle. This move reflects the now well known AWS strategy to integrate Open Source solutions into their ecosystem. Let’s hope MLFlow’s Open Source ecosystem stays robust amid the corporate jungle (HELO DBRX, WE C U IN REPO).
Mixture-of-Agents Framework Enters the Fray: Together AI introduces MoA, an AI framework using multiple LLMs for improved performance. This layered architecture allows for building robust and versatile AI systems, including choosing entirely Open Source AI models. Like a well-coordinated team of superheroes, each with their own unique power but able to form like Voltron.
Uncorking the OpenVINO 2024.2 Release: Intel’s AI toolkit, OpenVINO, gets an upgrade. This toolkit optimizes and deploys deep learning models, making AI development more efficient. OpenVINO is running like a well-oiled winery.
Intel’s Open Source Strategy: Intel’s CMO discusses optimizing over 500 Open Source models. Intel continues to ensure their hardware is the best platform for Open Source AI, mirroring their 25+ year strategy for selling more hardware through the spread of Open Source. When it comes to Open Source, Intel is keeping its chips in the game.
IBM’s Watsonx Bet: IBM's Watsonx banks on Open Source to drive enterprise AI at scale. Building trust and transparency is the name of the game for Big Blue.
Multilingual AI Partnership: A new partnership aims to curate Open Source models for all EU languages. It's just a press release for now, but we’ll keep our eyes peeled for more developments.
OSAI Opines
Fixing AI’s Original Sin: Tim O’Reilly has some ideas on solving AI copyright violations, which crossover to the Open Source ecosystem through copyright practices and innovations. Is ol' Tim preaching to the choir?
a16z — Open Source AI Champion? Andreessen Horowitz (a16z) is making bold statements about Open Source AI. Despite occasional hyperbole like "datacenter bombings," a16z remains a staunch defender of true Open Source AI. Just watch out for those flying servers.
OSAI FUD
Misinformation About Licenses: This post is full of fear, uncertainty, and doubt, with misinformation and misunderstandings about Open Source licenses — for example, being confused if they can be used by businesses or only non-profits. Sounds like the openwashing has them fully confused. It’s like they’re trying to navigate a floor scattered with sharp plastic toys while blindfolded.
Eagle Eye on: OSAI Legislation and Policy
a16z vs. SB 1047: a16z General Partner Anjey Midha breaks down why SB 1047 is problematic for startups and academics. Californians, take note—calling your assembly rep might just make a difference.
Senator Scott Wiener’s SB 1047 Update: California State Senator Scott Wiener’s bill SB 1047 passes the Assembly Privacy & Consumer Protection Committee 8-0. The bill aims to establish safety standards for AI development, but we’re wondering if developers feel the love.
Open Source AI Definition Update
Weekly Update: We can’t say it better than the regular "Open Source AI Definition — Weekly update June 17." Stay updated, folks!
Ranking Open Source AI Models: Nature magazine delves into the mix of models out there, much like OSAI tl;dr does. An excellent read for anyone trying to keep track of the wild world of AI definitions.
What is Open Source AI? Emerging Tech Brew provides a solid from-the-ground-up explanation of Open Source AI. Seems like the OSI’s road trip is sparking some good conversations.
AI Analysis: This piece provides an in-depth analysis of Open Source AI vs. closed-source AI. It’s likely AI-assisted, but hey, it’s accurate. We're not ones to cast AI assistance aspersions.
The Struggle to Define Open Source AI: OSI’s work around the OSAID is gaining traction. Well done, folks!
OSAI WTFaux?
Allen Institute’s Tulu v2.5 Models Mess: Tulu v2.5 Open RLHF Models from the Allen Institute are a confusing mess of licenses. Derived from LLaMa models, they’re stuck in licensing limbo. Further digging reveals even the Tula 2.5 dataset is a mix of OSI-licensed and non-OSD-compliant licenses for the data. It’s like trying to untangle a ball of yarn with no end in sight.
NVIDIA’s VILA Project: NVIDIA’s VILA project is another example of Open Source confusion. The models are not truly Open Source, stuck under non-commercial restrictions and derived from LLaMa. Fully Open Source models seem like a distant dream for NVIDIA. But hey, at least the source code is fully Open, right? The lessons of the early 2000s seems to be sinking in at NVIDIA, slowly.
Game-Shaper-AI: This promising tool for evolving game stories lacks a license. We hope it’s just an oversight and not another case of fauxpen source.
Apple’s Licensing Confusion: Apple releases 20 new models and datasets under non-OSD-compliant licenses like cc-by-nc and apple-sample-code-license. Media outlets mistakenly call it Open Source. Apple’s licensing game is more confusing than their latest iOS update.
DeepSeek-Coder-V2: This model claims to be Open Source but is actually under a restrictive license. DeepSeek AI is another example of misleading Open Source claims. They are welcome to license as they wish, but it's disingenuous and deceitful to claim Open Source when you are not. Congrats on beating GPT-4 Turbo in coding, but you’re still not actually Open Source.
OpenSora 1.2: OpenSora is almost completely Open Source AI for video generation and compression, with a lot of emphasis on Open weights. However, the originating dataset’s mix of licensing availability makes it a gray area. It's also not clear if a non-Open LLM was involved in the original training runs. Let’s hope things don’t end up on the legal cutting room floor.
Llama3VerusGPT: This announcement is about a model under the non-Open Source llama 3 license. We’re keeping an eye out for true Open Source AI releases beyond just the enabling code.
Fauxpen AI
None detected this week, stay tuned.
Helping Our Fellow Journalists
Geeky Gadgets: Meta’s LLaMa 3 is not Open Source, it’s under a restrictive license that doesn’t fit in the frictionless marketplace of ideas that Open Source licenses foster.
Coingeek: Alibaba’s Qwen2 model is not Open Source because they're "leaning on an open-source strategy'' and ''adopting an open-source approach". More simply, they’re leaning on openwashing.
Yicai Global: Meta’s LLaMa family of models are not Open Source.
CIO Dive: Reporing from a DataBricks press release, CIO Dive confuses closed-LLMs like LLaMa, and Mistral’s licensing remains a mystery.
Network World: SUSE’s AI plans seem vague about Open Source AI, and Network World doesn’t clarify.
##30##
Computer Science Researcher & Educator
5moI am so sorry for your loss of Selene 🐾❤️
Very sorry to hear about Selene's loss. Cats have that way about capturing our heart and I know she will be missed.
Senior Marketing Leader, specializing in open source technologies and strategy // Making fantastic things happen for stakeholders // Community Builder // Storyteller Extraordinaire // Writer of all things
6moI’m so sorry for your loss. May sweet Selene rest in peace.
Chair, CEO @ Kwaai nonprofit AI Lab | RealNetworks Fellow
6moRIP Selene