OSAI more… 10th Edition — License Shenanigans Abound
Byline: The Editor-in-Chief (The EiC)
Week of June 23 - July 7, 2024
Welcome back, everyone, for this double-wide issue of OSAI tl;dr! The editorial team is not any more rested than before, but at least we got caught up a bit. We’re also hacking up our typewriters and printing presses, tinkering with some new ideas to improve things around here.
The two weeks covered here are just as much of a firehose as ever. One trend we find interesting is which stories gain more attention with multiple articles about them. This issue, for example, the coverage on Moshi and Sentient was fairly extensive, and both of those are examples where there are currently only promises of Open Source-to-come.
In the case of Sentient, they make forward-looking claims that seem out-of-alignment with the Open Source Definition (OSD) about their up-coming Open Source plans. We hold steady at skeptical there, just as we do with Apple openwashing, and all the other licensing shenanigans we come across week after week at the newsdesk.
And don’t get us started on how common the misconception is about LLaMa models being Open Source, it’s always the biggest part of our final section in the newsletter, Helping Our Fellow Journalists.
Speaking of help, let’s get to that double-helping of OSAI news and blues!
OSAI News
Quantum Leaps and Bounds as Haiqu AI Transpiles New Ground: For all the Quantum Computing fans out there, a newly available Open Source transpilation tool, Rivet, addresses the bottleneck in prepping for complex quantum workflows, including quantum machine learning. Haiqu AI decided to release their internally built tool to assist with broader research efforts. A round of applause for teamwork. Now back to work!
Probllama Drama: Ollama Tool Fixes Remote Code Execution Vulnerability: If you're using the Open Source Ollama tool to run various LLMs, make sure you have the latest version. The "Probllama" vulnerability, a remote code execution issue, has been patched in version 0.1.34. No one wants their llama to go rogue!
Necklace Knows Best — Friend Wearable Captures Conversations for $69: Open Source wearable necklace Friend is hitting the market for $69. This nifty device automatically captures and transcribes your conversations to your phone using the AI model of your choice. The team behind this Kickstarter project also created buzz around their Open Glass concept, which won a recent hackathon in San Francisco. Quite the conversationalist!
Cephalo the Great Brings Multimodal Models for Materials Science: Researchers from MIT have introduced Cephalo, a series of multimodal vision-language models (V-LLMs) specifically designed for materials science applications. Cephalo bridges the gap between visual perception and language comprehension in analyzing and designing bio-inspired materials. Talk about brainy-mode!
Look Out! ClearML Brings AI Tools to the Public Sector: ClearML is partnering to bring their suite of tools to the public sector, streamlining AI workflows, experiment management, MLOps/LLMOps, and data management. It's time to get clear on AI management!
Childhood "Fsck Cancer" Model Atlas Announced: The Childhood Cancer Model Atlas (CCMA) has been announced, offering an advanced pediatric cancer hub with an Open Source bank of childhood cancer tumor tissue samples and AI data-mining tools. As we shoulder the battle to beat childhood cancer, remember to give at least one fsck at a time!
Picollo and Chips: SenseML Releases AutoML Solution for Edge Devices: Following previous announcements, SenseML has released Picollo AI, an AutoML solution for edge devices, which automates the tasks of applying machine learning to real-world problems. The code is under an AGPL 3.0 license, with a for-pay analytics studio SaaS offering. Big solutions for small devices!
Lilypad on the Fly, Announces IncentiveNet Initiative: LilyPad is fostering an Open Source-based decentralized AI data economy by leveraging idle GPU processing power. Their "IncentiveNet" initiative will reward participants who provide services, utility, and value to the network with Lilybit credits. Hop on the LilyPad for some cool incentives!
Mars5 Prosody Pros: Camb AI Releases TTS System: MARS5 TTS is an Open Source text-to-speech system from Camb AI, offering exceptional prosodic control and voice cloning capabilities with less than 5 seconds of audio input. What's prosody? It's the rhythm and pattern of sounds in speech. Get ready to clone that tone!
LLM Lightbulb Moment for UCSC Students: A 13-Watt LLM: Graduate students at UCSC have run an LLM effectively on just 13 watts of power, akin to a modern LED lightbulb. They achieved this by removing matrix multiplication (MatMul) from the LLM training and inference process, using a ternary number system. Bright idea!
EkStep Steps Up: Hosting DPGs for Free on AWS and Google Cloud: EkStep, a non-profit, is now hosting their digital public goods (DPGs) for free on AWS and Google Cloud. These providers are also working with EkStep to create an AI bot as a DPG. We’ll see if their not-actual-Open Source examples from this article meet the standards of a DPG. Way to step up, EkStep!
Swiss Study Blows Horn About Open Source Ubiquity and AI Adoption: According to the Open Source Study Switzerland 2024 by CH Open and swissICT, conducted by Bern University of Applied Sciences, Open Source software stands for "innovation, improves interoperability, strengthens data protection and increases digital sovereignty." This year's study also looks at the use of Open Source AI tools and models. That's the echoing call of Swiss precision in AI.
A Slow Burn: PyTorch and Openwashing at AI_dev Europe: The New Stack highlights the recent AI_dev Europe conference, including the slow, steady rise of PyTorch and the ongoing discussions about openwashing. It's a slow burn!
Side-by-Side With LLM Comparator for Qualitative Analysis: LLM Comparator, an interactive visualization tool with a Python library, helps analyze side-by-side LLM evaluation results. at the example- and slice-levels. Discover insights like "Model A's responses are better than B's on email rewriting tasks because Model A generates bulleted lists more often." It's a tale of two models!
IDE-al Development: Theia IDE for AI Components: We're reading between the lines here a bit, but we see Integrated development environments (IDEs) as crucial for manipulating AI components. The Theia IDE, an established Open Source project from the Eclipse ecosystem, integrates with over 3,000 VS Code plugins. Is it the ideal IDE for your AI needs?
Open Model Initiative: Competing with Stable Diffusion: With an earnest open letter offering a way around-and-forward of Stable Diffusion, three companies launched a new alliance called the Open Model Initiative (OMI). They make strong claims about being adherent entirely to Open Source for business and Community reasons, and it certainly seems like they've got the organizational diversity ecosystem idea at play.
AI-Biome-Tree-Huggers July 18 Webinar: On July 18 is a webinar for the release of an Open Source AI foundation model for world forests. It’s hosted by the World Resources Institute (WRI), the Global Restoration Initiative, and the Land & Carbon Lab. This model was a vital nutrient in helping produce the world’s first global map of tree canopy height at a one-meter resolution. This resolution allows the detection of single trees at a global scale. Now AI will be able to see both the forest and the trees.
Conversing With Moshi is a Mouthful: With more than a little bit of hyperbole, Kyutai announced Moshi, an experimental conversational AI. Conversational AI means it simulates speech and responds to what it hears. They've released a demo restricted to 5 minutes to showcase the low latency and responsiveness. So, not finding anything in our research about the Open Source claims, we of course decided to ask Moshi about it.
We asked Moshi about where to find its source code, and Moshi said it’s in GitLab instead of GitHub. We asked for a link, and Moshi said it couldn't provide a link, rather it said to search for ‘’Moshi AI’’ on gitlab.com. (It also said it's used by "French military ground forces" after misidentifying the make of a helicopter picked up on mic flying over the newsroom.) Perplexed and concerned by Moshi's certainty in response, we asked, "Are you playing a character role?" Moshi replied it was just talking. So we don't know if it was hallucinating or not! Keep your eyes open on the git forge sites and let us know what you see or hear. A permissive Open Source release is promised soon, and they were so earnest about it, we remain ever hopeful.
RTX AI Toolkit: NVIDIA's Latest Open Source Release: NVIDIA has opened up their RTX AI Toolkit, enabling customization and deployment of models. Someone backed up the Snap-On tool van, yelling, "Come and get your AI toolkit!"
Helping Lawyers Work Smarter Not Harder — OpenContracts for Analyzing Contracts: OpenContracts is an Open Source AI tool for analyzing and annotating contracts. We're predicting many more of these type of niche tools enabled by Open Source AI. It's time to boost your lawyer with AI!
OSAI Opines
Is Open Source the Best Path Towards AI Democratization?: Well, this here is an opinion all right! We're not saying we agree with it, but we do think it is at least trying to be well-reasoned. However, it feels as if the author may not understand the true goal of Open Source methods — a frictionless R&D experience.
Open Source Conflicts with Crucial AI Software From a Grad Student's Perspective: This tells the backstory of a smart opinion piece about AlphaFold 3 published in Undark recently and covered by OSAI tl;dr in the 7th edition. The backstory covers how the author, a computer science PhD student named Bryce Johnson, turned what started as a demand-action article into a OpEd fully covering the topic while still holding a solid opinion.
Open Source AI's Vital Role for Decision-Making: This article makes a strong case for why Open Source AI is necessary where AI is helping people make decisions — it's not a black box. It focuses on the example of an AI tool that makes recommendations to the Court about criminal bail conditions. Bottom line from the researcher in the example? Researcher Kosuke Imai said, ''(P)eople are biased as well. “The advantage of AI or an algorithm is that it can be made transparent,” he said. The key is to have open-source AI that is readily available for empirical evaluation and analysis.''
Recommended by LinkedIn
OSAI FUD
Nothing new this week. Maybe we’re being too kind about the FUD we read? The fact is, much of what we see is scant and in-passing. When we find some big and juicy ones, though, you’ll be the first to read about ‘em!
Eagle Eye on: OSAI Legislation and Policy
SB 1047: Y Combinator's AI Startups Raise Concerns: If you've seen an increase lately in noise and posts about California's SB 1047, it may be due to Y Combinator organizing their host of AI startups. These 140 startups are participating in raising attention and concerns about the legislation by signing a letter to the chairs of the committees currently studying the legislation.
California's SB 1047 Seeks to Safeguard Against AI Risks: According to the Indian Expres, California’s controversial AI bill, SB 1047, aims to safeguard against AI's existential threats, including nuclear war and catastrophic harm. Critics argue it misses the forest for the trees.
Judge Dismisses Parts of AI Copyright Lawsuit: The lawsuit in the Northern District of California against OpenAI/GitHub/Microsoft for allegedly misappropriating and misusing copyright materials is fizzling out. The judge dismissed part of the case because the claimants ''failed to show their code was reproduced identically.'' From potentially-landmark-case to who-knows-where?
AI Running for Mayor: The Cheyenne Experiment: This story about an AI running for mayor in Cheyenne, Wyoming, is a little fringe but noteworthy because the human-part of this candidacy is pledging to moved to an Open Source LLM if elected. That human is being viewed as the actual candidate and is the one on the ballot, but he says he’ll be using an AI chat interface to make all his Mayoral decisions. After being burned by terms-of-service, he says he is pursuing an Open Source solution next. Is running AI on-premise coming to a Cheyenne basement soon?
SB 1047 Details for Further Research: Here are the California Senate press release, the full bill, and a link to compare versions of the bill as it was amended. The latter is useful for figuring out what or who may have influenced the bill’s author(s) to include certain sections.
Open Source AI Definition Update
Weekly Update from The OSI Team: Here's your weekly update from the OSI team. The debates continue, and everything worth saying is probably being said here:
Can Open Source AI End OpenAI’s Dominance?: This article explores the dynamics around Open Source AI and questions if it can end OpenAI’s dominance. It's a kind of weather prediction — it may be stormy now, but since good weather will eventually arrive, you're never wrong saying, "The Sun will be out soon enough!"
Evaluating Open Source AI Models: Radboud University Study: Marc Dingemanse et al. from Radboud University conducted an evaluation of 40 large-scale LLMs claiming to be Open Source. The results are revealing, and not in a good-for-your-digestion way.
Mozilla’s Dataset Convening: Notes and Observations: Mozilla had the Dataset Convening recently, resulting in numerous observations and notes about open source datasets as part of Open Source AI. It's like there's a zeitgeist of concern about the Openness and availability of datasets.
OSAI WTFaux?
AppMap's Open Source Claims Under Scrutiny: AppMap seems like a potentially useful AI coding assistant, and they proudly claim to be Open Source. However, inconsistencies in their GitHub repository make us wonder if what is sloppiness, what is a mistake, what is intentional, and what extends beyond just the occasionally-missing `LICENSE` file?
Apple's 4M: Sample Code License Woes: Apple's ml-4m, a framework for training any-to-any multimodal foundation models, is Open Source on the software side, but the model weights are under a non-commercial-only license called the Sample Code License. We think it’s sample-y horrible.
Sentient: Open Source or Openwashing?: Sentient, an aspiring Open Source hey-let’s-all-build-models-together platform, raises eyebrows with their mixed messages about monetizable models and enforced alignment with the community via blockchain. Is it hyperbolic-overpromising, openwashing, or something worse?
Building a RAG App with a Not-Open Source Model: While we started to like the tutorial on building a RAG application, it disappoints by not actually using an Open Source model! It's a prime example of signal and noise traveling together.
InternLM2.5-7B-Chat: Open Source Framework, Proprietary Model: InternLM2.5-7B-Chat offers a nice Open Source framework but the model weights are under a proprietary license. But there's a nice Open Source framework around it! You know, in case you need something not-invented-here.
Fauxpen AI
VentureBeat's AI Innovation Awards: LLaMa in the Open Source Category: VentureBeat's 6th annual AI Innovation Awards includes a category for "Generating AI Open Source Contribution," but they mistakenly included Meta's LLaMa models, which are not Open Source. Let's help them get it right!
Helping Our Fellow Journalists
We're continuing to find lots of instances of confusion, and with our intention to be helpful, we think it would be good to make a categorized list and spend less time rambling at other journalists who probably aren't paying attention to us anyway.
LLaMa Not Open Source: Here is the list of articles that mistakenly refer to Meta’s LLaMa family of models as Open Source when they are not and don’t provide the benefits you get from actual open source.
Apple Model Not Open Source: Until Apple gets their license approved or switches to a different license, a bunch of their supposed open source AI releases simply aren't. Here’s some misreporting resulting from that:
Stability Diffusion Not Open Source: This is a reminder that among the many problems Stability AI has been dealing with, their models are not actually Open Source.
AliBaba Qwen Not Open Source: Whether you call it Qwen or Tongyi Qianwen, it’s not Open Source.
Google Gemma Not Open Source: Google’s Gemma and Gemma 2 models aren’t often written about as Open Source — which they are not — but here are some.
ClearML's Fractional GPU Not Open Source: A new one for the list, ClearML’s Fractional GPU software is not Open Source because it has restrictions on the field of use (no commercial use allowed).
DeepSeek Coder Not Open Source: Unfortunately, we now have to add DeepSeek Coder V2 as not having an OSAID compliant license for the model.
H2O.AI Not Open Source: Despite their claims and Apache 2.0 license on everything, it doesn't actually include the LLM — there is a disclaimer on HuggingFace that request and require you agree to terms that are not OSD-compliant. And also, it’s your responsibility to check back if the terms have changed underneath your usage. Yeah, the literal opposite of the value of Open Source.
CEO & Founder @Yarsed | $30M+ in clients revenue | Ecom - UI/UX - CRO - Branding
5moKarsten Wade