Hello Humans!
I’m still in battery saving mode so updates may be a little less frequent than usual across my channels. As we kick off 2025, a writer I’ve admired last year agreed to write this guest post. None other than
of the Newsletter Weight Thoughts.James Wang Introduces his Newsletter and upcoming Book
(0:41 seconds)
“A lot of things have been changing in the world—technology, geopolitics, and the basis of our micro and macro economies. This publication is my attempt to make some sense of all of it.” - James Wang
James Wang is also writing a book, which really grabbed my attention because I like his style of writing. I asked him for his macro insights on the State of AI as we take on 2025. I find his work memorable from taking on Nvidia, to his discussion of the AI bubble, among others.
Some writers just strike a balanced tone and analysis that rings well to the inner ear and that’s how I experience the work of Weight Thoughts Newsletter. If you agree, I invite you to follow his work more closely. As you know one of the mandates of this publication is to encourage what I call “emerging writers” related to our AI and emerging tech topics. This is also to expose you to a greater variety of voices and perspectives.
James Wang is the writer of Weighty Thoughts and author of the upcoming August 2025 book, “What You Need to Know About AI.” He is a partner at Creative Ventures, where he has been investing in AI since 2016 and was previously at Google X and Bridgewater Associates.
I just want to quickly mention the educational qualifications of JW since they bring some weight for me as a reader, though I only realized it now. James holds an MBA from UC Berkeley where he was a Jack Larson Fellow in Entrepreneurship, a BA with Honors from Dartmouth, and an MS in Computer Science at Georgia Tech. He also holds a Data Science Specialization from the John Hopkins Bloomberg School of Public Health and completed the course track for a PhD Designated Emphasis in Computational Sciences and Engineering at UC Berkeley. As a reader I find James accessible, insightful and having a synthesis of how things connect together I find especially satisfying.
By
, January, 2024.In Summary (a tldr; before diving deeper)
2024 was not the year that generative AI slowed down.
In fact, we ended the year with a flood of exciting generative AI news and releases. We got o3’s unveiling, which received the best score on the AGI ARC benchmarks of any model—and shocked the world with an over $1,000 per query cost while doing it.
Just a week before that, Google put itself back into the generative AI game with an impressive showing with the Gemini 2.0 and the Veo 2 video generation model that blew OpenAI’s Sora out of the water. In particular, Veo’s impressive ability to have object permanence and accurately represent physics showed just how far generative AI had come in video.
However, it’s worth remembering that before the rush in December, much of the media was asking serious questions about the scalability of AI. Numerous articles were questioning whether it was the end of the AI boom, now that we were seeing diminishing returns from throwing more hardware and compute at problems.
This was even as our paradigms have fundamentally changed with o1’s release with chain-of-thought (CoT) reasoning, and creating an entirely new dimension of scaling with test-time compute—which was only built on by o3.
Agentic AI, or agents, was also a massive part of the conversation that I shouldn’t omit—but is a more incremental change than a revolutionary one, which I’ll explain.
Finally, in addition to Google putting itself back into the game, we saw major Chinese open-source models, led by DeepSeek, starting to make major strides on benchmarks and mindshare.
I’d classify the major events into these high-level themes, highlighting useful articles from the year:
Scaling: Have We Hit a Wall?
Reasoning: A New Dimension of AI Scaling—and Cost
Agentic AI: Expanding Uses of Generative AI
More Global Competition: The Rise of Chinese Models
Within these, we’ll also interweave some interesting highlights and news during the year that didn’t make it into the top-level headings.
Scaling: Have We Hit a Wall?
In the midst of everyone’s excitement about all the new model releases in December, it’s easy to forget that one of the major narratives in mid-2024 was whether our rapid progress in generative AI had come to an end. As per The Verge: “AI insiders worry that model progress is hitting a scaling wall.”
This isn’t an idle or moot question. It is true that performance on major benchmarks had been leveling off for some time. Additionally, many open-weight models were catching up with close-weight models due to the leveling off. This also would raise a question of whether these AI companies would be able to continue to sustain their breakneck pace of raising money to keep pushing forward—led, of course, by OpenAI’s $6.6B raise at a valuation of $157B in October.
Gary Marcus, a long-time critic of the current modality of deep learning AI, including LLMs, raised fundamental questions about whether there was much more that could be done along this tech path. Besides that, scaling has been an exponential process—and, like any exponential process within the physical world, it has to end eventually.
Each generation of AI has required an order of magnitude more resources than the last.
However (though not entirely contradicting Gary Marcus), many other commentators, including myself, have described scaling as a far more complicated process than simply throwing more compute, parameters, and data at problems.
Not to appeal to authority, but this has always been the case. In the midst of all of this, it’s easy to overlook one of the other major events during 2024: the dispute between OpenAI and Elon Musk, and the publication by OpenAI of a huge number of texts, emails, and other communications between the founders of OpenAI and Musk.
The reason I mention it is an interesting snippet from those archives in 2016 from Ilya Sutskever, then Chief Scientist at OpenAI:
As his sentiment reflects, compute is certainly useful, especially for iterating quickly, but fundamental progress is not simply from more brute force—whether in volume of data or compute.
It is progress in our models and representations that matter and has brought us to today. I made this point in 2023: we never would have gotten our current computer vision performance, which kicked off the deep learning boom, without ImageNet and AlexNet’s fundamental advancements—no matter how much compute had to throw at it. Compute has always been overrated as the limiting factor of AI.
(This is also covered in a guest post on this publication with Timothy Lee.)
Similar to Moore’s Law in semiconductors for many years, the exponential pace was not just a natural phenomenon or fundamental law in technology. It was a business choice, with extremely clever engineering by smart people, that was not linear, simple, or a foregone conclusion. We invented new materials, techniques, and even went 3D in our chip designs. As my recent article described: scaling is a choice.
The same will be the case with generative AI, and at this point, we clearly haven’t reached the end of the road. Especially since we went “3D” in AI as well, with reasoning.
A New Dimension of AI Scaling—and Cost
The debut of o1 and CoT reasoning was a fundamental shift in the direction of generative AI. The cost and speed of tokens had already become a major factor in competition between models and AI companies, but we were starting to see efficiencies from better trained and quantized models.
Test-time compute added an entirely new dimension. Now, instead of cost merely being dictated by the size of models—and, in particular, training them—we now can scale costs and resources at the inference level as well.
The way this has been popularly described is that the models can “think” for longer before responding.
Scaling accuracy with more time to “think” (and more cost), from The Information.
It’s worth keeping in mind that this is not a more limited technique of simply ensembling models (in other words, stacking models on top of each other, with a supervisor model helping the base LLM, similar to how RAG feeds in data). This is something both more—and less—interesting.
Nathan Lambert at Interconnects has a useful deep-dive on the topic, but fundamentally, these models “talk to themselves” and generate a significant number of tokens before actually responding to the user. You can (and people have) replicated this themselves with “plain old LLMs.” The magic of these models is that it doesn’t require multiple human interventions to make it work.
This is “Chain of Thought” reasoning as described by OpenAI (and taking them at their word) and, taken to an extreme, can cost a huge amount of money per query—as we saw with what OpenAI shared about o3’s run data on the AGI ARC benchmarks with a mind-boggling >$1,000 per query cost.
With this incredible cost, though, we did see incredible performance that blew everything else that came before out of the water.
That being said, there is still an open question what specific use cases would warrant this kind of massive price tag—especially when OpenAI is still wrangling with the question of extracting value (i.e., subscriptions) from users of their existing models, as we saw with their debut of $200/month ChatGPT Pro plans in December and fairly open speculation on rolling out $2,000/month enterprise plans.
While OpenAI has been leading the way with its o1 and o3 models, we are seeing models from other labs with this kind of test-time compute. Google’s Gemini 2.0 Flash Thinking Mode, Alibaba’s QwQ, and DeepSeek’s R1 are just a few. I’d expect to see more of these going forward into 2025 as all the major labs explore possibilities in this area.
Regardless, it’s clear that we are seeing this kind of scale-out of “reasoning models,” we won’t be seeing demand for high-performance computing hardware, led by Nvidia’s GPUs, die down anytime soon.
Additionally, we also likely won’t see power demands die down anytime soon. Even before much CoT reasoning became a huge conversation topic, we saw Microsoft announcing a deal to reopen Three Mile Island, and Meta put out a call for proposals to open a 4 GW nuclear plant.
Data centers running AI can’t just connect directly into the existing power grid, given their incredible requirements. As this trend of ever-scaling compute goes higher—and even more with test-time scaling—we should expect more of these kinds of projects to be announced as we go forward.
Agentic AI: Expanding Uses of Generative AI
When I was in grad school, agents were usually the province of reinforcement learning (specifically, for me, in a graduate seminar on “Learning in Sequential Decision Problems”) trying to navigate unfamiliar environments.
My academic work was mostly kind of boring. But to get the idea about what this kind of thing is about, I like to refer to a fun example of someone teaching an RL agent to play Pokémon Red from scratch.
Agentic AI or agents in 2024 have not been referring to that. As per Nathan Lambert, “In the current zeitgeist, an ‘AI agent’ is anything that interacts with the digital or physical world during its output token stream.” Nvidia has a more specific definition, citing that agents need to perceive (take inputs), reason (LLM step), act (connect with tools), and learn (feedback loop).
In either case, one of the key characteristics is plugging in LLMs (mostly) to APIs or other “tools” that can allow them to interact with things of real consequence, and not merely be a chat interface.
This was too much part of the conversation in 2024 to ignore, but that being said, it’s less of a paradigm shift than an incremental step towards AI being more and more capable.
Agents are simply generative AI starting to become good enough and trusted enough to be plugged into endpoints that interact with things that have real consequences. Basically, instead of AIs helping navigate, we’re giving LLMs (and other models) the wheel to drive the car.
Obviously, that requires also being able to see what’s going on in the environment, have tools to act, and evaluate its actions—which is where we start to overlap with a more traditional RL state-action-reward framework.
Given this, the pipeline of agents will ultimately include tools on the output end (tools to interact with things of real consequence—either literally physical, or “digital consequence” like booking airline tickets) and also tools on the input end (like RAG) to make them more useful. The popularity of NotebookLM, which has been around for a while but really took off in 2024 with its “creating podcasts” functionality presaged some of this. One can imagine an AI agent that is like NotebookLM on the input side, and does something like plug in fields in a grant application on the output end.
(This publication also did a great tutorial on NotebookLM.)
Expect this to happen more and more because that is precisely where real value will start to come from generative AI—not just being something that chats with you or generates an image or video, but can do things for you directly.
More Global Competition: The Rise of Chinese Models
There was widespread skepticism whether China would be able to catch up with the US and Europe (but mostly the US) in AI. With the rise of DeepSeek and Alibaba’s QWEN in AI benchmarks, there is no longer any real room for doubt.
If you missed it, I’d highly suggest one of the rare interviews of the DeepSeek CEO, translated from Chinese and covered by ChinaTalk. It offers a glimpse as to why a company like DeepSeek—internally funded by its associated hedge fund with massive compute resources, and the ability to freely pursue advancement without immediate monetization—has been able to succeed. After all, this looks a lot like what Google, Meta, Microsoft, and many other leading AI labs in the US look like.
Nevertheless, this is only likely to spur on even more geopolitical competition and tension.
2024 saw the rollout of more chip controls—including the ones that debuted at the end of the year (as the last hurrah in this realm by the Biden Administration)—we are not likely to see the last of geopolitics inserting itself into AI and AI development. These controls have been, and likely will continue to be too little, too late to achieve any material strategic objective.
That being said, Liang Wenfeng (梁文锋), DeepSeek’s CEO, stated in the translated interview that one of the main things holding back Chinese companies has been the US-originated chip controls.
Liang Wenfeng: We do not have financing plans in the short term. Money has never been the problem for us; bans on shipments of advanced chips are the problem.
Test-time compute making inference costs exponentially more computationally expensive is only going to exacerbate these difficulties.
However, we should also expect that constraints create innovations as well. DeepSeek v3 uses significantly less compute for both training and inference. DeepSeek v3 supposedly had a training price tag of only $5.5mm on H800 GPUs over two months, vs. GPT-4’s reported training cost being over $100mm. One should remember: the H800 is the dumbed-down version of the H100 chips that Nvidia needed to create to be allowed to sell to the Chinese market. Additionally, its inference cost per token is also a fraction of the other frontier models (see below chart).
Perhaps we might see innovations within the Chinese ecosystem for efficiency from scarcity.
Sam Altman recently obliquely threw shade at the Chinese models, stating that it’s easy to “copy something that you know works.” While that is, perhaps, fair, it seems undeniable that these Chinese models and model companies have arrived and will be major players within generative AI through 2025 and likely beyond.
If you liked what you read here
Check out my Substack, Weighty Thoughts, and my upcoming book on AI, which is meant to be an A-to-Z of what you need to know to get up to speed on the history, technology, and where AI is already deeply impacting the world outside the headlines.
The goal is to build your intuition, so you can make up your own mind on what the implications are, even as AI rapidly develops.
Thanks
!“Moore’s law wasn’t a natural law—the reality was it took a ton of creativity and effort from engineers. The same is true of AI, despite the focus on mass compute.” - James Wang
Further Reading
Do note that Creative Ventures also has a Newsletter.
His upcoming book is looking to be a comprehensive “primer” of AI’s history, technology, economics, and business implications.
Think of it as a wide-spanning survey of “what you need to know about AI” that you can give to everyone from your friends who were shocked by ChatGPT and confused about the future—all the way to engineers or business executives who want to round out their knowledge.
I wonder if AI will have periods that look like plateaus, but are where AI development resources shift from capability building to product and service integration and next gen learning by doing. Having this cycle seems like it would help AI shift more to knowledge creation.