A few weeks ago Meenal Nalwaya who heads up multimodal LLMs at Meta and I were chatting about what is going on in the world of agentic infra globally. There have been a bunch of opinions on what agents are, what they can do, but there was no clear canonical architecture that was emerging. Even doing something as basic as a personal agent to book "a flight ticket to SFO tomorrow" needs the agent to know what "today" is, where "here" is to book something from here to SFO, my flight + seat + meal preferences, my FF number, my ID & credit card info, then once booked, schedule an uber ride, get my boarding passes & place it on my calendar, and ensure a pickup & a hotel once I land, and so much more. A simple task like this has so, so many complexities.
We felt we needed to dig into what is the architectural framework needed to solve some of this. We also felt that this problem needs a 2-sided approach - you can build a bicycle but then you also need the roads to build bicycle lanes or it doesn't work. While we believe this utopian world where agents can book us a flight, a car, a hotel, is still not here yet, we wanted to write a series of articles to help us think about the opportunities and challenges ahead. This is the first part of a series of articles, feedback & comments welcome - link in the first comment - and hope you enjoy the Homer Simpson references :) part 1: "Endless Homers, Infinite Agents":