Our mission is to advance humanity's understanding of AI by examining the inner workings of advanced AI models (or “AI Interpretability”). As a research-driven product organization, we bridge the gap between theoretical science and practical applications of interpretability.
We're building critical infrastructure that empowers developers to understand, edit, and debug AI models at scale, ensuring the creation of safer and more reliable systems.
Goodfire is a public benefit corporation headquartered in San Francisco.
Excited to announce our collaboration with Patrick Hsu, Brian Hie, Dave Burke, and the rest of the team at Arc Institute on interpreting Evo 2, their groundbreaking biological foundation model.
Evo 2 is an incredible model and represents a significant advance in AI for biology - processing one million DNA letters at once, from bacteria to humans. Through our interpretability work, we've discovered that the model has learned to recognize sophisticated biological concepts, from basic DNA elements to complex protein structures. We're extracting hidden biological knowledge, revealing novel patterns, and advancing genome engineering, all with mechanistic interpretability!
The Arc team has been beyond impressive - just an incredible team of scientists and engineers that were a pleasure to work with. More to come on this partnership soon!
You can read more here:
Technical preprint: https://lnkd.in/eA-jMPQu
Interactive visualization tool: https://lnkd.in/eAPswScg
Goodfire blog post: https://lnkd.in/ejyAp65Q
Arc blog post: https://lnkd.in/e6EJsxmK
Shoutout to Myra Deng, Liv Gorton, Nicholas Wang, Nam Nguyen, Tom McGrath, Daniel Balsam, and the entire Goodfire team for their fantastic work here.
I'm hiring a Chief of Staff at Goodfire (AI interpretability research lab)! Would appreciate any leads from my network in finding this person.
This is an incredibly intensive role with the objective of making me and the leadership team more effective. This person will shadow me and do a bit of everything to make the business successful and should be a generalist with a technical background who can internalize and apply the latest research in AI and interpretability.
As with the rest of our positions, it's full time in person in San Francisco. If this role isn't quite right, we're hiring for a number of technical positions which you can check out here - https://lnkd.in/gapupaYQ
Chief of staff JD - https://lnkd.in/e42C2RER
Thrilled to announce that I started an AI research fellowship with Goodfire to explore how we can use AI interpretability to make LLMs more reliable and controllable.
Tom McGrathEric Ho and Daniel Balsam have built something truly special in the Ember SDK and I'm excited to show what I've been working on with them.
I’m incredibly excited to announce Goodfire Ember — the first hosted mechanistic interpretability API, with inference support for generative models like Llama 3.3 70B. This makes large-scale interpretability work accessible to the broader community and is already being used by partners like Rakuten, Haize Labs, and Apollo Research to improve model performance, increase security, and extract new understanding from models.
We think this is the start of building a set of tools to accelerate alignment research, as well as unlocking a new development paradigm that harnesses the latent intelligence already present inside models.
Try it yourself: https://lnkd.in/eVc34XD2.
Read more about our launch: https://lnkd.in/erNbcTmG
X thread: https://lnkd.in/ecPu5Mrt
If you think aligning AGI is the most important problem in the world, we’re hiring at https://lnkd.in/gapupaYQ.
We're open-sourcing Sparse Autoencoders (SAEs) for Llama 3.3 70B and Llama 3.1 8B! These are the first open-source SAEs at this scale and capability level, which should help expand access to mechanistic interpretability research on increasingly capable models.
Announcement - https://lnkd.in/ecqCmBpX
SAEs on HuggingFace - https://lnkd.in/exWgen4B
I’m incredibly excited to announce Goodfire Ember — the first hosted mechanistic interpretability API, with inference support for generative models like Llama 3.3 70B. This makes large-scale interpretability work accessible to the broader community and is already being used by partners like Rakuten, Haize Labs, and Apollo Research to improve model performance, increase security, and extract new understanding from models.
We think this is the start of building a set of tools to accelerate alignment research, as well as unlocking a new development paradigm that harnesses the latent intelligence already present inside models.
Try it yourself: https://lnkd.in/eVc34XD2.
Read more about our launch: https://lnkd.in/erNbcTmG
X thread: https://lnkd.in/ecPu5Mrt
If you think aligning AGI is the most important problem in the world, we’re hiring at https://lnkd.in/gapupaYQ.
AI is hard to control and engineer. I wrote a post about how using feature steering, an interpretability technique, can change this dynamic.
https://lnkd.in/grUp6szS
If you're interested in what we're building Goodfire, we're hiring! https://lnkd.in/gapupaYQ
Incredibly excited to announce our research preview, which is now live. You can access it at https://lnkd.in/eDUXHftC, and read more about it on our blog (https://lnkd.in/eAK5VNEC).
In this preview, we've created a desktop interface that helps you understand and control Llama 3's behavior. You can see Llama 3's internal features (the internal building blocks of its responses) and precisely adjust these features to create new Llama variants.
Check it out, and let us know what you think! Shoutout to the team at Goodfire - Myra Deng, Daniel Balsam, and Tom McGrath for the incredible work.
Y'arr mateys! In honor of International Talk Like a Pirate Day, we're releasing an on-theme sneak peak of our research preview. We show that "feature" steering may enable more persistent and robust modifications to a language model's behavior compared to traditional inference-time techniques like prompting.
Basically, the model can't stop talking like a pirate.
Shoutout to Myra Deng for the great video and the rest of the Goodfire team for the awesome work, and sign up for our waitlist here - https://lnkd.in/ezFJ4hKS.
How sure are you that you can tell when social media accounts are bots? What about as AI improves?
I've been slow to share about this on LinkedIn, but Nicholas Thompson's post is a nice occasion:
Introducing "personhood credentials"
In a new paper—co-authored with researchers from ~20 orgs, & my OpenAI teammates Zoë Hitzig and David Schnurr— we ask, "What are AI-proof ways to tell who’s real online?"
As AI becomes more realistic, photos and even videos of someone might not be enough to trust that they aren't just a fake account trying to scam you.
Current solutions won’t be enough: We can’t rely anymore on AI lacking certain abilities, like typing in the letters of a CAPTCHA puzzle.
What we want is a way to access AI's transformative benefits - like helping to regenerate a person’s lost voice - without these abilities being leveraged for deception at scale. Further, people shouldn't have to give up privacy or inclusivity in the process.
To that end, we propose personhood credentials: a privacy-preserving tool that shows you’re a person, but doesn’t reveal which.
Importantly, these are backed by two things AI can’t fake, no matter how good it gets: passing in the real-world, and secure cryptography.
Personhood credentials can be issued by a range of trusted entities, like governments or foundations; you enroll by showing you’re a real person who hasn’t yet gotten one. Then, you can validate this with websites without revealing your identity.
The core requirements are that these credentials must be limited (so people can’t get many and give them to AI) and highly private—ensuring anonymity and unlinkable activity, even if websites or issuers collude. People and sites then have an optional tool to show there’s a real person behind an account, without showing anything more.
In the paper, we discuss a number of factors that must be carefully managed to maximize the benefits of these systems—like equitable access and checks on power—as well as a range of recommendations for preparing the Internet for AI's impacts and for making personhood credentials a viable option.
I'll include the paper below; would be grateful for any feedback!
The most interesting thing in tech: a smart idea for proving that you are real in an age of AI: Personhood credentials. There's a smart way, using cryptography, to verify that you are a real human without you having to reveal personal information that you don't want to reveal. Like PGP, it's based on a system of public and private keys and it's the best idea I've heard yet for solving a problem that is becoming ever more important.