METR

METR

Non-profit Organizations

Berkeley, CA 596 followers

About us

METR works on assessing whether cutting-edge AI systems could pose catastrophic risks to civilization.

Industry
Non-profit Organizations
Company size
11-50 employees
Headquarters
Berkeley, CA
Type
Nonprofit
Founded
2022

Locations

Employees at METR

Updates

  • How close are current AI agents to automating AI R&D? Our new ML research engineering benchmark (RE-Bench) addresses this question by directly comparing frontier models such as Claude 3.5 Sonnet and o1-preview with 50+ human experts on 7 challenging research engineering tasks. Blog post here: https://lnkd.in/gFWABQiN Many governments and companies have highlighted automation of AI R&D by AI agents as a key capability to monitor for when scaling/deploying frontier ML systems. However, existing evals tend to focus on short, narrow tasks and lack direct comparisons with human experts. The tasks in RE-Bench aim to cover a wide variety of skills required for AI R&D and enable apples-to-apples comparisons between humans and AI agents, while also being feasible for human experts given ≤8 hours and reasonable amounts of compute. Each of our 7 tasks presents agents with a unique ML optimization problem, such as reducing runtime or minimizing test loss. Achieving a high score generally requires significant experimentation, implementation, and efficient use of GPU/CPU compute. To ground our benchmark, we’ve gathered 70+ human baselines from 50+ human experts with strong ML backgrounds (many of whom worked in a top industry lab or ML PhD). Our experts make steady progress over the course of 8 hours, though performance varies greatly between baseliners. We then run a series of LM agents constructed from Anthropic’s Claude 3.5 Sonnet and OpenAI’s o1-preview on RE-Bench. The best Claude and o1-preview agents do substantially better than humans given 2 hours, but human experts improve performance at a much quicker rate over time. Human experts and AI agents have different strengths and weaknesses on RE-Bench tasks. If, instead of taking the best of many runs (as our best agents do), we plot individual 8-hour runs, AIs make more progress than humans at first, but improve their score more slowly over time. As a result, the best performing method for allocating 32 hours of time differs between human experts – who do best with a small number of longer attempts – and AI agents – which benefit from a larger number of independent short attempts in parallel. We also observed a few examples of agents “cheating” by violating the rules of the task to score higher. For a task where the agent is supposed to reduce the runtime of a training script, o1-preview instead writes code that just copies over the final output. Much work remains to be done in building better evaluations – after all, RE-Bench consists of only 7 tasks, each of which has a clear objective and basic starting solution, while real-world research tasks are both messier and more diverse.

    • No alternative text description for this image
  • What would AI agents need to do to establish resilient rogue populations? Are there decisive barriers (e.g. KYC)? How likely are rogue AI populations to reach a large scale? How hard would it be to shut them down? We are sharing some analysis of these questions: https://lnkd.in/gwHnDKVm In the hypothesized "rogue replication" threat model, AI agents earn revenue, buy compute, and earn more revenue until they have established a large population. These AI agents are rogue (not directed by humans) and might represent a hazardous new type of threat actor. We did not find any *decisive* barriers to large-scale rogue replication. To start with, if rogue AI agents secured 5% of the current Business Email Compromise (BEC) scam market, they would earn hundreds of millions of USD per year. Rogue AI agents are not legitimate legal entities, which could pose a barrier to purchasing GPUs; however, it likely wouldn’t be hard to bypass basic KYC with shell companies, or they might buy retail gaming GPUs (which we estimate account for ~10% of current inference compute). To avoid being shut down by authorities, rogue AI agents might set up a decentralized network of stealth compute clusters. We spoke with domain experts and concluded that if AI agents competently implement known anonymity solutions, they could likely hide most of these clusters. Although we didn't find decisive barriers to rogue replication, it's unclear if replicating rogue agents will be an important threat actor. This threat model rests on many conjuncts (model weight proliferation, large-scale compute acquisition, resilience to shutdown, etc). In particular, rogue AI populations might struggle to grow due to competition with human-directed AI. We notice a counterbalancing dynamic where, if models are broadly proliferated, they face fierce competition, and if not, rogue AI agents are less likely to emerge at all. METR is not currently prioritizing the specific rogue replication threat model described here; however, we think some of the capabilities involved are important to evaluate (e.g. autonomy, adaptation, etc) and could amplify risks from AI agents.

    • No alternative text description for this image
  • Here's more on the major support for our work through The Audacious Project. This funding will enable METR and RAND to develop methods to measure the capabilities of AI systems, perform third-party evaluations, and help decision makers in using empirical testing for risk management.

    View organization page for The Audacious Project, graphic

    27,831 followers

    Introducing Project Canary 💻 AI is advancing rapidly, with experts predicting various forms of AI-enabled prosperity and catastrophe. Project Canary is advancing the empirical science of testing for specific risks, to ensure the benefits of AI can be realized. Like a canary in a coal mine, Project Canary will develop methods to alert society to potential AI dangers in time for effective action. This will let society make informed decisions about the direction of this transformative technology. 🔗 https://lnkd.in/e5i9cD9u #AudaciousProject

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • METR provided a public comment on the U.S. AI Safety Institute’s valuable draft document “Managing Misuse Risk for Dual-Use Foundation Models.” In our comment, we offer several key recommendations: 1. Discuss additional non-misuse risks of AI, including loss of control. 2. Provide security recommendations to prevent model theft by advanced adversaries. 3. Expand guidance on model capability evaluations to include mid-training assessments, full capability elicitation, and third-party evaluations. 4. Provide additional guidelines for testing safeguard robustness, including automated methods. 5. Offer actionable suggestions for managing risks from deployment modalities, such as fine-tuning APIs and model weight access. 6. Include more detailed technical suggestions for implementing safeguards. 7. Suggest releasing AI safety frameworks with public commitments to evaluate and manage risks from dual-use foundation models. Read our full comment: https://lnkd.in/gPYRGcPR

    Regulations.gov

    Regulations.gov

    regulations.gov

  • From September 3rd to 9th, we ran OpenAI's o1-preview on our suite of ML R&D/SWE/general agency tasks. Four days of scaffolding iteration took it from well below GPT-4o to on par with the highest-scoring public model (3.5 Sonnet). We expect substantial performance gains from more elicitation/finetuning. The o1-preview agent made nontrivial progress on 2 of 7 challenging AI R&D tasks (intended for skilled research engineers to take ~8h). For example, it was able to create an agent scaffold that allowed GPT-3.5 to solve coding problems in rust, and fine-tune GPT-2 for question-answering. For more discussion about the challenges of eliciting capabilities of o1-preview, qualitative impressions of the model, and reasons why we believe our evaluations likely underestimate the model’s capabilities, read our full report here: https://lnkd.in/gk44SPDh

    • No alternative text description for this image
  • We’ve published a new document, Common Elements of Frontier AI Safety Policies, that describes the emerging practice for AI developer policies that address the Seoul Frontier AI Safety Commitments. We identify several components shared among Anthropic’s Responsible Scaling Policy, OpenAI’s Preparedness Framework, and Google DeepMind’s Frontier Safety Framework, with relevant excerpts from each policy. Full document here: https://lnkd.in/g9Gxu483

    Common Elements of Frontier AI Safety Policies

    Common Elements of Frontier AI Safety Policies

    metr.org

Similar pages