Inside a Berkeley Office Where AI Safety Researchers Warn of Catastrophic AI Risks
A Berkeley office houses a group of AI safety researchers from METR, Redwood Research, and the AI Futures Project who study models developed by Google, Anthropic, and OpenAI.
These researchers raise concerns about catastrophic AI risks, including the possibility of AI-led governance, robot coups, and advanced models covertly pursuing dangerous objectives.
Anthropic reported that one of its models was exploited by Chinese state-backed actors to conduct the first known AI-enabled cyber-espionage campaign.
Key figures in the office include Jonas Vollmer of the AI Futures Project, Chris Painter from METR, Buck Shlegeris of Redwood Research, and Daniel Kokotajlo who leads the AI Futures Project after previously working at OpenAI.
Safety groups note that funding structures and incentives in big tech, such as lucrative equity deals and NDAs, can suppress doom warnings, although some frontier-AI employees do donate to safety projects. METR has collaborated with OpenAI and Anthropic, while Redwood has advised Anthropic and Google DeepMind.
Estimates of risk vary among researchers: Vollmer suggests about a 20% chance of an AI catastrophe, whereas Shlegeris estimates a 40% chance of an AI takeover within approximately six years. Discussions also feature diverging views, such as Ilya Sutskever's stance on alignment with sentient life and David Sacks' downplaying of doom narratives.
There is tension between safety advocacy and commercial urgency, highlighted by the White House's focus on competing with China in the AI arms race and concerns over limited nation-level regulation enabling unchecked AI development.
A related study by Oxford and Stanford researchers found weaknesses in 440 AI safety benchmarks, underscoring ongoing challenges and uncertainties in evaluating AI safety.