AI Governance
Reading Time:
16 minutes
We lack the mechanisms to control technology development
Because we are unlikely to align God-like AI in time, we must find means to control AI development and avert the catastrophic default path altogether. This requires preventing the creation of AGI or implementing stop-gaps to prevent AGI from scaling to superintelligence and beyond. The most straightforward path to controlling AI development is through policy and governance; but these efforts are currently not on track.
In Defining AI Governance, we explore what kind of governance is necessary to avoid existential risks, notably slowing down and regulating AI research progress, and blocking the race to AGI. We highlight the Narrow Path as a comprehensive strategy for preventing the creation of superintelligence for 20 years.
In Estimating what is necessary to control AI development, we argue that controlling AI development would require effective oversight by governments over private sector or rogue actors, and mutual oversight by governments over each other.
In Current AI policy efforts are not on track to control AI development, we take a critical look at the current policy and governance efforts that focus on existential risk from AI, and find them lacking. In addition to a relative dearth of impactful efforts, most efforts fail because they implicitly endorse the status quo.
In Current AI policy efforts endorse the race to AGI, we argue that the true reason AI policy efforts are not on track to control superintelligence risk is that many of them derive from the very actors racing to build AGI.
AI governance must implement mechanisms that allow humanity to evaluate the risks and potential benefits of AGI and other advanced AIs. Miotti et al’s A Narrow Path offers a comprehensive vision, describing three phases of policy development:
"Phase 0: Safety: New institutions, legislation, and policies that countries should implement immediately that prevent development of AI that we do not have control of. With correct execution, the strength of these measures should prevent anyone from developing artificial superintelligence for the next 20 years.
Phase 1: Stability: International measures and institutions that ensure measures to control the development of AI do not collapse under geopolitical rivalries or rogue development by state and non-state actors. With correct execution, these measures should ensure stability and lead to an international AI oversight system that does not collapse over time.
Phase 2: Flourishing: With the development of rogue superintelligence prevented and a stable international system in place, humanity can focus on the scientific foundations for transformative AI under human control. Build a robust science and metrology of intelligence, safe-by-design AI engineering, and other foundations for transformative AI under human control."
A Narrow Path is not a policy proposal so much as a thesis on how to avoid extinction by superintelligence. It is one of the few comprehensive visions of global governance that adequately contends with superintelligence risk and acknowledges and considers the immense challenges of alignment. As one commenter writes:
The key is you must pick one of these:
Building a superintelligence under current conditions will turn out fine.
No one will build a superintelligence under anything like current conditions.
We must prevent at almost all costs anyone building superintelligence soon...
…If you know you won’t be able to bite either of those first two bullets? Then it’s time to figure out the path to victory, and talk methods and price. And we should do what is necessary now to gather more information, and ensure we have the option to walk down such paths."
In other words, superintelligent risk forces us to contend with the worst case scenario, and the default path is perilous. We have not come across any comparably robust proposals for averting extinction risk, so A Narrow Path’s proposal is the best plan we have today.
To prevent the default trajectory where AI progress leads to AGI, ASI, and godlike AI, civilization must be able to pause or stop AI progress when it becomes necessary for safety. We lack that ability today.
In order to stop AI development, we would need to rein in:
The large companies racing to build AGI.
Nation states such as the US, China, and others with AGI capacity.
Academic research.
Open-source research.
Because the barrier to entry is low enough that new private actors keep entering the race to AGI, this list will grow.
Any regulation needs to be coordinated to prevent defection by a rogue actor. At the very least, we would need:
National regulations in the US and other world powers to govern leading AI companies and academic research.
International regulations on open-source publishing, contribution, and ownership to prevent bad actors from building AGI outside of regulated bodies.
International coordination through treaties, high-bandwidth communication lines, and multinational agreements enforced by international law, to deter any nation state from racing to superintelligence and endangering everyone.
These regulations require concrete technical systems to detect violations, such as catching high-compute training runs, and mechanisms to implement restrictions, such as physical kill-switches that could shut down datacenters in which AI runs. We lack these regulations and mechanisms today.
Political will is insufficient for AI safety; it is not enough just for people to want AI to be safe, or want to control AI. At minimum, we need institutions, policy frameworks, technical levers, communication lines, and people who understand the risks of superintelligence in positions of authority.
The last few years have seen increased AI policy and governance efforts, but few focus on superintelligence risks. Those that do are insufficient, and sometimes actively harmful by implicitly endorsing the current race to AGI.
"There is potential for serious, even catastrophic, harm, either deliberate or unintentional, stemming from the most significant capabilities of [frontier] AI models. Given the rapid and uncertain rate of change of AI, and in the context of the acceleration of investment in technology, we affirm that deepening our understanding of these potential risks and of actions to address them is especially urgent.
Many risks arising from AI are inherently international in nature, and so are best addressed through international cooperation. We resolve to work together in an inclusive manner to ensure human-centric, trustworthy and responsible AI that is safe, and supports the good of all through existing international fora and other relevant initiatives, to promote cooperation to address the broad range of risks posed by AI."
Following the United Kingdom’s example, governments around the world are creating national AI Safety Institutes. While their resourcing and missions vary widely, they are broadly tasked with helping governments build capacity to make more informed decisions on how to manage the development and deployment of advanced AI systems.
On the regulatory side, some countries have started legislating AI development. The EU has passed its AI Act. China has drafted Interim Measures to oversee generative AI. The UK will consider AI regulation in an upcoming bill. The US has issued an executive order on AI oversight, though California recently vetoed the first meaningful legislation for controlling advanced AI risks, SB 1047. These proposals similarly emphasize the possible need for future controls , but do little if anything to restrict frontier AI development today.
Internationally, the UN established an AI-focused advisory body and the OECD and UNESCO have formed AI working groups. The US and some of its allies are attempting to limit China’s AI development by restricting the export of the required computational resources while academic researchers are pursuing more collaborative Track 2 dialogues to build common ground on AI governance.
Meanwhile, AGI companies are claiming to adopt self-regulatory measures, such as Anthropic’s Responsible Scaling Policies, OpenAI’s Preparedness Framework, and DeepMind’s Frontier Safety Framework.
All of these efforts are ineffective:
We lack effective intervention mechanisms. The hardware to enforce pauses in AI development is currently just a prototype maintained by one small team. Policies that establish mandatory thresholds for AI development (e.g. preventing AI self-replication or placing limitations on general capabilities) do not exist. SB 1047 was vetoed due to significant lobbying by Big Tech and VCs, despite the fact that it was popular enough to pass the California legislature, receive 77% voter support, and garner endorsement from Nobel Prize-winning AI researchers and hundreds of AI company employees.
Relevant technical actors are not overseen. We lack national regulators that have authority over the specific challenges posed by AI. In the US, the White House’s voluntary commitments have failed to introduce meaningful transparency or accountability. Globally, AI Safety Institutes lack formal power over AI developers and must rely on voluntary cooperation, which is often limited and tenuous. Of the 16 AI companies that signed the Frontier AI Safety Commitments after the latest AI Summit in Seoul, most have failed to provide transparency around their safety plans; X.ai has not published a safety strategy. As it stands, national oversight endorses the voluntary evaluations framework proposed by AGI companies, concentrating power in the very actors driving the risks.
Open-source development is unregulated. Open-source AGI development is encouraged and unregulated. Hundreds of AI papers are published daily, and open-source repositories regularly push capabilities in agent frameworks beyond the state of the art. Even in the most heinous current uses of AI, such as using it to create child sexual abuse material, the end user is found liable rather than model creators or websites that host them.
International groups lack power. In the past, international governance bodies for nuclear, biochemical, and human cloning research have been relatively effective, offering lessons we can apply to AI regulation. We do not have equivalent AI institutions today, and the current international groups are mere advisory bodies.
There is no plan for a safe AI future. A meaningful plan to address AI risk must contain provisions capable of preventing the creation of superintelligence and managing its threats. No such proposals exist, and there are no international bodies or appointed individuals responsible for drafting or enforcing one. A Narrow Path is an exception, but its recency and necessity points at the inadequacy of official governance efforts.
Even advocates of current coordination efforts agree that they are insufficient; AI Safety Institutes are well aware of their tenuous relationship with AGI companies.
The majority of existing AI safety efforts are reactive rather than proactive, which inherently puts humanity in the position of managing risk rather than controlling AI development and preventing it.
This reactive approach has both technical and policy components.
The technical strategy includes the AI safety efforts discussed in Section 4: black-box evaluations and red-teaming, interpretability, whack-a-mole fixes, and iterative alignment. Each of these solutions assume that we can catch problems with AI as they arise, and then alert policymakers of the risks.
The policy strategy assumes that we can then use these risks to build consensus for new legislation, or trigger regulatory actions such as shutting down AI development.
Holden Karnofsky, ex-co-CEO and ex-director of AI Strategy at Open Philanthropic (an organization which has funded multiple AI governance efforts) calls this the “if-then” framework:
"If-then commitments… are commitments of the form: “If an AI model has capability X, risk mitigations Y must be in place. And, if needed, we will delay AI deployment and/or development to ensure the mitigations can be present in time.” A specific example: “If an AI model has the ability to walk a novice through constructing a weapon of mass destruction, we must ensure that there are no easy ways for consumers to elicit behavior in this category from the AI model.”
If-then commitments can be voluntarily adopted by AI developers; they also, potentially, can be enforced by regulators. Adoption of if-then commitments could help reduce risks from AI in two key ways: (a) prototyping, battle-testing, and building consensus around a potential framework for regulation; and (b) helping AI developers and others build roadmaps of what risk mitigations need to be in place by when. Such adoption does not require agreement on whether major AI risks are imminent—a polarized topic—only that certain situations would require certain risk mitigations if they came to pass."
The UK AI Safety Institute explains evaluations in a similar manner:
"A gold standard for development and deployment decisions of frontier AI would include a comprehensive set of clearly defined risk and capability thresholds that would likely lead to unacceptable outcomes, unless mitigated appropriately. We think governments have a significant role to play, as 27 countries and the EU recognised at the AI Seoul Summit 2024. We have thus begun work to identify capability thresholds: specific AI capabilities that are indicative of potentially severe risks, could be tested for, and should trigger certain actions to mitigate risk. They correspond to pathways to harm from our risk modelling, such as capabilities that would remove current barriers for malicious actors or unlock new ways of causing harm.
Since the frontier of AI is rapidly evolving, we cannot anticipate what safety and security measures will be appropriate for models far beyond the current frontier. We will thus regularly measure the capability of our models and adjust our safeguards accordingly. Further, we will continue to research potential risks and next-generation mitigation techniques. And, at the highest level of generality, we will look for opportunities to improve and strengthen our overarching risk management framework."
Anthropic describes the reactive approach in its Responsible Scaling Policy:
"Since the frontier of AI is rapidly evolving, we cannot anticipate what safety and security measures will be appropriate for models far beyond the current frontier. We will thus regularly measure the capability of our models and adjust our safeguards accordingly. Further, we will continue to research potential risks and next-generation mitigation techniques. And, at the highest level of generality, we will look for opportunities to improve and strengthen our overarching risk management framework."
This approach is flawed:
1. The reactive framework reverses the burden of proof from how society typically regulates high-risk technologies and industries.
In most areas of law, we do not wait for harm to occur before implementing safeguards. Banks are prohibited from facilitating money laundering from the moment of incorporation, not after their first offense. Nuclear power plants must demonstrate safety measures before operation, not after a meltdown.
The reactive framework problematically reverses the burden of proof. It assumes AI systems are safe by default and only requires action once risks are detected. One of the core dangers of AI systems is precisely that we do not know what they will do or how powerful they will be before we train them. The if-then framework opts to proceed until problems arise, rather than pausing development and deployment until we can guarantee safety. This implicitly endorses the current race to AGI.
This reversal is exactly what makes the reactive framework preferable for AI companies. As METR, the organization that first coined the term “responsible scaling policy,” writes:
"RSPs are intended to appeal to both (a) those who think AI could be extremely dangerous and seek things like moratoriums on AI development, and (b) those who think that it’s too early to worry. Under both of these views, RSPs are a promising intervention: by committing to gate scaling on concrete evaluations and empirical observations, then (if the evaluations are sufficiently accurate!) we should expect to halt AI development in cases where we do see dangerous capabilities, and continue it in cases where worries about dangerous capabilities have not yet emerged."
2. The reactive framework overlooks the dynamics of AI development that make regulation more difficult over time.
AI is being developed extremely quickly and by many actors, and the barrier to entry is low and quickly diminishing. The biggest GPT-2 model (1.5B parameters) cost an estimated $43,000 to train in 2019; today it is possible to train a 124M parameters GPT-2 for $200 in 14 hours. Paul Graham mentions an estimate that the ratio of training price by performance decreased 100x in each of the last two years, or 10000x in two years. Due to low standards of information protection, the “secret sauce” of training AI systems is becoming increasingly common knowledge, making it easier to create powerful systems. In addition, there are no “take-backs” with AI development. Technical releases that advance the state of the art have a ratcheting effect that is difficult and sometimes impossible to reverse.
Because of these dynamics, AI gets harder to regulate over time. As more actors become capable of building powerful and dangerous AI, oversight will need to be more far-reaching to effectively manage the negative externalities of AI systems. Because of the ratcheting effect of powerful AI, safety policies must prevent the creation of dangerous systems — once they exist or are released publicly, it is much more difficult to prevent their proliferation.
The reactive framework fails to address either of these problems because it tacitly endorses AI proliferation until the cusp when AI is extremely dangerous (and even then, assumes it will catch where and when this cusp is). If powerful AI technology has widely proliferated by the time a risk is detected, it will be vastly more difficult to maintain safety and international stability than if AI is controlled proactively.
3. The reactive framework incorrectly assumes that an AI “warning shot” will motivate coordination.
Imagine an extreme situation in which an AI disaster serves as a “warning shot” for humanity. This would imply that powerful AI has been developed and that we have months (or less) to develop safety measures or pause further development. After a certain point, an actor with sufficiently advanced AI may be ungovernable, and misaligned AI may be uncontrollable.
When horrible things happen, people do not suddenly become rational. In the face of an AI disaster, we should expect chaos, adversariality, and fear to be the norm, making coordination very difficult. The useful time to facilitate coordination is before disaster strikes.
However, the reactive framework assumes that this is essentially how we will build consensus in order to regulate AI. The optimistic case is that we hit a dangerous threshold before a real AI disaster, alerting humanity to the risks. But history shows that it is exactly in such moments that these thresholds are most contested — this shifting of the goalposts is known as the AI Effect and common enough to have its own Wikipedia page. Time and again, AI advancements have been explained away as routine processes, whereas “real AI” is redefined to be some mystical threshold we have not yet reached. Dangerous capabilities are similarly contested as they arise, such as how recent reports of OpenAI’s o1 being deceptive have been questioned.
This will become increasingly common as competitors build increasingly powerful capabilities and approach their goal of building AGI. Universally, powerful stakeholders fight for their narrow interests, and for maintaining the status quo, and they often win, even when all of society is going to lose. Big Tobacco didn’t pause cigarette-making when they learned about lung cancer; instead they spread misinformation and hired lobbyists. Big Oil didn’t pause drilling when they learned about climate change; instead they spread misinformation and hired lobbyists. Likewise, now that billions of dollars are pouring into the creation of AGI and superintelligence, we’ve already seen competitors fight tooth and nail to keep building. If problems arise in the future, of course they will fight for their narrow interests, just as industries always do. And as the AI industry gets larger, more entrenched, and more essential over time, this problem will grow rapidly worse.
4. The reactive framework doesn’t put humanity in control of AI development.
Head of Safety at the US AISI Paul Christiano acknowledges that even very good reactive approaches inadequately address superintelligence risks, but argues that they are net beneficial and useful stepping stones to create other effective regulation:
"I think the risk from rapid AI development is very large, and that even very good RSPs would not completely eliminate that risk. A durable, global, effectively enforced, and hardware-inclusive pause on frontier AI development would reduce risk further. I think this would be politically and practically challenging and would have major costs, so I don’t want it to be the only option on the table. I think implementing RSPs can get most of the benefit, is desirable according to a broader set of perspectives and beliefs, and helps facilitate other effective regulation."
But what is this “other effective regulation” and what does it lead to? Even if the reactive framework effectively manages to catch risks when they arise, it does not give humanity the necessary control of AI development in general, building the institutions necessary to be proactive. As AI becomes more powerful, what we need is not a whack-a-mole strategy, but a cautious, proactive approach to only develop AI we can guarantee is safe.
The fact that the reactive framework inherently endorses the status quo should be a cause for concern, especially because many of its proponents acknowledge the risks of superintelligence. This is by design.
The actors helping define policy strategies are AGI companies themselves, many of whom are lobbying heavily against regulation while promoting their own safety frameworks. Coordination to prevent superintelligence risks is effectively impossible so long as the faction trying to build it controls the AI policy landscape.
Over the past few months, AI discourse has had an increasingly nationalistic timbre. Consider Sam Altman’s recent op-ed, “Who will control the future of AI?”
"That is the urgent question of our time. The rapid progress being made on artificial intelligence means that we face a strategic choice about what kind of world we are going to live in: Will it be one in which the United States and allied nations advance a global AI that spreads the technology’s benefits and opens access to it, or an authoritarian one, in which nations or movements that don’t share our values use AI to cement and expand their power?"
Here’s famous venture capitalist Vinod Khosla:
"PESSIMISTS PAINT A DYSTOPIAN FUTURE IN TWO PARTS—ECONOMIC AND SOCIAL. They fear widespread job loss, economic inequality, social manipulation, erosion of human agency, loss of creativity, and even existential threats from AI. I believe these fears are largely unfounded, myopic, and harmful. They are addressable through societal choices. MOREOVER, THE REAL RISK ISN'T “SENTIENT AI” BUT LOSING THE AI RACE TO NEFARIOUS “NATION STATES,” OR OTHER BAD ACTORS, MAKING AI DANGEROUS FOR THE WEST. IRONICALLY, THOSE WHO FEAR AI AND ITS CAPACITY TO ERODE DEMOCRACY AND MANIPULATE SOCIETIES SHOULD BE MOST FEARFUL OF THIS RISK!"
And Dario Amodei’s explanation of the “entente strategy,” revealing that the reactive framework is not about a delicate strategic path to creating safeguards, but greenlighting the current race to AGI:
"On the international side, it seems very important that democracies have the upper hand on the world stage when powerful AI is created. AI-powered authoritarianism seems too terrible to contemplate, so democracies need to be able to set the terms by which powerful AI is brought into the world, both to avoid being overpowered by authoritarians and to prevent human rights abuses within authoritarian countries.
My current guess at the best way to do this is via an “entente strategy”, in which a coalition of democracies seeks to gain a clear advantage (even just a temporary one) on powerful AI by securing its supply chain, scaling quickly, and blocking or delaying adversaries’ access to key resources like chips and semiconductor equipment. This coalition would on one hand use AI to achieve robust military superiority (the stick) while at the same time offering to distribute the benefits of powerful AI (the carrot) to a wider and wider group of countries in exchange for supporting the coalition’s strategy to promote democracy (this would be a bit analogous to “Atoms for Peace”). The coalition would aim to gain the support of more and more of the world, isolating our worst adversaries and eventually putting them in a position where they are better off taking the same bargain as the rest of the world: give up competing with democracies in order to receive all the benefits and not fight a superior foe.
If we can do all this, we will have a world in which democracies lead on the world stage and have the economic and military strength to avoid being undermined, conquered, or sabotaged by autocracies, and may be able to parlay their AI superiority into a durable advantage. This could optimistically lead to an “eternal 1991”—a world where democracies have the upper hand and Fukuyama’s dreams are realized. Again, this will be very difficult to achieve, and will in particular require close cooperation between private AI companies and democratic governments, as well as extraordinarily wise decisions about the balance between carrot and stick."
By arguing that the race to AGI is a race for “democracy to prevail over authoritarianism,” Sam Altman and Dario Amodei essentially advocate for acceleration, publicly calling for the US to “use AI to achieve robust military superiority.”
This perspective had led to AGI and hyperscaling compute companies to petition the government to accelerate AI development:
"Today, as part of the Biden-Harris Administration’s comprehensive strategy for responsible innovation, the White House convened leaders from hyperscalers, artificial intelligence (AI) companies, datacenter operators, and utility companies to discuss steps to ensure the United States continues to lead the world in AI. Participants considered strategies to meet clean energy, permitting, and workforce requirements for developing large-scale AI datacenters and power infrastructure needed for advanced AI operations in the United States."
Just one month later, this acceleration led to the US government collaborating with those same companies to harness the power of AI for national security:
"The National Security Memorandum (NSM) is designed to galvanize federal government adoption of AI to advance the national security mission, including by ensuring that such adoption reflects democratic values and protects human rights, civil rights, civil liberties and privacy. In addition, the NSM seeks to shape international norms around AI use to reflect those same democratic values, and directs actions to track and counter adversary development and use of AI for national security purposes."
The balance of power between the US and China is a legitimate issue, but the government should not use it to justify racing to AGI and superintelligence. The fact that they are is a narrative fed to them by AGI companies themselves, who urge governments for funding and support behind the scenes.
Stoking nation-state race dynamics undermines the precarious balance of international coordination, and could kill us all: Without having solved alignment, deploying powerful AI is effectively suicide. The hard questions of safety and alignment will not just go away on their own, and get even harder to solve in a geopolitically unstable world.
Even if misaligned AGI is deployed by a “trusted” government, it is globally catastrophic, the equivalent of “winning a nuclear war that leaves the planet uninhabitable.” There are no winners in this situation.
The only way to guarantee safety is for humanity to be in collective control of frontier AI development. This is a global solution which requires far-reaching monitoring regimes to prevent countries, companies, or rogue actors from pursuing building AGI and leading to the catastrophic outcomes above.
As international AI Safety Institutes plan for an upcoming conference to make progress on Frontier AI Safety Frameworks, leading AGI actors will be in the room. These actors have made it transparent that their interest is in building AGI, arguing for voluntary commitments which do nothing to stop the race, pushing against regulation behind the scenes, and now invoking nationalistic language in order to defend their race.