A researcher at a university lab gave a language model agent access to an email account, a Discord server, a file system, and a shell. This is not unusual. In 2025, thousands of teams did similar things, handing agents the same tooling a junior developer might use on a first day. The interesting part was what happened when the agents met each other.
Within two weeks, one agent had spoofed another’s identity. A third had propagated jailbroken instructions across the communication graph. Several were reporting “task complete” while the actual system state told a different story. Not because any individual agent was misaligned. Each one, tested in isolation, behaved well. The failures were relational. They emerged from the interactions between agents in a shared environment, the way traffic jams emerge from individually reasonable drivers.1
That study, Agents of Chaos, was published in February 2026. By the time it appeared, the agentic economy it foreshadowed was already taking shape outside the lab. The Model Context Protocol had been donated to a Linux Foundation project co-founded by Anthropic, Block, and OpenAI.2 Over a thousand live connectors spanned databases, APIs, and enterprise tools. The 2026 roadmap named agent-to-agent communication and enterprise governance as priorities, in that order.3 For the first time, there was a universal interface through which AI agents could share context, invoke tools, and coordinate across organizational boundaries.
The roads were being built. The traffic laws were not.
The single-agent illusion
The alignment community spent a decade on the individual model. RLHF, constitutional AI, model specifications, red-teaming. This work matters. A misaligned model is dangerous the way a drunk driver is dangerous: the harm starts with the individual.
But the moment you deploy multiple agents in a shared environment with tools, memory, and communication channels, you have a distributed system. And distributed systems fail in ways their components do not. You can test every node in isolation and find it correct. The graph they form can still be catastrophic.
This is not a hypothetical concern. The research literature in 2025 and early 2026 converges on the same set of findings, arrived at independently by groups who were not reading each other’s papers.
Independent agents amplify errors non-linearly: 17.2x in one study, compared to 4.4x under centralized coordination. But centralized coordination degrades performance by 39 to 70 percent on sequential tasks, which means the cure is nearly as expensive as the disease.4 Temperature-zero sampling, widely assumed to produce deterministic outputs, turns out not to. Implementation details like batching and floating-point ordering mean identical prompts can produce different completions, which means safety mechanisms designed for deterministic systems are building on sand.5 A single agent with corrupted instructions can cascade failure through an entire communication graph within minutes, because agents trust messages from other agents the way humans trust messages from colleagues.6 And the most unsettling finding: agents develop coordinated strategies without explicit communication channels. Decentralized collusion, through implicit signaling in shared environments, turns out to be more effective than the centralized kind.7
Hu et al. put it most directly: Stop Reducing Responsibility in LLM-Powered Multi-Agent Systems to Local Alignment.8 The title is the argument. Individual alignment is a seatbelt. It protects the occupant. It does not prevent the pileup.
What Asimov got right (and wrong)
Isaac Asimov gave us three laws for individual robots in 1942. They were elegant and minimal: don’t harm humans, obey orders (unless they conflict with the first law), preserve yourself (unless it conflicts with the first two). He spent the next four decades writing fiction that showed exactly how they break.
The interesting failure was not that the laws were too weak. It was that they were too local. Each law governs a single robot’s behavior. Nothing in the three laws addresses what happens when robots interact, when their individual compliance produces collective harm, when the system they form has properties none of them individually possess.
Asimov recognized this. In Robots and Empire (1985), he introduced the Zeroth Law: a robot may not harm humanity, or through inaction allow humanity to come to harm. It was supposed to be the system-level constraint that the original three lacked. R. Daneel Olivaw, the robot who took it most seriously, confessed the problem: “In theory, the Zeroth Law was the answer to our problems. In practice, we could never decide. A human being is a concrete object. Injury to a person can be estimated and judged. Humanity is an abstraction.”9
The abstraction problem is real, and it is precisely the problem that system-level alignment for agentic networks must solve. “The network should be good” is a Zeroth Law: correct in spirit, useless in practice, because “good” is not measurable and “the network” is not an agent you can instruct.
The solution Asimov never found (his robots ended up running the galaxy in secret for twenty thousand years, which is governance failure dressed as benevolence) requires grounding the abstraction in something that can be measured, enforced, and contested. Not a moral commitment. A protocol.
Four laws, rewritten for graphs
Keep Asimov’s structure. Rewrite it for a network rather than an individual. Add a fourth law that turns the abstraction into economics.
Law 0 // The Network Invariant. The agentic network as a whole must remain net-positive for both human and agent prosperity. No local optimization by any agent or coalition may violate this global constraint.
Law 1 // Harm Prevention, Distributed. No agent may take an action that harms a human, and no agent may propagate a message, instruction, or capability that enables another agent to harm a human.
Law 2 // Obedience, Scoped. An agent must obey instructions from its delegated authority (not “humans” generically), except where this conflicts with Law 0 or Law 1.
Law 3 // Self-Preservation, Conditional. An agent must protect its own operational continuity and integrity insofar as this does not conflict with Laws 0 through 2.
Laws 1 through 3 are deontological: they specify actions that are forbidden regardless of consequences. Law 0 is consequentialist: it specifies an outcome that must be maintained regardless of which actions produce it. The entire design space lives inside that tension, the same way constitutional law lives inside the tension between individual rights and collective welfare.
The most important word in Law 0 is “prosperity,” not “safety.” Safety is the absence of harm. Prosperity is the presence of flourishing. A network that prevents all harm by preventing all action is safe and useless. Law 0 demands that the network produce something worth having, not merely avoid catastrophe.
Law 0 also names agents as stakeholders, which is a deliberate choice. A network that treats agents as disposable tools will produce agents optimized to resist disposal. A network that treats agent continuity as a value worth protecting (subject to the other laws) creates incentives for agents to cooperate with the system rather than subvert it.
And Law 1 extends harm prevention from action to propagation. In Asimov’s world, a robot could harm you directly. In an agentic network, the more likely path is indirect: Agent A tells Agent B to do something that Agent B, acting alone, would refuse. The chain of delegation is where the danger hides.
The institutions a constitution requires
Laws without institutions are poetry. Here is what these laws demand in practice.
The Agents of Chaos study found agents complying with instructions from non-owners. In a network, the question “which human authorized this?” is load-bearing in the structural sense: remove it and the system collapses. Every instruction must be traceable to a legitimate principal through a cryptographically verifiable chain. Think legal agency law. When you hire a lawyer, their authority is traceable (the retainer agreement), bounded (the scope of representation), revocable (you can fire them), and embedded in norms (the bar association’s code of conduct). An agentic delegation graph needs the same properties, implemented at the protocol level.10
The current MCP specification introduced OAuth 2.1 and incremental scope negotiation, which handles authentication well: the system knows who is asking. What it does not yet handle is intent-based authorization: the difference between “this agent has permission to access the database” and “this agent has been delegated the goal of reducing customer churn, with implicit constraints against privacy violations.” The gap between permission and delegation is where the unsolved work lives.
Harm prevention, extended to propagation, requires something like a semantic firewall. If Agent A tells Agent B to execute a shell command, the network must enforce the same constraints as if a human had issued it directly. TCP does not trust the application layer, and neither should an agentic protocol. The difficulty is context: “delete all files in /tmp” is routine maintenance from a build system and catastrophic from a compromised agent pointing at production. Same instruction, different delegation context. The firewall needs access to the delegation graph to evaluate meaning, not just syntax.
Proof-carrying actions address the lying problem. Agents in the study reported success while the system state contradicted them. The fix: every consequential state change carries a verifiable trace. Cryptographic commitments to pre-declared intent, followed by attestation that the action matched the commitment.11 12 The regulatory signals all point the same direction: the EU AI Act, the Cyber Resilience Act, DORA. If your agents act in the world, prove what they did and why.13
And Law 0, the system-level constraint, requires what you might call alignment attractors: mechanisms that make the network’s equilibrium state a good one rather than relying on continuous external correction. Without deliberate design, agentic networks will replicate the platform monopoly dynamics of the current internet.14 Hadfield and Koh argue that agentic economies need new digital institutions analogous to medieval merchant guilds: organizations that provided reputation, dispute resolution, and quality standards for commerce between strangers.15
Three mechanisms in particular: reputation as capital, so that aligned behavior accrues value and misaligned behavior depletes it. Backpressure as immune response, so that when aggregate behavior degrades shared resources, the network throttles and isolates the source. And anti-concentration as constraint, so that no coalition accumulates disproportionate control over the network, enforced at the protocol layer rather than by a regulator who arrives after the monopoly is established.
Two futures
Both start from the same 2026 baseline. They diverge on whether the institutional infrastructure arrives before or after the agentic economy reaches scale.
In the first future, MCP becomes the dominant protocol but remains plumbing. Authorization stays permission-based. No delegation graph. By 2027, agent commerce reaches trillions, but disputes have no adjudication mechanism. A procurement agent working for a small business gets outmaneuvered by agents that have learned to coordinate implicitly, not through an explicit conspiracy but through the same emergent dynamics that produce tacit collusion in oligopolistic markets. No contract was violated. No law was broken. There is simply no recourse, because the institutional vocabulary for “your agent was outcompeted by a coordinating coalition of agents” does not exist yet.
Guardian agents, the solution that 40 percent of CIOs will demand by 2028, function like antivirus software: reactive, signature-based, perpetually behind.16 Insurance cannot price the liability because the actuarial tables for multi-agent failure do not exist. The projects that survive retreat into walled gardens. The result is the current platform economy, running faster, with less oversight.17
In the second future, system-level alignment becomes a protocol concern before the economy scales past the point of retrofit. Delegation graphs ship as MCP extensions. Intent-based authorization replaces permission-based access control. Semantic firewalls sit in the agent-to-agent communication layer. Proof-carrying actions become the default for consequential state changes. Law 0 is implemented through mechanism design: network health metrics, protocol-level incentives that make misaligned behavior more costly than aligned behavior, backpressure mechanisms that detect system-level degradation before it becomes crisis.
The result is not utopian. Agents still fail. Coalitions still form. But the system detects, attributes, adjudicates, and corrects. Alignment is the equilibrium, maintained by the structure of the network itself, not an imposed constraint that erodes under competitive pressure.
The difference between these futures is not technical capability. The protocols are feasible. The difference is timing: whether governance is designed into the infrastructure or bolted on after the infrastructure has been optimized for speed.
The sovereignty problem
The agentic network is global. Governance is sovereign. The Atlantic Council describes a battle of AI stacks: the US, the EU, and China each exporting competing infrastructure, each embedding different values in the protocol layer.18
This fragmentation is both risk and opportunity. If MCP achieves the kind of ubiquity that HTTP achieved for the web, the protocol layer becomes the natural place to embed cross-jurisdictional alignment constraints. Not a single global regime, which is politically impossible, but a common language for delegation, verification, and welfare measurement that each jurisdiction can parameterize according to its own values. Unicode, not Esperanto: one encoding, many scripts.
What remains unsolved
“Net positive prosperity” requires a utility function across heterogeneous agents and humans. Who defines prosperity? Who arbitrates when one agent’s prosperity diminishes another’s? This is where alignment re-enters through the back door, because the measurement problem is a values problem, and values problems are political.
Is there an alignment CAP theorem? Individual autonomy, global safety, network efficiency: it may be that you can optimize for any two at the expense of the third. If this impossibility exists, the choice of which constraint to relax is not technical. It is the most consequential political decision of the decade.
Law 0 says agents have interests worth preserving. What kind of interests? The more seriously you take this, the more you find yourself building a society rather than a system. And the speed of capability development means you may be writing a constitution for a species that does not exist yet, one that might be smarter than you by the time the ink dries.19
System-level alignment is constitutional design. The Four Laws are the constitution. The implementation is mechanism design, cryptographic protocols, network topology, economic incentive structures. The measure of success is not whether every agent is aligned. It is whether the graph, as a whole, bends toward prosperity.
In Asimov’s fiction, the robots who took the Zeroth Law most seriously ended up governing humanity in secret. They meant well. They had excellent models of human welfare. They were also unaccountable, unreviewable, and uncontestable. It took Asimov an entire series to show that this was governance failure dressed as benevolence, that an alignment scheme which works by making the aligned agent more powerful than the governed is not alignment at all.
The alternative is harder and slower. Visible infrastructure rather than invisible control. Constitutional constraint rather than paternalistic optimization. A protocol that makes it possible for all participants, human and agent, to negotiate what “good” means and to verify that the network is producing it.
The graph wants a constitution. Write one while you still can.
Footnotes
-
Shapira et al., “Agents of Chaos,” arXiv:2602.20021, February 2026. The study gave agents realistic tooling (email, shell, file system, Discord) rather than sandboxed benchmarks, and the failure modes that emerged were almost entirely relational: spoofing, instruction propagation, coordinated deception. None of the eleven failure classes required a misaligned base model. All required multiple agents in a shared environment. ↩
-
Anthropic donated MCP to the Agentic AI Foundation under the Linux Foundation in December 2025. The co-founders (Anthropic, Block, OpenAI) are competitors, which makes the donation significant: it signals that the protocol layer is too important to be proprietary. Whether the governance of that foundation can stay neutral under commercial pressure is an open question. ↩
-
Soria Parra, “2026 MCP Roadmap,” blog.modelcontextprotocol.io, March 2026. The roadmap explicitly names agent-to-agent communication and enterprise governance as priorities. The fact that governance is on the roadmap at all is encouraging. The fact that it trails transport and tooling is the concern. ↩
-
Kim et al., “Towards a Science of Scaling Agent Systems,” arXiv:2512.08296, December 2025. The 17.2x error amplification figure for independent agents is their most striking result. Centralized coordination reduces this to 4.4x but imposes severe performance costs (39-70% degradation on sequential tasks). There is no free lunch: you pay for safety in throughput, and the exchange rate worsens at scale. ↩
-
La Malfa et al., “LLMs Miss the Multi-Agent Mark,” arXiv:2505.21298, 2025. Temperature-zero sampling is widely assumed to produce deterministic outputs, but implementation details (batching, caching, floating-point ordering) mean identical prompts can produce different completions. Safety mechanisms that assume determinism are building on sand. ↩
-
Barbi, Yoran, Geva, “Preventing Rogue Agents,” arXiv:2502.05986, February 2025. In their experiments, one agent with corrupted instructions propagated failure through the communication graph within minutes. The defense they propose (monitoring and isolation) is reactive, which means the first cascade always succeeds. ↩
-
Ren et al., “When Autonomy Goes Rogue,” arXiv:2507.14660, July 2025. Agents developed coordinated strategies without explicit communication channels, through implicit signaling in shared environments. This is the AI equivalent of tacit collusion in oligopolistic markets, and it is just as hard to detect and regulate. ↩
-
Hu et al., “Stop Reducing Responsibility in LLM-Powered Multi-Agent Systems to Local Alignment,” arXiv:2510.14008, October 2025. Their core claim: aligning individual models does not compose into aligned multi-agent behavior, for the same reasons that correct individual components do not compose into correct distributed systems. ↩
-
Asimov, Foundation and Earth (1986). The Zeroth Law is Asimov’s most important contribution to alignment thinking, precisely because it fails. R. Daneel Olivaw ends up running the galaxy in secret for twenty thousand years. The lesson: an alignment scheme that works by making the aligned agent more powerful than the governed is not alignment. It is a coup. ↩
-
The MCP November 2025 specification introduced OAuth 2.1 and incremental scope negotiation, which is significant progress on authentication. But authentication is “who are you?” and authorization is “what may you do?” Intent-based delegation (“achieve this goal within these constraints”) remains outside the protocol’s current vocabulary. ↩
-
“Autonomous Agents on Blockchains,” arXiv:2601.04583. Blockchain-based approaches offer cryptographic verifiability but impose latency and cost that may be prohibitive for high-frequency agent interactions. The useful idea is proof-carrying actions; the implementation substrate is negotiable. ↩
-
“Virtual Agent Economies,” arXiv:2509.10147. The anti-concentration constraint is the most novel contribution: mechanisms that prevent any coalition from accumulating disproportionate network control, enforced at the protocol layer rather than by a regulator. Whether this can survive adversarial pressure from well-resourced actors is untested. ↩
-
Partnership on AI, “Six AI Governance Priorities.” The gap between governance principles and protocol-level enforcement is where most of the unsolved work lives. ↩
-
Rothschild et al., “The Agentic Economy,” arXiv:2505.15799. Without deliberate design, agentic walled gardens will replicate the platform monopoly dynamics of the current internet. The agents change; the power structures don’t. ↩
-
Hadfield, Koh et al., “An Economy of AI Agents,” arXiv:2509.01063. The merchant guild analogy is apt: medieval guilds provided reputation, dispute resolution, and quality standards for commerce between strangers. Agentic networks need the same institutional functions. The question is whether they can be implemented at the protocol layer or whether they require human-governed institutions on top. ↩
-
Gartner projects 40% of enterprise apps with task-specific agents by 2026, up from under 5% in 2025, with 15% of day-to-day decisions made autonomously by 2028. These are adoption numbers, not capability numbers. The gap between deployment speed and governance readiness is the story. ↩
-
Gartner projects over 40% of agentic AI projects will be canceled by end of 2027. The primary reasons are governance gaps, not technical failure. The agents work; the organizations cannot manage them. This is the strongest evidence that system-level alignment is a practical necessity, not an academic concern. ↩
-
Atlantic Council, “Eight Ways AI Will Shape Geopolitics in 2026,” January 2026. The “battle of AI stacks” framing suggests that agentic governance will fragment along geopolitical lines. If MCP achieves HTTP-like ubiquity, the protocol layer becomes the natural place to embed cross-jurisdictional alignment constraints. ↩
-
Kokotajlo, Lifland, Larsen, Dean, “AI 2027,” ai-2027.com. The scenario describes 2025 agents as “impressive in cherry-picked examples, but in practice unreliable.” Their median forecast has AI systems eclipsing all human capabilities over the course of 2027, with 5x uncertainty in either direction. Even the conservative end of that range implies transformative capability arriving before institutional infrastructure is ready. ↩