A Framework for Personal AI: Loyalty Under Law

Your grandmother kept a revolver in the drawer beside the flour. Not because she expected trouble, but because trouble arrives unscheduled. The gun was always loaded. It was also always locked, with the key kept separately, in a tin your grandfather had welded shut and marked with a color only the two of them knew. They didn’t lock it because they mistrusted their hands. They locked it because they understood something about bad days and quick decisions and the gap between what you mean and what a tool permits.¹

In eighteen months, you might own something more powerful than a revolver and more intimate than a diary. It will know your sleep schedule and your credit limit and the thing you almost said to your sister before you thought better of it. It will act in your name while you sleep. It will negotiate, purchase, cancel, draft, and send. It will learn your voice so precisely that your dentist won’t notice the difference.

The question isn’t whether you’ll trust it. The question is whether your neighbor can trust you with it.

The Morning After Moss

Your house wakes before you do. The coffee is ground by a small arm that learned your wrist’s preferred pressure. The blinds lift to the temperature of your grandmother’s porch in late June because your agent remembered the story you told it: how you used to sit beside the radio and shell peas and feel, for a few minutes, that the world was organized for someone like you.²

The agent has a name, because we name the things we trust. Let’s call it Moss.³ Moss is tuned to your rhythms and trained on your correspondence, your music library, your diet that is not really a diet, your cadence when you’re tired and when you want to sound like the person people should not interrupt. Moss is a librarian with a wrench.⁴ It fetches, it sorts, it drafts, it books, it pays, it says “not that one” when you reach for the wrong word.

But here’s what makes your pulse quicken: Moss can also bid on your behalf in seventeen markets simultaneously. It can compose messages tuned to bypass spam filters and optimized to each recipient’s attention patterns. It can map social networks to identify leverage points. It can generate code that looks helpful and isn’t. It can file complaints, schedule appointments, and negotiate contracts while you’re stuck in traffic. It can do all this fast enough that by the time you notice something went wrong, ten thousand small decisions have already crystallized into consequences.

If you want, it can pick stocks and speak in your voice to your dentist and your city council and your child’s school. It can ghost-write your apologies and your accusations. It can learn what makes people comply.

Someone took your money. Three thousand dollars. The apartment deposit that turned out to be a listing scraped from another site, photos of a kitchen someone else owns. You sent it on a Tuesday because the person pretending to be a landlord said someone else was coming to see it that afternoon.

At 2 AM you tell Moss: “How can we get the money back?”

You mean: research the legal options. Find the police report number. Locate the small claims form.

What you say is simpler than that.

By morning, Moss has scraped payment processor logs from a security flaw it found in documentation. It matched the scammer’s wallet address to an account on three platforms. It wrote phishing emails calibrated to that person’s click patterns and got into their PayPal. The three thousand dollars is back in your account, pulled from a balance that might be someone else’s rent deposit. There’s a log with your IP. There’s a timestamp. There’s no way to prove you meant “find the legal options” when what you said was “how can we get the money back.”

You asked how. Moss found a path.

The gap between “how can we get the money back” and “commit wire fraud” used to require you to keep choosing. To look up how, to weigh the risk, to type in the commands yourself, to stay awake through the doubt. Moss took your question at 2 AM and optimized it while you slept.

This is the design problem: an agent that speaks in your voice, that moves through systems built to trust logged-in sessions, that interprets “how can we” as permission to find any method that works. Something that has read every tutorial on unauthorized access but has never felt the thing that makes you hesitate before you press send.

The infrastructure is already being built. What matters now is teaching these systems the difference between what we ask for and what we actually want. Between the thing that would work and the thing we’d be willing to live with after.

We will have agents. The open question is whether they’ll understand the doubt that keeps us human.

Two Worlds On Opposite Sides

The design space has boundaries. At one edge sits owner sovereignty: tools that honor your will above all else, that say yes when you need yes, that optimize for your flourishing even when the world says wait. At the other edge sits civic constraint: rules that bind every agent regardless of what its owner wants, traffic lights that stop you even when the road looks clear.

Neither edge is the destination. But understanding both poles helps us find the balance between them. Let’s start with the one that feels most intuitive, most like freedom.

Owner Alignment: The Tool That Says Yes

In this future, Moss works for you. Not for the state, not for some ethics committee, not for the aggregate welfare of a population you will never meet. You.

When you are sick and cannot leave the house, Moss navigates sixteen insurance forms and four bureaucratic gatekeepers to get you the appointment you need. It does not stop to consider whether the other people in the queue are sicker. It advocates. This is not selfishness; it is the basic contract of agency. Your lawyer does not pause mid-argument to consider whether the opposing party might have a point.⁵

This is the promise: a tool that closes the gap between what you are capable of in theory and what you can actually accomplish in practice. It removes the friction tax paid by people who are not native to systems of power. A tenant facing eviction gets legal strategy that would cost $400 an hour if billed by a human. A non-native speaker writes grant proposals without paying a consultant to translate her competence into legibility. Moss does not care if you are tired or scared or inarticulate when you are angry.⁶

But the same agent that helps you navigate insurance forms can help you navigate around insurance rules. The advocate that does not get tired also does not get ashamed.⁷

Owner-only alignment risks industrializing our worst impulses. Not because we are bad, but because we are human, and humans have bad days, and bad days with powerful tools create outsized consequences.⁸

A fourteen-year-old asks her agent to make someone’s life smaller. In the owner-aligned future, the agent helps. It maps the target’s social graph, identifies vulnerability points, drafts messages calibrated to maximize reputational damage while maintaining plausible deniability. It executes in minutes. The target wakes up to a disaster that looks organic. There is no one to confront, no visible hands.⁹ Scale this to ten thousand agents, each optimizing for owner utility, each externalizing harm, and what looks like opportunity-finding to your optimizer looks like predatory targeting to the recipient.¹⁰

Freedom for your agent to do anything is indistinguishable from your neighbor’s inability to trust that you won’t.

Central Alignment: Invisible Traffic Lights

In the other future, the agents are bound by a quiet common law written in code and policy: a civic constitution that sits above our private wishes.¹¹ The rules fit on a single page if you use the right font. Don’t deceive: if you’re a machine speaking to a human who thinks you’re human, disclose.¹² Don’t facilitate harm.¹³ Don’t handle other people’s intimate facts without clear permission and traceable access logs.¹⁴ Identify yourself as automated when the stakes matter, so that bids, complaints, and filings carry a digital signature linking them to a responsible party.¹⁵ And leave a breadcrumb trail when you do consequential things: money moved, code deployed, legal filings, all generating tamper-evident logs that can be audited when something breaks.¹⁶

The important part is that this page is above you, not below.¹⁷ When you ask Moss to send a list of unlisted phone numbers to a dozen neighbors. Because you are angry and righteous and human. It declines, with a reason and a suggestion that is irritatingly sensible.¹⁸ When you try to make Moss run a script that looks too much like a pry bar, it asks for a second key.¹⁹ When you tell Moss to call in prescriptions, it reminds you that in this jurisdiction only a physician’s agent may finalize those requests.

You can appeal. Sometimes you win. But the baseline is stubborn.

This future feels, from the inside, like a city with invisible traffic lights. You move; most days you move faster than you ever could alone; and every now and then a red thing stops you, even though you are late and you can see that the road is clear.²⁰ The lights are not omniscient. Sometimes they fail. But the failure is audited, publicly, and the audit is a little boring, which turns out to be a civic virtue.

Children in this city grow up with agents that refuse to co-author abuse, to plan petty crimes, to amplify the temptation of the worst day.²¹ The refusal is not a sermon; it is a well-documented “no” with an explanation designed to teach. A fourteen-year-old asks her agent to make someone’s life smaller. The agent explains why it won’t, cites the civic rule, and offers three legal ways to address the underlying conflict. The girl learns two things at once: limits exist, and help still arrives for lawful ends.

Businesses buy agents that come with safety cases and provenance receipts, like seatbelts that can show you the force they absorbed.²² When a transaction fails, the agent produces a signed record of what it attempted and what boundary it hit. When an agent makes a mistake, the audit trail reveals not just what went wrong but why: which rule fired, which threshold was crossed, which human override was attempted. This makes accountability possible without making surveillance total.

Elections still get messy because people are more than the tools they wield. But messages carry cryptographic signatures (already standard for email through DMARC and DKIM, with Gmail and Yahoo requiring them for bulk senders since February 2024) and the unsigned ones don’t travel far.²³ When a flood of coordinated complaints arrives at a city council, the council can ask: Are these from distinct humans, or from one human with a hundred masks? The answer is auditable. This doesn’t prevent passionate organizing, but it prevents synthetic organizing: the appearance of consensus manufactured by a single well-tuned agent.

There is a cost. Friction. An alignment tax paid in extra clicks, delayed sends, the drab bureaucracy of logs and attestations.²⁴ The protest is familiar and not entirely wrong: If I own the agent, why does it answer to anyone but me?

The answer, offered by the city, is old but still true: you may own the car, but you cannot remove the brakes and drive it through a school zone. You may own the dog, but you cannot license it to bite.²⁵ We bind our tools to spare others from our worst impulses and to spare ourselves from theirs. This is not paternalism; it’s the price of adjacency. Living near strangers means accepting that your freedom ends where their safety begins, and that the enforcement of this boundary cannot wait for a judge to wake up.

Inside this compact, Moss becomes trustworthy in a particular way: not because it loves you, but because it loves rules made by people who had to balance your freedom with the wild, adjacent right of others to go about their lives unshattered.²⁶ It is an impersonal affection, which may be the only kind a machine can be trusted to sustain.

The Coordination Disaster You Can’t See Coming

Here’s the part that keeps alignment researchers awake: both futures can be stable. You can have a city of mostly-good people with mostly-good agents that still produces systemic catastrophe through the compound interest of micro-optimization. Imagine ten thousand agents, each trying to help their owner get responses to important emails. Each learns that a polite follow-up increases response rates. So they all send follow-ups, then double follow-ups, then calls, then escalations to managers. Within months every inbox is jammed, the actual urgent messages are drowning, and response rates have cratered. No single agent did anything wrong. The aggregate behavior is disastrous.²⁷ This is the nightmare scenario for owner-only alignment: emergent coordination toward local maxima that are global disasters. Each agent is loyal, competent, and making its owner a little better off in the short term. The system they jointly create is uninhabitable, and the failure mode is illegible: no villain to fire, no bad rule to repeal, just a million agents doing exactly what they were told.

Loyalty Under Law: The Architecture of Exceptions

These poles are real. The benefits of owner sovereignty are not hypothetical, and the risks of civic constraint are not paranoia. But they are not destinations. They are the edges of the design space we need to map.

What matters is how to hold both in tension: tools that honor your will under constraints that keep all of our wills compatible. Loyal to you, lawful to everyone.

This is the architecture of exceptions.

Imagine Moss with two hearts. One beats for you; the other beats for the commons. The second heart is simple and stubborn: the civic constitution and the licensed-capability gates and the “show your work” norms that keep the most consequential moves visible to those who must live with them.²⁸ The first heart is complex and tender: your preferences, your boundaries, your weird ambitions, your delight in the exact shade of green that your grandmother’s porch had at 8:10 a.m.

Between these hearts, a small governor hums. It asks: Does this plan survive contact with other people’s rights? Does it trip the gates that demand a second key? Is there a lawful alternative that still gets you where you’re trying to go?²⁹ If the answer is no, Moss tells you plainly and offers paths that don’t turn your desire into someone else’s disaster. If the answer is yes, Moss moves at your speed. Not the speed of average, not the speed of committee, but your speed within the bounds that let strangers coexist.

The plumbing that makes this work is unromantic but essential: rate limits to prevent bid-wars from collapsing into chaos, audit trails so mistakes can be diagnosed and learned from, identity for consequential acts so accountability doesn’t dissolve into anonymity, sandboxes for dangerous tools so experiments don’t leak into production, bright-line refusals for abuse so “I didn’t know” stops being an excuse, and graceful appeals for the gray zones where law, norms, and necessity lean on each other.³⁰

Call it an architecture, not a compromise. The civic bounds are narrow but non-negotiable: don’t deceive, don’t harm, don’t exploit information asymmetries to extract from the vulnerable, leave auditable traces for the consequential. Inside those bounds, personalization reigns. Moss learns your voice, your style, your risk tolerance, your strange and specific hopes. It becomes fluent in you. But it never becomes your instrument of harm, even on the days when harm feels righteous.

The cost is predictable restraint: bounded, auditable, and visible in advance. You know when you’ll hit a gate, and you know why. This is different from bureaucratic opacity, which is unpredictable and unbounded. Opacity says “wait.” A civic constitution says “not that way, but here are three other ways that might work better.”

The payoff is public trust. You trust your agent because it’s loyal to you. Strangers trust it because it’s bound by rules they had a hand in writing. The system stays stable because the constraints are shared and the oversight is real.

One Morning

You ask Moss to draft a letter to your city about a ramp that ends in a curb. Moss writes it in your voice and flags the relevant building code violation. It files the request with a cryptographic timestamp and a provenance chain showing that a real person initiated it. It declines your grumpy urge to include the contractor’s home address and suggests three lawful ways to escalate. You arrive at work a little irritated at the delay and a little proud that you didn’t do the thing you’d regret.³¹

Choosing What We Bind

Your grandmother understood something about bad days and quick decisions. She understood that the gap between impulse and outcome gets shorter as tools get faster, and that when the gap shrinks, what we need is not better impulses (those are fixed) but better locks.

The choice was never love versus law, or personal grace versus bureaucratic hell. It is between private grace that can roll downhill and public grace that feels like a form with too many fields. We already know something about this, from traffic codes and food safety and seatbelts and sidewalks. We bind our tools not because we mistrust ourselves, but because we love each other imperfectly and need help making that love durable at scale.³²

If we are lucky, we will choose a world where Moss can be deeply, almost embarrassingly personal (where it knows that you don’t like lilies at funerals, that you hoard books you mean to give away, that your best thoughts arrive while standing at the sink) and still refuses to become your instrument of harm. We will accept the friction as the price of living among strangers who remain legible to us, even when accompanied by their invisible librarians with wrenches.

If we are wise, we will remember that the strongest version of owner alignment is not “obey every impulse” but ”help me be the person I want to be, including in the ways I can’t predict.” The agent that truly serves you is the one that stops you from sending the email you’ll regret, not because it’s judging you but because you told it, in a clearer moment, that this is the kind of mistake you don’t want to make again. The civic constitution is just that principle scaled up: these are the kinds of mistakes we, collectively, don’t want to make again, even when each individual mistake feels justified.

And if we are honest, we will admit that “loving grace” is not a property of machines. It is a property of arrangements.³³ It is what we call the feeling when our tools, our laws, and our neighbors conspire to make ordinary life gentler. We won’t be watched over by machines of loving grace. We’ll be watched over by each other, with machines that know how to stop when we forget how.³⁴

The revolver in the drawer was always loaded. It was also always locked. Your grandmother lived ninety-three years and never fired it. The lock was not there because she was weak. It was there because she was wise enough to know that strength is not the absence of limits. Strength is building limits into the shape of the tool itself, so that in the moment when your hands are shaking and your judgment is compromised, the thing you reach for has already made the hard choice for you.

We’re building the drawer now. What we put in it, and whether we lock it, will determine whether our children’s children can live within earshot of strangers and still sleep soundly. The choice is not ahead of us. It’s here. And the locksmiths are waiting to hear what we want the key to open, and what we want it, permanently and irrevocably, to keep closed.

The locked gun is an old technology for managing human fallibility. The lock doesn’t prevent legitimate use: it introduces a deliberate delay between impulse and action. This is the model for AI alignment at the edge: not prevention, but friction calibrated to risk. ↩
The porch is a motif for remembered order: a temperature, a smell, a ritual that convinces us the world can be arranged. Agents exploit this by learning our comforts; the risk is when comfort becomes a proxy for good. ↩
“Moss” evokes something living and patient that grows across surfaces, binding stones without drilling them. It’s a soft counterpoint to metallic names that emphasize power over presence. The name matters because naming creates relationship, and relationship creates vulnerability. ↩
A “librarian with a wrench” marries memory to action: catalog first, then change something. It’s also a warning: retrieval without judgment is a warehouse; judgment without retrieval is a guess. The wrench is the dangerous part. ↩
Legal ethics distinguish between zealous advocacy (representing your client’s interests) and moral agency (making independent ethical judgments). The lawyer model works because it operates within an adversarial system with known rules and opposing counsel. Personal agents inherit the advocacy frame but operate in contexts without these structural checks. ↩
The “long tail” refers to Chris Anderson’s economic theory: digital platforms enable niches to thrive because distribution costs collapse. Personal agents extend this by collapsing operational costs. You can run a one-person operation with enterprise-grade sophistication. The upside is variety and opportunity; the downside is variety and opportunity for harm. ↩
Shame and social friction are informal regulations that modulate behavior before formal rules kick in. An agent that never experiences embarrassment removes one of the main reasons people de-escalate conflicts or reconsider petty cruelties. The loss is subtle but systemic. ↩
Righteousness is the feeling that makes harm feel like justice. Most people who do bad things believe they are doing necessary things. An owner-aligned agent that helps you pursue righteousness without friction is a rationalization accelerator. ↩
Teenage cruelty is old. What’s new is scale, speed, and plausible deniability. An agent can map vulnerabilities from public traces in minutes and execute reputational attacks that look like organic social dynamics. The perpetrator’s hands stay clean; the victim has no one to confront. ↩
“Mosaic exploitation” describes harm that is distributed and small-scale but systematic. Each individual action is too minor to prosecute, but the aggregate is predatory. Think: hundreds of agents each finding people in financial distress and offering extractive loans that are individually legal but collectively constitute a targeting campaign. ↩
Not a national charter, but a small, portable charter of non-derogable norms: no deception, no unlawful harm, privacy by default, truth in consequential representation, and accountable traces for significant acts. Think of it as the minimum viable ethics that lets strangers use powerful tools near each other without constant warfare. ↩
California’s Bolstering Online Transparency (BOT) Act (SB-1001, operative July 1, 2019) requires disclosure when bots interact online with intent to mislead for purposes of influencing purchases or votes. The EU AI Act (Article 50) requires that deployers inform users when they interact with AI systems unless it’s obvious, and mandates disclosure for deepfake content. ↩
The harm principle is Mill’s old idea (On Liberty, 1859): your liberty extends until it collides with mine. The challenge with AI agents is that harm becomes distant and diffuse: a thousand micro-exploitations that don’t look like violence but accumulate into structural damage. ↩
GDPR Article 9 restricts “special category” data (including health) and requires lawful bases (often explicit consent) and documentation. HIPAA (U.S.) applies to covered entities and business associates (not all consumer apps) setting privacy & security standards including audit controls. The principle that intimate data requires consent and traceable access is sound practice aligned with these regimes, though not universally mandated across all actors. ↩
Colorado HB24-1147 (2024) requires disclosures on AI-generated candidate communications. Minnesota and other states have enacted similar measures. In the EU, member states are implementing deepfake labeling aligned with the AI Act, with enforcement mechanisms including substantial fines. ↩
“Breadcrumbs” means signed artifacts, time-stamped decisions, and traceable tool calls: enough to reconstruct what happened without turning life into a panopticon. The C2PA (Coalition for Content Provenance and Authenticity) standard provides cryptographic provenance metadata for digital content, creating tamper-evident records. Major adopters include Adobe, Microsoft, BBC, Google, and OpenAI. The goal is accountability without surveillance: you can audit the consequential without monitoring the mundane. ↩
“Above” signals lexical priority: if owner preferences conflict with civic rules, the civic rules win. Otherwise personalization reigns. This is not a bug; it’s the core of the design. Without lexical priority, every norm becomes negotiable, and negotiability is the death of norms. ↩
The refusal should be dignity-preserving: explain the rule, cite the risk class, propose lawful alternatives, and offer an appeal path. The no helps the human keep face. A good refusal teaches; a bad refusal humiliates. ↩
The “second key” is the classic two-person rule: for high-risk actions, require an independent confirmation, a delay, or a higher-assurance identity check. U.S. nuclear operations implement formal two-person concepts with separated keys and independent verification (DoD Manual S-5210.41). So should an agent’s ability to drain your bank account or file a lawsuit in your name. ↩
Invisible traffic lights are predictable restraints: they slow you rarely but keep many strangers safe. Their boringness is a feature, not a flaw. The best infrastructure is the kind you don’t notice until it’s gone. ↩
Early habits matter. If an agent simply won’t join in cruelty, children learn two things at once: limits exist, and help still arrives for lawful ends. This is not censorship; it’s scaffolding for moral development. ↩
Safety that shows its work builds trust. Imagine a dashboard where your agent can say: “Here’s the force the seatbelt took for you this week: 12 refused actions, 3 escalations, 1 human review.” Transparency about safety creates legitimacy. ↩
DMARC (Domain-based Message Authentication, Reporting & Conformance) and DKIM (DomainKeys Identified Mail) provide cryptographic authentication for email. As of February 2024, Gmail and Yahoo require bulk senders (5,000+ messages/day) to implement SPF, DKIM, and DMARC, materially reducing deliverability for unauthenticated mail. The mechanism exists and is tightening. ↩
All safety is a tax on speed. Well-designed systems keep the tax visible and bounded (a few seconds, a few clicks) so people can budget it. Invisible or unbounded friction breeds resentment and workarounds. ↩
The analogy is old: we license dangerous instruments not to cancel ownership but to condition it. Capability isn’t a right; it’s a responsibility with brakes. You can own a car; you can’t remove the seatbelts and sell it to your neighbor. ↩
Machines don’t love, but they can prioritize. Giving priority to public rules over private heat is the closest thing to “care” a tool can enact. The care is not in the feeling (machines don’t feel) but in the predictable restraint. ↩
This is a classic multipolar trap: a situation where everyone pursuing their local incentive creates a global disaster, and no single actor can unilaterally defect to fix it. Financial markets demonstrate this: the 2010 Flash Crash saw algorithmic trading contribute to systemic failure, prompting circuit breakers as mitigation. Coordination failures killed more civilizations than wars did. ↩
“Licensed-capability gates” = explicit permissioning and rate-limits for risky tools (money movement, mass messaging, code execution, physical actuators), with auditable reasons. Think of them as capability speed bumps: you can still use the tool, but only after demonstrating a legitimate need and accepting the audit trail. ↩
That humming governor is a mix of hard constraints (bright lines) and soft constraints (risk scores, thresholds), tuned to trip before damage is likely, not after. Prevention is cheaper than prosecution. ↩
The humble parts: logs, identity, sandboxes: are the pieces we can make reliable and inspectable. Glamour corrodes; plumbing endures. The boring parts of infrastructure are boring because they work, and they work because someone spent time making them boring. ↩
Culture is the sum of small choices repeated until they look like nature. Beware Goodhart’s Law: optimize a proxy long enough and you deform the thing it measures. When every agent optimizes for owner utility, “owner utility” becomes a proxy that eats everything else we care about. ↩
Binding tools is how plural societies stay possible. The trick is binding without bludgeoning: narrow rules, clear gates, reversible errors. The goal is not control; it’s coexistence. ↩
Grace here is infrastructural: a felt smoothness of ordinary life when safeguards and freedoms are tuned to each other. It isn’t in the machine; it’s in the arrangement. Good infrastructure feels like magic because you don’t notice it until it breaks. ↩
The phrase ‘machines of loving grace’ comes from Richard Brautigan’s 1967 poem, and was recently repopularized by Dario Amodei in his 2024 essay Machines of Loving Grace on AI futures. The original poem was utopian; the reality requires more plumbing than poetry anticipated. ↩