A university cluster runs a fairshare scheduler. The algorithm is simple: if you’ve used a lot of compute recently, you wait longer; if you’ve used less, you move up. Sounds neutral. It isn’t.
On shared clusters, you don’t “run code.” You submit a job request (how many GPUs, for how long, with what memory) and a scheduler decides whose work starts now and whose work waits. That decision is the gate to participation. And the gate is configured.
And the queue doesn’t start at Slurm. Before you submit anything, someone decided who gets an account, what partitions you can touch, what counts as “allowed use,” what gets logged, and what you’re permitted to export. Admissions is policy too. It’s just policy that happens once, quietly, off to the side, so it feels like “not the queue.”
Here’s what fairshare actually encodes: on many clusters, usage is tracked at the lab or project account level, not the individual level.1 A student joining a group that’s been running heavy workloads starts “in debt” before submitting her first job. The algorithm doesn’t know whether low individual usage means “I’m new” or “I’m inside a high-usage account”; it doesn’t explicitly privilege first contact. “Neutral” scheduling quietly inherits institutional hierarchies.
Or: many clusters give scheduling preference to larger jobs because packing big allocations is more efficient than juggling many small ones.2 The researcher testing an idea with a ten-minute experiment waits behind the established lab running production workloads. The efficiency metric encodes a value: scale matters more than exploration.
These aren’t bugs. They’re the scheduler working as designed. The queue is always policy. The question is whether we admit it.
The library closes at six. A student arrives at 5:47 and submits a GPU job that should take minutes. Queue position: 1,847. Estimated wait: 11 hours. She closes the laptop.
What’s Happening
In January 2025, OpenAI announced Stargate: $500 billion over four years for AI infrastructure, with $100 billion deployed immediately.3 By September, they reported nearly 7 gigawatts of planned capacity and over $400 billion committed, ahead of schedule.4 In Memphis, xAI built what it calls “the world’s largest” AI supercomputer in 122 days, now expanding toward 2 gigawatts.5 Microsoft, Alphabet, Amazon, and Meta are each guiding to roughly $70-125 billion in annual capex, with data centers and AI infrastructure described as the dominant drivers of the increases.6
The NAIRR Task Force, the U.S. government’s own blueprint for a national AI research resource, estimated that a real public compute floor would cost $2.6 billion over six years.7 Annualized, that’s roughly $430 million per year. Stargate’s $500 billion over four years works out to $125 billion per year. The ratio is about 290 to 1.
These aren’t perfectly comparable numbers. Private capex bundles land, power, buildings, and proprietary stacks; a public floor would be structured differently and would not try to match frontier training capacity. NAIRR’s estimate also mixes acquisition with operating functions in a way that doesn’t map neatly onto a hyperscaler’s capex line. But even if you haircut the private number aggressively, the order-of-magnitude gap in political commitment is still the story.
And yes: there is a NAIRR Pilot.8 The point of the Pilot is to learn. The point of a floor is to exist. The Pilot is not a durable entitlement layer at the scale the Task Force described, and it doesn’t solve the procurement reality of supply-constrained silicon. The private buildout is ahead of schedule; the public floor is still a proposal.
This concentration isn’t inherently evil. Private investment built capacity faster than any government program could. But it means that access to compute, the precondition for participating in technical AI work, is increasingly governed by the pricing decisions and terms of service of a handful of companies answering to shareholders, not publics.
The question isn’t whether concentration happened. It’s what we do about it.
When I say “the right to compute,” I don’t mean a right to unlimited GPUs on demand. I mean a right to a baseline on-ramp: enough to learn, prototype, and participate. Paired with due process: published rules, consistent enforcement, and an appeal path when you’re denied. Not an open bar. A floor. And a constitution for the queue.
The Objection That Matters
The obvious argument for public compute is equity: the student who can’t pay, the clinic that can’t bid, the city planner priced out of running flood simulations. This argument is true but incomplete. It invites the response: subsidize the edges, let markets do the rest.
The real argument is about governance. And the strongest objection to public compute isn’t that markets work. It’s that public systems will be captured by the same institutions that already dominate research.
This objection is serious. Consider:
Cumulative advantage in science funding is well documented: the “Matthew effect” where past success predicts future awards.9 NSF allocations favor teams with track records and proposal-writing machinery. Even INCITE, DOE’s flagship allocation program, has to reserve 10% of allocatable time for an early-career track, precisely because proposal review at scale naturally concentrates access among established groups.10 Public compute could easily become another channel for the already-resourced to entrench their position.
The EU’s AI Factories, with their 19 sites and “free and fast access” language, could end up serving the same well-connected universities that already dominate European research funding.11 The UK’s AIRR could become a perk for Russell Group institutions while everyone else waits.
And queues can be gamed. Political pressure creates hidden fast lanes. Powerful PIs get their jobs expedited with a phone call. The scheduler looks neutral on paper while the actual policy is “know the right people.” We’ve seen this before. We’ll see it again.
So why build public compute at all?
The Answer to Capture
The problem isn’t that queues encode policy. They always do. The problem is that queues encode policy while pretending to be neutral technical systems.
A private cloud’s queue is priced. That’s a policy: access goes to those who can pay, with tiers for those who can pay more. But the policy is visible. You know the rules. You can decide whether to play.
A research cluster’s queue runs fairshare. That’s also a policy: recent heavy users wait, light users advance. But the weights are often opaque. How much does job size matter? How does priority decay? What happens when two users at the same fairshare level compete? The scheduler decides, and the decisions are buried in configuration files that users never see.
The constitutional move isn’t to eliminate policy from queues. It’s to make the policy visible, contestable, and accountable.
Imagine a queue where you can see your position, the reason for it, and what would change it. Where new users get explicit priority, because a public option that makes you prove you deserve access before you can learn is just credentialism with extra steps. Where the scheduling weights are published, the top decile’s share of capacity is tracked, and when someone jumps ahead for a health emergency you see the justification and receive a credit for next cycle.
The thresholds would be arguable. Demand spikes would stress soft guarantees. Determined actors would probe for loopholes. The point isn’t to get the numbers right on the first try. It’s to make them visible so they can be argued about and adjusted.
This doesn’t prevent capture. But it makes capture auditable. You can point to the logs and say: this policy claims to prioritize new users, but median first-contact wait times have tripled this quarter. What happened?
The Access Gap Shapes the Field
Here’s what I didn’t understand until I looked at who’s in the room for AI safety debates: the people discussing how to align AI systems are overwhelmingly the people who have compute to train them.
This isn’t a conspiracy. It’s selection. If you want to do empirical alignment research, you need to run experiments, which means you need GPUs. The researchers who can get GPUs are at well-funded labs or well-resourced universities. The researchers who can’t are… not in the conversation.
The result is that the values being encoded into AI systems are the values of people who already have access. The research agenda is shaped by what’s tractable for people with compute. The safety concerns that get attention are the ones legible to insiders.
Public compute wouldn’t fix this directly. But it would change who can participate in the conversation. A graduate student at a regional university with a genuinely novel idea about interpretability could actually test it. A nonprofit working on AI applications for public defenders could train a model without begging a hyperscaler for donated credits. The field’s Overton window would widen because different people would be inside it.
This is the non-obvious case for public compute: the access gap isn’t just unfair, it’s distorting. We’re building transformative technology, and the builders are a narrow slice of humanity, selected for resource access rather than insight.
One Story
Dr. Elena Kovacs is a composite; the details and locations are altered, but the situation is real and recurring. She’s an internist at a federally qualified health center in rural Wisconsin. Her clinic network serves about 28,000 patients across four counties, most on Medicaid or uninsured. Two years ago, she taught herself Python. Last year, she built a model to flag early sepsis from EHR vitals: temperature trends, heart rate variability, white cell trajectories. She trained it on borrowed compute: a colleague at a nearby R1 had unused GPU hours on an NIH-funded allocation and let her run jobs at 3 a.m. when contention was low.
The model works. In her validation cohort, it flags sepsis eight hours earlier than the standard SIRS criteria, with fewer false positives. She presented a poster at an informatics conference. Two other rural health networks in the upper Midwest asked if they could deploy it.
She can’t share it.
The weights were trained on time borrowed from a grant that didn’t cover this project. The cloud provider’s terms of service are ambiguous about whether model derivatives can be redistributed for clinical use. Her health center’s compliance officer won’t approve distribution without legal clarity. The university’s tech transfer office sent a letter asserting potential IP interest because their allocation was used, even though no university employee touched the code. There’s no clean chain of custody, no governance framework designed for this situation, no way to say definitively: these weights were trained on public resources and can be shared freely for public benefit.
So she has a model that catches sepsis eight hours early sitting on a hard drive. The other network will either reinvent it (some other physician teaching herself Python, scrounging borrowed compute, running training jobs at 3 a.m.) or they won’t, and the early warning will stay locked in one clinic while patients elsewhere wait for the blood cultures that Elena’s model could have ordered at shift change.
This is exactly what the NAIRR Task Force’s “NAIRR-Secure” concept was designed for: secure enclaves for legally protected health data, governed by the Five Safes framework, with clear protocols for privacy, civil rights, and stewardship of outputs, including what can be exported, under what review, and with what licensing defaults.12 A public option should default to a clear output regime: open by default for NAIRR-Open (weights, code, and results belong to the researcher); reviewed export with standardized data-use and IP terms for NAIRR-Secure (so Elena knows before she starts whether she can share). The architecture exists on paper. The appropriations don’t match the ambition. So Elena’s model sits in a drawer, and her neighbors keep using screening criteria from 1992.
The Sustainability Constraint
One more thing a public queue can do that a private queue can’t: govern its own footprint.
Data centers consumed roughly 1.5% of global electricity in 2024 (415 terawatt-hours) and demand is growing at about 12% annually.13 Large facilities can use up to 5 million gallons of water per day.14 The metrics to measure this exist: PUE (power usage effectiveness) and WUE (water usage effectiveness) are ISO-standardized.15 The EU now requires data center sustainability reporting, with public disclosure at least in aggregated form.16
A public compute option can make these constraints binding. Every job carries a carbon cost. That cost is visible to the user and counts against a sustainability budget. Flexible jobs (the ones that can run tonight instead of right now, or in a region with cleaner grid mix) get priority. The queue becomes a mechanism for governing externalities, not just access.
Private clouds can do carbon-aware scheduling. Some do. But they’re not accountable to anyone for the tradeoffs. A public option can publish the dials: here’s how much carbon we’re burning, here’s the marginal cost of running your job now versus overnight, here’s the share of our capacity going to water-stressed regions. The sustainability is auditable, like the access.
What It Takes
A baseline entitlement. Not a lottery, not a grant competition, but a floor. Enough compute to learn, to prototype, to participate. The number will be contested. Start somewhere. Adjust.
First-contact priority. The student’s first job runs fast. Merit review comes later, once she knows whether this is something she wants to pursue. You don’t make people prove they deserve access before they can discover what access enables.
Federated and strategic capacity. Federation should mostly mean a unified access plane: one identity, one allocation ledger, one policy layer, across many resource providers. Most jobs still run on a single site with a contiguous cluster; truly cross-site distributed training is a separate hard problem and shouldn’t be the default assumption.
But the harder constraint is physical. If the public option is serious, it can’t be built on leftover capacity and donated credits. In a tight supply environment, “writing a check” is not the same as securing silicon. A durable compute floor requires strategic procurement: at the mild end, long-term offtake and joint purchasing that can actually reserve supply; at the strong end, prioritization tools that governments already use for critical infrastructure during shortages. The point is not to nationalize anything. It’s to acknowledge that a public floor is infrastructure, and infrastructure requires supply certainty.
That procurement reality implies something else: access isn’t only a scheduling problem. It’s an on-ramp problem.
Human middleware. Access to a GPU is useless if you don’t know how to containerize a workload or optimize for throughput. A public option that simply hands out logins tends to be captured by those who already possess deep technical literacy. We need a corps of Research Software Engineers (RSEs) whose explicit mandate is to unblock new users, optimize inefficient code, and provide the “on-ramp” support that a commercial cloud’s sales engineering team provides to enterprise clients. If the interface is just a raw SSH terminal, we haven’t democratized anything; we’ve just subsidized the computer science department.
Visible governance. Publish the scheduling weights. Track top-decile share. Report wait times by user class. Make capture auditable.
An accountable operating entity. Someone has to set the weights, hear appeals, and decide when emergency lanes are justified. The NAIRR blueprint proposes an Operating Entity; the question is what constraints it operates under. At minimum: published meeting minutes, independent annual audit, formal appeals process with response SLAs, statutory limits on policy changes without public comment, and a governance board that includes researchers, public-interest representatives, and rotating members from underrepresented institution types. The goal isn’t to eliminate discretion but to make discretion visible and contestable.
A minimum viable year one. If you want this to be real, build the smallest version that changes who can participate:
- Stand up the Operating Entity with published queue policy from day one.
- Launch first-contact access broadly (the learning/prototyping floor), with dashboards that can embarrass you if it fails.
- Pilot NAIRR-Secure with a small number of high-need enclaves (health, education, public administration) so outputs like Elena’s aren’t trapped in legal fog.
- Staff the RSE on-ramp like it matters, because it does.
- Publish monthly metrics: median first-contact start time, 90th-percentile wait time, top-decile share, emergency-lane usage, and sustainability burn.
Don’t start with “a grant program for excellence.” Start with a floor that lets new people try.
Durable funding. The E-Rate program has funded school connectivity for nearly 30 years through the Universal Service Fund: a small, broadly shared contribution mechanism administered by the FCC.17 The model matters more than the exact dollar figure: a dedicated, boring, durable revenue stream that turns “connectivity for schools” from a yearly budget fight into infrastructure. Compute can follow the same pattern. The CREATE AI Act would establish NAIRR formally.18 The blueprint exists. The hard part is treating it like the infrastructure it is.
The Real Stakes
The case for public compute isn’t that private provision is evil. It’s that the queue is where we decide who gets to participate in technical work, and we’re currently outsourcing that decision to a handful of companies’ terms of service.
The student who closes her laptop at 5:47 isn’t failing a class. She’s being sorted out of a field. The clinic with the life-saving model in a drawer isn’t just frustrated; it’s part of a pattern where the benefits of AI accrue to the resourced while the underserved wait. The city planner who can’t run the better flood simulation isn’t just inconvenienced. He’s making a $40 million bond decision with worse information than he could have.
These are governance failures, not market failures. The market is working as designed. If we want different outcomes, we need a public option that encodes different values, visibly, so we can argue about them in the open.
The student closes her laptop. Queue position: 1,847. Estimated wait: 11 hours.
Somewhere in Wisconsin, Elena checks the hard drive where her model sits. Still there. Still unshareable.
Right now, it’s policy that no one voted for, no one can audit, and no one the student could talk to has the power to change.
Footnotes
-
Slurm’s multifactor priority documentation describes fairshare as prioritizing jobs from accounts that are “under-serviced” relative to their allocated share, with usage tracked at the account level. The SchedMD “Priority and Fair Trees” documentation explicitly illustrates how hierarchical fairshare can cause a new user to inherit their group’s accumulated usage debt. See SchedMD, “Multifactor Priority Plugin” and “Priority and Fair Trees.” ↩
-
NASA’s High-End Computing Capability documentation explicitly states that “larger jobs are favored” in their PBS scheduling configuration, explaining that this improves overall system utilization. See NASA Advanced Supercomputing, “How PBS Schedules Jobs.” ↩
-
OpenAI, “Announcing The Stargate Project,” January 21, 2025. OpenAI describes Stargate as “a new company which intends to invest $500 billion over the next four years building new AI infrastructure for OpenAI in the United States” and states it will “begin deploying $100 billion immediately.” The initial partners are OpenAI, SoftBank, Oracle, and MGX. ↩
-
OpenAI, “OpenAI, Oracle, and SoftBank expand Stargate with five new AI data center sites,” September 23, 2025. The announcement states that five new sites bring Stargate to “nearly 7 gigawatts of planned capacity and over $400 billion in investment over the next three years,” putting them “on a clear path to securing the full $500 billion, 10-gigawatt commitment we announced in January by the end of 2025, ahead of schedule.” The flagship Abilene, Texas facility is described as “already up and running.” ↩
-
xAI describes Colossus as “the world’s largest supercomputer” and states it was “brought online in 122 days” on its corporate website. Reuters (December 30, 2025) separately reports that xAI is acquiring additional buildings in Memphis to expand capacity, with plans targeting nearly 2 gigawatts of training capacity and scaling beyond one million GPUs. ↩
-
These figures are drawn from company statements and financial reporting. Microsoft: Brad Smith, “The golden opportunity for American AI,” Microsoft Blog, January 3, 2025, states the company plans to invest “approximately $80 billion” in FY2025 on AI-enabled data centers. Alphabet: Reuters (October 29, 2025) reports Alphabet raised its 2025 capex guidance to $91-93 billion. Amazon: Reuters (October 30, 2025) reports capex guidance suggesting approximately $125 billion for 2025, driven largely by AWS and AI infrastructure. Meta: Meta Investor Relations, “Third Quarter 2025 Results,” states 2025 capex outlook of $70-72 billion. These figures represent total capex, not AI-specific spending, but company statements indicate AI infrastructure is the primary driver of the increases. ↩
-
NAIRR Task Force, “Strengthening and Democratizing the U.S. Artificial Intelligence Innovation Ecosystem: An Implementation Plan for a National Artificial Intelligence Research Resource,” January 2023. The report estimates $2.6 billion over an initial six-year period, including $2.25 billion for resource providers, $55-65 million per year for an Operating Entity, and $5 million per year for evaluation. It also describes a refresh cadence requiring new $750 million investments every two years to keep resources state-of-the-art. The Task Force explicitly argues that the concentration of compute resources outside well-resourced organizations “constrains diversity” and limits “public-sector capability.” ↩
-
NSF describes the NAIRR Pilot (launched January 2024) as a proof-of-concept intended to refine the design of a full NAIRR. The Pilot is time-limited and structured around contributed resources and learning-by-doing, not a statutory, appropriated compute entitlement at the scale envisioned in the Task Force implementation plan. ↩
-
The “Matthew effect” in science funding, where past success predicts future awards independent of proposal quality, is well documented. See Bol, Thijs, et al., “The Matthew effect in science funding,” PNAS 115(19), 2018. The paper finds that among equally qualified applicants, early grant recipients accumulate significantly more funding and publications over time. ↩
-
INCITE awards can be multi-year with annual renewals, structurally favoring persistence. The program explicitly reserves capacity for an early-career track (10% of allocatable time in the 2025 awards) as an anti-capture mechanism. See Oak Ridge Leadership Computing Facility, “INCITE Program Awards Supercomputing Time to 81 High-Impact Projects,” November 2024. ↩
-
EuroHPC Joint Undertaking, “Five Years of the EuroHPC Joint Undertaking,” December 2025. The AI Factories initiative now includes 19 AI Factories and 13 AI Factory Antennas across EU member states, with stated goals of providing “free and fast access” to AI-optimized compute for researchers, startups, SMEs, and public-sector bodies. Whether this access is genuinely broad or captured by established institutions remains to be seen as the program scales. ↩
-
The NAIRR Task Force report proposes two primary resource zones: NAIRR-Open for general research access and NAIRR-Secure for work involving legally protected data. NAIRR-Secure would provide “secure enclaves” governed by privacy and civil-rights protections, explicitly referencing the Five Safes framework (safe projects, safe people, safe settings, safe data, safe outputs) developed for health and census data. The framework addresses data stewardship, export review, and conditions for sharing derivatives, though specific IP and licensing defaults would need to be established in implementation. ↩
-
International Energy Agency, “Energy and AI,” published April 10, 2025. The report states that data centers accounted for approximately 1.5% of global electricity consumption in 2024 (415 TWh), with demand growing at roughly 12% annually since 2017. The IEA notes that AI workloads are significantly more energy-intensive than traditional data center operations, and projects continued rapid growth through 2030. ↩
-
Brookings Institution, “AI, data centers, and water,” November 2025. The report states that a “typical” data center uses approximately 300,000 gallons of water per day for cooling, while large hyperscale facilities can consume up to 5 million gallons daily. Water usage varies significantly by cooling technology and local climate, with evaporative cooling systems in hot climates representing the highest consumption. ↩
-
ISO/IEC 30134-2:2016 defines Power Usage Effectiveness (PUE) as the ratio of total facility energy to IT equipment energy. A PUE of 1.0 would mean all energy goes to computing, while typical values range from 1.2 to 2.0. ISO/IEC 30134-9:2022 defines Water Usage Effectiveness (WUE) in liters per kilowatt-hour. These standards enable consistent measurement and comparison across facilities, and are already used in industry sustainability reporting. ↩
-
Commission Delegated Regulation (EU) 2024/1364 of 14 March 2024 establishes an EU-wide scheme for rating data center sustainability. Operators must make required information public and communicate KPIs to a European database. The database is publicly available in aggregated form, while individual data-center information may be protected where confidentiality or trade secrets apply. ↩
-
FCC, “E-Rate: Universal Service Program for Schools and Libraries.” E-Rate was established by the Telecommunications Act of 1996 and is funded through the Universal Service Fund (USF), which collects contributions from telecommunications providers (often passed through to consumers). The FCC sets an inflation-adjusted annual E-Rate funding cap (for Funding Year 2025: $5,058,637,966). The USF overall raises on the order of roughly $8-9 billion annually across its programs (E-Rate plus others). In FCC v. Consumers’ Research, No. 24-354 (June 27, 2025), the Supreme Court upheld the constitutionality of the universal-service contribution scheme. ↩
-
H.R. 2385, the CREATE AI Act of 2025, was introduced in March 2025 to formally establish the National AI Research Resource as authorized federal infrastructure. The White House “America’s AI Action Plan” (July 2025) separately calls for building “a lean and sustainable NAIRR operations capability.” Legislative authorization and executive intent both exist; sustained appropriations do not. ↩