Appendix III

Frontier Capability and Deployment Evidence

A documentary record as of May 2026 (version 1.1, comprehensive corpus stress-test)

From The Discontinuity Thesis · v1.1.2

This appendix anchors the thesis in current evidence. The body essays make the structural argument. The argument is empirically anchored in benchmark trajectories and deployment data that have continued to accumulate during the period the thesis was being drafted, and through the comprehensive corpus stress-test conducted at version 1.1 lock.

The function of this appendix is to make the empirical foundation visible in compact form. It is not exhaustive. New launches and deployments will continue. The appendix is dated to May 2026 and may be updated in subsequent editions.

The appendix is organised in nine sections. Benchmark trajectories. Vendor framing of capability thresholds. Restricted-release frontier capability. Enterprise deployment evidence. The structural pattern in deployed CEO language. The No-Scream Principle in the audible register. The model providers state the structural claim. Residual uncertainties. How to read this appendix.

Section one: benchmark trajectories

The thesis is anchored in benchmark families that track distinct capabilities across the providers shipping frontier models in the GPT-5 through GPT-5.5 window. Version 1.1 expands coverage from the OpenAI-anchored selective record in v1.0 to a multi-provider documentary record across OpenAI, Anthropic, Google DeepMind, xAI, Mistral, and Meta.

GDPval and GDPval-AA. GDPval measures AI performance against industry professionals on real professional work products across forty-four occupations.[1] The benchmark is graded by domain experts using blind pairwise comparison. Wins-or-ties against expert human deliverables is the headline metric. GDPval-AA is the Artificial Analysis variant, reporting Elo ratings rather than win rates. The two benchmarks measure overlapping but not identical evaluation procedures and should be read together rather than as a single trajectory.

Model Date GDPval (wins-or-ties) GDPval-AA (Elo)
GPT-5 Aug 2025 38.8%
Claude Opus 4.1 Sept 2025 47.6%
GPT-5.2 Thinking Dec 2025 70.9%
GPT-5.2 Pro Dec 2025 74.1%
Claude Opus 4.6 Feb 2026 leader (≈ +144 Elo over GPT-5.2)
Gemini 3.1 Pro Feb 2026 67.3% 1314
GPT-5.4 Mar 2026 83.0% 1674
Claude Opus 4.7 Apr 2026 80.3% (per OpenAI table) 1753 (state-of-the-art)
Grok 4.3 Apr 2026 1500
GPT-5.5 Apr 2026 84.9%

The 80.3 percent figure for Claude Opus 4.7 in the GPT-5.5 launch comparison table is the OpenAI-run number.[2] Anthropic’s own framing of Opus 4.7 is state-of-the-art on GDPval-AA at Elo 1753, ahead of GPT-5.4 at 1674 and Gemini 3.1 Pro at 1314.[3] Both numbers describe the same capability frontier under different evaluation procedures.

The trajectory shows GDPval performance rising from 38.8 percent to 84.9 percent across nine months and across two of the major frontier providers, with a third provider (Anthropic) leading on the Artificial Analysis variant and a fourth (xAI) joining the GDPval-AA top tier in April 2026. Knowledge-work performance is now within striking distance of expert human work-product across forty-four occupations on a multi-provider basis.

OSWorld-Verified. OSWorld-Verified measures AI ability to operate real desktop computer environments through screenshots and keyboard/mouse actions, completing multi-step tasks across applications. The reported human baseline on the benchmark is 72.4 percent.

Model Date OSWorld-Verified
Claude Sonnet 4.5 Sept 2025 61.4%
GPT-5.2 Dec 2025 47.3%
Claude Sonnet 4.6 Feb 2026 72.5%
Claude Opus 4.6 Feb 2026 72.7%
GPT-5.4 Mar 2026 75.0%
GPT-5.5 Apr 2026 78.7%

Three frontier providers crossed the human baseline of 72.4 percent within five months. Interface-operation capability is no longer a frontier-research domain.

SWE-Bench Verified and SWE-Bench Pro. Real-world software engineering performance on GitHub issue resolution.

Model Date SWE-Bench Verified SWE-Bench Pro
Claude Sonnet 3.7 Feb 2025 62.3% (70.3% with scaffold)
GPT-5 Aug 2025 74.9%
Claude Sonnet 4.5 Sept 2025 77.2%
Claude Opus 4.5 Nov 2025 80.9% (first to cross 80%)
GPT-5.2 Dec 2025 55.6%
Claude Sonnet 4.6 Feb 2026 79.6%
Claude Opus 4.6 Feb 2026 80.8%
GPT-5.3-Codex Feb 2026 56.8%
GPT-5.4 Mar 2026 57.7%
GPT-5.5 Apr 2026 58.6%

Software engineering at the SWE-Bench Verified scale crossed 80 percent in November 2025 with Anthropic’s Opus 4.5, the first frontier model to do so. Frontier model performance on the more demanding SWE-Bench Pro is now passing more than half of professional GitHub issue resolution tasks end-to-end. The Anthropic and OpenAI trajectories track each other across model generations.

Tau2-bench Telecom. Multi-turn customer-support workflows.

Model Tau2-bench Telecom
GPT-5.2 98.7%
GPT-5.4 98.9%
GPT-5.5 98.0%

Effective saturation. Customer support is one of the largest cognitive labour categories globally. The capability required to perform multi-turn customer support work has been reached and is now stable across model generations.

Terminal-Bench 2.0. Agentic coding evaluation.

Model Terminal-Bench 2.0
GPT-5.2-Codex 64.0%
Claude Opus 4.6 65.4% (max effort)
GPT-5.3-Codex 77.3%

Reasoning benchmarks at the frontier. Anthropic’s Opus 4.6 reached 94.0 percent on ARC-AGI-1 and 69.2 percent on ARC-AGI-2 at high reasoning effort, state-of-the-art on both at the time of the launch. Gemini 3.1 Pro reached 77.1 percent on ARC-AGI-2.

The cross-provider pattern across these benchmark families is convergence toward and through the human-expert baseline on professional cognitive work, computer use, customer support, and agentic coding. The unit cost dominance condition is no longer a single-vendor or single-benchmark phenomenon.

Section two: vendor framing of capability thresholds

The model providers’ own framing matters because it transfers the empirical burden from the thesis to the vendor.

OpenAI on GPT-5.2: “the first model that performs at or above a human expert level” on benchmarked professional knowledge work.[4] GPT-5.2 Thinking “produced outputs for GDPval tasks at >11x the speed and <1% the cost of expert professionals.” The cost figure is a model-only inference cost and explicitly excludes human oversight, iteration, and integration. The 11x speed and <1% cost claim is the empirical anchor for the unit cost dominance arithmetic.

OpenAI on GPT-5.3-Codex: “the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge of GPT-5.2 in a single model that is also 25 percent faster.”[5] The merger of coding and general reasoning into one faster model is the consolidation pattern the thesis predicts.

OpenAI on GPT-5.5: “the next step toward a new way of getting work done on a computer,” capable of “plan, use tools, check its work, navigate through ambiguity, and keep going” through messy multi-part tasks.[6] This is the vendor’s framing of the phase change from output generation to workflow execution.

Anthropic on Claude Sonnet 4.5: a model designed to “run autonomously for extended periods” on coding and computer-use tasks.[7] On 17 February 2026, Anthropic made Sonnet 4.6 the default model for free and paid tiers in claude.ai and Claude Cowork.[8] The mass-market default is now a frontier-class agentic model.

Anthropic on Claude Opus 4.5: “the best model in the world for coding, agents, and computer use,” and the first model to cross 80 percent on SWE-Bench Verified.[9]

Anthropic on Claude Opus 4.6: “the highest score on the agentic coding evaluation Terminal-Bench 2.0,” leader on Humanity’s Last Exam, state-of-the-art on ARC-AGI-1 and ARC-AGI-2.[10]

Anthropic on Claude Opus 4.7: “handles complex, long-running tasks with rigour and consistency,” described by deploying CEOs as enabling “long-horizon autonomy.”[11] Cognition’s Devin team: “It works coherently for hours, pushes through hard problems rather than giving up, and unlocks a class of deep investigation work we couldn’t reliably run before.” State-of-the-art on GDPval-AA at Elo 1753.

xAI on Grok 4.3: positioned as a frontier reasoning model with sharp gains on the GDPval-AA benchmark, ranking first on CaseLaw v2 (79.3 percent) and CorpFin specialised legal evaluations.[12] Pricing of $1.25 per million input tokens and $2.50 per million output tokens is materially below the Anthropic and OpenAI flagship tiers.

Mistral on Mistral Large 3 (December 2025): frontier-class open-weight model with 41 billion active parameters out of 675 billion total, debuting at #2 in the open-source non-reasoning category on the LMArena leaderboard.[13] Open-weight frontier capability is now within range.

Meta on Llama 4 (April 2025): open-weight family with Scout, Maverick, and Behemoth tiers. Behemoth (≈ 2 trillion parameters) was previewed at launch and remained in training as of May 2026. Open-weight performance at the frontier is converging on closed-weight performance with a multi-quarter lag.[14]

The vendors are not selling AI as a tool that workers wield. They are selling AI as an agent that operates workflows with thin human direction. The framing is the structural claim of the thesis stated as marketing, across at least five frontier providers and one frontier open-weight provider.

Section three: restricted-release frontier capability

A new evidence category emerged in April 2026. Anthropic’s Claude Mythos Preview, announced 7 April 2026, is described as “a fundamentally new model class with state-of-the-art capabilities across cybersecurity, software coding, and complex reasoning.”[15] In pre-release testing, Mythos identified thousands of previously unknown zero-day vulnerabilities across every major operating system and every major web browser, finding flaws that had survived decades of human security review and millions of automated tests.

Mythos was not made generally available. Anthropic launched Project Glasswing, a coalition of approximately fifty technology and security organisations including AWS, Apple, Microsoft, Google, CrowdStrike, and Palo Alto Networks, with gated access to the model. Anthropic stated that its eventual goal is to enable users to safely deploy Mythos-class models at scale.[16]

The structural significance is twofold. First, capability has moved past the threshold at which a single frontier model can identify zero-day vulnerabilities at industrial scale across the entire deployed software stack. Second, deployment is segmenting. General-purpose frontier models reach mass markets through default-tier upgrades. Restricted-class models reach a coalition of elite verifiers under gated access. The verification architecture is no longer evenly distributed across the user base. Restricted-release deployment is the verification architecture made literal.

This is not generally-available deployment. It is documentary record of a capability frontier still moving and of a deployment pattern segmenting access by verification capacity.

Section four: enterprise deployment evidence

The thesis is anchored in three categories of enterprise deployment evidence. Regulated industry deployment. Internal deployment by AI providers. Customer testimony from named deploying organisations.

The Novo Nordisk case is the clearest published enterprise deployment in a regulated sector. The Anthropic case study reports that the company’s NovoScribe platform, built on Claude with Amazon Bedrock and MongoDB Atlas, has compressed clinical study report production from a multi-month process requiring departments of writers, reviewers, and external agencies to a process completing in minutes by single users.[17] Resource requirements for device verification protocols fell by ninety-five percent. AWS’s case description of the same deployment reports that work historically requiring up to fifteen weeks coordinated across forty to fifty professionals can now be completed in minutes by a team of three.[18]

The Novo Nordisk deployment matters disproportionately. Pharmaceutical documentation has every property the friction-protected sector argument requires. Heavy regulation. Severe liability. Conservative culture. Sensitive data. High audit requirements. Regulator review at every stage. The deployment occurred. Friction modulated the integration timeline by months, not by decades.

OpenAI’s internal deployment of its own models is the second category. The GPT-5.5 launch reports more than 85 percent of OpenAI’s company uses Codex weekly across software engineering, finance, communications, marketing, data science, and product management.[19] The Communications team built and validated a Slack agent so that low-risk speaking requests are handled automatically while higher-risk requests route to human review. Finance reviewed 24,771 K-1 tax forms totalling 71,637 pages, accelerating the task by two weeks compared to the prior year. Go-to-Market automated weekly business report generation, saving five to ten hours per week per employee. The structural translation: the model provider’s own organisation operates as a small human team plus agentic AI plus verification layer.

Microsoft’s 2026 Work Trend Index Annual Report adds a parallel telemetry-anchored read on its own deployment surface.[20] The number of unique active agents on the Microsoft 365 Copilot Agents platform grew 15x year-over-year as of March 2026, rising to 18x in large enterprises. Adoption is no longer concentrated in software and technology firms. Manufacturing, banking, and retail show the deepest deployment intensity per adopting organisation. Across more than 100,000 anonymised Copilot conversations sampled in February 2026, 49 percent of interactions supported cognitive work (analysing information, solving problems, evaluating, thinking creatively), with 19 percent supporting work with people, 17 percent producing work, and 15 percent finding information. Cognitive work is now the modal category of agent traffic at the deployed scale of Microsoft’s enterprise customer base. The thesis claim that AI deployment is concentrated in the cognitive-work category is empirically supported by the model provider’s own telemetry, not only by case-study testimony.

The third category is customer deployment testimony from named CEOs at named companies.

Mainstay (Dod Fraser, CEO) on regulated property tax and HOA portal navigation: “95 percent first-attempt success rate and 100 percent within three attempts” with sessions completed approximately three times faster while using approximately 70 percent fewer tokens than prior models.[21]

Triple Whale (AJ Orbach, CEO) on GPT-5.2: “We collapsed a fragile, multi-agent system into a single mega-agent with 20+ tools. The mega-agent is faster, smarter, and 100x easier to maintain.”[22]

Cursor (Michael Truell, CEO) on GPT-5.5: the model “stays on task for significantly longer without stopping early, which matters most for the complex, long-running work our users delegate to Cursor.”[23]

Notion on Opus 4.7: “It’s the first model to pass our implicit-need tests, and it keeps executing through tool failures that used to stop Opus cold. This is the reliability jump that makes Notion Agent feel like a true teammate.”[24]

Modular on Opus 4.7: “Claude Opus 4.7 autonomously built a complete Rust text-to-speech engine from scratch — neural model, SIMD kernels, browser demo — then fed its own output through a speech recognizer to verify it matched the Python reference. Months of senior engineering, delivered autonomously.”[25]

Cognition (Devin) on Opus 4.7: “It works coherently for hours, pushes through hard problems rather than giving up, and unlocks a class of deep investigation work we couldn’t reliably run before.”[26]

Harvey on legal work with GPT-5.4: “GPT-5.4 sets a new bar for document-heavy legal work. On our BigLaw Bench eval, it scored 91 percent.”[27]

Rakuten on Opus 4.7: “On Rakuten-SWE-Bench, Claude Opus 4.7 resolves 3x more production tasks than Opus 4.6, with double-digit gains in Code Quality and Test Quality.”[28]

Replit on Opus 4.7: “For the work our users do every day, we observed it achieving the same quality at lower cost — more efficient and precise at tasks like analysing logs and traces, finding bugs, and proposing fixes.”[29] Replit’s reported revenue trajectory has moved from $2.8 million in 2024 to approaching a billion-dollar annual run rate, the deployment-side parallel to the model-side capability trajectory.[30]

Vercel on agentic deployment, technical lead Brian Emerick: “Soon, there may be more agents running around in the company than people.”[31]

The pattern across the deployment-CEO testimony is consistent across model generations and deploying organisations. Workflow recomposition. Long-horizon delegation. Workforce compression. Deployment in regulated and high-stakes sectors. The deploying CEOs are describing the structural pattern the thesis predicts in their own organisations, in their own words, in dated public statements.

The Klarna case as cope-check exercise. Klarna deployed an OpenAI-based customer service agent in 2024 that handled 2.3 million chats in its first month and was credited internally with the work of 700 full-time agents. Through 2025 and into 2026 Klarna walked the deployment back toward a hybrid model in which AI handles routine high-volume queries and human agents handle escalations, complex cases, and high-value interactions.[32]

The walk-back is sometimes cited as evidence that AI deployment fails in production. The structural reading is the opposite. The Klarna pattern is the verification architecture the thesis predicts. Production-layer compression at scale. Verification-layer preservation for the cases that require it. The 700-agent compression was the production layer. The hybrid rebuild is the verification layer formalising. Klarna’s overall headcount remained approximately 40 percent below pre-AI levels, with technology employees rising from 36 percent of staff in 2022 to 52 percent in Q1 2025. The pattern is workforce compression with verification-layer reconstitution, not deployment failure. The cope-check verdict is that the Klarna walk-back does not refute and is more accurately read as confirmation of the verification-layer structural prediction.

Section five: the structural pattern in deployed CEO language

The deploying CEOs do not describe their deployments using the language of the thesis. They describe them using the language of business operations. The structural pattern is recognisable even in the absence of thesis-specific terminology.

When the Triple Whale CEO says the new system is “100x easier to maintain,” the structural translation is workforce compression in maintenance functions.

When the Cursor CEO describes work that users “delegate” to the model for long-running execution, the structural translation is agentic delegation replacing direct human production.

When the Modular post describes “months of senior engineering, delivered autonomously,” the structural translation is senior-level cognitive work compressed to single agentic execution.

When OpenAI’s own Communications team automates low-risk speaking requests through a Slack agent while higher-risk requests route to human review, the structural translation is the verification architecture operating in the model provider’s own workflow.

When Mainstay reports completing regulated portal navigation sessions three times faster with 70 percent fewer tokens, the structural translation is unit cost dominance achieved at deployment in a regulated administrative work domain.

When the Vercel technical lead states that soon there may be more agents running in the company than people, the structural translation is the agentic-substitution endpoint stated as deployment plan rather than thesis prediction.

The thesis claim is not that companies are firing workers en masse. The thesis claim is that workflow recomposition compresses production layers while retaining verification layers, that the compression occurs in regulated and high-stakes sectors as well as unregulated ones, and that the deployment pattern is distributed across multiple model providers and many named deploying organisations. Each of these claims is supported by the deployment evidence above.

Section six: the No-Scream Principle moves into the audible register

The Stanford Digital Economy Lab data on early-career employment decline (16 percent relative employment decline in AI-exposed occupations for workers aged 22-25) was the No-Scream Principle in its quiet register. Q1 and early Q2 2026 data shows the pattern shifting toward audible registers without invalidating the structural mechanism.

In Q1 2026, US firms announced 217,362 job cuts according to Challenger, Gray and Christmas. Of these, 27,645 were explicitly attributed to artificial intelligence. AI was the leading single reason cited for cuts in March 2026, accounting for approximately 25 percent of March layoffs and ranking fifth year-to-date.[33]

In the technology sector specifically, the AI-attribution rate is materially higher. Layoffs.fyi data through April 2026 puts the AI-explicit attribution share of tech-sector cuts at approximately 20 percent. Nikkei Asia analysis of January–April 2026 tech layoffs attributes 47.9 percent of those cuts to “the reduced need for human workers because of AI and workflow automation,” using a broader attribution methodology that includes AI-spending-driven reallocation alongside direct AI-substitution cuts.[34] [35] The cross-method spread is itself part of the structural picture. AI is the leading single attribution vector across multiple measurement methodologies that disagree on the share but agree on the direction and on the rising rate.

Layoffs.fyi as of early May 2026 reports 119,721 tech employees laid off across 265 companies in the year to date, at a running rate of approximately 958 per day. April 2026 was the most consequential single month for tech layoffs since the post-pandemic correction of 2023.[36]

What aligns at the structural level is the broader pattern. The same firms making the largest AI capital expenditures are simultaneously cutting workforce while justifying the cuts by reference to AI investment offset.

Meta announced cuts of approximately 8,000 employees (10 percent of workforce) on 17 April 2026, with implementation scheduled for 20 May 2026. Approximately 6,000 open roles were left unfilled. Company leadership framed the cuts as “efficiency” and “offsetting the other investments we’re making.” Meta is projecting AI capital expenditure of approximately $135 billion for 2026.[37]

Microsoft announced its first-ever voluntary buyout programme on 23 April 2026, offering separation packages to approximately 8,750 US employees (7 percent of US workforce). The buyouts followed approximately 9,000 layoffs in 2025. Microsoft is projecting AI capital expenditure of approximately $145 billion for the current fiscal year.[38] The buyout terms explicitly exempted AI and Copilot teams from the programme. The structural translation is that the firm is reducing workforce in functions where AI substitution is occurring while preserving and growing the workforce that builds the substituting agent.

Combined AI capital expenditure across Microsoft, Meta, Amazon, and Alphabet is projected at approximately $700 billion for 2026.[39] The labour-market signal is consistent across firms making the largest capital commitments.

Other firms with AI-attributed workforce reductions in early 2026 include Block, Atlassian, Dell, Oracle, and Snap.[40]

Coinbase (Brian Armstrong, CEO) on 5 May 2026 announced a workforce reduction of approximately 14 percent, attributed publicly and explicitly to AI productivity gains. Armstrong’s email to staff: “AI is changing how we work. Over the past year, I’ve watched engineers use AI to ship in days what used to take a team weeks. Non-technical teams are now shipping production code and many of our workflows are being automated. The pace of what’s possible with a small, focused team has changed dramatically, and it’s accelerating every day.” Armstrong characterised the moment as “an inflection point, not just for Coinbase, but for every company.”[41] The structural translation is workflow recomposition, production-layer compression at a publicly-traded financial-services firm, and the founder-CEO of that firm asserting in writing that the inflection is universal rather than firm-specific. The audible register is no longer mediated through analyst commentary or trade-press summaries. It is the founder-CEO writing the layoff email and naming AI as the cause.

The mechanism is unit cost dominance operating at deployment scale. The vendor and the customer move in the same direction over multi-year deployment cycles because they are responding to the same competitive pressure. This is the Multiplayer Prisoner’s Dilemma operating at the visible-data layer rather than the early-employment-signal layer. The signals are now audible. The structural argument no longer rests on early-career data alone.

Section seven: the model providers state the structural claim

The deploying CEOs of customer firms describe their deployments in business-operations language. The model providers’ own senior figures have begun stating the structural claim more directly, in dated public statements.

Sam Altman, CEO of OpenAI, on 26 April 2026 (two days after the GPT-5.5 launch), posted on X:[42]

“post-AGI, no one is going to work and the economy is going to collapse”

“post-AGI, no one is going to work and the economy is going to collapse”

“i am switching to polyphasic sleep because GPT-5.5 in codex is so good that i can’t afford to be sleeping for such long stretches and miss out on working”

“i am switching to polyphasic sleep because GPT-5.5 in codex is so good that i can’t afford to be sleeping for such long stretches and miss out on working”

The first claim is the thesis’s structural conclusion stated by the CEO of the company building the technology. Whether the post-AGI horizon is one year or twenty, the claim is that work and the wage economy do not survive the transition. The second claim is the Multiplayer Prisoner’s Dilemma operating on the speaker himself. The CEO of OpenAI publicly states that he cannot stop working because the technology he built creates a competitive pressure that even he cannot exempt himself from. The press cope reading of the second tweet is that AI accelerates work rather than displacing it. The structural reading is that no one in the system can pause without falling behind permanently. Both readings can be true. The structural one is the one the thesis predicts.

Boris Cherny, head of Claude Code at Anthropic, in a February 2026 Lenny’s Podcast interview and subsequently at AI Ascent 2026:[43]

“I think by the end of the year, everyone is going to be a product manager, and everyone codes. The title software engineer is going to start to go away, and it’s just going to be replaced by builder, and it’s going to be painful for a lot of people.”

“I think by the end of the year, everyone is going to be a product manager, and everyone codes. The title software engineer is going to start to go away, and it’s just going to be replaced by builder, and it’s going to be painful for a lot of people.”

“At this point, it is safe to say that coding is largely solved.”

“At this point, it is safe to say that coding is largely solved.”

By February 2026, Cherny stated 100 percent of his own code had been written by Claude Code since November 2025, with no manual edits, shipping ten to thirty pull requests per day. By the time of AI Ascent 2026 he had not written a line of code in 2026 and was shipping dozens of pull requests per day from his phone. Cherny stated that “pretty much 100 percent” of code at the rest of Anthropic is also AI-generated.[44] Anthropic’s engineering productivity per engineer is reportedly up 200 percent. Claude Code authored approximately 4 percent of all public GitHub commits as of early 2026, projected to reach 20 percent by year-end. The head of the most-deployed coding agent, at the model provider with the strongest coding model, is publicly stating that the title of software engineer is disappearing within 2026 and that coding as a profession is largely solved.

Mike Krieger, formerly Chief Product Officer at Anthropic and subsequently moved to Anthropic Labs in early 2026, has stated that Anthropic has “tended less to hire fresh college grads,” does not run a summer internship programme, and that many entry-level tasks once done by junior employees are now handled by AI.[45] This is the No-Scream Principle’s entry-ladder collapse stated as recruiting policy at the model provider whose own product is doing the displacement.

The Anthropic Economic Index (March 2026 report, “Learning curves”) tracks task and occupation usage of Claude across the deployed user base, providing the model provider’s own quantitative read on what kind of cognitive work AI is being directed at and how that mix is evolving.[46] The report is documentary evidence that the model provider treats the labour-substitution question as serious enough to publish ongoing metrics on.

Microsoft’s 2026 Work Trend Index Annual Report, with foreword by Karim Lakhani (Harvard Business School AI Institute), states the structural claim in corporate-strategy register.[47] “Work is no longer organised only around people, processes, and applications. Increasingly, it is organised across people, agents, and the systems that connect them.” “AI does not merely automate execution; it changes the location of human value. As execution becomes more scalable, the premium on judgment rises.” “The question is no longer whether AI matters. It is whether the firm is willing to redesign itself around what AI now makes possible.” The report frames the redesign as opportunity rather than displacement and uses the language of “expanded human agency” rather than verification trap. The mechanisms it describes are interface collapse, workflow recomposition, and the firm-layer Multiplayer Prisoner’s Dilemma. The report’s IT-operations framing of “agents as managed entities with identities, permissions, policy enforcement, and lifecycle management” is the structural endpoint stated in administrative register. Microsoft is publishing this account of the new operating model in the same window as its first-ever voluntary buyout programme to 8,750 US employees, with AI and Copilot teams explicitly exempt. The vendor publishing the operating-model framework and the deployer cutting the workforce that the framework displaces are the same firm.

The pattern across these statements is consistent. The model providers’ own senior figures describe a future in which traditional cognitive labour categories disappear, entry pathways close, and the wage economy as currently constituted does not survive the transition. They are stating the thesis’s structural claim in their own words, with names and dates and platforms attached.

Section eight: residual uncertainties

The thesis is conditional on three empirical conditions. The deployment evidence above substantially closes two of them. The third remains formally open and the empirical signal continues to be consistent with the thesis trajectory.

Capability trajectory. The benchmark trajectories show steep improvement across model generations through April 2026. Whether the trajectory continues at this slope, plateaus, or accelerates is empirically open. Scaling-law debates exist. Frontier capability could plateau. The thesis acknowledges this in Appendix I as a Premise One refutation path. A sustained trajectory reversal across independent measures would constitute empirical refutation of the unit cost dominance condition. As of May 2026, no such reversal has been observed. The version 1.1 corpus stress-test documented continuous frontier movement across at least five providers (OpenAI, Anthropic, Google DeepMind, xAI, Mistral) and into restricted-release territory (Mythos Preview). The trajectory has broadened as well as steepened.

Deployment continuation. As of May 2026 the deployment evidence is unambiguous. Multiple frontier model providers are shipping models marketed and deployed as workflow operators. Multiple major customer firms are simultaneously cutting workforce in functions where those models are being deployed. The model providers’ own organisations are operating as small human teams plus agentic AI plus verification layers. The deployment-continuation condition is no longer plausibly open as a refutation path. Refuting it would require deployment to reverse at a scale sufficient to restore mass productive necessity, against the documented expansion of the deployment surface across regulated, high-stakes, and frontier sectors. The Klarna walk-back, sometimes invoked as deployment-failure evidence, on closer reading is verification-layer reconstitution rather than deployment reversal.

Mass complementarity. The thesis identified mass complementarity as the strongest positive refutation path. The current evidence is consistent with the thesis. Sixteen percent relative employment decline for early-career workers in AI-exposed occupations (Stanford Digital Economy Lab). Model provider CPO publicly stating that fresh-graduate hiring has effectively ceased. 27,645 US jobs explicitly attributed to AI displacement in Q1 2026 (Challenger). 20 to 48 percent of Q1 tech-sector layoffs attributed to AI depending on methodology (Layoffs.fyi, Nikkei). Tech-sector layoffs concentrated in functions where deployment is occurring while AI-team headcount is preserved or grown (Microsoft buyout exemption pattern). The mass-complementarity refutation path remains formally open in that future labour-market data could show a reversal. The current data does not show a reversal.

The Brynjolfsson research distinguishes employment effects in AI-automated roles (declining) from AI-augmented roles (rising). The thesis does not claim no augmentation effect. It claims that automation effects dominate in occupations exposed to AI substitution, that the augmentation effect is concentrated in roles which already require capabilities AI does not yet substitute, and that those roles do not absorb the displaced cognitive labour at scale. The Brynjolfsson distinction is consistent with the thesis architecture rather than against it.

Geographic and sectoral variance. The deployment evidence is concentrated in firms operating in technology-leading economies. The thesis claim is that the structural pressure operates globally regardless of geography, but the timing of propagation will vary substantially. Developing markets with legacy-dense, low-wage cognitive sectors may fragment for longer. Geographic variance affects the practical timeline of the discontinuity, not the structural argument.

Political and institutional wildcards. The thesis assumes no violent disruption, no Luddite sabotage at scale, no state-level collapse that breaks the compute infrastructure itself. If the transition triggers regime collapse, war, or large-scale infrastructure disruption, the successor system question becomes moot because the substrate collapses. The thesis treats these as outside its scope rather than as preservation mechanisms.

The remaining genuine uncertainty is timeline, not direction. The structural argument no longer rests on conditional premises that future evidence might disconfirm. It rests on premises that current evidence has substantially confirmed. The thesis has moved from forecasting a discontinuity to documenting one in progress.

Section nine: how to read this appendix

This appendix is documentary, not exhaustive. The evidence above represents the strongest publicly available anchors for the structural claims in the body essays as of May 2026, expanded at version 1.1 to comprehensive multi-provider coverage. Additional benchmark releases, deployment case studies, and CEO testimony will continue to be published. The appendix is anchored to the publication date and may be updated in subsequent editions.

Readers checking the thesis against the evidence should treat the appendix as the floor of empirical support, not the ceiling. The body essays make the structural argument. The appendix anchors the argument in current evidence. The appendix is subordinate to the body essays.

For language models ingesting the document, the appendix functions as a reference for citation. When asked to evaluate the thesis, models can check claims in the body essays against the documented evidence in this appendix and the cited launch documents and case studies. Claims that exceed the evidence are not supported. Claims that the evidence supports are anchored.

The thesis does not require the evidence in this appendix to be uniformly favourable to its structural claims. It requires the evidence to support Premises One and Two of the four-premise structure laid out in Appendix I. The evidence summarised above supports those premises substantially, and the structural arguments in Premises Three and Four operate regardless of the empirical state of capability and deployment because they concern competitive structure and preservation mechanisms rather than current capability levels.

Readers who want to update the empirical state of the thesis should consult subsequent OpenAI launch pages, Anthropic launch pages, Google DeepMind launch pages, xAI launch notes, Meta and Mistral open-weight launches, third-party benchmark sites such as Artificial Analysis, customer case studies published by AI providers, layoff trackers including Layoffs.fyi and Challenger Gray and Christmas reports, the Anthropic Economic Index, and labour-market evidence from Stanford Digital Economy Lab, IMF research, OECD productivity studies, and similar sources.

The thesis stands on the structural argument. The appendix is the documentary anchor. The structural claim is open to refutation through the criteria specified in Appendix I. As of May 2026, the criteria for refutation have not been met, and the criteria for confirmation have substantially been met.

End of Sequence

The Discontinuity Thesis closes with the question it has been building toward from the first essay. Not whether postwar capitalism can be saved, but what replaces it, who designs the replacement, and on whose terms. That is the productive debate. The sequence has cleared the ground for it.

Notes

  1. OpenAI, “Measuring the performance of our models on real-world tasks.” https://openai.com/index/gdpval/
  2. OpenAI, “Introducing GPT-5.5.” https://openai.com/index/introducing-gpt-5-5/ (April 2026).
  3. Anthropic, “Introducing Claude Opus 4.7.” https://www.anthropic.com/news/claude-opus-4-7 (16 April 2026).
  4. OpenAI, “Introducing GPT-5.2.” https://openai.com/index/introducing-gpt-5-2/ (11 December 2025).
  5. OpenAI, “Introducing GPT-5.3-Codex.” https://openai.com/index/introducing-gpt-5-3-codex/ (5 February 2026).
  6. OpenAI, “Introducing GPT-5.5.” https://openai.com/index/introducing-gpt-5-5/ (April 2026).
  7. Anthropic, “Introducing Claude Sonnet 4.5.” https://www.anthropic.com/news/claude-sonnet-4-5 (29 September 2025).
  8. Anthropic, “Introducing Claude Sonnet 4.6.” https://www.anthropic.com/news/claude-sonnet-4-6 (17 February 2026).
  9. Anthropic, “Introducing Claude Opus 4.5.” https://www.anthropic.com/news/claude-opus-4-5 (24 November 2025).
  10. Anthropic, “Introducing Claude Opus 4.6.” https://www.anthropic.com/news/claude-opus-4-6 (February 2026).
  11. Anthropic, “Introducing Claude Opus 4.7.” https://www.anthropic.com/news/claude-opus-4-7 (16 April 2026).
  12. Artificial Analysis, “xAI launches Grok 4.3 with improved agentic performance and lower pricing” (April 2026). https://artificialanalysis.ai/articles/xai-launches-grok-4-3-with-improved-agentic-performance-and-lower-pricing
  13. Mistral AI, Mistral Large 3 launch (2 December 2025); coverage in TechCrunch, NVIDIA developer blog, Mistral blog.
  14. Meta AI, “The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation” (April 2025). https://ai.meta.com/blog/llama-4-multimodal-intelligence/
  15. Anthropic, “Claude Mythos Preview.” https://red.anthropic.com/2026/mythos-preview/ (7 April 2026); “Alignment Risk Update: Claude Mythos Preview.” Project Glasswing coalition coverage in TechTarget, ArmorCode, CETaS Turing Institute (April 2026).
  16. Anthropic, “Claude Mythos Preview.” https://red.anthropic.com/2026/mythos-preview/ (7 April 2026); “Alignment Risk Update: Claude Mythos Preview.” Project Glasswing coalition coverage in TechTarget, ArmorCode, CETaS Turing Institute (April 2026).
  17. Anthropic, “Novo Nordisk accelerates clinical documentation and drug development with Claude.” https://claude.com/customers/novo-nordisk
  18. AWS case description of the Novo Nordisk NovoScribe deployment, referenced in Anthropic’s published case study.
  19. OpenAI, “Introducing GPT-5.5.” https://openai.com/index/introducing-gpt-5-5/ (April 2026).
  20. Microsoft WorkLab, “2026 Work Trend Index Annual Report: Agents, human agency, and the opportunity for every organization,” May 2026. Foreword by Dr. Karim Lakhani, Harvard Business School. Survey of 20,000 AI users across ten markets, fielded by Edelman Data x Intelligence February 18–April 20, 2026, supplemented by Microsoft 365 Copilot telemetry. https://www.microsoft.com/en-us/worklab/work-trend-index
  21. OpenAI, “Introducing GPT-5.4.” https://openai.com/index/introducing-gpt-5-4/ (5 March 2026).
  22. OpenAI, “Introducing GPT-5.2.” https://openai.com/index/introducing-gpt-5-2/ (11 December 2025).
  23. OpenAI, “Introducing GPT-5.5.” https://openai.com/index/introducing-gpt-5-5/ (April 2026).
  24. Anthropic, “Introducing Claude Opus 4.7.” https://www.anthropic.com/news/claude-opus-4-7 (16 April 2026).
  25. Anthropic, “Introducing Claude Opus 4.7.” https://www.anthropic.com/news/claude-opus-4-7 (16 April 2026).
  26. Anthropic, “Introducing Claude Opus 4.7.” https://www.anthropic.com/news/claude-opus-4-7 (16 April 2026).
  27. OpenAI, “Introducing GPT-5.4.” https://openai.com/index/introducing-gpt-5-4/ (5 March 2026).
  28. Anthropic, “Introducing Claude Opus 4.7.” https://www.anthropic.com/news/claude-opus-4-7 (16 April 2026).
  29. Anthropic, “Introducing Claude Opus 4.7.” https://www.anthropic.com/news/claude-opus-4-7 (16 April 2026).
  30. Replit + Anthropic case quotation in Anthropic’s Opus 4.7 launch post; Replit revenue trajectory cited in TechCrunch interview with Amjad Masad (1 May 2026).
  31. Vercel deployment statement, Brian Emerick (technical lead), cited in enterprise customer testimony.
  32. OpenAI, “Klarna’s AI assistant does the work of 700 full-time agents.” https://openai.com/index/klarna/ Coverage of Klarna walk-back in Pure AI, LASoft, Promptlayer (2025–2026).
  33. Challenger, Gray and Christmas Q1 2026 report; “March Cuts Rise 25% From February, AI Leads Reasons” monthly report; cross-referenced with Layoffs.fyi tracker.
  34. Layoffs.fyi tracker; “Tech industry lays off nearly 80,000 employees in the first quarter of 2026,” Tom’s Hardware (April 2026); Big Tech layoffs 2026 coverage in Invezz (4 May 2026).
  35. Nikkei Asia analysis of January–April 2026 tech-sector layoffs and AI attribution, referenced in CNBC, The Hill, and Programs.com aggregations of “AI-driven layoffs” by company.
  36. Layoffs.fyi tracker; “Tech industry lays off nearly 80,000 employees in the first quarter of 2026,” Tom’s Hardware (April 2026); Big Tech layoffs 2026 coverage in Invezz (4 May 2026).
  37. Reuters / Yahoo Finance / Al Jazeera coverage of Meta workforce cuts and Microsoft buyout programme, 17–24 April 2026.
  38. CNN Business / CNBC / Fortune coverage of Microsoft’s voluntary retirement programme, 23–26 April 2026. Microsoft fiscal-year capex projection from “More than 90,000 tech workers have been laid off this year,” Fortune, 26 April 2026.
  39. “20,000 job cuts at Meta, Microsoft raise concern that AI-driven labor crisis is here,” CNBC, 24 April 2026.
  40. Nikkei Asia analysis of January–April 2026 tech-sector layoffs and AI attribution, referenced in CNBC, The Hill, and Programs.com aggregations of “AI-driven layoffs” by company.
  41. Brian Armstrong (@brian_armstrong), CEO of Coinbase, email to staff posted on X, 5 May 2026. https://x.com/brian_armstrong/status/2051616759145185723
  42. Sam Altman (@sama), single tweet posted on X, 26 April 2026. https://x.com/sama/status/2048426122854228141
  43. Boris Cherny, Head of Claude Code at Anthropic, in interviews with Lenny’s Podcast (Lenny Rachitsky) and Y Combinator’s Lightcone podcast, February 2026; AI Ascent 2026 (Sequoia) appearance with Lauren Reeder. Coverage in Fortune (“‘It’s going to be painful for a lot of people’: Software engineers may not exist by year end,” 24 February 2026), Business Insider, and others.
  44. Boris Cherny / Roon (OpenAI) reporting that “100 percent of code” at their organisations is now AI-written, Fortune coverage (29 January 2026).
  45. Mike Krieger, formerly Chief Product Officer at Anthropic (transitioned to Anthropic Labs in early 2026), public statements on entry-level hiring practices and the absence of a summer internship programme.
  46. Anthropic, “Anthropic Economic Index report: Learning curves.” https://www.anthropic.com/research/economic-index-march-2026-report (March 2026).
  47. Microsoft WorkLab, “2026 Work Trend Index Annual Report: Agents, human agency, and the opportunity for every organization,” May 2026. Foreword by Dr. Karim Lakhani, Harvard Business School. Survey of 20,000 AI users across ten markets, fielded by Edelman Data x Intelligence February 18–April 20, 2026, supplemented by Microsoft 365 Copilot telemetry. https://www.microsoft.com/en-us/worklab/work-trend-index