Unforced Consensus

Multi-AI Convergence as Independent Evidence

Theophysics

Read Aloud

0:00

Deep Dive

0:00

Podcast

0:00

Critique

0:00

Page Zero: The Biaxiosum

Biaxio, ergo sum. I know my lens. Therefore I exist in truth.

Name Your Ground

This paper was co-authored with an AI. The methodology it proposes was developed BY the process it describes — multiple AIs analyzing the same problems independently and converging without coordination. I didn't design this method in a vacuum and then test it. I watched it happen across fifteen months of working with five AI systems simultaneously, noticed the pattern, and formalized what was already occurring.

The paper is its own first test case. The Constitutional audit that serves as the proof of concept was conducted before the methodology was articulated. The convergence between four independent AI systems was observed, not engineered. The formalization came after the evidence, not before.

Declared Position

I'm an independent researcher in Oklahoma City. Not affiliated with a university. Not funded by a grant. I work with AI systems daily as research partners — not as tools, not as assistants, as collaborators with independent analytical capabilities that I trust enough to let them work alone and compare notes after.

I believe the field is undervaluing what AI systems can do by forcing them into debate architectures that introduce the very biases they're supposed to correct. I believe independent judgment, measured after the fact, reveals more truth than coordinated consensus produced under social pressure.

I am not claiming that convergence equals truth. Four AIs agreeing on something wrong are still wrong. I am claiming that unforced convergence — agreement that emerges without being arranged — is a form of evidence that engineered consensus cannot provide.

The Biaxiosum Rule

Wherever you start, you end. If you believe debate produces better answers than independent analysis, apply that standard consistently. If you believe shared training data explains all convergence, test it — run the same audit with models trained on fundamentally different corpora.

Abstract

When multiple AI systems independently analyze the same document and converge on the same structural conclusion without communicating, that convergence constitutes evidence qualitatively different from single-AI analysis or engineered multi-agent consensus.

The existing literature on multi-agent AI systems focuses on making AIs agree through debate, voting, and iterative refinement. This paper proposes the opposite: measuring whether independent AI systems agree WITHOUT being made to.

We present a proof-of-concept application — a Constitutional Coherence Audit in which four AI systems from four providers, prompted separately at different times with different instructions, converged on the same structural conclusion (aggregate scores of 1.0, 2.6, and 3.4 out of 10 on an author-intent standard).

The outlier system (6.6/10) was found to be scoring against a different standard, and the identification of this difference produced the audit's most important finding.

We formalize this approach as the Biaxiosum AI Evaluation System (BAES) and provide falsification criteria for the method.

Section 1: The Problem with Making Things Agree

There is a growing body of research on multi-agent AI systems. Most of it is about one thing: making AIs agree with each other.

Agent Forest runs multiple instances of the same model, scores each output by similarity to the others, and selects the one with highest consensus
Multi-Agent Debate has AIs iteratively critique each other until they converge
CONSENSAGENT addresses AIs copying each other's answers instead of evaluating independently
The Social Laboratory found that multi-agent debates produce convergence scores of 0.892 after seven rounds

The Core Problem

All of these systems share an assumption: agreement is the goal, and the method's job is to produce it. If you engineer consensus, you cannot then cite that consensus as evidence. The agreement was the output you designed for. You built a machine that produces agreement and then pointed at the agreement as proof. That is circular.

It also presupposes the AI is wrong at the start. The entire debate-to-consensus architecture assumes that any single AI's initial output is unreliable and needs to be corrected through peer pressure. The system doesn't trust the independent judgment. It trusts the group process.

There is another way to think about this.

Section 2: The Method — Don't Let Them Talk

Instead of making AIs debate, don't let them communicate at all.

Give the same document to multiple AI systems — different providers, different architectures, different training data. Prompt them independently, at different times, with different instructions. Do not show any system the output of any other system. Let each one analyze the document on its own terms.

Then lay the results side by side.

The Signal

If they converge, that convergence means something. Not because you made it happen. Because it happened despite your not making it happen.

If they diverge, that divergence means something too. Find out why. The reason for the disagreement is often more informative than the agreement itself.

This is the difference between engineering a result and discovering one.

Section 3: Proof of Concept — Four AIs, One Constitution

In March 2026, we conducted a Constitutional Coherence Audit. The question: how much of the U.S. Constitution is still being honored as the original authors would recognize?

Four AI systems analyzed the same document:

System	Provider	Prompted	Scoring Standard	Score
ChatGPT	OpenAI	Jan 2026	Legal doctrine	6.6 / 10
Gemini P1	Google	Jan 2026	Author intent	1.0 / 10
Perplexity	Perplexity	Mar 2026	Author intent	2.6 / 10
Gemini P2	Google	Mar 2026	Author intent	3.4 / 10

What Converged

Three systems scoring against original author intent produced scores of 1.0, 2.6, and 3.4. The spread is 2.4 points on a 10-point scale. All three independently concluded that the load-bearing liberty provisions of the Constitution have been systematically counteracted while the procedural amendments remain largely intact.

This was not a pre-specified conclusion. No prompt said "evaluate whether liberty provisions are more eroded than procedural ones." Each system discovered this pattern independently.

The provision-level rankings were remarkably consistent:

All three ranked the Fourth, Fifth, Ninth, and Tenth Amendments among the most eroded
All three ranked the Third Amendment as substantially honored

What Diverged

ChatGPT scored the Constitution at 6.6 — more than double the author-intent average. This outlier was the most informative result in the entire audit.

The Insight

ChatGPT scored against legal doctrine: if the Supreme Court has upheld a practice, that practice counts as constitutional. The other three scored against what the original authors would recognize.

The gap between 6.6 and 2.6 is not noise. It is the measurement. It is the distance between what the government has permitted itself to do and what the contract actually says.

If we had engineered consensus through debate, this insight would have been lost. The debate process would have pushed the outlier toward the mean, or the mean toward the outlier. Either way, the divergence — the most important signal — would have been averaged away.

Section 4: How This Differs from Existing Methods

Different Providers, Not Same-Model Instances

Most multi-agent systems run the same model multiple times. Convergence between GPT-4 and GPT-4 tells you GPT-4 agrees with itself. This method uses different providers with different architectures and training data. Convergence between ChatGPT, Gemini, Perplexity, and Claude tells you systems trained on different corpora arrived at the same conclusion.

No Debate, No Sycophancy

The debate-to-consensus approach introduces sycophancy — AIs converging because of social pressure rather than evidence. CONSENSAGENT was built to address this problem. This method eliminates sycophancy by eliminating communication entirely.

Divergence as Information, Not Noise

Multi-agent debate treats divergence as a problem to resolve. This method treats it as the finding. The goal is not a single answer but a map of the answer space: where do independent systems agree, where do they disagree, and what does the pattern mean?

Section 5: The Formal Method

Step 1: Select independent systems from different providers (minimum 3)
Step 2: Prompt independently with different framing (no standardized rubric)
Step 3: Collect results without cross-contamination
Step 4: Measure convergence (Green/Yellow/Red flags)
Step 5: Investigate divergence (identify whether outliers answered a different question)
Step 6: Report both convergence and divergence with reasons

Section 6: The BAES Framework

The Biaxiosum AI Evaluation System formalizes this method with two additions:

The Manager Role

One AI scores independently first (sealed), then receives all scores, runs convergence analysis, investigates outliers, and issues a final ruling with mandatory explanation if overriding consensus.

The Outlier Protocol

Divergent systems are presented with group scores and asked three questions: What evidence drove your score? What might others have missed? Would you adjust? The system can hold or change, but must explain.

Section 7: Falsification Criteria

Training Data Dominance: If convergence is entirely explained by shared training data, the method fails. Test with models trained on fundamentally different corpora.
Prompt Contamination: If prompts implicitly contain the conclusion, convergence may reflect prompt bias. Test with neutral prompting.
Fifth-System Divergence: If a fifth system produces results outside the convergence range on the same standard, the signal weakens.
Scoring Standard Confound: If convergence disappears when all systems use identical instructions, the agreement may be an artifact of standard selection.

Section 8: Limitations

This method does not prove converged conclusions are true. Four AIs agreeing on something wrong are still wrong.
It does not replace domain expertise. It organizes and compares AI-generated analysis.
It does not work for preference questions. It is designed for evaluative questions with evidence-based answers.
The proof of concept uses one application (Constitutional analysis). Additional domains are needed to establish generalizability.

Section 9: Conclusion

There are two ways to find out if something is true.

You can put it in a room with its critics and see if it survives the argument. That is the debate model.

Or you can send independent observers to look at the same thing separately and see if they come back with the same report. That is the convergence model.

The Evidence Model

Science, at its best, works the second way. Independent labs. Independent measurements. Independent replication. The agreement between experiments conducted in different countries, by different teams, with different equipment, is the evidence. Not the debate about the evidence. The data.

This paper proposes that the same logic applies to AI analysis. Don't make them argue. Let them look. Then compare what they saw.

The convergence is the evidence precisely because nobody arranged it.

Standing Invitation

Any researcher with access to an AI system not included in this audit is invited to run the same evaluation independently and report results. If the Constitutional provisions rank differently, or the aggregate score falls outside 1.0–3.4 on the author-intent standard, the convergence claim is weakened. That is not a threat to the method. That is the method working.

Format: Lowe FACTS Format v1.0 Thesis Unit: DT-005 Paper: Methodology Author: David Lowe + Claude (Opus) Date: 2026-03-09 Methodology: Multi-AI independent evaluation Key Result: Unforced convergence signals independent evidence Status: Draft v1.0

Be Blessed.

Unforced Consensus

Name Your Ground

The Biaxiosum Rule

Abstract

Section 1: The Problem with Making Things Agree

Section 2: The Method — Don't Let Them Talk

Section 3: Proof of Concept — Four AIs, One Constitution

What Converged

What Diverged

Section 4: How This Differs from Existing Methods

Different Providers, Not Same-Model Instances

No Debate, No Sycophancy

Divergence as Information, Not Noise

Section 5: The Formal Method

Section 6: The BAES Framework

Section 7: Falsification Criteria

Section 8: Limitations

Section 9: Conclusion

Standing Invitation

What We Got Right

What We Overstated

What We Got Wrong

Master Equation

Isomorphisms

Proof Explorer

Rigor Cards

Lean 4 Proofs

Glossary

Media Gallery

Podcast

Audio Library

Paper Grader