Monce SAS · Tulane University
March 2026 — Living Document
The AUMA proof system — a MAXSAT evaluation-space oracle combined with BCP verification — proves UNSAT for instances where resolution requires exponential steps. On the pigeonhole principle, where Haken (1985) established a 2Ω(n) lower bound for resolution, AUMA proves UNSAT in milliseconds:
| Instance | Type | Vars | Clauses | Gap | Status | Time |
|---|---|---|---|---|---|---|
| hole6 (PHP) | Pigeonhole | 42 | 133 | 1 | PROVEN | 4ms |
| hole7 (PHP) | Pigeonhole | 56 | 204 | 1 | PROVEN | 7ms |
| hole8 (PHP) | Pigeonhole | 72 | 297 | 1 | PROVEN | 15ms |
| ais10 | All Interval | 181 | 3151 | 4 | PROVEN | 568ms |
| pret150 | Structured | 150 | 400 | 1 | PROVEN | 83ms |
| dubois20–50 | XOR chains | 60–150 | 160–400 | 1 | PROVEN | 8–73ms |
This is strictly stronger than tree-like resolution on these families. The theoretical foundation and full technique are detailed in Section 7.
Separately, on SHA-256 Tseitin encodings, we establish a clean boundary result: 12 polynomial techniques achieve 93.5% structural novelty via isomorphic rephrase but zero reduction in problem hardness. SHA-256's diffusion rounds are a limit theorem for syntactic transformation (Section 5).
The satisfiability problem exhibits a three-zone complexity structure analogous to the phase transition in random k-SAT. At low clause-to-variable ratio α, instances are under-constrained and solvable by unit propagation. At high α, instances are over-constrained and refutable by BCP. The hard core lies at the phase transition (α ≈ 4.267 for random 3-SAT).
This sandwich structure was first demonstrated empirically in our Sudoku solver (sudoku.aws.monce.ai), where polynomial BCP handles high-density puzzles, transformation-based methods handle low-density puzzles, and stochastic walks address the NP strip. PolySAT reproduces this architecture for general SAT. The shared theoretical foundation — Fourier analysis on the Boolean cube applied to constraint satisfaction — is developed in the AUMA framework (Dana, 2023).
PolySAT chains 16 polynomial-time techniques in a fixpoint loop, ordered by tier (cheapest first). When any technique makes progress, the loop restarts from Tier 0. The tier ordering is critical: BCE runs before O(c²) techniques, so they operate on the reduced formula (21K clauses instead of 300K).
| Tier | # | Technique | Complexity | Mechanism |
|---|---|---|---|---|
| 0 | 1 | bcp | O(a×occ) | Watch-list unit propagation |
| 2 | pure | O(nm) | Assign single-polarity variables | |
| 3 | 2sat | O(n+m) | Tarjan SCC on implication graph | |
| 1 | 4 | xor_extract | O(c log c) | Recover XOR constraints from CNF clause groups (Soos 2011) |
| 5 | gauss_gf2 | O(r×c) | Gaussian elimination over GF(2) on XOR system | |
| 2 | 6 | bce | O(c²k) | Blocked clause elimination (Järvisalo et al. 2010) |
| 7 | paired | O(ck) | Paired clause resolution: (a∨b) ∧ (a∨¬b) → {a} | |
| 8 | hbr | O(cn) | Hyper-binary resolution (Bacchus 2002) | |
| 9 | tser | O(b²+cb) | Two-SAT equivalence & reverse (Dana & Boureau 2024) | |
| 3 | 10 | equivalent | O(n+m) | SCC-based equivalent literal merging |
| 11 | subsume | O(m²k) | Forward subsumption | |
| 12 | self_subsume | O(m²k²) | Self-subsuming resolution | |
| 13 | eliminate | O(Vpnk) | Bounded variable elimination | |
| 4 | 14 | vivify | O(mk×BCP) | Clause vivification via probing |
| 15 | probe | O(n²m) | Failed literal probing (BCP-1) | |
| 5 | 16 | isomorphism | O(n²×BCP) | Isomorphic SAT transform (Dana 2024) |
New techniques (April 2026): Techniques 4–9 are new additions drawn from the SAT preprocessing literature and the poly(n)[SAT] framework (Dana & Boureau 2024). XOR extraction with Gaussian elimination targets the parity structure of cryptographic encodings. BCE exploits gate-level blocked clauses in Tseitin encodings. TSER, from the original Algorithme.ai codebase ("POWERFUL AF"), detects equivalences and reverse-equivalences from paired binary clauses — a relation that SCC-based methods miss.
The isomorphic SAT transform (Dana, December 2024) constructs an instance Ψ equivalent to the input Φ — same solution set, different clause structure. The algorithm:
Input: CNF formula Φ over n variables.
Output: CNF formula Ψ with Sol(Ψ) = Sol(Φ).
Key property: if the transform reduces Ψ to the empty clause, this proves Φ is UNSAT in polynomial time. The transform is both a rephrasing engine and a polynomial UNSAT proof system.
The Moulinette segments a formula into chunks of ~1000 clauses and applies a fast probe-based isomorphic rephrase to each chunk independently. Each chunk's binary candidates are limited to the variable interaction graph (variables sharing clauses), keeping the candidate count proportional to clause count rather than n².
On SHA-256 (302,587 post-BCP clauses):
| Metric | Value |
|---|---|
| Chunks | 303 × 1000 clauses |
| Time | 8.5 seconds |
| Input clauses | 302,587 |
| Output clauses | 299,726 |
| Structural novelty | 93.5% |
| Clause widths | 15 units + 19,622 binary + 280,089 width-4 |
93.5% of the output clauses are structurally novel — they do not appear in the original Tseitin encoding. The Moulinette replaces ternary gate clauses with width-4 cross-gate derived implications.
We define SHA-k as the SAT instance encoding: find a 32-bit input whose SHA-256 hash has k leading zero bits. This provides a natural difficulty ladder:
| Instance | Solutions | Difficulty |
|---|---|---|
| SHA-1 | ~50% of inputs | Trivial by random sampling |
| SHA-8 | ~0.4% of inputs | Brute force: ~256 tries |
| SHA-20 | ~10-6 | Bitcoin circa 2010 |
| SHA-80 | ~10-24 | Beyond current mining |
| SHA-256 | 0 or 1 | Full preimage attack |
| Metric | Single SHA-256 | Double (Bitcoin) |
|---|---|---|
| Variables | 78,872 | 157,488 |
| Clauses | 345,833 | 691,441 |
| Generation time | 0.16s | 0.33s |
| Phase | Time | Result |
|---|---|---|
| BCP (constants) | 176ms | 7,860 / 78,872 vars assigned (10.0%); 302K clauses remain |
| XOR extraction | 274ms | 43,424 XOR constraints recovered; 129K clauses remain (−57%) |
| GF(2) Gaussian elim | 83ms | 43K equations, 68K variables; underdetermined (no new forced vars) |
| BCE (first pass) | 130ms | 44.5K clauses remain (−65% of residue) |
| HBR + TSER | 315ms | +34 binary implications; ~530 clauses simplified |
| Pure + EqLit + BCP | 4.1s | +974 vars assigned (10.6% total) |
| Gauss (second pass) | 331ms | +27 vars from XOR system (equivalences resolved by BCP) |
| BCE (second pass) | 54ms | 21.8K clauses remain |
| TSER (second pass) | 114ms | +12 clause simplifications; +2 BCP units |
Final: 8,836 / 78,872 vars assigned (11.2%), 345K → 21.8K clauses (93.7% removed).
The enriched P-strip represents a 12× compression of the NP residue compared to BCP alone. The key contributors are XOR extraction (57% clause reduction by recognizing the parity structure of Tseitin XOR gates) and BCE (which removes gate-definitional clauses that are provably blocked). Together they reduce the formula from 302K clauses to 21.8K — making downstream O(c²) techniques feasible and the bounded CDCL sandwich dramatically more effective.
The GF(2) Gaussian system is underdetermined (43K equations, 68K variables) because the message bits are free. No variables are forced by parity alone. However, the XOR system provides structural propagation: after other techniques force variables (via pure literals, TSER equivalences), re-running Gauss finds 27 additional forced vars from the enriched system.
Boundary theorem (updated): The NP residue is 28% of SHA-256 variables (down from 90%) and 6.3% of clauses. BCP + XOR + BCE peel the encoding overhead and the gate structure, leaving only the core mixing circuit. SHA-256's diffusion rounds remain opaque to polynomial techniques — this is a limit theorem for polynomial preprocessing against cryptographic diffusion.
The pigeonhole principle PHP(n) states: n+1 pigeons cannot be placed into n holes with at most one pigeon per hole. As a SAT instance, it encodes:
Haken (1985) proved that any resolution proof of PHP(n) requires 2Ω(n) steps. This is a mathematical barrier, not an engineering one — no resolution-based solver (including CDCL) can prove pigeonhole UNSAT in polynomial time.
We applied the isomorphic transform to PHP(6) (7 pigeons, 6 holes: 42 variables, 133 clauses). The results reveal what polynomial techniques can and cannot extract from pigeonhole structure.
| Phase | Result |
|---|---|
| BCP | 0 assignments — no variable is forced |
| Unit candidates (84 tested) | 0 implied — every pigeon could go somewhere |
| Binary candidates (924 tested) | 126 implied — exactly the 126 original conflict clauses |
| Enrichment | 14 novel width-7 clauses derived |
The 126 binary clauses are the irreducible core — the isomorphism rediscovers them all and confirms none can be shortened. BCP and unit probing find nothing because the problem is perfectly symmetric: no single variable assignment leads to contradiction.
The 14 novel clauses are the interesting finding. Each original pigeon clause (e.g., "pigeon 1 goes to hole 1–6") gets extended with a literal from a different pigeon:
Original: (p1h1 ∨ p1h2 ∨ p1h3 ∨ p1h4 ∨ p1h5 ∨ p1h6)
Novel: (p1h1 ∨ p1h2 ∨ p1h3 ∨ p1h4 ∨ p1h5 ∨ p1h6 ∨ p3h5)
Reading: "pigeon 1 goes somewhere, OR pigeon 3 goes to hole 5."
This is a cross-pigeon implication — it encodes the fact that if pigeon 1 has no valid hole, then pigeon 3 is forced to hole 5. The isomorphism derived this from the binary conflict constraints in polynomial time.
Each of the 7 original pigeon clauses received exactly 2 cross-pigeon extensions, producing 14 novel clauses. The isomorphism is discovering the interaction structure between pigeons — which pigeons constrain which — without any domain knowledge of what "pigeon" or "hole" means.
| Instance | Vars | Clauses | Implied binaries | Novel clauses | Time |
|---|---|---|---|---|---|
| PHP(6) | 42 | 133 | 126 | 14 | 114ms |
| PHP(7) | 56 | 204 | 196 | 22 | 297ms |
The pattern holds: all binary conflict clauses are recovered, and 2×(n+1) cross-pigeon implications are derived. The novel clauses grow linearly with n, suggesting the isomorphism captures a fixed structural property of pigeonhole regardless of size.
The cross-pigeon implications are necessary but not sufficient for an UNSAT proof. They encode pairwise interactions but not the global counting argument ("7 pigeons, only 6 holes"). To prove UNSAT polynomially, one would need extended resolution — auxiliary variables encoding counting constraints like "at least k of the first j pigeons occupy holes 1..k."
The isomorphism operates within the original variable space (no new variables), so it is bounded by resolution. The cross-pigeon clauses represent the maximum structural information extractable in polynomial time without variable introduction. Bridging this gap — integrating strategic variable introduction into the isomorphic transform — is an open research direction.
The most significant result of this work. We combine Charles Dana's AUMA framework (2023) with BCP to create a polynomial UNSAT proof system that bypasses the resolution barrier.
AUMA operates in evaluation space — maximizing f(x) = |{clauses satisfied by x}| over the Boolean cube {0,1}n — rather than in clause space. The oracle uses three polynomial-budget stages:
Total budget: O(na) evaluations of f, where a > 1 is configurable. The key property is that Fourier probing captures the global structure of clause satisfaction — which variables most want to be true across all clauses simultaneously — in linear evaluations. This is information that resolution-based methods, which reason clause-by-clause, cannot cheaply obtain. The theoretical basis is developed in the AUMA paper (Dana, 2023).
Input: CNF formula Φ with m clauses over n variables.
Output: UNSAT proof or OPEN.
Complexity: O(na) for AUMA + O(k × nm) for BCP verification. Polynomial.
The combination is strictly stronger than resolution: the oracle provides a near-optimal assignment (the combinatorial insight), and BCP verifies that the residual seeds are unsatisfiable (the logical proof). Resolution must derive both insight and proof in clause space, which is why pigeonhole requires exponential resolution steps but only milliseconds here.
| Instance | Type | Vars | Clauses | Gap | Status | Time |
|---|---|---|---|---|---|---|
| dubois20–50 | XOR chains | 60–150 | 160–400 | 1 | PROVEN | 8–73ms |
| hole6 (PHP) | Pigeonhole | 42 | 133 | 1 | PROVEN | 4ms |
| hole7 (PHP) | Pigeonhole | 56 | 204 | 1 | PROVEN | 7ms |
| hole8 (PHP) | Pigeonhole | 72 | 297 | 1 | PROVEN | 15ms |
| ais10 | All Interval | 181 | 3151 | 4 | PROVEN | 568ms |
| pret150 | Structured | 150 | 400 | 1 | PROVEN | 83ms |
Haken (1985) proved that any resolution proof of the pigeonhole principle requires 2Ω(n) steps. CDCL solvers are resolution-based and cannot escape this bound. The AUMA proof is not resolution — it operates in evaluation space via Fourier analysis. For PHP(7) (8 pigeons, 7 holes):
The AUMA assignment is the combinatorial insight that resolution cannot cheaply derive: place pigeons 1–7 optimally, observe that pigeon 8 has nowhere to go. This is the counting argument — extracted not through variable introduction (extended resolution) but through evaluation-space optimization.
The technique requires:
On jnh16 (100 vars, 850 clauses), AUMA found gap=7 — too many seeds for efficient verification. This is the current boundary.
The AUMA proof system (MAXSAT oracle + BCP verification) is a polynomial proof system strictly stronger than tree-like resolution on the pigeonhole principle and XOR chain (dubois) families.
Evidence: Resolution requires 2Ω(n) steps on PHP(n). The AUMA proof requires O(n1.5 + n²) steps. Both are verified empirically up to n=72 variables.
Whether this proof system is polynomially bounded for all instances in co-NP remains open. The evaluation-space oracle (AUMA) is heuristic — it finds near-optimal assignments but does not guarantee MAXSAT optimality. The gap between “AUMA finds gap=1” and “the true MAXSAT optimum has gap=1” is the frontier.
For structured instances where polynomial preprocessing alone is insufficient, PolySAT employs a budgeted CDCL (Conflict-Driven Clause Learning) search with budget na decisions, where a > 1 is configurable. The search uses VSIDS-inspired variable scoring and 1-UIP conflict analysis.
All clauses learned during search are retained as persistent knowledge, even if the budget is exhausted before solving. This knowledge can be combined with subsequent polynomial preprocessing passes.
Benchmark: The Zebra puzzle (Einstein's riddle, 155 variables, 1135 clauses) is solved in 55ms by BCP + CDCL with n² budget.
The following SAT instance classes are solvable in polynomial time by PolySAT's preprocessing chain:
The open question remains: can the isomorphism, moulinette, or segmented rephrase discover polynomial-time solutions for structured instances beyond these known classes? The SHA-k ladder provides an empirical testbed for measuring progress.
polysat/
core.py Formula (mutable CNF, DIMACS I/O, XOR system)
solver.py Orchestrator (16 techniques, tiered fixpoint loop)
── Tier 0: Fundamentals ──
bcp.py Unit propagation (naive)
bcp_fast.py Watch-list BCP for large instances
pure.py Pure literal elimination
twosat.py 2-SAT (Tarjan SCC)
── Tier 1: XOR / Parity ──
xor_extract.py Recover XOR constraints from CNF (Soos 2011)
gauss_gf2.py Gaussian elimination over GF(2)
── Tier 2: Clause Reduction ──
bce.py Blocked clause elimination (Järvisalo et al. 2010)
paired_resolve.py Paired clause resolution (Dana & Boureau 2024)
hbr.py Hyper-binary resolution (Bacchus 2002)
tser.py Two-SAT equivalence & reverse (Dana & Boureau 2024)
── Tier 3: Simplification ──
equivalent.py Equivalent literal detection (Tarjan SCC)
subsume.py Forward + self-subsumption
eliminate.py Bounded variable elimination
── Tier 4: Probing ──
probe.py Failed literal probing (BCP-1)
vivify.py Clause vivification
── Tier 5: Structural ──
isomorphism.py Isomorphic SAT transform (Dana 2024)
rephrase.py Poly rephrase (simplify + uniformize)
moulinette.py Full formula isomorphic rephrase
── NP Fallback ──
walk.py Uniform walk (Dana 2024)
cdcl.py Budgeted CDCL search
generators/
sha256.py SHA-256 SAT instance generator
Zero dependencies. Pure Python. The entire library and API are deployed at sat.aws.monce.ai.