Inkwell Finance Documentation

CONFIDENTIAL & PROPRIETARY © 2026 Inkwell Finance, Inc. All Rights Reserved. This document is for informational purposes only and does not constitute legal, tax, or investment advice, nor an offer to sell or a solicitation to buy any security or other financial instrument. Any examples, structures, or flows described here are design intent only and may change.

Every number in this section is status-tagged — measured, measured-with-caveat, projected, or extrapolated — and backed by a committed run log.

N=32 match cycle · H100

7.77 s on an NVIDIA H100 SXM, every output decrypted and asserted against the plaintext expected value. The correctness-preserving floor on the current FHE backend.

Atomic min · H100

~15 ms mean over 20 iterations, 20/20 outputs decrypted and asserted. The one-comparison floor — every higher-level op composes from it.

Verification discipline

Four status tags (measured / measured-with-caveat / projected / extrapolated) govern every number on these pages. See Methodology for the citation rules.

The full picture: single-batch compute is measured on hardware from consumer CPU through H100; correctness is verified on every cited measurement; and the cells that can’t yet clear correctness (CKKS past the bootstrap noise floor) are tagged so no one mis-cites them.

What’s in this section

Benchmarks

The headline scheme × hardware figures with the verification discipline that backs every number.

Methodology

The four status tags and the rules for citing each. Why the discipline matters for an FHE product.

Why CKKS fails

Why CKKS at production matching depth does not clear correctness, and why programmable-bootstrap is the correctness-preserving path.

SIMD amortization

How packing delivers near-free per-batch amortization on a saturated GPU.

Scaling dashboard ↗

Interactive model anchored to the measured headline numbers.

The short version

Correctness-preserving path is programmable-bootstrap FHE. Each homomorphic primitive folds one bootstrap, so there is no chain-depth limit on the bisection.
CKKS at production bootstrap chain depth does not clear correctness on Dagon’s matching circuit. We measure it for throughput parity on the analysis pages, but we do not ship it.
H100 is the production target. Consumer GPU (4070 Ti) stays available for local development; measurements show a 5–6× consumer → datacenter speedup on the same workload.
The next-generation backend is a target, not a measurement. Every projected figure for it is derived from published primitive cost and tagged accordingly.

BenchmarksHeadline measured wall-clock for Dagon's match engine on a datacenter GPU, with the verification discipline behind every cited number.