Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.inkwell.finance/llms.txt

Use this file to discover all available pages before exploring further.

CONFIDENTIAL & PROPRIETARY © 2026 Inkwell Finance, Inc. All Rights Reserved. This document is for informational purposes only and does not constitute legal, tax, or investment advice, nor an offer to sell or a solicitation to buy any security or other financial instrument. Any examples, structures, or flows described here are design intent only and may change.
Every number on this page is tagged with a status — measured, measured-with-caveat, projected, or extrapolated-from-SOTA — and backed by a committed log. No figure on this page is an opaque estimate. For the methodology that defines those tags, see Methodology.

Headline

N=32 match cycle · H100

7.77 s wall-clock on an NVIDIA H100 SXM, with every output decrypted and asserted against the plaintext expected value. Correctness-preserving per-op programmable bootstrap.

Atomic homomorphic min · H100

15.1 ms mean (== p50) over 20 iterations. 20/20 decryptions match the plaintext answer. The one-comparison floor — every higher-level op composes from it.
The headline numbers were measured on a fresh H100 SXM cloud pod. Inter-pod variance on this hardware is typically in the low-double digits of percent; the 7.77 s figure is a stable reference within that envelope.

Programmable-bootstrap headlines

The match cycle uses a programmable-bootstrap FHE backend — every homomorphic primitive folds one bootstrap, so chain depth is not a correctness constraint. Headline figures on the production target (H100 SXM):
workloadwall-clockverification
Atomic min(u32, u32)15.1 ms mean20/20 outputs verified
N=32 match cycle7.77 severy output decrypted and asserted
Verification runs on the host after cudaStreamSynchronize; its wall-clock is reported separately from the match timer. Decrypt overhead is well under one percent of the match wall-clock.

CKKS reference (parity-fail exhibit)

A leveled CKKS path on the same hardware was run as a throughput reference. At the bootstrap chain depth this matcher needs, CKKS does not preserve correctness: the wall-clock measurement completes but the decrypted answer diverges from the plaintext shadow beyond tolerance. We cite this number only with the failure mode stated inline; see Why CKKS fails at scale for the structural reason.

Next-generation FHE (projected — pre-alpha)

Production REFHE substrate is expected to deliver substantial per-primitive speedup over current TFHE-style benchmarks once mainnet alpha lands. Concrete margins will be measured at that point. Any pre-mainnet figure for this backend is treated as extrapolated-from-SOTA under the methodology discipline — derived from published primitive cost, not measurement.

Verification scope

Every measured TFHE-style number on this page decrypts every output after the match timer stops and asserts equality against the plaintext expected value. A final invariant assertion in each harness guards against silent undercounts. Decrypt wall-clock is reported separately so reviewers can confirm verification runs outside the match timer.

Methodology

The four status tags (measured / measured-caveat / projected / extrapolated) and the rules for citing each.

Why CKKS fails at scale

What the CKKS H100 parity failure means, the absolute bootstrap noise floor, and why programmable-bootstrap is the correctness-preserving path.

SIMD amortization

How packing takes single-batch latency down to per-batch amortized cost on a saturated GPU.

Interactive scaling dashboard ↗

Walk through the headline measured anchors interactively.