Inkwell Finance Documentation

CONFIDENTIAL & PROPRIETARY © 2026 Inkwell Finance, Inc. All Rights Reserved. This document is for informational purposes only and does not constitute legal, tax, or investment advice, nor an offer to sell or a solicitation to buy any security or other financial instrument. Any examples, structures, or flows described here are design intent only and may change.

Dagon’s single-batch match latency has a floor. Its per-batch amortized cost does not.

The property

FHE operations are O(1) per ciphertext regardless of how many of the ring’s slots carry meaningful data. A bootstrap on a ciphertext whose internal ring supports thousands of slots takes the same wall-clock whether all slots are populated or only a handful are. That leaves a large unused slice of ring capacity at the typical batch size — which we can fill by interleaving independent matches into the same ciphertexts. Every interleaved match rides the same compute pass.

What we see in practice

On a consumer-class GPU, the measured wall-clock for a single-batch match and a fully packed match (many concurrent batches interleaved into the same ring) differ by only a few percent. Everything else divides: the amortized cost per batch at full-ring pack is two to three orders of magnitude lower than the single-batch number. The GPU is already kernel-saturated on a single batch. Memory and compute throughput sit near 100% of device capacity during match. The remaining ring slots aren’t idle for lack of optimization — they’re idle because no batch is riding them. Interleaving fixes that.

Two packing modes

Stride-aware packing

Interleaves independent batches at regular slot positions inside one ciphertext. Rotation keys provisioned per stride. Nearly free amortization up to the ring’s capacity.

Multi-ciphertext loop

Runs the full graph repeatedly on separate ciphertexts. Useful only when concurrency needs exceed what stride-packing can fit in a single ring. Net-negative on a saturated GPU because per-graph setup cost dominates.

For realistic concurrent match volumes, the stride-aware path covers the whole working range with no meaningful wall-clock penalty.

Market-scale throughput

Two different numbers answer two different questions:

Single-batch latency — the UX number. How long does a trader wait for a batch to clear? Seconds. The measured H100 N=32 anchor is 7.77 s (every output decrypted and asserted).
Amortized-per-batch cost — the throughput number. How many match batches can the engine clear per unit of time at full packing? Milliseconds.

On a saturated datacenter GPU, the amortized cost can cross into sub-Solana-slot territory at full-ring pack: the engine clears a batch in less wall-clock than a block takes to produce. Network I/O and on-chain settlement become the scaling bottleneck at that point, not FHE.

A timing measurement, not yet a production guarantee

The amortization measurements are cost-correct: real kernels executing on real hardware, producing real wall-clocks. Semantic correctness of every packing variant against the plaintext reference is an ongoing engineering gate; some packing modes are still being hardened for production use. The throughput numbers are what the infrastructure can do — production rollout follows when the full correctness matrix is green.

Benchmarks

End-to-end wall-clock on consumer + datacenter GPU, and what drives it.

Interactive dashboard ↗

Toggle stride, scheme, and sign complexity to see the amortization curve shift against measured anchors.

Start Here

Concepts

Architecture

FAQ

Performance

Comparisons

Objections

SIMD amortization

The property

What we see in practice

Two packing modes

Stride-aware packing

Multi-ciphertext loop

Market-scale throughput

A timing measurement, not yet a production guarantee

Benchmarks

Interactive dashboard ↗

Start Here

Concepts

Architecture

FAQ

Performance

Comparisons

Objections

Documentation Index

​The property

​What we see in practice

​Two packing modes

Stride-aware packing

Multi-ciphertext loop

​Market-scale throughput

​A timing measurement, not yet a production guarantee

​Related

Benchmarks

Interactive dashboard ↗

The property

What we see in practice

Two packing modes

Market-scale throughput

A timing measurement, not yet a production guarantee

Related