Autotune-as-a-Service • GGUF / llama.cpp • Credits-based

Find the best quant, ctx, batch for your target hardware—automatically.

Stop guessing and rerunning benchmarks. Sigilant runs controlled tests across variants and returns a winner plus artifacts you can trust: run summary, per-variant metrics, and gates (quality + RAM).

Request access See how it works

Credits-first (no subscription required)

Buy credits, run autotune, get results. Add subscriptions later only when you have repeat usage.

Target hardware matching

Validate on CPU or cloud flavors that approximate your real deployment environment.

Trustable artifacts

Each run produces structured JSON + CSV metrics, plus gates and a clear winner decision.

How it works

Choose model + target

Select your GGUF model and the target device profile (local CPU or cloud).

Run controlled benchmarks

We evaluate candidate variants across quant, ctx, batch (and more).

Get the winner + artifacts

Receive a clear winner decision plus run-level summary, gates, and exportable metrics.

Why Sigilant

Speed without guesswork Automate repetitive benchmarking and focus on shipping.

Gates built-in Quality + memory gating helps avoid “fast but broken” variants.

Credible outputs Artifacts are structured for audits, sharing, and regression comparisons.

Designed for real deployments Test on hardware profiles that resemble your production environment.

FAQ

Is this a subscription?

Not initially. Early access is credits-based: purchase credits, run autotunes, top up as needed.

What do I get after a run?

A winner decision plus structured artifacts (JSON/CSV) including per-variant metrics and gate rollups.

Can I pay by invoice?

Yes for teams: we can support invoice / bank transfer for annual or larger credit blocks.