Expertise / Decision Intelligence & Reasoning

Decisions you can defend. Reasoning you can audit.

Underwriting, claims, credit, regulatory review, clinical decision support — the decisions that matter need more than a fluent answer from a frontier model. We build reasoning systems that combine LLMs with rules engines, optimization, and policy retrieval — with the evidence trail, citations, and counterfactuals that hold up to a regulator.

Why a reasoning model alone isn’t enough.

Chat with a frontier reasoning model and you’ll get a confident answer. Ship that same answer into an underwriting workflow and you’ll get a regulator’s question you can’t answer. Production decision systems are compound systems — and Gartner is already tracking enterprise adoption slowing the moment security, evidence, and audit gaps surface.

Confident hallucinations on the edge cases

Reasoning models still fail on long-tail conditions, and their post-hoc rationalizations don’t reflect the actual reasoning trace. Without grounded evidence and hard constraints, the model improvises in exactly the places that matter most.

No hard constraints

A reasoner without a rules engine and a solver will negotiate around the policy lines it shouldn’t cross. Hard constraints — eligibility, regulatory limits, capital rules — belong in deterministic code, not in a prompt.

Black-box outputs

A score with no evidence is useless in adjudication. Glass-box outputs — cited policy clause, retrieved evidence, counterfactual reasoning, confidence — are now the production standard, not a nice-to-have.

Rubber-stamping by design

When the operator UX nudges humans toward “Approve,” you lose the value of the human review and inherit every model error. The interface is part of the decision system; we engineer it that way.

Compound systems, not single prompts.

A reasoning model orchestrates specialist components — each engineered for its job, all evaluated together. This is the architecture that holds up under audit.

  • 01 · Define the decision and the evidence space. What’s being decided, against what policy, with what evidence sources, under what regulatory regime. Most projects skip this — and inherit ambiguity into every downstream component.
  • 02 · Retrieve the evidence. Policy clauses, prior decisions, source documents, internal data. Grounded retrieval feeds the reasoner so the answer is anchored in the controlling documents, not in the model’s parametric memory.
  • 03 · Reason with hard constraints. A reasoning model proposes. A rules engine enforces. A solver optimizes where trade-offs exist. Each component does the job it’s best at — and only that job.
  • 04 · Produce glass-box output. Decision plus cited evidence plus policy reference plus counterfactual plus confidence. Formatted for audit, not for the demo. The output is the artifact a regulator or an internal auditor can replay.
  • 05 · Engineer the human-in-the-loop UX. Operator interfaces that surface uncertainty, encourage disagreement, route exceptions correctly, and log every override. Rubber-stamping is a design failure, not a user failure.
  • 06 · Evaluate and govern. Continuous evaluation against historical decisions and shadow runs. Alignment with EU AI Act high-risk obligations, ISO/IEC 42001, and sector regulators built in from day one — not bolted on after legal review.
Talk through your decision use case
AUDIT · EVALUATION · REGULATION Reasoner Policy retrieval CONTROLLING DOCS Evidence CASE DATA · PRIORS Rules engine HARD CONSTRAINTS Solver OPTIMIZE TRADE-OFFS Glass-box decision Decision Evidence Policy citation Counterfactual Confidence HUMAN REVIEW
Reasoner orchestrates policy, evidence, rules, and solver — outputting decisions with cited evidence, policy reference, and counterfactual. Humans review exceptions. Everything is logged.

Decision systems we ship.

Three production decision patterns — each engineered for a different shape of high-stakes work.

01

Adjudication systems

Underwriting, claims, credit, KYC/KYB. Compound reasoning over policy and evidence, with citation back to the controlling document and full audit trail per decision. Built for regulated lines of business and for the operators who have to defend the call.

02

Triage & prioritization systems

Risk scoring, exception routing, escalation logic. The agent doesn’t act on the work — it makes the call that determines who or what does. The highest-impact decision layer in most operations, and the one most teams skip past.

03

Decision-support copilots

Clinical, legal, compliance. Reasoning over evidence, structured for the human expert to decide. Built around the FDA’s “decision support, not replacement” framing — the standard now spreading well beyond healthcare.

What we engineer around the reasoner.

The model is one piece. These are the layers that make a reasoning system trustworthy in production.

01

Evidence & policy retrieval

Grounded retrieval over the controlling documents — policy manuals, regulatory texts, prior decisions, internal data. Permission-aware, version-controlled, citation-preserving by default.

02

Hard constraints & solvers

Rules engines for deterministic policy. Optimization solvers for trade-offs (capital, capacity, pricing, eligibility). The reasoner proposes within a space defined by code, not by hope.

03

Audit trail & explainability

Glass-box outputs, decision logs, counterfactuals, model and prompt versioning, replay capability. Everything a regulator, an internal auditor, or a customer adjudicator can pull apart.

Decisions that earn their keep.

The strongest 2026 use cases share a shape: contestable, high-stakes calls where the answer has to be defended — with cited evidence, a policy reference, and an audit trail a regulator can replay.

Insurance underwriting

Broker submission triage, dynamic pricing, line-of-business adjudication. Compound reasoning over policy, exposure, and broker history — with the evidence the underwriter actually needs.

Claims adjudication

Multi-agent FNOL → damage assessment → reserves → fraud screen, with a full evidence trail from first report through settlement. Built for the audit a year later, not the demo today.

Credit & lending decisions

Scorecards combined with LLM-narrated reasoning, regulatory-defensible. Taktile and competitors have set the bar on glass-box adjudication; we build to that bar in your stack.

Regulatory & compliance review

KYC/KYB, sanctions, AML triage, regulatory exam preparation. Evidence-cited, policy-grounded, fully logged — the work that historically swallows specialist headcount.

Clinical decision support

Evidence-cited recommendations for clinicians, built to the FDA’s “decision support, not replacement” framing. HIPAA-ready reasoning workspaces (per OpenAI’s and Anthropic’s Q1 2026 healthcare offerings) are now the floor.

Internal audit & investigation

Evidence assembly, pattern detection, narrative drafting under audit standards. Compound reasoning where pure retrieval and pure narrative both fall short.

Frequently asked questions.

When is a reasoning-tier model worth the latency and cost?

Whenever the decision is contestable. Adjudication, triage, anything a regulator or a customer might dispute — the reasoning premium pays for itself in defensibility. For drafting, summarization, and retrieval, a fast model is still the right call; we route work to the right tier rather than putting everything through Opus or GPT-5.

How do we satisfy the EU AI Act and our sector regulator on these decisions?

By engineering for it from day one. Most decisions we work on fall under the EU AI Act’s high-risk regime — Annex III obligations are now deferred to 2 December 2027 under the May 2026 Digital Omnibus, but the substantive requirements (risk management, data governance, logging, human oversight, transparency) haven’t softened. We map the system against EU AI Act Articles 9–15, ISO/IEC 42001, and your sector regulator (PRA, EIOPA, FDA, etc.) before we build, and we ship documentation that closes those gaps.

How do we combine an LLM reasoner with our existing rules engine?

Don’t replace the rules engine — orchestrate around it. The LLM reasons over evidence and proposes a decision; the rules engine enforces hard constraints; a solver handles trade-offs where they exist. Existing investments in Drools, ODM, Camunda DMN, or a custom policy engine stay in place — the reasoner sits above them, not instead of them.

Do we own the decision system you build with us?

Yes. The code, the prompts, the evaluation harness, the retrieval pipelines, the rules and solver integrations — all yours, in your repos. We design every engagement so your team can operate, extend, and revalidate the system independently. Enablement is built into the project, not sold back to you as a retainer.

What does “glass-box output” actually mean in production?

Every decision ships with five things: the call, the cited evidence, the controlling policy reference, a counterfactual (“what would have flipped this”), and a calibrated confidence. It is the artifact your operator, your internal auditor, and your regulator can pull apart twelve months later. If your decision system can’t produce that artifact today, it isn’t production-ready.

Can we run the reasoner on-prem or in a sovereign environment?

Usually yes. For sovereignty-constrained or regulated workloads we deploy on Azure/AWS/GCP sovereign regions, on private inference (Databricks Mosaic, Snowflake Cortex, IBM watsonx), or fully on-prem with open-weight reasoners. The retrieval, rules, solver, and audit layers run in your environment by default; the reasoner placement is a design choice we make against your data residency and latency constraints.

Where to start.

Decision System Review · 3 weeks · fixed fee

Bring us a high-stakes decision workflow.

Underwriting, claims, credit, adjudication, clinical, compliance. We map the decision against modern compound reasoning architectures, identify the rules, solver, and retrieval components needed, and deliver a target architecture with regulatory mapping (EU AI Act, ISO/IEC 42001, sector regime), an evaluation plan, and a sequenced build path.

What you get: a decision-system readiness assessment scored against twelve criteria; a target architecture for the reasoner, retrieval, rules, and solver layers in your environment; a regulatory mapping against EU AI Act Articles 9–15, ISO/IEC 42001, and your sector regulator; a staged delivery plan with timelines and effort estimates; and one workshop with your decision-owning, risk, and engineering leads. Led by a senior consultant — fixed scope, fixed fee.

Book a Decision System Review
Start the conversation

Ready to ship a decision system your auditors will accept?

A 30-minute conversation with a senior consultant. Bring a decision workflow you need to defend — underwriting, claims, credit, regulatory review, clinical. We’ll tell you whether reasoning-tier architecture is the right answer, where the rules, solver, and retrieval components belong, and what a Decision System Review would surface.

Book a Decision System Review