Docs
AI Assistant

Models and Routing

How EQUIRE routes AI workloads across model tiers, attributes usage to your organization, enforces zero-retention, and applies per-task timeouts.

EQUIRE runs every AI request through the Vercel AI Gateway when configured, with a direct-Anthropic fallback. Each workload is bound to a model tier (semantic alias) and a feature tag (workflow it originated from). Tier and tag together determine the exact model used, the data-handling guarantees applied, and how the request is attributed to your organization.

Model Tiers

The platform exposes five semantic aliases. Code calls a tier by name; the resolver picks the concrete model based on whether the Gateway is in play.

TierPurposeGateway modelDirect fallback
chatGeneral reasoning, deal Q&A, draftinganthropic/claude-sonnet-4.6claude-sonnet-4-6
fastCheap classification, gap detection, simple summariesanthropic/claude-haiku-4-5claude-haiku-4-5
criticalHighest-stakes reasoning — IC memo critical sections, expert opinionanthropic/claude-opus-4.7claude-opus-4-7
extractionDocument extraction with long context (up to 64k tokens)anthropic/claude-sonnet-4.6claude-sonnet-4-6
verificationLightweight second-pass verification of extracted valuesanthropic/claude-haiku-4-5claude-haiku-4-5

Aliases let admins swap underlying models without touching feature code. Calling chat always returns the currently-blessed mid-tier model, even after a version bump.

Gateway versus Direct Fallback

  • Gateway (preferred) — used when AI_GATEWAY_API_KEY or a Vercel OIDC token is configured. Adds attribution, zero-retention, and unified routing across providers.
  • Direct Anthropic — used when only ANTHROPIC_API_KEY is set. Same models, but no Gateway-side attribution metadata; embeddings are not available on this fallback.

The platform does not silently mix providers. If you have only Anthropic configured, every request uses Anthropic; if the Gateway is configured, every request uses the Gateway.

How Features Map to Tiers

Every workload that calls the AI is tagged with a feature (from a closed enum). The tag drives both attribution and the default tier. Admin overrides (below) can change the model behind a feature without changing the tag.

Chat Surfaces

Feature tagDefault tierNotes
chatchat (Sonnet)Deal-mode chat assistant
portfolio-chatchat (Sonnet)Portfolio-mode chat
mandate-dashboardchat (Sonnet)Mandate-dashboard advisor
digestchat (Sonnet)Prospecting digest narrative

Documents and Extraction

Feature tagDefault tierNotes
document-processingextraction then verificationSonnet 64k for the extraction pass; Haiku 8k for the verification pass

document-processing is the umbrella tag for the entire ingestion pipeline. Extraction and verification share the same tag so usage reports show one line for the workflow rather than splitting across passes.

IC Memo

ic-memo mixes tiers section-by-section based on stakes:

  • Opus (critical) — Executive Summary, Investment Thesis, Market Analysis, Risk Factors
  • Sonnet (chat) — all other narrative sections
  • Haiku (fast) — section classification and gap detection

The mix is fixed in code rather than configurable per memo, so every memo gets the same provenance posture. Admin overrides can swap the model under any tier without changing the section-to-tier mapping.

Other Workflows

Feature tagDefault tierNotes
expert-opinioncritical (Opus)Per-assumption expert commentary
valuationchat (Sonnet)Valuation analyst pipeline
deal-health-coherencechat (Sonnet)Tier-2 coherence analyzer
deal-health-deep-scanchat (Sonnet)Tier-3 deep scan
researchchat or fastSonnet for narrative, Haiku for filtering
originationchat (Sonnet)Origination AI advisor and prospect briefs
organization-enrichmentchat (Sonnet)Org research cache
corbis-researchchat (Sonnet)Corbis MCP research agent
deliverablechat (Sonnet)Deliverable drafting
image-genn/aImage generation routes through a separate provider, not the LLM stack

Anthropic-Only Features

A small set of features is restricted to Anthropic models even when the Gateway has alternatives configured:

  • document-processing
  • extraction (legacy alias retained for older usage rows)
  • deal-health-deep-scan

These features rely on capabilities — native PDF vision, long-context caching, multi-step reasoning depth — that are currently best-served by Anthropic. If an admin tries to override one of these to a non-Anthropic model, the override is silently rejected and the default tier is used. The admin UI shows the same restriction so the constraint is visible up front.

Admin Overrides

Platform admins can override the model behind any feature tag without changing application code. Overrides live in Supabase (credeals.platform_ai_config) and are read with a 60-second in-memory cache.

The admin UI exposes a closed allowlist of models that can be set as overrides:

  • anthropic/claude-opus-4.7
  • anthropic/claude-sonnet-4.6
  • anthropic/claude-haiku-4-5
  • openai/gpt-5.4
  • openai/gpt-5

OpenAI models are accepted for non–Anthropic-only features. Setting a model that violates the Anthropic-only constraint is silently rejected, and clearing an override returns the feature to its default tier on the next cache refresh.

Override changes take effect within 60 seconds; there is no application restart required.

Gateway Attribution

Every AI call goes out with attribution metadata so usage reports, audit trails, and cost analytics line up with the workflow that triggered the call.

What Gets Sent

  • feature — the workflow tag (e.g. ic-memo, valuation, chat)
  • org — your organization ID, omitted only for system-level calls with no org context
  • user — the user who initiated the action, when available
  • zeroDataRetention: true — always set, on every call

Telemetry Allowlist

EQUIRE writes telemetry events to its own observability layer alongside the Gateway. The allowed metadata keys are limited to a closed set including feature, orgId, surface, documentType, documentId, attachmentCount, pdfPageCount, tenantCount, confidence, verificationMode, readiness, mandateCount, steps, and hitStepLimit. Anything outside the allowlist is dropped before being recorded.

Crucially, prompts and completions are never persisted in telemetry. recordInputs and recordOutputs are hard-coded to false. The only place a prompt or completion lives is in transit — and the Gateway is configured for zero retention.

Zero Data Retention

The Gateway is configured to not store prompts or completions. Every request carries zeroDataRetention: true, which the Gateway honors by short-circuiting any request-body logging or training-set capture on the upstream provider side.

In practical terms:

  • Your prompts and the model's responses are not retained by the Gateway after the response is delivered.
  • They are not used to train any model.
  • They are not visible in cross-org analytics.

EQUIRE's own database stores the outputs of AI work (extracted fields, IC memo text, valuation assumptions, audit log entries) where they are needed for the product. Those outputs live under your org's RLS scope and are deleted when you delete the underlying deal or scheduled account-level deletion runs.

Timeouts

Every AI call carries an explicit timeout drawn from a small set of presets. Total timeout is the upper bound for the whole call; chunk timeout is the maximum gap between streamed tokens before the call is aborted.

PresetTotalChunkUsed for
quick15s5sLightweight classification, gap checks
standard60s15sMost chat, narrative, and deal-tool calls
verification60s15sHaiku verification pass on extracted documents
pdfVerification180s45sPDF vision verification — vision TTFT can be 20–45s for 30–100 page OMs before tokens flow
extraction300s60sLong-context document extraction (up to 64k output tokens)

pdfVerification is the longest preset by design. Native-PDF vision ingestion has a long time-to-first-token before any streaming starts, so a shorter preset would abort valid calls mid-think.

Embeddings

When the Gateway is configured, embeddings use openai/text-embedding-3-small (1536 dimensions). They carry the same feature, orgId, and userId attribution as LLM calls.

Direct Anthropic does not provide an embeddings endpoint. When only the Anthropic fallback is in place, embedding calls fail soft — the helper returns an empty array and any features that require embeddings degrade gracefully (text search falls back to keyword matching, research narratives skip the semantic-similarity stage). Production deployments should always configure the Gateway so embeddings are available.

Where to Go Next

  • For the data-handling and human-in-the-loop posture of every AI surface, see Trust and safety.
  • For which tools each chat mode exposes, see Tool reference.
  • For specialist agents that drive the bulk of document-processing and corbis-research traffic, see Specialist agents.
Edit on GitHub

Last updated on

On this page