Models and Routing
How EQUIRE routes AI workloads across model tiers, attributes usage to your organization, enforces zero-retention, and applies per-task timeouts.
EQUIRE runs every AI request through the Vercel AI Gateway when configured, with a direct-Anthropic fallback. Each workload is bound to a model tier (semantic alias) and a feature tag (workflow it originated from). Tier and tag together determine the exact model used, the data-handling guarantees applied, and how the request is attributed to your organization.
Model Tiers
The platform exposes five semantic aliases. Code calls a tier by name; the resolver picks the concrete model based on whether the Gateway is in play.
| Tier | Purpose | Gateway model | Direct fallback |
|---|---|---|---|
chat | General reasoning, deal Q&A, drafting | anthropic/claude-sonnet-4.6 | claude-sonnet-4-6 |
fast | Cheap classification, gap detection, simple summaries | anthropic/claude-haiku-4-5 | claude-haiku-4-5 |
critical | Highest-stakes reasoning — IC memo critical sections, expert opinion | anthropic/claude-opus-4.7 | claude-opus-4-7 |
extraction | Document extraction with long context (up to 64k tokens) | anthropic/claude-sonnet-4.6 | claude-sonnet-4-6 |
verification | Lightweight second-pass verification of extracted values | anthropic/claude-haiku-4-5 | claude-haiku-4-5 |
Aliases let admins swap underlying models without touching feature code. Calling chat always returns the currently-blessed mid-tier model, even after a version bump.
Gateway versus Direct Fallback
- Gateway (preferred) — used when
AI_GATEWAY_API_KEYor a Vercel OIDC token is configured. Adds attribution, zero-retention, and unified routing across providers. - Direct Anthropic — used when only
ANTHROPIC_API_KEYis set. Same models, but no Gateway-side attribution metadata; embeddings are not available on this fallback.
The platform does not silently mix providers. If you have only Anthropic configured, every request uses Anthropic; if the Gateway is configured, every request uses the Gateway.
How Features Map to Tiers
Every workload that calls the AI is tagged with a feature (from a closed enum). The tag drives both attribution and the default tier. Admin overrides (below) can change the model behind a feature without changing the tag.
Chat Surfaces
| Feature tag | Default tier | Notes |
|---|---|---|
chat | chat (Sonnet) | Deal-mode chat assistant |
portfolio-chat | chat (Sonnet) | Portfolio-mode chat |
mandate-dashboard | chat (Sonnet) | Mandate-dashboard advisor |
digest | chat (Sonnet) | Prospecting digest narrative |
Documents and Extraction
| Feature tag | Default tier | Notes |
|---|---|---|
document-processing | extraction then verification | Sonnet 64k for the extraction pass; Haiku 8k for the verification pass |
document-processing is the umbrella tag for the entire ingestion pipeline. Extraction and verification share the same tag so usage reports show one line for the workflow rather than splitting across passes.
IC Memo
ic-memo mixes tiers section-by-section based on stakes:
- Opus (
critical) — Executive Summary, Investment Thesis, Market Analysis, Risk Factors - Sonnet (
chat) — all other narrative sections - Haiku (
fast) — section classification and gap detection
The mix is fixed in code rather than configurable per memo, so every memo gets the same provenance posture. Admin overrides can swap the model under any tier without changing the section-to-tier mapping.
Other Workflows
| Feature tag | Default tier | Notes |
|---|---|---|
expert-opinion | critical (Opus) | Per-assumption expert commentary |
valuation | chat (Sonnet) | Valuation analyst pipeline |
deal-health-coherence | chat (Sonnet) | Tier-2 coherence analyzer |
deal-health-deep-scan | chat (Sonnet) | Tier-3 deep scan |
research | chat or fast | Sonnet for narrative, Haiku for filtering |
origination | chat (Sonnet) | Origination AI advisor and prospect briefs |
organization-enrichment | chat (Sonnet) | Org research cache |
corbis-research | chat (Sonnet) | Corbis MCP research agent |
deliverable | chat (Sonnet) | Deliverable drafting |
image-gen | n/a | Image generation routes through a separate provider, not the LLM stack |
Anthropic-Only Features
A small set of features is restricted to Anthropic models even when the Gateway has alternatives configured:
document-processingextraction(legacy alias retained for older usage rows)deal-health-deep-scan
These features rely on capabilities — native PDF vision, long-context caching, multi-step reasoning depth — that are currently best-served by Anthropic. If an admin tries to override one of these to a non-Anthropic model, the override is silently rejected and the default tier is used. The admin UI shows the same restriction so the constraint is visible up front.
Admin Overrides
Platform admins can override the model behind any feature tag without changing application code. Overrides live in Supabase (credeals.platform_ai_config) and are read with a 60-second in-memory cache.
The admin UI exposes a closed allowlist of models that can be set as overrides:
anthropic/claude-opus-4.7anthropic/claude-sonnet-4.6anthropic/claude-haiku-4-5openai/gpt-5.4openai/gpt-5
OpenAI models are accepted for non–Anthropic-only features. Setting a model that violates the Anthropic-only constraint is silently rejected, and clearing an override returns the feature to its default tier on the next cache refresh.
Override changes take effect within 60 seconds; there is no application restart required.
Gateway Attribution
Every AI call goes out with attribution metadata so usage reports, audit trails, and cost analytics line up with the workflow that triggered the call.
What Gets Sent
feature— the workflow tag (e.g.ic-memo,valuation,chat)org— your organization ID, omitted only for system-level calls with no org contextuser— the user who initiated the action, when availablezeroDataRetention: true— always set, on every call
Telemetry Allowlist
EQUIRE writes telemetry events to its own observability layer alongside the Gateway. The allowed metadata keys are limited to a closed set including feature, orgId, surface, documentType, documentId, attachmentCount, pdfPageCount, tenantCount, confidence, verificationMode, readiness, mandateCount, steps, and hitStepLimit. Anything outside the allowlist is dropped before being recorded.
Crucially, prompts and completions are never persisted in telemetry. recordInputs and recordOutputs are hard-coded to false. The only place a prompt or completion lives is in transit — and the Gateway is configured for zero retention.
Zero Data Retention
The Gateway is configured to not store prompts or completions. Every request carries zeroDataRetention: true, which the Gateway honors by short-circuiting any request-body logging or training-set capture on the upstream provider side.
In practical terms:
- Your prompts and the model's responses are not retained by the Gateway after the response is delivered.
- They are not used to train any model.
- They are not visible in cross-org analytics.
EQUIRE's own database stores the outputs of AI work (extracted fields, IC memo text, valuation assumptions, audit log entries) where they are needed for the product. Those outputs live under your org's RLS scope and are deleted when you delete the underlying deal or scheduled account-level deletion runs.
Timeouts
Every AI call carries an explicit timeout drawn from a small set of presets. Total timeout is the upper bound for the whole call; chunk timeout is the maximum gap between streamed tokens before the call is aborted.
| Preset | Total | Chunk | Used for |
|---|---|---|---|
quick | 15s | 5s | Lightweight classification, gap checks |
standard | 60s | 15s | Most chat, narrative, and deal-tool calls |
verification | 60s | 15s | Haiku verification pass on extracted documents |
pdfVerification | 180s | 45s | PDF vision verification — vision TTFT can be 20–45s for 30–100 page OMs before tokens flow |
extraction | 300s | 60s | Long-context document extraction (up to 64k output tokens) |
pdfVerification is the longest preset by design. Native-PDF vision ingestion has a long time-to-first-token before any streaming starts, so a shorter preset would abort valid calls mid-think.
Embeddings
When the Gateway is configured, embeddings use openai/text-embedding-3-small (1536 dimensions). They carry the same feature, orgId, and userId attribution as LLM calls.
Direct Anthropic does not provide an embeddings endpoint. When only the Anthropic fallback is in place, embedding calls fail soft — the helper returns an empty array and any features that require embeddings degrade gracefully (text search falls back to keyword matching, research narratives skip the semantic-similarity stage). Production deployments should always configure the Gateway so embeddings are available.
Where to Go Next
- For the data-handling and human-in-the-loop posture of every AI surface, see Trust and safety.
- For which tools each chat mode exposes, see Tool reference.
- For specialist agents that drive the bulk of
document-processingandcorbis-researchtraffic, see Specialist agents.
Last updated on