Docs
Workflows

Document Ingestion

How EQUIRE turns uploaded OMs, rent rolls, and operating statements into a structured deal record — pipeline stages, upload limits, reconciliation, embeddings, and valuation seeding.

EQUIRE accepts supported file types (see below) and runs each document through a processing pipeline. Extracted values — occupancy, rent, NOI, tenant names, loan terms — link back to source documents and text snippets so you can verify without hunting through originals.

Supported uploads and limits

Accepted extensions match the upload policy: .pdf, .xlsx, .xls, .docx, .doc, .pptx, .ppt, .txt, .csv, .zip. PDFs can be up to 100 MB; ZIP archives can be up to 500 MB (the container limit — document members use per-type caps: PDF 100 MB, other supported docs 50 MB, 500 MB total extracted documents); other accepted types are capped at 50 MB per file. Images (PNG, JPEG, WebP) dropped on the Documents page or extracted from ZIPs route to Property Photos (up to 12 per deal, 15 MB each). Large multipart payloads may use direct-upload paths when the client exceeds the small form-data threshold (4 MB).

ZIP archives expand to supported CRE formats and property photos; nested zips and unsupported types are skipped per policy.

Document types you can ingest

Document typeTypical formats
Offering memorandumPDF
Rent rollPDF, Excel, CSV
Operating statement / T-12PDF, Excel
Lease abstractsPDF, Word
Loan and debt schedulesPDF
Purchase and sale agreementsPDF
Broker packagesPDF
Comp sheetsPDF, Excel
PresentationsPDF, .pptx

What gets extracted

Offering memorandum

Property identity (address, asset class, year built, total SF), asking price, in-place NOI, cap rate, occupancy, and tenant or lease abstracts included in the package.

Rent roll

Tenant name, suite or unit, leased square footage, rents, lease commencement and expiration, renewal options, and recovery structure. Summary or aggregate rows (floorplan averages, grand totals) are filtered where the pipeline treats them as non-tenant rows.

Operating statement / T-12

Effective gross revenue, operating expenses by line item, NOI, and occupancy by period.

Debt and loan schedules

Loan amount, rate, IO period, amortization, maturity, and lender where present.

Pipeline flow (orchestration)

Processing is driven by the document process route. At a high level, after classification and extraction (including verification substages), the run continues with:

  1. Reconciliation — merge into the deal schema, detect conflicts, persist merged state.
  2. Artifacts and validation — persist field candidates, decisions, and validation issues.
  3. Property sync — align property metadata across documents where applicable.
  4. Valuation projection — deterministic assumption seeding from the merged schema (populateAssumptions with AI inference skipped in this step). Full AI-heavy valuation work still happens when you open the Valuation tab and run the model builder; see Valuation DCF.
  5. Chunking and embeddings — text is chunked and embedded into document_chunks for retrieval after reconciliation (not between extraction and merge).
  6. Finalization — document status, review issues, and downstream staleness (e.g. IC memo) are updated.

If wall-clock processing approaches the soft time budget, non-critical continuation work (projection and embedding) may be deferred to a follow-up job recorded on the ingestion payload rather than blocking the first response.

Processing stages you may see

Document processing stages (enum DocumentProcessingStage) include, in order of progression:

queuedclassificationextractionverification_deterministicverification_llmverification_evidencereconciliationprojectioncompleted (or failed). Ingestion jobs can also schedule continuation stages embedding and projection on the job payload when work is split across runs.

Classification, extraction, and verification

Classification uses filename signals, content keywords, and model-based fallback when needed. Extraction uses native PDF handling or text parsing by format. Verification passes check extracted values against source evidence before reconciliation.

Conflict detection

When two documents disagree on the same field, EQUIRE records a conflict and routes it to the Review tab instead of silently merging. You choose the winning value or enter a manual override.

Documents are processed and stored within your organization's account. Do not upload material non-public information you are not authorized to handle in this environment. For how AI is used in processing, see Processing AI data.

Source provenance

Field-level attribution

Fields in the deal record reference originating documents. On Extracted Data, opening a value shows the source document and the snippet that supported it.

Manual overrides

User-entered values take highest precedence in downstream valuation precedence and are marked as user-sourced.

Reviewing conflicts

The Review tab lists open review items: data conflict items, source review items, and field check items. Field checks include low-confidence values, missing fields, clarifications, diligence confirmations, and financial verification work. Resolving them updates the deal schema and valuation inputs. Document review status is recomputed when conflicts are cleared (including auto-heal paths when issues are resolved).

The workbench lanes (Blocking Data, Source Review, and Supporting Checks) are a recommended order of operations over the same queue, not a second set of totals.

Worker dispatch and admins

Background workers can invoke processing with x-worker-secret validated against PIPELINE_WORKER_SECRET (and related internal headers). Preview deployments should set this secret if worker dispatch is required; ops details live in internal pipeline documentation, not in this user guide.

Email and intake (optional path)

Inbound email with attachments can create or enrich deals via the intake pipeline (webhook-verified, org-resolved). User-facing surfaces include pending-attach flows (GET /api/intake/pending-attach, PATCH /api/intake/[recordId]/dismiss). Operational email setup is documented for admins in the repo's docs/operations/email-setup.md; it is not duplicated here.

Limits and known gaps

  • Non-English documents: extraction targets English.
  • Scanned PDFs: low-quality scans may yield partial extraction; higher-quality scans or alternative formats help. Vision-style reading may be attempted when the pipeline classifies a scan.
  • Rent roll summary rows: aggregates may be filtered; restore missed real rows manually on Extracted Data.
  • Large documents: processing continues in the background; watch the Documents tab for status.

Reclassifying a misclassified document

Reclassify and reprocess from the Documents tab without re-uploading when the type was wrong.

After ingestion

Resolved data feeds Rent roll & financial data cleanup, then Valuation DCF, IC Memo, and Deliverables. For the full workspace tour, see Getting started.

Edit on GitHub

Last updated on

On this page