Document Ingestion
How EQUIRE turns uploaded OMs, rent rolls, and operating statements into a structured deal record — pipeline stages, upload limits, reconciliation, embeddings, and valuation seeding.
EQUIRE accepts supported file types (see below) and runs each document through a processing pipeline. Extracted values — occupancy, rent, NOI, tenant names, loan terms — link back to source documents and text snippets so you can verify without hunting through originals.
Supported uploads and limits
Accepted extensions match the upload policy: .pdf, .xlsx, .xls, .docx, .doc, .pptx, .ppt, .txt, .csv, .zip. PDFs can be up to 100 MB; ZIP archives can be up to 500 MB (the container limit — document members use per-type caps: PDF 100 MB, other supported docs 50 MB, 500 MB total extracted documents); other accepted types are capped at 50 MB per file. Images (PNG, JPEG, WebP) dropped on the Documents page or extracted from ZIPs route to Property Photos (up to 12 per deal, 15 MB each). Large multipart payloads may use direct-upload paths when the client exceeds the small form-data threshold (4 MB).
ZIP archives expand to supported CRE formats and property photos; nested zips and unsupported types are skipped per policy.
Document types you can ingest
| Document type | Typical formats |
|---|---|
| Offering memorandum | |
| Rent roll | PDF, Excel, CSV |
| Operating statement / T-12 | PDF, Excel |
| Lease abstracts | PDF, Word |
| Loan and debt schedules | |
| Purchase and sale agreements | |
| Broker packages | |
| Comp sheets | PDF, Excel |
| Presentations | PDF, .pptx |
What gets extracted
Offering memorandum
Property identity (address, asset class, year built, total SF), asking price, in-place NOI, cap rate, occupancy, and tenant or lease abstracts included in the package.
Rent roll
Tenant name, suite or unit, leased square footage, rents, lease commencement and expiration, renewal options, and recovery structure. Summary or aggregate rows (floorplan averages, grand totals) are filtered where the pipeline treats them as non-tenant rows.
Operating statement / T-12
Effective gross revenue, operating expenses by line item, NOI, and occupancy by period.
Debt and loan schedules
Loan amount, rate, IO period, amortization, maturity, and lender where present.
Pipeline flow (orchestration)
Processing is driven by the document process route. At a high level, after classification and extraction (including verification substages), the run continues with:
- Reconciliation — merge into the deal schema, detect conflicts, persist merged state.
- Artifacts and validation — persist field candidates, decisions, and validation issues.
- Property sync — align property metadata across documents where applicable.
- Valuation projection — deterministic assumption seeding from the merged schema (
populateAssumptionswith AI inference skipped in this step). Full AI-heavy valuation work still happens when you open the Valuation tab and run the model builder; see Valuation DCF. - Chunking and embeddings — text is chunked and embedded into
document_chunksfor retrieval after reconciliation (not between extraction and merge). - Finalization — document status, review issues, and downstream staleness (e.g. IC memo) are updated.
If wall-clock processing approaches the soft time budget, non-critical continuation work (projection and embedding) may be deferred to a follow-up job recorded on the ingestion payload rather than blocking the first response.
Processing stages you may see
Document processing stages (enum DocumentProcessingStage) include, in order of progression:
queued → classification → extraction → verification_deterministic → verification_llm → verification_evidence → reconciliation → projection → completed (or failed). Ingestion jobs can also schedule continuation stages embedding and projection on the job payload when work is split across runs.
Classification, extraction, and verification
Classification uses filename signals, content keywords, and model-based fallback when needed. Extraction uses native PDF handling or text parsing by format. Verification passes check extracted values against source evidence before reconciliation.
Conflict detection
When two documents disagree on the same field, EQUIRE records a conflict and routes it to the Review tab instead of silently merging. You choose the winning value or enter a manual override.
Documents are processed and stored within your organization's account. Do not upload material non-public information you are not authorized to handle in this environment. For how AI is used in processing, see Processing AI data.
Source provenance
Field-level attribution
Fields in the deal record reference originating documents. On Extracted Data, opening a value shows the source document and the snippet that supported it.
Manual overrides
User-entered values take highest precedence in downstream valuation precedence and are marked as user-sourced.
Reviewing conflicts
The Review tab lists open review items: data conflict items, source review items, and field check items. Field checks include low-confidence values, missing fields, clarifications, diligence confirmations, and financial verification work. Resolving them updates the deal schema and valuation inputs. Document review status is recomputed when conflicts are cleared (including auto-heal paths when issues are resolved).
The workbench lanes (Blocking Data, Source Review, and Supporting Checks) are a recommended order of operations over the same queue, not a second set of totals.
Worker dispatch and admins
Background workers can invoke processing with x-worker-secret validated against PIPELINE_WORKER_SECRET (and related internal headers). Preview deployments should set this secret if worker dispatch is required; ops details live in internal pipeline documentation, not in this user guide.
Email and intake (optional path)
Inbound email with attachments can create or enrich deals via the intake pipeline (webhook-verified, org-resolved). User-facing surfaces include pending-attach flows (GET /api/intake/pending-attach, PATCH /api/intake/[recordId]/dismiss). Operational email setup is documented for admins in the repo's docs/operations/email-setup.md; it is not duplicated here.
Limits and known gaps
- Non-English documents: extraction targets English.
- Scanned PDFs: low-quality scans may yield partial extraction; higher-quality scans or alternative formats help. Vision-style reading may be attempted when the pipeline classifies a scan.
- Rent roll summary rows: aggregates may be filtered; restore missed real rows manually on Extracted Data.
- Large documents: processing continues in the background; watch the Documents tab for status.
Reclassifying a misclassified document
Reclassify and reprocess from the Documents tab without re-uploading when the type was wrong.
After ingestion
Resolved data feeds Rent roll & financial data cleanup, then Valuation DCF, IC Memo, and Deliverables. For the full workspace tour, see Getting started.
Last updated on
Prospecting & Origination
How EQUIRE scouts opportunities, scores them against mandates, uses the Prospecting Analyst and Scout Review, and converts prospects into Screening deals — with server-enforced conversion gates.
Rent roll & financial data
How EQUIRE's Rent Roll and Extracted Data tabs turn parsed documents into clean tenancy and T-12 inputs before valuation finalization.