Data Ingestion Readiness Plan
This document defines the preparatory plan for the reliable measurement backbone. It covers ENG-5169, ENG-5239, and ENG-5240.
It is deliberately a documentation and specification artifact. It does not change ingestion behavior, database schema, background jobs, or API responses yet.
Purpose
The measurement layer needs to answer one operational question before the model layer can be trusted:
Are we collecting, normalizing, and deriving the data needed for the model
and portfolio analytics every day, and can we explain every gap?
Gordon validated the dashboard as useful as a consolidated display, but the next phase depends on source completeness. The system should not simply say that a value is missing. It should explain whether the gap is caused by identity mapping, provider coverage, entitlement, market hours, adjusted contracts, expired horizons, unsupported fields, or a retryable ingestion failure.
Scope
| Ticket | Scope | Preparatory output |
|---|---|---|
ENG-5169 | Reliable Amberdata ingestion planning for execution quotes, forward marks, terminal marks, liquidity, and readiness diagnostics. | Source contract, ingestion domains, provider questions, and readiness gates. |
ENG-5239 | Durable checkpointed ingestion jobs. | Job contract for resumable, idempotent, non-interactive ingestion. |
ENG-5240 | Coverage probe and reason taxonomy. | Reason buckets that explain missing data without mutating analytics. |
Current Foundation
The system already captures daily Amberdata snapshots for the core crypto instruments:
- Greeks
- IV surface
- spot
- open interest
- BTC and ETH snapshot history
- Alpha trades for
ICOIandIMST
This is the correct foundation. The next readiness layer should make the data capture auditable and repeatable enough that model pages can distinguish:
- data that is loaded and fresh
- data that is stale
- data that is missing because the provider returned no rows
- data that is missing because the contract identity is unresolved
- data that is unsupported by the chosen provider
- data that is pending because the horizon has not matured
- data that exists only as a proxy, fallback, or estimate
Source Domains
| Domain | Needed for | Current policy |
|---|---|---|
| Amberdata BTC/ETH daily snapshots | Market setup, regime, surface, term structure, probability model. | Continue daily capture and surface coverage diagnostics. |
| Amberdata TradFi option quotes | MSTR/COIN execution benchmark and forward marks. | Use only for traded-contract execution and outcome marks when identity and source coverage are valid. |
| Alpha trade feed | ICOI and IMST trade rows. | Current production trade source. Keep idempotent source IDs and portfolio scope. |
| Future SMA feed | SMA copilot and restructuring evaluation. | Deferred until field contract and derivative-equivalent mapping are reviewed. |
| Accounting/PMS/broker/settlement | Realized P&L and terminal outcomes. | Required before realized P&L becomes decision-grade. |
| Model estimates | Exploration, scenario analysis, fallback sensitivity. | Must remain visibly estimated and excluded from sourced aggregates by default. |
Durable Job Contract
Any broad historical or rolling ingestion process should run as a durable job, not as a browser-bound or interactive admin request.
Durable jobs should be:
- checkpointed: store progress by source, portfolio, instrument, date window, and job stage
- idempotent: rerunning a job should update the same logical observations without duplicating rows
- resumable: failures should restart from the last completed checkpoint
- bounded: request windows should respect provider limits, including one-hour quote windows where applicable
- auditable: record source URL family, provider status, row counts, timestamps, retry count, and final reason state
- non-blocking: long-running backfills should not depend on an open browser or a single request lifecycle
- source-labelled: every written observation should carry source and quality metadata
Job Matrix
| Job | Primary purpose | Inputs | Output state |
|---|---|---|---|
| Daily market snapshot | Maintain BTC/ETH Greeks, IV surface, spot, and OI coverage. | Currency, snapshot date, Amberdata endpoints. | available, partial, stale, unavailable, not_entitled. |
| Execution quote backfill | Source bid/ask/mid near Alpha trade execution. | Trade identity, execution timestamp, tolerance. | quoted, stale, unavailable, identity_unmapped, unsupported, not_entitled. |
| Forward mark backfill | Source 1d/7d/30d marks for traded contracts. | Trade identity, horizon, target timestamp. | available, pending, unavailable, expired_before_horizon, terminal_mark_available. |
| Terminal mark backfill | Source expiry or terminal quote/settlement for expired-before-horizon rows. | Trade identity, expiry timestamp, terminal source. | terminal_mark_available, terminal_mark_unavailable, settlement_required. |
| Liquidity observation backfill | Source traded-contract OI, volume, spread, quote age. | Trade identity, timestamp, provider fields. | available, partial, field_not_returned, unavailable. |
| Coverage probe | Explain why requested analytics can or cannot be sourced. | Trade set, identity map, provider probes. | Reason buckets only; no economics mutation. |
| SMA source validation | Validate future SMA feed fields before ingestion. | SMA extract sample and mapping rules. | ready, needs_mapping, needs_source_field, unsupported. |
Coverage Reason Taxonomy
Coverage probes should return explicit reason buckets. They should not mutate trade economics or invent marks.
| Reason bucket | Meaning | Operator implication |
|---|---|---|
identity_mapped | Trade resolved to an approved derivative-equivalent identity. | Source lookup may proceed. |
identity_unmapped | Required contract terms or mapping approval are missing. | Analytics should be withheld until mapped. |
adjusted_unsupported | Adjusted root or deliverable cannot be sourced through the current provider path. | Needs authoritative adjusted-contract mapping or another source. |
proxy_used | A proxy or fallback was used for a limited analytic context. | Display proxy caveat and exclude from unsupported metric families. |
provider_discovery_required | Provider symbol format, endpoint, or entitlement is uncertain. | Engineering/provider validation needed. |
execution_quote_available | Fresh same-contract bid/ask/mid exists near execution. | Execution benchmark can be sourced. |
execution_quote_stale | Quote exists but outside tolerance. | Exclude from fresh execution aggregates or show stale caveat. |
execution_quote_empty | Provider returned success with no quote rows. | Treat as unavailable unless provider explains alternate query requirements. |
execution_quote_not_entitled | Provider denies access. | Entitlement or alternate source required. |
liquidity_field_missing | Provider returned quote but not OI or volume. | Liquidity is partial; do not show zero OI/volume. |
forward_mark_available | Mark exists at requested horizon. | Forward outcome can be computed for that row. |
forward_mark_pending | Horizon has not matured yet. | Show pending, not unavailable or zero P&L. |
forward_mark_unavailable | Horizon matured but no sourced mark exists. | Exclude from forward aggregates. |
expired_before_horizon | Requested horizon occurs after option expiry. | Attempt terminal mark; do not look for impossible live quote. |
terminal_mark_available | Terminal quote or settlement source exists. | Expiry-aware outcome can be sourced. |
terminal_mark_unavailable | No terminal source exists. | Keep outcome unavailable. |
outside_market_hours | Target timestamp falls outside regular market session. | Use approved nearest-session policy or keep unavailable. |
retry_exhausted | Provider requests failed after retry policy. | Keep unavailable with retry metadata. |
unsupported_structure | Instrument structure is outside current model/source support. | Withhold decision-grade analytics. |
Data Readiness Output
Every readiness output should be intelligible to an operator. A useful response is not just a percentage. It should say what is ready, what is missing, and what action is required.
Minimum readiness shape:
data_readiness = {
scope,
as_of,
requested_rows,
analyzable_rows,
blocked_rows,
pending_rows,
estimated_rows,
proxy_rows,
reason_counts,
source_counts,
oldest_source_timestamp,
newest_source_timestamp,
next_action
}
Example operator wording:
Execution benchmark coverage is blocked mostly by adjusted-contract identity and
provider-empty quote responses. Forward 7d outcomes are additionally blocked by
expired-before-horizon rows that need terminal marks.
Provider Validation Questions
These questions can be investigated before Gordon makes business decisions because they do not force an economic assumption.
| Question | Why it matters |
|---|---|
| What symbol format should be used for MSTR/COIN historical option quotes? | Prevents false unavailable states caused by malformed identifiers. |
Does Amberdata support adjusted roots such as 2MSTR and 2COIN? | Controls whether adjusted Alpha rows can become decision-grade from Amberdata alone. |
What does HTTP 200 with empty data mean for a valid historical option lookup? | Distinguishes no coverage from no trades from wrong query format. |
| Are contract OI and volume available from the same level-1 endpoint or a different endpoint? | Determines whether execution-liquidity can become complete. |
| What are the historical retention limits by endpoint? | Controls backfill feasibility and default lookback. |
| Can the provider return terminal close/settlement values for expired options? | Controls expiry-aware forward outcome coverage. |
| What batching and rate limits apply to one-hour quote requests? | Required for durable backfills and checkpoint sizing. |
| How are market holidays and outside-market-hours timestamps represented? | Required for nearest-session policy. |
Decision Points
| Decision | Why it matters | Can proceed now? |
|---|---|---|
| What source is authoritative for execution quotes? | Fill quality depends on same-contract bid/ask/mid around execution. | Build source-state contract now; source approval still needed. |
| What source is authoritative for forward marks? | 1d/7d/30d outcome P&L depends on sourced marks. | Define job and quality states now; production marks need source validation. |
| What source is authoritative for terminal/expiry marks? | Expired-before-horizon rows need settlement, close, or terminal quote logic. | Define terminal states now; source approval still needed. |
Are adjusted roots such as 2MSTR and 2COIN supported directly? | Determines whether adjusted rows can be sourced without fallback. | Provider investigation can proceed now. |
| Is standard-root fallback acceptable for any metric family? | Controls whether fallback rows are exploratory or decision-grade. | Keep caveated now; business approval needed for use. |
| What release threshold is acceptable if every gap has a reason code but coverage is incomplete? | 100% source coverage may not be realistic. | Build reason taxonomy now; threshold needs sponsor/team review. |
| What nearest-session policy should apply outside market hours? | Quote timing can change execution and mark interpretation. | Document states now; production policy needs approval. |
Implementation Boundaries
Preparatory work can proceed now:
- document source contracts and provider questions
- define durable job and checkpoint requirements
- define coverage reason buckets
- expose non-mutating readiness diagnostics in later implementation
- keep source-quality states separate from economic values
Implementation should wait for business/source answers before:
- treating proxy or fallback marks as decision-grade
- using adjusted standard-root fallback in sourced aggregates
- displaying realized P&L
- ingesting SMA trades into production scope
- making strong trade warnings dependent on incomplete source families
Acceptance For Preparatory Stage
Preparatory documentation is complete when:
- the reliable measurement backbone is described as a first-class product pillar
- every remaining data gap can be assigned to a named reason bucket
- durable backfill jobs have a clear non-interactive operating contract
- provider-validation questions are explicit
- no documentation implies missing economics should be shown as zero
- no documentation implies a proxy market context is the same as a traded-contract mark