Reference/Data Ingestion Readiness Plan

Data Ingestion Readiness Plan

This document defines the preparatory plan for the reliable measurement backbone. It covers ENG-5169, ENG-5239, and ENG-5240.

It is deliberately a documentation and specification artifact. It does not change ingestion behavior, database schema, background jobs, or API responses yet.

Purpose

The measurement layer needs to answer one operational question before the model layer can be trusted:

Are we collecting, normalizing, and deriving the data needed for the model
and portfolio analytics every day, and can we explain every gap?

Gordon validated the dashboard as useful as a consolidated display, but the next phase depends on source completeness. The system should not simply say that a value is missing. It should explain whether the gap is caused by identity mapping, provider coverage, entitlement, market hours, adjusted contracts, expired horizons, unsupported fields, or a retryable ingestion failure.

Scope

TicketScopePreparatory output
ENG-5169Reliable Amberdata ingestion planning for execution quotes, forward marks, terminal marks, liquidity, and readiness diagnostics.Source contract, ingestion domains, provider questions, and readiness gates.
ENG-5239Durable checkpointed ingestion jobs.Job contract for resumable, idempotent, non-interactive ingestion.
ENG-5240Coverage probe and reason taxonomy.Reason buckets that explain missing data without mutating analytics.

Current Foundation

The system already captures daily Amberdata snapshots for the core crypto instruments:

  • Greeks
  • IV surface
  • spot
  • open interest
  • BTC and ETH snapshot history
  • Alpha trades for ICOI and IMST

This is the correct foundation. The next readiness layer should make the data capture auditable and repeatable enough that model pages can distinguish:

  • data that is loaded and fresh
  • data that is stale
  • data that is missing because the provider returned no rows
  • data that is missing because the contract identity is unresolved
  • data that is unsupported by the chosen provider
  • data that is pending because the horizon has not matured
  • data that exists only as a proxy, fallback, or estimate

Source Domains

DomainNeeded forCurrent policy
Amberdata BTC/ETH daily snapshotsMarket setup, regime, surface, term structure, probability model.Continue daily capture and surface coverage diagnostics.
Amberdata TradFi option quotesMSTR/COIN execution benchmark and forward marks.Use only for traded-contract execution and outcome marks when identity and source coverage are valid.
Alpha trade feedICOI and IMST trade rows.Current production trade source. Keep idempotent source IDs and portfolio scope.
Future SMA feedSMA copilot and restructuring evaluation.Deferred until field contract and derivative-equivalent mapping are reviewed.
Accounting/PMS/broker/settlementRealized P&L and terminal outcomes.Required before realized P&L becomes decision-grade.
Model estimatesExploration, scenario analysis, fallback sensitivity.Must remain visibly estimated and excluded from sourced aggregates by default.

Durable Job Contract

Any broad historical or rolling ingestion process should run as a durable job, not as a browser-bound or interactive admin request.

Durable jobs should be:

  • checkpointed: store progress by source, portfolio, instrument, date window, and job stage
  • idempotent: rerunning a job should update the same logical observations without duplicating rows
  • resumable: failures should restart from the last completed checkpoint
  • bounded: request windows should respect provider limits, including one-hour quote windows where applicable
  • auditable: record source URL family, provider status, row counts, timestamps, retry count, and final reason state
  • non-blocking: long-running backfills should not depend on an open browser or a single request lifecycle
  • source-labelled: every written observation should carry source and quality metadata

Job Matrix

JobPrimary purposeInputsOutput state
Daily market snapshotMaintain BTC/ETH Greeks, IV surface, spot, and OI coverage.Currency, snapshot date, Amberdata endpoints.available, partial, stale, unavailable, not_entitled.
Execution quote backfillSource bid/ask/mid near Alpha trade execution.Trade identity, execution timestamp, tolerance.quoted, stale, unavailable, identity_unmapped, unsupported, not_entitled.
Forward mark backfillSource 1d/7d/30d marks for traded contracts.Trade identity, horizon, target timestamp.available, pending, unavailable, expired_before_horizon, terminal_mark_available.
Terminal mark backfillSource expiry or terminal quote/settlement for expired-before-horizon rows.Trade identity, expiry timestamp, terminal source.terminal_mark_available, terminal_mark_unavailable, settlement_required.
Liquidity observation backfillSource traded-contract OI, volume, spread, quote age.Trade identity, timestamp, provider fields.available, partial, field_not_returned, unavailable.
Coverage probeExplain why requested analytics can or cannot be sourced.Trade set, identity map, provider probes.Reason buckets only; no economics mutation.
SMA source validationValidate future SMA feed fields before ingestion.SMA extract sample and mapping rules.ready, needs_mapping, needs_source_field, unsupported.

Coverage Reason Taxonomy

Coverage probes should return explicit reason buckets. They should not mutate trade economics or invent marks.

Reason bucketMeaningOperator implication
identity_mappedTrade resolved to an approved derivative-equivalent identity.Source lookup may proceed.
identity_unmappedRequired contract terms or mapping approval are missing.Analytics should be withheld until mapped.
adjusted_unsupportedAdjusted root or deliverable cannot be sourced through the current provider path.Needs authoritative adjusted-contract mapping or another source.
proxy_usedA proxy or fallback was used for a limited analytic context.Display proxy caveat and exclude from unsupported metric families.
provider_discovery_requiredProvider symbol format, endpoint, or entitlement is uncertain.Engineering/provider validation needed.
execution_quote_availableFresh same-contract bid/ask/mid exists near execution.Execution benchmark can be sourced.
execution_quote_staleQuote exists but outside tolerance.Exclude from fresh execution aggregates or show stale caveat.
execution_quote_emptyProvider returned success with no quote rows.Treat as unavailable unless provider explains alternate query requirements.
execution_quote_not_entitledProvider denies access.Entitlement or alternate source required.
liquidity_field_missingProvider returned quote but not OI or volume.Liquidity is partial; do not show zero OI/volume.
forward_mark_availableMark exists at requested horizon.Forward outcome can be computed for that row.
forward_mark_pendingHorizon has not matured yet.Show pending, not unavailable or zero P&L.
forward_mark_unavailableHorizon matured but no sourced mark exists.Exclude from forward aggregates.
expired_before_horizonRequested horizon occurs after option expiry.Attempt terminal mark; do not look for impossible live quote.
terminal_mark_availableTerminal quote or settlement source exists.Expiry-aware outcome can be sourced.
terminal_mark_unavailableNo terminal source exists.Keep outcome unavailable.
outside_market_hoursTarget timestamp falls outside regular market session.Use approved nearest-session policy or keep unavailable.
retry_exhaustedProvider requests failed after retry policy.Keep unavailable with retry metadata.
unsupported_structureInstrument structure is outside current model/source support.Withhold decision-grade analytics.

Data Readiness Output

Every readiness output should be intelligible to an operator. A useful response is not just a percentage. It should say what is ready, what is missing, and what action is required.

Minimum readiness shape:

data_readiness = {
  scope,
  as_of,
  requested_rows,
  analyzable_rows,
  blocked_rows,
  pending_rows,
  estimated_rows,
  proxy_rows,
  reason_counts,
  source_counts,
  oldest_source_timestamp,
  newest_source_timestamp,
  next_action
}

Example operator wording:

Execution benchmark coverage is blocked mostly by adjusted-contract identity and
provider-empty quote responses. Forward 7d outcomes are additionally blocked by
expired-before-horizon rows that need terminal marks.

Provider Validation Questions

These questions can be investigated before Gordon makes business decisions because they do not force an economic assumption.

QuestionWhy it matters
What symbol format should be used for MSTR/COIN historical option quotes?Prevents false unavailable states caused by malformed identifiers.
Does Amberdata support adjusted roots such as 2MSTR and 2COIN?Controls whether adjusted Alpha rows can become decision-grade from Amberdata alone.
What does HTTP 200 with empty data mean for a valid historical option lookup?Distinguishes no coverage from no trades from wrong query format.
Are contract OI and volume available from the same level-1 endpoint or a different endpoint?Determines whether execution-liquidity can become complete.
What are the historical retention limits by endpoint?Controls backfill feasibility and default lookback.
Can the provider return terminal close/settlement values for expired options?Controls expiry-aware forward outcome coverage.
What batching and rate limits apply to one-hour quote requests?Required for durable backfills and checkpoint sizing.
How are market holidays and outside-market-hours timestamps represented?Required for nearest-session policy.

Decision Points

DecisionWhy it mattersCan proceed now?
What source is authoritative for execution quotes?Fill quality depends on same-contract bid/ask/mid around execution.Build source-state contract now; source approval still needed.
What source is authoritative for forward marks?1d/7d/30d outcome P&L depends on sourced marks.Define job and quality states now; production marks need source validation.
What source is authoritative for terminal/expiry marks?Expired-before-horizon rows need settlement, close, or terminal quote logic.Define terminal states now; source approval still needed.
Are adjusted roots such as 2MSTR and 2COIN supported directly?Determines whether adjusted rows can be sourced without fallback.Provider investigation can proceed now.
Is standard-root fallback acceptable for any metric family?Controls whether fallback rows are exploratory or decision-grade.Keep caveated now; business approval needed for use.
What release threshold is acceptable if every gap has a reason code but coverage is incomplete?100% source coverage may not be realistic.Build reason taxonomy now; threshold needs sponsor/team review.
What nearest-session policy should apply outside market hours?Quote timing can change execution and mark interpretation.Document states now; production policy needs approval.

Implementation Boundaries

Preparatory work can proceed now:

  • document source contracts and provider questions
  • define durable job and checkpoint requirements
  • define coverage reason buckets
  • expose non-mutating readiness diagnostics in later implementation
  • keep source-quality states separate from economic values

Implementation should wait for business/source answers before:

  • treating proxy or fallback marks as decision-grade
  • using adjusted standard-root fallback in sourced aggregates
  • displaying realized P&L
  • ingesting SMA trades into production scope
  • making strong trade warnings dependent on incomplete source families

Acceptance For Preparatory Stage

Preparatory documentation is complete when:

  • the reliable measurement backbone is described as a first-class product pillar
  • every remaining data gap can be assigned to a named reason bucket
  • durable backfill jobs have a clear non-interactive operating contract
  • provider-validation questions are explicit
  • no documentation implies missing economics should be shown as zero
  • no documentation implies a proxy market context is the same as a traded-contract mark