Operator Runbook
This runbook describes production workflows for checking health, syncing Alpha trades, running sync diagnostics, triggering snapshots, dispatching alerts, and triaging missing numbers.
Production URLs
| Service | Canonical URL |
|---|---|
| Frontend | https://vol-frontend-244916812493.us-east4.run.app |
| Backend | https://vol-backend-244916812493.us-east4.run.app |
Use canonical project-number URLs in docs, tickets, and handoff notes. Treat *.a.run.app as Cloud Run internal output only.
Health Check
Backend:
curl -s -o /dev/null -w '%{http_code}\n' \
https://vol-backend-244916812493.us-east4.run.app/api/health
Expected:
200
Frontend:
curl -s -o /dev/null -w '%{http_code}\n' \
https://vol-frontend-244916812493.us-east4.run.app/
Expected:
200
Alpha Trade Sync Workflow
Use /trades in the frontend.
1. Check Sync Scope
The OMS/PMS Sync card should show:
- Portfolios:
IMST, ICOI - Include pending:
noby default - Last successful sync timestamp
- Watermark
If the card shows a configuration error, check backend env vars and Secret Manager references.
2. Incremental Preview
Keep Dry run checked and leave the full-scope diagnostic unchecked.
Expected if no new Alpha trades exist:
Fetched = 0Normalized = 0Rejected = 0- Premium delta
0.00 - Quantity delta
0.00 - Sync diagnostic scope is incremental delta
Use incremental preview to check whether there are new or changed source rows.
3. Full Sync Diagnostic Preview
Check Dry run and the full-scope diagnostic.
Use this when you need to confirm the selected Alpha scope is fully loaded, or after any sync logic change.
Expected for a clean loaded scope:
- Source rows equals cached rows
- Source quantity equals cached quantity
- Source premium USD equals cached premium USD
- Quantity delta is
0.0 - Premium delta USD is
0.0 - No unmatched source IDs
- No unmatched local IDs
4. Real Sync
Only turn Dry run off after the preview is acceptable.
After a real sync:
- Last successful sync time should update.
- Watermark should advance if new source rows were processed.
- Trade table should show Alpha rows with portfolio, status, recon status, price per contract, multiplier, premium, and fees.
Current Verified Alpha Scope
The current production scope is:
portfolio_names=ICOI,IMST|include_pending=false
Known clean full-scope sync diagnostic evidence:
source_row_count: 749
local_scope_row_count: 749
source_total_quantity: 269267.0
local_total_quantity: 269267.0
quantity_delta: 0.0
source_total_premium_usd: 466630651.75
local_total_premium_usd: 466630651.75
premium_delta_usd: 0.0
unmatched_source_trade_ids: []
unmatched_local_trade_ids: []
Trade Analytics Backfill And Freshness
Real Alpha syncs from /trades now run this enrichment pipeline automatically after the Alpha rows are written. Use the manual checks below after deploying trade-performance analytics changes, when investigating a partial post-sync analytics result, or when the /performance page reports stale or missing enrichment.
The pipeline enriches synced Alpha executions with:
- trade-time dashboard market context
- execution benchmark quality
- persisted forward outcome marks
- quality counts for missing or stale enrichment
- freshness state for the last successful analytics run
- derived-on-read opportunity alignment
The analytics backfill reuses internal_trade_sync_state under the trade_analytics_pipeline source. It is safe to re-run because the underlying enrichment tables upsert by source trade ID and context key.
1. Check Trade Analytics Status
curl -sS \
'https://vol-backend-244916812493.us-east4.run.app/api/internal/trades/analytics/status?source=alpha_sync'
Optional portfolio-scoped status:
curl -sS \
'https://vol-backend-244916812493.us-east4.run.app/api/internal/trades/analytics/status?source=alpha_sync&portfolio=IMST'
Read the response as:
quality.total_trades: Alpha rows in the selected scope.quality.missing_context: rows without trade-time market context.quality.stale_context: rows where nearest market context was outside tolerance.quality.unmapped_context: rows whose Alpha underlying does not map to a dashboard market proxy.quality.missing_benchmark: rows without an execution benchmark record.quality.unavailable_benchmark: rows where no defensible quote/model mark exists.quality.available_outcomes: rows with a sourced 7d forward mark and computable P&L.quality.pending_outcomes: rows where the 7d horizon has not matured yet.quality.unavailable_outcomes: rows without sourced outcome marks.freshness.last_successful_sync_at: last completed analytics backfill state write.freshness.last_error: last non-dry-run pipeline error, if any.
Unavailable counts are quality states, not zero-valued metrics. Do not interpret missing context, unavailable benchmarks, or unavailable outcomes as neutral performance.
2. Dry-Run The Backfill
ADMIN_SECRET="$(gcloud secrets versions access latest --secret=vol-admin-secret)"
curl -sS -X POST \
https://vol-backend-244916812493.us-east4.run.app/api/internal/trades/analytics/backfill \
-H "x-admin-secret: ${ADMIN_SECRET}" \
-H 'Content-Type: application/json' \
-d '{
"source": "alpha_sync",
"limit": 500,
"dry_run": true,
"include_context": true,
"include_execution": true,
"include_outcomes": false
}'
Expected:
statusisokorpartial_error.rows_processedshows rows processed by persisted enrichment steps.stepslistsmarket_context,execution_benchmarks,opportunity_alignment, andoutcome_attribution.opportunity_alignmentisderived_on_read; it recomputes from persisted context rows when the API is requested.outcome_attributionreads persisted marks when present. It remains blank/unavailable until sourced marks are backfilled.- Dry run does not update freshness state.
- If one step fails, the other still runs and the failed step is explicit in
errors.
3. Run The Backfill
Only run this after the dry-run output is acceptable:
curl -sS -X POST \
https://vol-backend-244916812493.us-east4.run.app/api/internal/trades/analytics/backfill \
-H "x-admin-secret: ${ADMIN_SECRET}" \
-H 'Content-Type: application/json' \
-d '{
"source": "alpha_sync",
"limit": 500,
"dry_run": false,
"include_context": true,
"include_execution": true,
"include_outcomes": false
}'
After the real run, repeat the status check and verify:
freshness.last_successful_sync_atupdated when there were no pipeline errors.freshness.last_erroris empty for a clean run.freshness.last_run_summary.rows_processedreflects the latest non-dry-run enrichment batch.- missing/stale/unavailable counts are understood and documented before relying on Performance.
/performanceloads and discloses quality states rather than converting unavailable values to zero.
4. Backfill Forward Outcome Marks
Forward marks are source-intensive because the backend looks up TradFi option quotes for the actual MSTR/COIN contract around each horizon. Run this as its own bounded batch, starting with dry run:
curl -sS -X POST \
https://vol-backend-244916812493.us-east4.run.app/api/internal/trades/outcomes/backfill \
-H "x-admin-secret: ${ADMIN_SECRET}" \
-H 'Content-Type: application/json' \
-d '{
"source": "alpha_sync",
"limit": 100,
"lookback_days": 365,
"horizons": [1, 7, 30],
"dry_run": true,
"quote_window_minutes": 30,
"max_mark_staleness_minutes": 60
}'
If the dry run shows acceptable source coverage, rerun with "dry_run": false.
Interpretation:
unrealized_markmeans sourced mark and P&L are available.pendingmeans the horizon has not matured.unavailablemeans no fresh sourced mark or required economic input exists.stalemark quality means a quote existed but was outside tolerance and is excluded from P&L.- Missing or stale marks are excluded from averages; they are not zero P&L.
5. Operational Sequence
Standard production sequence:
- Run Alpha full-scope sync diagnostic preview on
/trades. - Run Alpha real sync if the preview is clean.
- Review the post-sync analytics summary shown on
/trades. - Verify
/performance. - If the post-sync analytics summary is partial or failed, check analytics status.
- Dry-run analytics backfill.
- Run analytics backfill if the dry run is acceptable.
- Re-check analytics status.
- Verify
/performanceagain.
Portfolio filtering scopes execution and outcome backfills. Use the global source=alpha_sync run for production refreshes unless you are deliberately refreshing one portfolio.
Trigger Daily Snapshot Manually
Snapshots are normally scheduled through Cloud Scheduler with an OIDC token from the compute service account. The scheduled/default run targets the latest complete UTC date (today - 1). To trigger manually, use the admin endpoint with the backend admin secret.
ADMIN_SECRET="$(gcloud secrets versions access latest --secret=vol-admin-secret)"
curl -sS -X POST \
https://vol-backend-244916812493.us-east4.run.app/api/admin/snapshot \
-H "x-admin-secret: ${ADMIN_SECRET}" \
-H 'Content-Type: application/json' \
-d '{}'
Pass target_date=YYYY-MM-DD only when you intentionally want a specific UTC snapshot date. Avoid broad historical backfills through the generic admin route unless the snapshot type is known to be historical/as-of safe.
Do not print the admin secret.
Verify Snapshot-Backed Pages
Start with /data. It is the first-class snapshot observability page and should show:
- Latest status for
gex,iv_surface,vol_metrics,spot, andoiby currency. - Freshness, source timestamp, record count, created time, and suspicious count/freshness warnings.
- Latest status is evaluated against the latest complete UTC snapshot date (
today - 1). Current-day partial rows can still appear in the coverage matrix, but should not make the latest-health cards look stale or suspicious. - Recent coverage matrix by UTC snapshot date so missing or empty rows are visible without running SQL.
- Probability readiness for BTC/ETH/SOL 7d and 30d, including
density_quality.state, area, input points, surface timestamp, and spot source. - A remediation-history panel. In the first pass this is derived from snapshot tables only; durable manual backfill audit history still needs deployment/runbook notes until a persisted remediation event table is added.
After snapshot changes or backfills, check:
/data: snapshot capture health, coverage gaps, and probability readiness./analytics: gamma profile, live GEX magnet/repeller levels with freshness state, thresholds, spot-vol, smile tracking, regime stationarity./probability: implied density, conditional density, touch probabilities, and the experimental portfolio candidate screen. The regime overlay uses storedgex_hourlyhistory, not Model Diagnostics live strike-level GEX./flow: GEX time series and conditional returns.
No-data states usually mean one of:
- Not enough snapshot history.
- Latest snapshot missing required detail rows.
- Spot history missing from
gex_hourly. - Amberdata endpoint returned no payload for the selected currency/period.
Alert Signals And Slack Dispatch
On /analytics, the Alert Signals card can refresh current alert candidates and dispatch to Slack through an admin route.
Safe workflow:
- Click refresh.
- Review alert text and severity.
- Use dry run first where available.
- Dispatch only if alert text is acceptable.
If Slack dispatch fails:
- Check backend Slack webhook configuration.
- Check admin proxy path allowlist includes
/api/alerts/dispatch. - Check backend logs for upstream HTTP errors.
Missing Number Triage
Overview price missing
Check:
/api/reference-rates?asset=btc- If reference rates return forbidden/unavailable, backend should fall back to derivatives metrics underlying price where possible.
Volume missing
Check:
/api/volume-24h?currency=BTC- Backend aggregates Amberdata level-1 quotes by
volumeUSDand contract volume.
IV/RV or probability no-data
Check:
- Snapshot history exists for
iv_surface. gex_hourlyhas recent spot rows.- Requested DTE has enough surface points.
- Probability density quality passed. Degraded density usually means the selected IV surface is sparse, malformed, or still backed by old mixed-timestamp rows.
For IV surface remediation after the timestamp-preservation fix:
- Back up or export Cloud SQL before changing historical rows.
- Deploy the backend migration and snapshotter fix.
- Re-run
iv_surfacesnapshots for affected currencies/dates, for example through the admin snapshot route for recent dates orback-end/backfill.py --types iv_surfacefor a controlled historical range. - Confirm
iv_surface_snapshotshas distinct source timestamps per day and that probability API responses reportdensity_quality.state = ok. - If Amberdata retention prevents rebuilding older dates, treat those historical probability outputs as unavailable/degraded rather than using mixed surfaces.
Trade P&L blank
This may be correct. P&L should remain blank when required valuation inputs are missing or untrustworthy.
For Alpha rows, current known missing inputs include:
traded_ivspot_at_trade- sourced execution-time bid/ask quote for the exact MSTR/COIN option contract
- sourced forward mark for the exact MSTR/COIN option contract and horizon
- authoritative realized P&L or validated lifecycle accounting
Production Incident Rule
If the dashboard shows a number that appears wrong, prioritize data accuracy over availability:
- Identify source endpoint/table.
- Check raw payload or DB row.
- Check normalization and unit conversion.
- Check UI formatting.
- If source or calculation cannot be verified, hide/blank the number rather than guessing.