Trade Copilot Evaluation Specification
This document defines the preparatory contract for accident-avoidance warnings, trade-action taxonomy, restructuring comparison, and candidate-screen diagnostics. It covers ENG-5246, ENG-5247, ENG-5248, and ENG-5250.
It is not an implementation and it does not authorize live recommendations. It defines the product and data contract that should be reviewed before code changes.
Purpose
The first valuable copilot behavior is accident avoidance:
Do not let the trader add exposure in a part of the surface where the current
regime and conditional distribution say the trade is poorly compensated.
The later behavior is restructuring evaluation:
Given the current book and regime, compare credible alternatives such as
buying back May, selling July, using June, trading a vertical, selling a 1x2,
or doing nothing.
Ticket Coverage
| Ticket | Product concern | Preparatory output |
|---|---|---|
ENG-5246 | Accident-avoidance warnings for short upside vol restructurings. | Warning triggers, severity, evidence, and wording policy. |
ENG-5247 | Trade intent and action taxonomy. | Bounded action menu and intent-source policy. |
ENG-5248 | V1 restructuring evaluator. | Comparison dimensions and output contract. |
ENG-5250 | Candidate-screen diagnostics and filter provenance. | Filter counts, row reasons, and empty-state explanations. |
Accident-Avoidance Warning Contract
Gordon's example:
- trader is short May upside calls
- spot rallies
- trader buys back one strike and sells more of a higher strike, such as a 1x2-style roll
- in a
spot up / vol upregime, this can increase short vol in exactly the area where vol may reprice higher
The system should eventually detect this type of structure and explain the risk.
Minimum warning output:
accident_warning = {
trade_or_scenario_id,
warning_type,
warning_severity,
surface_region,
exposure_change,
regime_evidence,
conditional_distribution_evidence,
term_structure_evidence,
liquidity_evidence,
trust_state,
source_quality,
message,
caveats
}
Warning types:
| Warning type | Meaning |
|---|---|
short_upside_vol_in_positive_spot_vol_region | Proposed action adds short upside vol where spot-up/vol-up behavior is active. |
short_band_underpriced_by_conditional | Proposed short option lies in a strike band where conditional probability is materially above vanilla. |
front_month_gamma_reintroduced | Proposed action adds front-month gamma/short-vol exposure after the thesis says front-month sensitivity is dangerous. |
term_structure_wrong_bucket | Proposed action sells the tenor most sensitive to the current regime when a later tenor appears less sensitive. |
low_confidence_no_strong_warning | Possible issue exists but trust/source quality is too weak for assertive wording. |
Severity:
| Severity | Use |
|---|---|
info | Evidence is weak, missing, or exploratory. |
caution | Evidence suggests risk, but trust is discounted or source coverage is incomplete. |
danger | Evidence, trust, and exposure change all support a clear accident-avoidance warning. |
blocked | Identity/source state is too poor to evaluate the action. |
Warning wording must be risk-control language, not formal trade advice.
Trade Intent And Action Taxonomy
The current shortcut that "buy means long vol" and "sell means short vol" is useful for a first pass but is not enough for a copilot. The system needs action and intent labels.
Action taxonomy:
| Action | Meaning |
|---|---|
open_long_vol | Buy option exposure intended to own volatility/convexity. |
open_short_vol | Sell option exposure intended to harvest volatility premium. |
close_or_reduce | Buy back short exposure or sell long exposure to reduce risk. |
roll_strike | Move exposure from one strike to another in the same tenor. |
roll_tenor | Move exposure from one expiry/tenor to another. |
vertical_spread | Buy one strike and sell another in same expiry. |
calendar_or_diagonal | Trade same or related strike exposure across expiries. |
ratio_spread | Trade unequal quantities across strikes, including 1x2 structures. |
hedge_overlay | Add exposure intended to hedge existing book risk. |
do_nothing | Explicitly leave exposure unchanged. |
unknown | Source data cannot determine the action. |
Intent source:
| Source | Meaning |
|---|---|
explicit_source | Trade feed or user input supplies strategy intent. |
manual_tag | Operator manually tags the trade. |
inferred_lifecycle | System infers action from position lifecycle and nearby trades. |
direction_fallback | System falls back to buy/sell long-vol/short-vol assumption. |
unknown | No defensible intent is available. |
Every score should report the intent source. If intent is only inferred or direction-based, confidence should be lower.
V1 Restructuring Evaluator
The first evaluator should be a bounded comparator, not a full optimizer. It should compare realistic actions Gordon mentioned:
- buy back May and do nothing else
- buy back May and sell June
- buy back May and sell July
- trade a vertical spread
- trade a calendar or diagonal
- sell a 1x2 or similar ratio structure
- do nothing
Scoring dimensions:
| Dimension | Question |
|---|---|
| Regime fit | Does the action add or remove exposure in the current spot-vol regime? |
| Surface edge | Is the relevant strike/tenor rich or cheap after smile and conditional adjustments? |
| Conditional distribution impact | Does the action sell options where conditional probability exceeds vanilla? |
| Term structure | Does the action move exposure into a more or less sensitive tenor? |
| Liquidity | Is the contract tradable enough to support the comparison? |
| Source quality | Are marks, identity, and surface inputs direct, proxy, estimated, stale, or unavailable? |
| Portfolio exposure | Does the action reduce or increase existing concentrated risk? |
| Confidence | Do trust-engine and source states support the conclusion? |
Minimum evaluator output:
restructuring_candidate = {
scenario_id,
action_type,
legs,
regime_fit_score,
surface_edge_score,
conditional_distribution_score,
term_structure_score,
liquidity_score,
source_quality_score,
portfolio_exposure_score,
confidence,
warning_flags,
rank,
explanation
}
The explanation should be direct. Example:
Selling more May upside improves premium but increases short exposure in a
front-month region where spot-vol sensitivity is positive and conditional
probability exceeds vanilla. Confidence is discounted because execution
liquidity is partial.
Candidate-Screen Diagnostics
The candidate screen ranks live option quotes against expected payoff under the conditional density and applies filters. Gordon and operators need to know why candidates appear or disappear.
Every candidate view should report:
- starting candidate count
- count removed by DTE filter
- count removed by open-interest filter
- count removed by premium filter
- count removed by liquidity/source-quality filter
- count removed by conditional-edge threshold
- final candidate count
- reason for empty results
- source timestamps and trust state
Filter provenance output:
candidate_filter_summary = {
starting_count,
after_dte_count,
after_open_interest_count,
after_premium_count,
after_liquidity_count,
after_conditional_edge_count,
final_count,
removed_counts_by_filter,
empty_state_reason,
filter_settings,
source_quality
}
Candidate rows should be described as construction context, not recommendations, until the warning and recommendation policy is approved.
Decision Points
| Decision | Why it matters | Can proceed now? |
|---|---|---|
| Which action alternatives should V1 compare? | Keeps the evaluator bounded and trader-relevant. | Define menu now; final menu needs Gordon review. |
| What warning severity can appear before P&L data is complete? | Accident avoidance may be useful before outcome attribution is complete. | Define severity tiers now; strong warnings need approval. |
What confidence threshold is required for danger warnings? | Prevents overconfident warnings from weak data. | Parameterize and document now. |
| Who supplies trade intent? | Buy/sell alone cannot identify hedges, closes, rolls, or spreads. | Add intent-source policy now; source integration later. |
| Should candidate rows be described as recommendations? | Affects legal/business interpretation and trader behavior. | Keep as diagnostics until approved. |
Preparatory Acceptance
This specification is complete when:
- accident warnings have named triggers and severity states
- trade actions and intent-source labels are defined
- restructuring comparison dimensions are explicit
- candidate filters have provenance requirements
- warning language is separated from formal recommendations
- low-confidence and blocked states remain first-class outcomes