Catalog Resource Pipeline
This document describes the resource-level search pipeline after the outcome state simplification.Core Model
Searchability is not liveness. A resource is search eligible when both are true:- it has derived metadata:
indexed = true - it has an active embedding for the configured resource embedding model
Base Flow
The stage historically namedprobe is now a schema loading step. Its only job
is to load endpoint schema information from available specs and, when needed,
from the payment challenge. It does not perform raw liveness checks and does
not write live/error/timeout health state.
Terminal Outcomes
The durable terminal outcome is stored oncatalog.indexed_resources:
pipeline_outcomepipeline_outcome_reasonpipeline_outcome_atpipeline_failure_kind
| Outcome | Meaning | How it happens | Retry behavior | Search impact |
|---|---|---|---|---|
success | The resource has metadata and, when embeddings are configured, an active embedding. | Schema/source metadata was available, enrichment produced indexed metadata, and embedding exists. | No automatic success refresh is scheduled by liveness. Future scanner/material changes can mark it due again. | Search eligible when origin/search filters also pass. |
failed | The latest attempt could not finish, but the resource is recoverable or may recover. | Source metadata is unavailable but prior searchable metadata is preserved, enrichment failed, embedding failed, worker crashed, dispatch failed, or another transient/system issue occurred. | Backoff retry: 1h, 4h, 12h, 1d, 3d, then 7d. Manual retry can force it sooner. | Existing metadata/embedding may still make it search eligible. Failed outcome alone does not hide it. |
dead | The pipeline has no information it can use to derive searchable metadata, and this is treated as non-recoverable until rediscovery changes the resource. | No schema/source metadata is available and there is no prior searchable metadata to preserve, or discovery removes the resource. | Not scheduled for retry. A material rediscovery can reset it to unknown and make it due again. | Not search eligible unless rediscovered and successfully enriched later. |
unknown | Legacy or newly discovered resource that has not finalized under the new outcome model. | New rows and material resets start here. Migration preserves unknown where no stronger outcome can be inferred. | Eligible when it has missing schema/metadata/embedding work and is due. | Search eligibility still depends only on metadata plus embedding. |
Lifecycle Fields
The pipeline state shown in admin is separate from terminal outcome:| Queue state | Meaning |
|---|---|
queued | A dispatcher selected the resource and created an active lease. |
running | A worker owns the lease and is executing schema/enrich/embed. |
ready | No lease exists, work is needed, and the retry time is due. |
waiting | Work is needed, but retry backoff has not elapsed. |
failed | Latest terminal outcome is failed and retry is due. |
healthy | No current pipeline work is needed and the resource is not dead. |
- both input and output schemas are missing and the resource has not already succeeded with other source metadata
- latest outcome is
failed - enrichment hash is stale
- enriched spec hash differs from the current spec hash
- embedding text hash is missing
- active embedding is missing
Schema Lookup Versus Health
These are intentionally different concepts:| Concern | Purpose | Pipeline role |
|---|---|---|
| Schema lookup | Find endpoint/spec information that can improve metadata and embedding text. | Used when available, but scanner/discovery source metadata can also drive enrichment. |
| Health monitor | Measure runtime availability, latency, and recent failures. | Advisory/post-filter only. Does not block enrichment, embedding, or search eligibility. |
Live Search Meaning
Live semantic search joins active resource embeddings and requiresindexed = true. It also requires normal search filters such as non-removed resource,
non-dismissed origin, protocol match, score threshold, and usage filtering unless
broad search is requested.
It does not require:
probe_status = 'live'last_live_at- consecutive probe failure grace
indexed = true, non-removed
resource, and non-dismissed origin. It does not require an active embedding.
Admin UI Reading
The Resources admin page separates three axes:- Search: metadata plus active embedding.
- Pipeline: latest indexing outcome and queue state.
- Health: runtime availability signal, separate from search and pipeline eligibility.
success: metadata and active embedding exist.failed: latest attempt failed and can retry.dead: no source information for embedding; not scheduled.unknown: not finalized under the new model yet.