Skip to main content

Catalog Resource Pipeline

This document describes the resource-level search pipeline after the outcome state simplification.

Core Model

Searchability is not liveness. A resource is search eligible when both are true:
  • it has derived metadata: indexed = true
  • it has an active embedding for the configured resource embedding model
The main pipeline does not decide whether a resource is live. Runtime health is a separate monitor signal and can be used as a post-filter or operator warning, but it is not a prerequisite for enrichment, embedding, or search eligibility.

Base Flow

The stage historically named probe is now a schema loading step. Its only job is to load endpoint schema information from available specs and, when needed, from the payment challenge. It does not perform raw liveness checks and does not write live/error/timeout health state.

Terminal Outcomes

The durable terminal outcome is stored on catalog.indexed_resources:
  • pipeline_outcome
  • pipeline_outcome_reason
  • pipeline_outcome_at
  • pipeline_failure_kind
OutcomeMeaningHow it happensRetry behaviorSearch impact
successThe resource has metadata and, when embeddings are configured, an active embedding.Schema/source metadata was available, enrichment produced indexed metadata, and embedding exists.No automatic success refresh is scheduled by liveness. Future scanner/material changes can mark it due again.Search eligible when origin/search filters also pass.
failedThe latest attempt could not finish, but the resource is recoverable or may recover.Source metadata is unavailable but prior searchable metadata is preserved, enrichment failed, embedding failed, worker crashed, dispatch failed, or another transient/system issue occurred.Backoff retry: 1h, 4h, 12h, 1d, 3d, then 7d. Manual retry can force it sooner.Existing metadata/embedding may still make it search eligible. Failed outcome alone does not hide it.
deadThe pipeline has no information it can use to derive searchable metadata, and this is treated as non-recoverable until rediscovery changes the resource.No schema/source metadata is available and there is no prior searchable metadata to preserve, or discovery removes the resource.Not scheduled for retry. A material rediscovery can reset it to unknown and make it due again.Not search eligible unless rediscovered and successfully enriched later.
unknownLegacy or newly discovered resource that has not finalized under the new outcome model.New rows and material resets start here. Migration preserves unknown where no stronger outcome can be inferred.Eligible when it has missing schema/metadata/embedding work and is due.Search eligibility still depends only on metadata plus embedding.

Lifecycle Fields

The pipeline state shown in admin is separate from terminal outcome:
Queue stateMeaning
queuedA dispatcher selected the resource and created an active lease.
runningA worker owns the lease and is executing schema/enrich/embed.
readyNo lease exists, work is needed, and the retry time is due.
waitingWork is needed, but retry backoff has not elapsed.
failedLatest terminal outcome is failed and retry is due.
healthyNo current pipeline work is needed and the resource is not dead.
Work is needed when any of these are true:
  • both input and output schemas are missing and the resource has not already succeeded with other source metadata
  • latest outcome is failed
  • enrichment hash is stale
  • enriched spec hash differs from the current spec hash
  • embedding text hash is missing
  • active embedding is missing
Stale health probes are not a pipeline work trigger.

Schema Lookup Versus Health

These are intentionally different concepts:
ConcernPurposePipeline role
Schema lookupFind endpoint/spec information that can improve metadata and embedding text.Used when available, but scanner/discovery source metadata can also drive enrichment.
Health monitorMeasure runtime availability, latency, and recent failures.Advisory/post-filter only. Does not block enrichment, embedding, or search eligibility.
This means a resource can have a failed health check and still be enriched, embedded, and search eligible if schema/source metadata exists.

Live Search Meaning

Live semantic search joins active resource embeddings and requires indexed = true. It also requires normal search filters such as non-removed resource, non-dismissed origin, protocol match, score threshold, and usage filtering unless broad search is requested. It does not require:
  • probe_status = 'live'
  • last_live_at
  • consecutive probe failure grace
Autocomplete remains metadata based and requires indexed = true, non-removed resource, and non-dismissed origin. It does not require an active embedding.

Admin UI Reading

The Resources admin page separates three axes:
  • Search: metadata plus active embedding.
  • Pipeline: latest indexing outcome and queue state.
  • Health: runtime availability signal, separate from search and pipeline eligibility.
Use outcome filters to inspect terminal results:
  • success: metadata and active embedding exist.
  • failed: latest attempt failed and can retry.
  • dead: no source information for embedding; not scheduled.
  • unknown: not finalized under the new model yet.
Use health filters only to inspect runtime monitor behavior.