DUNIN7 · LOOMWORKS · RECORD
record.dunin7.com
Status Current
Path phases/phase-34-external-service-polling/phase-34-cr-external-service-polling-v0_2.md

Loomworks — Phase 34: External-service polling for specialists — CR

Version. 0.2 Date. 2026-05-03 CR number. CR-2026-046 Provenance. Claude.ai CR drafting session, revised after CC pre-flight audit (phase-34-cr-audit-v0_1.md). Operator: Marvin Percival. Status. Working draft. Awaiting Operator approval. Strategy document. loomworks-engine-implementation-strategy-v0_2.md Section 4. Sits alongside. Operator Layer v0.6. Supersedes. v0.1 (same date) — addresses 10 blockers, 4 recommended changes, and 8 non-blocking findings from CC's pre-flight audit. Major substantive changes: specialist-always-writes pattern (no engine-side projection of RenderEvent from polling outcomes); (engagement_id, declared_render_type_id) lookup instead of nonexistent app.state.agent_registry; per-poll session with asyncio.gather for concurrency; full per-event-kind payload schema; the v0.1 §3.6 dispatch-switch relocated into a runner-wrapped task wrapper.


1. What this builds

Today's specialist contract assumes the specialist completes synchronously inside the BackgroundAgentRunner's task. For external long-running services (Claude Code Dispatch building a website over hours, 3D printing services running prints over hours-to-days, video generators running renders over minutes-to-hours), this assumption is wrong. The specialist needs to dispatch to the external service, return a handle, and have the engine poll the external service for completion.

This CR adds external-service polling for render specialists in three pieces:

Piece 1 — Specialist contract extension. RenderSpecialist.produce_render(...) may now return RenderEvent | ExternalProductionHandle | None (the existing RenderEvent | None plus the new ExternalProductionHandle for external dispatch). Specialists that complete in-process continue to return RenderEvent directly (or None after handling failure via mark_failed); specialists that dispatch externally return a handle and additionally implement poll_external_work(...).

Piece 2 — ExternalProductionRecord and awaiting_external job state. A new MemoryObject tracks in-flight external work tied to a render_jobs row. The render_jobs.status CHECK constraint extends to admit awaiting_external between dispatched and completed | failed. Three new event kinds (external_production_dispatched, external_production_polled, external_production_resolved) drive the record's lifecycle through the standard projector pattern.

Piece 3 — The polling loop. A background task started by the FastAPI lifespan walks external_production_records_view for rows whose next-poll deadline has elapsed, opens a fresh session per poll, calls each specialist's poll_external_work, and acts on the response. The loop survives process restart: state lives in event-log-canonical form; the loop on startup queries the view and resumes polling.

This is the first of seven engine phases (Phases 34 through 40) the Operator Layer architecture commits the engine to. It unblocks Arc 6 Phase B (application-rendering adapter) and Phase C (3D printing adapter), and it is a prerequisite for Phase 37 (adapter chaining and composition).


2. Strategy decisions consumed

From engine implementation strategy v0.2 Section 4:

S1 — External-service polling for specialists, not async dispatch in general. The engine already has substantial async machinery (BackgroundAgentRunner from Phase 3, render_jobs operational table from Phase 10, HTTP 202 dispatch with caller polling). The actual gap is narrower: the specialist's own dispatch to an external long-running service. This CR fills only that gap; existing synchronous specialists continue to work unchanged.

S2 — State lives in the database, not in process memory. The polling loop's state survives restart. An awaiting_external job row plus its ExternalProductionRecord (event-log-canonical, materialized into external_production_records_view) is sufficient for any process to resume polling. No work is lost across deploys, restarts, or crashes.

S3 — Specialist-declared polling interval. The polling interval is what the specialist returns in the ExternalProductionHandle. The engine does not impose a one-size-fits-all interval. The specialist may also update the interval on subsequent polls (back off as production runs longer).

S4 — Cancellation is deferred. Phase 34 ships polling without a cancel pathway. The Operator can see that an external job is in flight; cancellation is future work.

S5 — Smart scheduling is deferred. Phase 34 polls at the specialist's declared interval, period. Exponential backoff on errors, adaptive intervals based on historical completion times, rate-limiting against external services — all future work.


3. Substrate changes

3.1 Specialist contract extension

src/loomworks/agents/render_specialist.py is extended with the new return shape, the polling method, two new dataclasses, and one new exception.

New types and exception:


# src/loomworks/agents/render_specialist.py — additions

import logging
from dataclasses import dataclass
from datetime import datetime
from typing import Literal
from uuid import UUID

from sqlalchemy.ext.asyncio import AsyncSession

_logger = logging.getLogger(__name__)


@dataclass
class ExternalProductionHandle:
    """Returned by specialists that dispatch to external services
    rather than completing in-process. Indicates the engine should
    poll the specialist for completion via poll_external_work().

    The specialist owns the external_job_id format; the engine treats
    it as opaque. The polling_interval_seconds is how often the engine
    will call poll_external_work; the specialist may update it on
    subsequent polls (e.g., back off as production runs longer).
    """
    external_job_id: str
    polling_interval_seconds: int
    progress_hint: str | None = None


@dataclass
class ExternalProductionFailure:
    """Returned by poll_external_work when the external service
    reports the work has failed. The polling loop transitions the
    render_jobs row to 'failed' and surfaces the failure_detail.

    Distinct from a poll attempt that itself raises (network error
    talking to the external service, etc.), which is logged and
    retried at the next tick rather than terminating the job.
    """
    failure_detail: str
    error_code: str | None = None  # Specialist-defined error code if useful


class UnexpectedSpecialistResultError(ValueError):
    """Raised when produce_render or poll_external_work returns a value
    of an unexpected type. Caught by the dispatch helper and the polling
    loop respectively, marking the job failed with this message."""
    pass

RenderSpecialist.produce_render — extended return type, signature unchanged:


class RenderSpecialist:
    # ... existing fields and methods unchanged ...

    async def produce_render(
        self,
        *,
        job_id: UUID,
        engagement_id: UUID,
        confirmed_shape_event_ref: MemoryRef,
        triggered_by: ActorRef,
        trigger: str,
        db: AsyncSession,
    ) -> RenderEvent | ExternalProductionHandle | None:
        """Execute one render-production job end-to-end.

        Existing semantics preserved. Returns one of:

        - RenderEvent: synchronous completion. The specialist has
          already called _append_render_produced and mark_completed
          before returning. The dispatch wrapper does not write
          additional events for this pathway.
        - None: failure. The specialist has already called mark_failed
          before returning. The dispatch wrapper does not write
          additional events for this pathway.
        - ExternalProductionHandle: NEW in Phase 34. The specialist
          dispatched to an external service and is returning a handle
          for the polling loop to track. The dispatch wrapper records
          an external_production_dispatched event (creating the
          ExternalProductionRecord) and transitions the render_jobs
          row to 'awaiting_external'. The specialist MUST NOT have
          called mark_completed or mark_failed; the row is still
          'dispatched' at the moment of return.

        The job_id, engagement_id, confirmed_shape_event_ref, and
        triggered_by are passed for the specialist's use; the
        declared_render_type_ref is read from self (set in __init__).
        """
        ...

RenderSpecialist.poll_external_work — new method:


    async def poll_external_work(
        self,
        *,
        render_job_id: UUID,
        engagement_id: UUID,
        external_job_id: str,
        db: AsyncSession,
    ) -> RenderEvent | ExternalProductionHandle | ExternalProductionFailure:
        """Called by the engine's polling loop at the interval declared
        in the most recent ExternalProductionHandle. Returns one of:

        - RenderEvent: production completed successfully. The
          specialist MUST have called _append_render_produced before
          returning, in the same fashion as the synchronous pathway.
          The polling loop writes only the external_production_resolved
          event (success outcome) and transitions the render_jobs row
          to 'completed' with mark_completed.
        - ExternalProductionHandle: still in progress. The polling
          loop writes an external_production_polled event with the
          (possibly updated) interval and progress_hint; the
          ExternalProductionRecord materializes the new values.
        - ExternalProductionFailure: external service reported
          failure. The specialist MUST NOT have called mark_failed;
          the polling loop writes external_production_resolved
          (failure outcome) and transitions the render_jobs row to
          'failed' with mark_failed.

        IDEMPOTENCE CONTRACT (load-bearing for restart safety). The
        engine may call this method again for the same external_job_id
        after a process restart, even if a prior call was already in
        flight when the prior process died. Specialists MUST tolerate
        being asked the same question twice. Treat each invocation as
        a query of the external service's current state, not a
        state-changing action. If the specialist needs to take side
        effects (e.g., download a completed artifact), the specialist
        should make those side effects idempotent themselves (e.g.,
        check for an existing local artifact before downloading).

        Specialists that never return ExternalProductionHandle from
        produce_render do not need to override this method. The default
        implementation raises NotImplementedError; the polling loop
        will never call it for synchronous specialists.
        """
        raise NotImplementedError(
            "Specialist does not support external polling. "
            "Override poll_external_work if produce_render returns "
            "ExternalProductionHandle."
        )

3.2 New MemoryObject — ExternalProductionRecord

Tracks in-flight external work. Standard MemoryObject pattern: canonical persistence in memory_events, generic materialized view in current_memory_objects, plus a dedicated operational view (external_production_records_view) for polling-loop queries.


# src/loomworks/engagement/types.py — addition to discriminated union

class ExternalProductionRecord(MemoryObject):
    object_type: Literal["external_production_record"] = "external_production_record"

    # Linkage
    render_job_id: UUID                              # The render_jobs row this tracks
    declared_render_type_ref: MemoryRef              # Used to resolve the specialist
                                                     # via get_render_specialist(...)
    specialist_ref: ActorRef                         # The specialist that dispatched the work
                                                     # (for forensic / audit only — lookup uses drt)

    # External identity
    external_job_id: str                             # Specialist-defined external identifier (opaque to engine)

    # Polling state (mutated through external_production_polled events)
    polling_interval_seconds: int                    # Current interval; specialist may update on each poll
    progress_hint: str | None = None                 # Last reported progress (specialist-defined free text;
                                                     # logged at INFO; intended for eventual operator visibility
                                                     # via the Operator Layer; specialists should write strings
                                                     # safe for non-technical operator readers)

    # Lifecycle timestamps
    dispatched_at: datetime                          # From the external_production_dispatched event
    last_polled_at: datetime | None = None           # From the latest external_production_polled or _resolved event
    completed_at: datetime | None = None             # From the external_production_resolved event

    # Resolution outcome (populated when external_production_resolved arrives)
    render_event_object_id: UUID | None = None       # Set on success-resolved; the produced RenderEvent's
                                                     # object_id (the specialist appended the RenderEvent
                                                     # itself; this records the link for downstream queries)
    failure_detail: str | None = None                # Set on failure-resolved
    error_code: str | None = None                    # Set on failure-resolved

Why no state field. The job-row is the source of truth for lifecycle (render_jobs.status). The ExternalProductionRecord carries polling-specific fields and the success/failure resolution payload. Lifecycle state is derivable from a join: (completed_at IS NULL) → polling, (completed_at IS NOT NULL AND failure_detail IS NULL) → completed, otherwise failed. Removing state removes a possible-divergence surface between the record and the job row.

Per-event-kind payload schema (the load-bearing detail for projector implementation):

| Field | Written to event_kind | Source | Read at view | |---|---|---|---| | render_job_id | external_production_dispatched payload | passed by dispatch wrapper | view column (immutable) | | declared_render_type_ref | external_production_dispatched payload | passed by dispatch wrapper | view column (immutable) | | specialist_ref | external_production_dispatched payload | passed by dispatch wrapper | view column (immutable) | | external_job_id | external_production_dispatched payload | from the handle | view column (immutable) | | polling_interval_seconds | external_production_dispatched payload (initial); external_production_polled payload (updates) | from each handle | view column (latest value) | | progress_hint | external_production_dispatched payload (initial); external_production_polled payload (updates) | from each handle | view column (latest value) | | dispatched_at | external_production_dispatched event timestamp | event row's created_at | view column (immutable) | | last_polled_at | derived: latest event timestamp from the union of external_production_polled and external_production_resolved events for this object | projector recomputes on each event | view column (latest) | | completed_at | external_production_resolved event timestamp | event row's created_at | view column (set once) | | render_event_object_id | external_production_resolved payload (success outcome only) | passed by polling loop | view column (set once) | | failure_detail | external_production_resolved payload (failure outcome only) | from ExternalProductionFailure.failure_detail | view column (set once) | | error_code | external_production_resolved payload (failure outcome only) | from ExternalProductionFailure.error_code | view column (set once) |

Three event kinds:

3.3 render_jobs status extension

The render_jobs.status column has a CHECK constraint (ck_render_jobs_status, defined in migration 0022) enumerating ('queued', 'dispatched', 'completed', 'failed'). Migration 0052 (§4) drops and recreates this constraint with 'awaiting_external' added.

The full set after Phase 34: ('queued', 'dispatched', 'awaiting_external', 'completed', 'failed').

The status transitions:


queued → dispatched
dispatched → completed                   (specialist returned RenderEvent — synchronous success)
dispatched → failed                      (specialist returned None — synchronous failure)
dispatched → awaiting_external           (specialist returned ExternalProductionHandle — new in Phase 34)
awaiting_external → awaiting_external    (poll returned ExternalProductionHandle — loop continues)
awaiting_external → completed            (poll returned RenderEvent — external success)
awaiting_external → failed               (poll returned ExternalProductionFailure — external failure)

3.4 The polling loop

A background asyncio task started by the FastAPI lifespan. Walks external_production_records_view for rows whose last_polled_at + polling_interval_seconds 1s has elapsed (or dispatched_at + polling_interval_seconds 1s for never-polled rows) and that are not yet resolved (completed_at IS NULL). Opens a fresh session per poll. Polls run concurrently via asyncio.gather.


# src/loomworks/engagement/external_polling.py — new module

import asyncio
import logging
from datetime import datetime, timezone
from typing import TYPE_CHECKING

from sqlalchemy.ext.asyncio import async_sessionmaker, AsyncEngine, AsyncSession

from loomworks.agents.render_dispatch import get_render_specialist
from loomworks.agents.render_specialist import (
    ExternalProductionFailure,
    ExternalProductionHandle,
    UnexpectedSpecialistResultError,
)
from loomworks.engagement.types import RenderEvent

if TYPE_CHECKING:
    from loomworks.engagement.types import ExternalProductionRecord

_logger = logging.getLogger(__name__)

LOOP_TICK_SECONDS = 5  # How often the loop wakes to check for due polls.
                       # Fast enough that a 30-second polling interval is approximately
                       # honored; slow enough that the loop is not a hot tick.


class ExternalPollingLoop:
    """Background task that polls in-flight external work.

    On startup: starts the loop. The loop's first tick reads
    external_production_records_view for unresolved records and
    polls any that are due.
    On each tick: walks unresolved records whose deadline has elapsed,
    polls each in its own session via asyncio.gather.
    On shutdown: signals the loop to stop, waits up to timeout for
    in-flight polls to complete, then cancels.
    """

    def __init__(
        self,
        *,
        db_engine: AsyncEngine,
        session_factory: async_sessionmaker,
    ):
        self._db_engine = db_engine
        self._session_factory = session_factory
        self._task: asyncio.Task | None = None
        self._stopped = asyncio.Event()

    async def start(self) -> None:
        """Called from the FastAPI lifespan during startup."""
        _logger.info(
            "external_polling_loop.started tick_seconds=%d",
            LOOP_TICK_SECONDS,
        )
        self._task = asyncio.create_task(self._run())

    async def stop(self, *, timeout: float = 30.0) -> None:
        """Called from the FastAPI lifespan during shutdown."""
        self._stopped.set()
        if self._task:
            try:
                await asyncio.wait_for(self._task, timeout=timeout)
            except asyncio.TimeoutError:
                self._task.cancel()
        _logger.info("external_polling_loop.stopped")

    async def _run(self) -> None:
        """The loop. Wakes every LOOP_TICK_SECONDS, processes due polls."""
        while not self._stopped.is_set():
            try:
                await self._tick()
            except Exception as exc:
                # A tick failure should not stop the loop. Log and continue.
                _logger.exception(
                    "external_polling_loop.tick_failed",
                    exc_info=exc,
                )
            try:
                await asyncio.wait_for(
                    self._stopped.wait(), timeout=LOOP_TICK_SECONDS
                )
            except asyncio.TimeoutError:
                pass  # Normal — wake up and tick

    async def _tick(self) -> None:
        """One pass through unresolved external_production_records.

        Reads due records using a brief session, then polls each in
        its own session via asyncio.gather. Per-poll session boundaries
        ensure that one slow poll cannot block another, and one poll's
        failure cannot roll back another poll's writes.
        """
        async with self._session_factory() as db:
            due_records = await self._get_due_records(db)

        if not due_records:
            return

        await asyncio.gather(
            *(self._poll_one(record) for record in due_records),
            return_exceptions=True,  # A raised exception in one poll
                                     # must not propagate and abort the others
        )

    async def _get_due_records(
        self, db: AsyncSession
    ) -> list["ExternalProductionRecord"]:
        """Returns ExternalProductionRecord rows in
        external_production_records_view that are unresolved
        (completed_at IS NULL) and whose next-poll deadline has
        elapsed:
            COALESCE(last_polled_at, dispatched_at)
              + polling_interval_seconds * INTERVAL '1 second'
              <= NOW()
        """
        ...  # Implementation: SELECT against external_production_records_view

    async def _poll_one(self, record: "ExternalProductionRecord") -> None:
        """Polls one record. Opens its own session. Each session
        commits independently; one poll's failure does not affect
        the others.
        """
        async with self._session_factory() as db:
            async with db.begin():
                await self._poll_one_inner(record, db)

    async def _poll_one_inner(
        self, record: "ExternalProductionRecord", db: AsyncSession
    ) -> None:
        """Resolves the specialist by (engagement_id, declared_render_type_id),
        calls poll_external_work, acts on the response.
        """
        specialist = get_render_specialist(
            engagement_id=record.engagement_id,
            declared_render_type_id=record.declared_render_type_ref.id,
        )
        if specialist is None:
            await self._mark_failed(
                record,
                failure_detail=(
                    f"Specialist for declared_render_type "
                    f"{record.declared_render_type_ref.id} no longer registered"
                ),
                error_code="specialist_unregistered",
                db=db,
            )
            return

        try:
            result = await specialist.poll_external_work(
                render_job_id=record.render_job_id,
                engagement_id=record.engagement_id,
                external_job_id=record.external_job_id,
                db=db,
            )
        except Exception as exc:
            # Polling itself failed. Different from ExternalProductionFailure,
            # which is the external service reporting failure. Log and
            # leave the record for the next tick. last_polled_at is updated
            # to prevent immediate re-polling.
            _logger.warning(
                "external_polling_loop.poll_raised "
                "render_job_id=%s declared_render_type_id=%s exc=%s",
                record.render_job_id,
                record.declared_render_type_ref.id,
                exc,
            )
            await self._update_polled_timestamp_only(record, db=db)
            return

        if isinstance(result, RenderEvent):
            await self._mark_completed(record, render_event=result, db=db)
        elif isinstance(result, ExternalProductionHandle):
            await self._mark_polling_continues(
                record,
                polling_interval_seconds=result.polling_interval_seconds,
                progress_hint=result.progress_hint,
                db=db,
            )
        elif isinstance(result, ExternalProductionFailure):
            await self._mark_failed(
                record,
                failure_detail=result.failure_detail,
                error_code=result.error_code,
                db=db,
            )
        else:
            # Specialist returned an unexpected type. Defensive fail.
            await self._mark_failed(
                record,
                failure_detail=(
                    f"Specialist returned unexpected type "
                    f"{type(result).__name__}; expected RenderEvent, "
                    f"ExternalProductionHandle, or ExternalProductionFailure"
                ),
                error_code="poll_returned_unexpected_type",
                db=db,
            )

    async def _mark_completed(
        self,
        record: "ExternalProductionRecord",
        *,
        render_event: RenderEvent,
        db: AsyncSession,
    ) -> None:
        """The specialist's poll_external_work has appended the
        RenderEvent itself (per the specialist contract — see §3.1
        poll_external_work docstring). This helper writes only the
        external_production_resolved event (success outcome) and
        transitions the render_jobs row to 'completed' via
        mark_completed.
        """
        # 1. Append external_production_resolved event with success payload
        #    (render_event_object_id = render_event.object_id)
        # 2. Call mark_completed(job_id=record.render_job_id,
        #                        render_event_object_id=render_event.object_id,
        #                        db=db)
        ...

    async def _mark_failed(
        self,
        record: "ExternalProductionRecord",
        *,
        failure_detail: str,
        error_code: str | None,
        db: AsyncSession,
    ) -> None:
        """Writes external_production_resolved event with failure
        payload and transitions the render_jobs row to 'failed' via
        mark_failed.
        """
        # 1. Append external_production_resolved event with failure payload
        # 2. Call mark_failed(job_id=record.render_job_id,
        #                     error=failure_detail, db=db)
        ...

    async def _mark_polling_continues(
        self,
        record: "ExternalProductionRecord",
        *,
        polling_interval_seconds: int,
        progress_hint: str | None,
        db: AsyncSession,
    ) -> None:
        """Writes external_production_polled event. Projector
        materializes the new polling_interval_seconds and progress_hint
        on the view; last_polled_at is recomputed from the new event's
        timestamp.
        """
        # Append external_production_polled event with payload:
        #   { "polling_interval_seconds": polling_interval_seconds,
        #     "progress_hint": progress_hint }
        ...

    async def _update_polled_timestamp_only(
        self,
        record: "ExternalProductionRecord",
        *,
        db: AsyncSession,
    ) -> None:
        """For poll attempts that raised: write an
        external_production_polled event with no field changes (same
        polling_interval_seconds, same progress_hint). This advances
        last_polled_at via the projector's recomputation, preventing
        immediate re-polling on the next tick.
        """
        await self._mark_polling_continues(
            record,
            polling_interval_seconds=record.polling_interval_seconds,
            progress_hint=record.progress_hint,
            db=db,
        )

3.5 Lifespan integration

src/loomworks/api/app.py's _lifespan is extended to start and stop the polling loop alongside the existing BackgroundAgentRunner. The polling loop receives the engine and session factory directly; it does not require a separate "agent registry" attached to app.state (the registry it needs is the existing module-global _SPECIALIST_REGISTRY in render_dispatch.py, accessed via get_render_specialist).


@asynccontextmanager
async def _lifespan(app: FastAPI):
    settings = app.state.settings
    async with _create_db_engine(settings) as engine:
        app.state.db_engine = engine
        app.state.session_factory = _make_session_factory(engine)
        async with AsyncSession(engine) as session:
            await ensure_administrative_engagement(session)
            await ensure_seed_requirements(session)
            await session.commit()
        app.state.agent_runner = BackgroundAgentRunner(
            session_factory=app.state.session_factory,
        )

        # NEW in Phase 34
        from loomworks.engagement.external_polling import ExternalPollingLoop
        app.state.external_polling_loop = ExternalPollingLoop(
            db_engine=engine,
            session_factory=app.state.session_factory,
        )
        await app.state.external_polling_loop.start()

        try:
            yield
        finally:
            # NEW in Phase 34 — stop polling loop before agent runner shutdown
            await app.state.external_polling_loop.stop(timeout=30)
            await app.state.agent_runner.shutdown(timeout=30)

Pre-flight note. The exact app.state field names and the BackgroundAgentRunner construction may differ from the sketch; CC inspects src/loomworks/api/app.py at Step 0 and confirms (1) the session factory's name on app.state, (2) whether BackgroundAgentRunner already takes a session_factory kwarg, (3) the precise order in which existing startup steps run. The functional contract is what matters: the polling loop has access to the session factory and starts after app.state carries everything it needs.

3.6 Render dispatch — handling the new return shape

The current dispatch path is fire-and-forget through runner.dispatch(...), which returns None. The specialist commits its own success path (synchronous: _append_render_produced then mark_completed) and its own failure path (synchronous: mark_failed then return None) from inside the dispatched task. The dispatch helper (_enqueue_and_dispatch in render_dispatch.py) does not inspect the specialist's return value.

Phase 34 introduces a third return type — ExternalProductionHandle — that requires the engine to write an event (external_production_dispatched) and transition the job-row status to awaiting_external before the specialist's task completes. This work cannot live in _enqueue_and_dispatch (the dispatch helper has already returned by the time the specialist runs) and cannot live in the specialist itself (the specialist returning ExternalProductionHandle should not have to know about the engine's projector machinery).

The structurally correct location is a specialist task wrapper that runs inside runner.dispatch's task closure, calls produce_render, inspects the return value, and either lets the existing pathways stand (synchronous success or synchronous failure) or writes the dispatch event and transitions the row (external dispatch). The wrapper is what gets passed to runner.dispatch(agent_fn=...); the specialist's produce_render is called by the wrapper.


# src/loomworks/agents/render_dispatch.py — addition

async def _run_specialist_with_external_dispatch_handling(
    *,
    specialist: "RenderSpecialistLike",
    job_id: UUID,
    engagement_id: UUID,
    confirmed_shape_event_ref: MemoryRef,
    declared_render_type_ref: MemoryRef,
    triggered_by: ActorRef,
    trigger: str,
    db: AsyncSession,
) -> None:
    """Wrapper that runs inside the BackgroundAgentRunner's task
    closure. Calls specialist.produce_render and acts on the return
    value:

    - RenderEvent: synchronous success. The specialist already wrote
      _append_render_produced and called mark_completed. The wrapper
      does nothing further.
    - None: synchronous failure. The specialist already called
      mark_failed. The wrapper does nothing further.
    - ExternalProductionHandle: the specialist dispatched externally.
      The wrapper writes the external_production_dispatched event
      (creating the ExternalProductionRecord MemoryObject) and
      transitions the render_jobs row to 'awaiting_external'. The
      polling loop will pick up the record on its next tick.
    - Anything else: defensive failure. The wrapper calls mark_failed
      with an UnexpectedSpecialistResultError message.
    """
    try:
        result = await specialist.produce_render(
            job_id=job_id,
            engagement_id=engagement_id,
            confirmed_shape_event_ref=confirmed_shape_event_ref,
            triggered_by=triggered_by,
            trigger=trigger,
            db=db,
        )
    except Exception as exc:
        # The specialist itself raised. The specialist's existing
        # error-handling does this via try/except internally and returns
        # None; this catches anything that escapes despite that.
        await mark_failed(
            job_id=job_id,
            error=f"specialist raised: {exc}",
            db=db,
        )
        return

    if isinstance(result, RenderEvent):
        # Synchronous success — specialist already finalized. Nothing to do.
        return
    elif result is None:
        # Synchronous failure — specialist already finalized. Nothing to do.
        return
    elif isinstance(result, ExternalProductionHandle):
        # External dispatch — write the dispatched event and transition
        # the job row.
        await _record_external_production_dispatched(
            job_id=job_id,
            engagement_id=engagement_id,
            declared_render_type_ref=declared_render_type_ref,
            specialist_actor=specialist.agent_actor,
            handle=result,
            db=db,
        )
        await mark_awaiting_external(job_id=job_id, db=db)
    else:
        await mark_failed(
            job_id=job_id,
            error=(
                f"Specialist produce_render returned unexpected type "
                f"{type(result).__name__}; expected RenderEvent, "
                f"ExternalProductionHandle, or None"
            ),
            db=db,
        )

_enqueue_and_dispatch is amended to dispatch the wrapper rather than produce_render directly:


# Before:
await self.runner.dispatch(
    agent_fn=specialist.produce_render,
    job_id=job_id,
    engagement_id=self.engagement_id,
    confirmed_shape_event_ref=confirmed_shape_event_ref,
    triggered_by=triggered_by,
    trigger=trigger,
    db=db,
)

# After:
await self.runner.dispatch(
    agent_fn=_run_specialist_with_external_dispatch_handling,
    specialist=specialist,
    job_id=job_id,
    engagement_id=self.engagement_id,
    confirmed_shape_event_ref=confirmed_shape_event_ref,
    declared_render_type_ref=declared_render_type_ref,
    triggered_by=triggered_by,
    trigger=trigger,
    db=db,
)

Two new helpers added to render_dispatch.py (sketched; final form during Step 4):


async def _record_external_production_dispatched(
    *,
    job_id: UUID,
    engagement_id: UUID,
    declared_render_type_ref: MemoryRef,
    specialist_actor: ActorRef,
    handle: ExternalProductionHandle,
    db: AsyncSession,
) -> "ExternalProductionRecord":
    """Appends the external_production_dispatched event, creating
    the ExternalProductionRecord MemoryObject. Returns the persisted
    record."""
    ...


async def mark_awaiting_external(*, job_id: UUID, db: AsyncSession) -> None:
    """Transitions render_jobs.status from 'dispatched' to
    'awaiting_external'. Mirrors mark_dispatched / mark_completed /
    mark_failed in convention."""
    ...

mark_awaiting_external lives alongside the existing mark_* helpers (likely in src/loomworks/agents/render_dispatch.py or src/loomworks/engagement/render.py — pre-flight confirms). It is the only new lifecycle helper.

3.7 A test specialist for exercising the polling pathway

src/loomworks/agents/test_external_specialist.py — a deterministic test specialist that can be configured to return ExternalProductionHandle from produce_render and a configurable sequence of poll outcomes from poll_external_work. Used by all the polling-loop tests. Mirrors the StubRenderSpecialist pattern from Phase 10.


class StubExternalRenderSpecialist(RenderSpecialist):
    """A render specialist that simulates external dispatch + polling.

    Configured at instantiation with:
    - external_job_id: returned in the initial handle from produce_render
    - initial_polling_interval_seconds: returned in the initial handle
    - poll_outcomes: list of outcomes to return from successive
      poll_external_work calls; each outcome is one of:
          RenderEvent (the specialist also calls _append_render_produced
              before returning, mirroring the synchronous specialist)
          ExternalProductionHandle
          ExternalProductionFailure
          Exception (raised — exercises the poll_raised pathway)

    Each call to poll_external_work consumes one entry from poll_outcomes.
    Tests exercise specific scenarios by constructing the desired sequence.
    """
    ...

4. Migrations

The next available migration number is 0051 (verified at audit; highest applied is 0050).

4.1 Migration 0051 — external_production_records_view and event-kind support

The ExternalProductionRecord MemoryObject is canonical in memory_events; materialization into current_memory_objects happens through the existing projector infrastructure. This migration adds the dedicated operational view and indexes for polling-loop queries.

What the migration does:

  1. Creates external_production_records_view materialized from current_memory_objects filtered by object_type = 'external_production_record', with the polling-loop fields pre-flattened from the JSONB payload into typed columns. Pattern parallels Phase 9's shape_events_view and Phase 10's render_events_view.
  2. Indexes:
  1. The projector dispatch (in src/loomworks/memory/projector.py) is extended with apply_event_to_external_production_records_view for the three new event kinds (per the per-event-kind payload schema in §3.2). This is code work, not migration work, but it is wired in conjunction with this migration.

4.2 Migration 0052 — render_jobs.status CHECK constraint extension

DDL change confirmed at audit. The constraint ck_render_jobs_status exists (migrations/versions/0022_phase_10_render_events_view_and_render_jobs.py:170–173) enforcing ('queued', 'dispatched', 'completed', 'failed').

The migration drops ck_render_jobs_status and recreates it with the extended set:


# 0052_phase_34_render_jobs_status_awaiting_external.py

def upgrade():
    op.drop_constraint("ck_render_jobs_status", "render_jobs", type_="check")
    op.create_check_constraint(
        "ck_render_jobs_status",
        "render_jobs",
        "status IN ('queued', 'dispatched', 'awaiting_external', 'completed', 'failed')",
    )

def downgrade():
    op.drop_constraint("ck_render_jobs_status", "render_jobs", type_="check")
    op.create_check_constraint(
        "ck_render_jobs_status",
        "render_jobs",
        "status IN ('queued', 'dispatched', 'completed', 'failed')",
    )

4.3 No third migration

Audit confirmed there is no CHECK constraint on memory_events.event_kind; the column is String(64) free text (src/loomworks/memory/events.py:62–77). The three new event kinds (external_production_dispatched, external_production_polled, external_production_resolved) are added as code-level constants only; no schema change is required.

The v0.1 migration §4.3 is removed in v0.2.


5. Substrate tests

New test file tests/test_external_polling.py. Eleven tests covering the contract, the lifecycle, and the failure modes.

5.1 Contract tests

  1. test_specialist_returning_handle_creates_record — A specialist's produce_render returns ExternalProductionHandle. After dispatch completes, verify (a) the dispatch wrapper has called _record_external_production_dispatched, (b) an ExternalProductionRecord exists with the correct fields, (c) the render_jobs row has transitioned to awaiting_external via mark_awaiting_external.
  1. test_specialist_returning_render_event_does_not_create_record — A specialist returns RenderEvent (the existing synchronous behavior — specialist itself called _append_render_produced and mark_completed). After dispatch completes, verify no ExternalProductionRecord is created and the render_jobs row is completed. Existing synchronous specialists are unaffected.
  1. test_specialist_returning_none_does_not_create_record — A specialist returns None (synchronous failure pathway — specialist itself called mark_failed). After dispatch completes, verify no ExternalProductionRecord is created and the render_jobs row is failed.

5.2 Polling-loop happy-path tests

  1. test_polling_loop_calls_poll_at_interval — Configure a stub specialist with a 1-second polling interval and a sequence of two ExternalProductionHandle returns followed by a RenderEvent. Drive the loop with LOOP_TICK_SECONDS = 0.5. Verify the polling loop calls poll_external_work three times and the third call's result drives the render_jobs transition to completed. Verify two external_production_polled events were written and one external_production_resolved event was written.
  1. test_specialist_can_update_polling_interval — Configure a stub whose first poll returns ExternalProductionHandle(polling_interval_seconds=2) and second returns ExternalProductionHandle(polling_interval_seconds=10). Drive the loop with a fast tick. Verify the loop honors the updated interval (next due-records query after the second poll respects 10 seconds, not 2).
  1. test_progress_hint_propagates_through_record — Configure a stub whose successive polls update progress_hint. Verify each external_production_polled event payload carries the new hint and the external_production_records_view row materializes the latest value.

5.3 Failure tests

  1. test_specialist_failure_marks_job_failed — Configure a stub whose poll returns ExternalProductionFailure. Verify the polling loop writes external_production_resolved with the failure payload, calls mark_failed, and the render_jobs row transitions to failed. Verify failure_detail and error_code materialize on the external_production_records_view row.
  1. test_specialist_unregistered_marks_job_failed — Create an ExternalProductionRecord whose declared_render_type_ref does not resolve via get_render_specialist. Run a tick. Verify the loop marks the job failed with error_code="specialist_unregistered".
  1. test_poll_raising_does_not_terminate_loop — Configure a stub whose first poll raises an exception. Verify the loop logs a warning, writes an external_production_polled event (advancing last_polled_at via the projector), does not transition the job to failed, and the next tick polls again (where the second outcome is, e.g., a successful RenderEvent). The job ultimately completes successfully.

5.4 Restart safety

  1. test_loop_resumes_in_flight_polls_after_restart — Create an ExternalProductionRecord in the database (insert via direct event append). Simulate restart by instantiating a fresh ExternalPollingLoop against the same database (no carry-over state). Verify the new loop's first tick reads the record, polls the (re-registered) specialist, and processes the result correctly. Exercises the canonical recovery pathway.

5.5 Concurrency

  1. test_concurrent_polls_for_different_specialists — Create three ExternalProductionRecord rows for three different specialists. Configure one specialist's poll_external_work to await an asyncio.Event before returning; configure the other two to return immediately. Drive a tick. Verify (a) all three polls start concurrently (the slow specialist is awaited but does not block the others), (b) the two fast polls' external_production_polled events are committed independently before the slow specialist's event is set, (c) once the slow specialist's event is set, its poll completes and writes its event independently. Each poll has its own session; one slow poll cannot block another.

6. Order of operations

Auto-mode posture. Step 0 auto. Steps 1–5 auto, Checkpoint A halts for substrate verification. Steps 6–8 auto, Checkpoint B halts for the full-pathway smoke test and tagging.

Step 0 — Pre-flight verification and CR archive.

The audit (phase-34-cr-audit-v0_1.md) resolved most pre-flight questions definitively. CC's pre-flight at execution time is verification, not first-time inspection:

  1. Archive this CR to docs/phase-crs/phase-34-cr-external-service-polling-v0_2.md.
  2. Verify src/loomworks/agents/render_specialist.py line numbers near 95, 147–222 still match the audit's references (line numbers can drift with intermediate work).
  3. Verify src/loomworks/agents/render_dispatch.py lines near 192–193 (registry globals) and 436–469 (_enqueue_and_dispatch) still match.
  4. Verify src/loomworks/api/app.py lifespan structure (audit reports lines 55–216; verify the session_factory exposure and BackgroundAgentRunner construction match the §3.5 sketch, and adjust the sketch if not).
  5. Confirm migrations/versions/ highest applied is still 0050 (next available 0051). If new migrations have landed since the audit, renumber 0051/0052 accordingly and update the CR's migration filenames in implementation notes.
  6. Re-baseline the test count: uv run pytest --collect-only and record current count. Audit reported 1,284. Adjust §1 and §6's expected baselines if drift.

Commit: Phase 34 step 0: CR archive and pre-flight verification.

Step 1 — Specialist contract extension and test specialist.

  1. Extend src/loomworks/agents/render_specialist.py with ExternalProductionHandle, ExternalProductionFailure, UnexpectedSpecialistResultError, the extended produce_render return type (signature unchanged: job_id, engagement_id, confirmed_shape_event_ref, triggered_by, trigger, db), and the poll_external_work method with the default NotImplementedError body. Include _logger import.
  2. Create src/loomworks/agents/test_external_specialist.py with StubExternalRenderSpecialist per §3.7.

Verification: imports succeed, types resolve correctly. The existing StubRenderSpecialist from Phase 10 still works unchanged (it returns RenderEvent | None, which is a subtype of RenderEvent | ExternalProductionHandle | None).

Commit: Phase 34 step 1: specialist contract extension and test specialist.

Step 2 — ExternalProductionRecord MemoryObject.

  1. Add ExternalProductionRecord to src/loomworks/engagement/types.py's discriminated union per §3.2 (no state field; render_event_object_id for success-resolved linkage).
  2. Add the three event kinds (external_production_dispatched, external_production_polled, external_production_resolved) wherever event-kind constants live (audit reports event_kind is free-text String(64); the constants are documentation/type-narrowing only).
  3. Register external_production_record as a recognized object_type in the deserializer (src/loomworks/memory/registry.py, per Phase 9/10 patterns).

Verification: imports succeed; round-tripping an ExternalProductionRecord through serialization and deserialization works via existing test patterns.

Commit: Phase 34 step 2: ExternalProductionRecord MemoryObject.

Step 3 — Migrations 0051 and 0052, and projector extension.

  1. Write Alembic migration 0051 per §4.1: creates external_production_records_view (materialized from current_memory_objects with the polling-loop fields flattened) and the indexes named in §4.1.
  2. Write Alembic migration 0052 per §4.2: drops ck_render_jobs_status, recreates with awaiting_external added.
  3. Extend the projector in src/loomworks/memory/projector.py with apply_event_to_external_production_records_view handling the three new event kinds per the payload schema in §3.2.
  4. Run alembic upgrade head. Verify the schema has the new view, indexes, and updated CHECK constraint. Verify alembic downgrade -1 then alembic upgrade head round-trips cleanly for both 0051 and 0052.

Verification: schema inspection confirms view, indexes, and constraint. A manual event-append test (insert an external_production_dispatched event, query the view, confirm the row appears) confirms the projector wire-up.

Commit: Phase 34 step 3: migrations 0051 and 0052, projector extension.

Step 4 — Dispatch wrapper, lifecycle helpers, and dispatch-helper amendment.

  1. Add _run_specialist_with_external_dispatch_handling to src/loomworks/agents/render_dispatch.py per §3.6.
  2. Add _record_external_production_dispatched helper that writes the external_production_dispatched event with the correct payload per §3.2.
  3. Add mark_awaiting_external helper alongside the existing mark_* helpers (location TBD at pre-flight; same module as mark_completed/mark_failed).
  4. Amend _enqueue_and_dispatch in render_dispatch.py to dispatch _run_specialist_with_external_dispatch_handling rather than specialist.produce_render directly, passing specialist and declared_render_type_ref as kwargs.

Verification: uv run pytest -v against the existing test suite — no regressions. Existing synchronous specialists continue to dispatch and complete unchanged (the wrapper is a pass-through for RenderEvent and None returns).

Commit: Phase 34 step 4: dispatch wrapper and lifecycle helpers.

Step 5 — Polling loop module and lifespan integration.

  1. Create src/loomworks/engagement/external_polling.py with ExternalPollingLoop per §3.4. Per-poll session via asyncio.gather. _get_due_records queries external_production_records_view. Structured info-level logs at external_polling_loop.started and external_polling_loop.stopped per B20.
  2. Implement the helpers (_get_due_records, _poll_one, _poll_one_inner, _mark_completed, _mark_failed, _mark_polling_continues, _update_polled_timestamp_only).
  3. Extend src/loomworks/api/app.py's _lifespan per §3.5. The polling loop starts after the agent runner; stops before the agent runner shutdown.

Verification: uv run pytest -v — no regressions. Application starts and stops cleanly; the external_polling_loop.started / external_polling_loop.stopped log lines appear at startup and shutdown.

Commit: Phase 34 step 5: polling loop module and lifespan integration.

Checkpoint A — Substrate complete.

Operator verifies:

Step 6 — Polling-loop tests.

Write the 11 tests in tests/test_external_polling.py per §5. Use StubExternalRenderSpecialist for all polling pathway tests. Use monkeypatch to set LOOP_TICK_SECONDS to a small value (e.g., 0.1) for the duration of each test. Instantiate ExternalPollingLoop directly (not through the app lifespan) for fine-grained control.

Verification: uv run pytest -v tests/test_external_polling.py green. All 11 tests pass.

Commit: Phase 34 step 6: external polling tests.

Step 7 — Full suite verification.

Run the full substrate suite. All existing tests plus the 11 new ones pass.

Verification: uv run pytest -v green. Test count: ~1,295+ (audit baseline 1,284 + 11 new), 2 skipped (carried forward).

Commit: Phase 34 step 7: full suite green.

Step 8 — Implementation notes.

Write implementation notes to docs/phase-impl-notes/phase-34-implementation-notes-v0_1.md recording:

Commit: Phase 34 step 8: implementation notes.

Checkpoint B — Final.

Operator verifies via a manual smoke test:

  1. Register a StubExternalRenderSpecialist configured to return ExternalProductionHandle with a 5-second polling interval and three poll outcomes: ExternalProductionHandle, ExternalProductionHandle, then a RenderEvent (where the third call also appends the render_produced event before returning).
  2. Trigger a render via the existing render request endpoint.
  3. Observe: the render_jobs row transitions to awaiting_external immediately. The ExternalProductionRecord appears in external_production_records_view with completed_at = NULL. After ~10 seconds, the record's progress_hint updates and last_polled_at advances. After ~15 seconds, the render_jobs row transitions to completed and the RenderEvent appears in render_events_view. The external_production_records_view row now has completed_at set and render_event_object_id populated.
  4. On a second test run, restart the app process mid-flight (between polls). Verify the in-flight ExternalProductionRecord resumes polling under the new process and reaches completed.

On acceptance: tag the substrate repo as phase-34-external-service-polling. Bump the manifest.


7. Acceptance gate

This CR is accepted when:

  1. Substrate: all tests pass (~1,295+, 2 skips). Baseline number reconciled at Step 0 verification.
  2. The RenderSpecialist contract supports RenderEvent | ExternalProductionHandle | None from produce_render.
  3. Specialists returning ExternalProductionHandle cause the dispatch wrapper to write external_production_dispatched (creating the ExternalProductionRecord) and transition the render_jobs row to awaiting_external.
  4. Existing specialists returning RenderEvent or None continue to work unchanged (the wrapper is a pass-through for those types).
  5. The polling loop starts at app startup with the external_polling_loop.started log line and stops cleanly at shutdown with external_polling_loop.stopped.
  6. The polling loop opens a fresh session per poll and runs polls concurrently via asyncio.gather.
  7. The polling loop calls each specialist's poll_external_work at the specialist-declared interval, with the specialist able to update the interval on subsequent polls.
  8. Successful poll completion (specialist returned RenderEvent and itself appended render_produced) writes external_production_resolved (success outcome) and transitions the job to completed.
  9. Failure responses (ExternalProductionFailure) write external_production_resolved (failure outcome) and transition the job to failed.
  10. Specialist resolution uses get_render_specialist(engagement_id, declared_render_type_id) — no new registry capability is introduced.
  11. The polling loop survives process restart: in-flight ExternalProductionRecord rows resume polling under the new process via the projector-driven view.
  12. Migrations 0051 and 0052 round-trip cleanly (up then down then up).
  13. The poll_external_work docstring includes the idempotence contract (B5).

8. Post-CR state


9. What this CR does not build


10. Kickoff prompt for the Claude Code session


Read the Change Request document at the path I supply below. This is
CR-2026-046 v0.2, the Phase 34 Change Request. You are the executing
agent named in the CR.

CR path: ~/Downloads/phase-34-cr-external-service-polling-v0_2.md

Phase 34 adds external-service polling for render specialists. The
specialist contract extends produce_render's return type to
RenderEvent | ExternalProductionHandle | None. Specialists that
dispatch to an external service return a handle; a new dispatch
wrapper records the external production and transitions the job row
to awaiting_external. A new MemoryObject (ExternalProductionRecord)
tracks in-flight external work. A new background task
(ExternalPollingLoop) opens per-poll sessions and gathers polls
concurrently.

Note: v0.2 supersedes v0.1 after a CC pre-flight audit identified 10
blockers, 4 recommended changes, and 8 non-blocking findings. The
audit lives at ~/Downloads/phase-34-cr-audit-v0_1.md (read this for
context — it explains why several substrate paths look different
from typical "specialist returns RenderEvent" patterns).

Key points:
  - produce_render signature UNCHANGED: job_id, engagement_id,
    confirmed_shape_event_ref, triggered_by, trigger, db. Only the
    return type extends to include ExternalProductionHandle.
  - Specialist always writes the render_produced event itself, in
    both synchronous and external-polling pathways. The polling
    loop only writes external_production_resolved and transitions
    the job row.
  - Specialist resolution uses the existing
    get_render_specialist(engagement_id, declared_render_type_id) —
    no new registry capability.
  - Per-poll session, asyncio.gather for concurrency. One slow
    specialist does not block others.
  - poll_external_work has an idempotence contract documented in
    its docstring.
  - 11 substrate tests covering contract, happy path, failure
    modes, restart safety, concurrency.

Substrate baseline at audit: 1,284 tests, 2 skips. Re-baseline at
Step 0 if drifted.

Migrations: 0051 (operational view + projector indexes), 0052
(render_jobs.status CHECK constraint extension). No third migration
— event_kind is free text.

Step 0: archive CR + pre-flight verification (audit-resolved facts,
just confirm line numbers haven't drifted).

Steps 1-5 auto, Checkpoint A halts for substrate verification.
Steps 6-8 auto, Checkpoint B halts for manual smoke test and tagging.

Implementation notes at Step 8:
docs/phase-impl-notes/phase-34-implementation-notes-v0_1.md

DUNIN7 — Done In Seven LLC — Miami, Florida Phase 34: External-service polling for specialists — CR v0.2 — 2026-05-03