Version. 0.1 Date. 2026-05-08 Status. Investigation. Two-turn exploration covering both extensibility (skills versus agents, operation-not-format granularity) and Operator interaction (purpose-declared upload, no file-type picker, engine handles format detection). Filed as a single record because the two halves are coupled — the extensibility model determines what the Operator-facing interaction can hide. Author. Claude.ai (investigation layer). Operator: Marvin Percival. Provenance. Conversation began with the Operator asking whether it is reasonable to consider every file type discretely and construct a skill or agent per type to manage upload and prepare it for memory. The Operator surfaced examples — text files, foreign-language text, PDFs, DOCX, voice files, video files, XML, JSON, XLS, PPTX — and observed that an all-encompassing solution feels unable to adapt to future needs without refactoring or new construction. The Operator proposed plug-and-play methodology as more appropriate. A follow-up turn asked how the Operator would inform the Companion that a file is ready to upload — should the Operator select the file type, or simply say "I have a file to upload," with the Companion presenting a file selection dialog and "the magic happens." The two halves of the conversation map onto extensibility and Operator interaction, and are filed together. Informed by. Methodology v0.20 — extensibility-first principle ("the framework specifies the boundary specialists attach at, not an enumerated list"); plain-English-communication-with-Operators principle (decision-requests in plain English, not technical vocabulary); only-show-what-is-available principle (no disabled buttons, no grayed-out options); plain-terms-discipline (technical vocabulary belongs only in code-translation contexts; methodology nouns stay first-class). Phase 16 amendments (multi-stage extraction, multi-output extraction, non-text-shaped knowledge as strain categories; metadata-driven runtime as a future-trigger option). Queued Directions v0.2 — "Upload facility extension" entry. Skill-versus-agent distinction (methodology v0.13+: skills are bounded operations against well-defined inputs; agents are goal-pursuing over multiple steps with bounded autonomy). Phase 38 specification grammar declaration (declare grammar, register specialists that accept it, engine matches at runtime — same pattern fits upload). Closed-loop engagement investigation (Companion as attribution channel; observation/attribution separation). Knowledge elevation pathway investigation (engagement-context-aware interpretation as Companion work). Memory space extensibility investigation (access set computation; the Companion's reasoning informed by engagement, organization, domain Memory).
The Operator opened with two coupled observations.
First: every file type wants different processing. Text files are the root case; foreign-language text needs translation; PDFs and DOCX need conversion; voice files need transcription; video files need both visual interpretation and transcription; XML, JSON, XLS, PPTX all need their own processing. An all-encompassing solution feels unable to adapt to future needs without refactoring or new construction. Plug-and-play methodology seems more appropriate.
Second: how does the Operator inform the Companion that a file is ready to upload? Two candidate interaction patterns — Operator-declares-file-type with file picker afterward, or Operator-declares-intent with file picker and engine-side detection. Which is right?
What lands: plug-and-play is correct, and the methodology already commits to it through extensibility-first; the refinement is that the plug-and-play unit is transformation operations (skills) rather than file-type handlers (one per format), with interpretation kept at the agent layer. On the interaction side, Operator-declared file type fails three methodology principles; the right pattern is Operator declares purpose in plain English, the Companion offers upload, the engine handles detection, the Companion interprets in engagement context, the Operator commits. The "magic happens" framing is right but the magic has to be honest about its limits — the engine asks Operator-shaped questions about purpose, content, and consequence, never about format.
The extensibility-first principle is already a Loomworks commitment from prior conversation: the framework specifies the boundary specialists attach at, not an enumerated list. Upload is an obvious place that principle applies. The engine should not enumerate file types it can ingest because the enumeration is wrong on the day it is written and gets more wrong over time.
Three observations support this:
The Phase 16 amendment work already filed two strain categories that the upload pathway needs to handle without enumeration: multi-stage extraction (one input becomes a chain of processing steps) and multi-output extraction (one input produces multiple distinct outputs). The metadata-driven runtime as a future-trigger option was filed there too. The architectural posture is settled at the methodology level: uploads are processed by something specialist-shaped, not by an enumerated handler list.
The Operator's question conflates two things that the methodology has good vocabulary for separating, and pulling them apart is the load-bearing move.
Transformation is bounded — this is the methodology's "skill" sense (bounded operation against well-defined inputs). PDF-to-text is a skill. Audio-to-transcript is a skill. Foreign-text-to-English is a skill. Video-to-transcript-with-timestamps is a skill. Each has well-defined inputs and outputs; each can be commodified, swapped, or chained. Skills compose — a foreign-language PDF needs PDF-to-text then translation, which is two skills in sequence rather than a third skill called "foreign-PDF-handler."
Interpretation is unbounded and engagement-specific — this is agent territory. What does this contract mean for the engagement's open questions? What does this voicemail's content imply about Memory the engagement already holds? Which assertions does this PDF support, contradict, or extend? Interpretation is the work of bringing transformed content into the engagement's reasoning about itself, and that's the Companion (or a domain-specific agent) doing what only an agent can do.
The plug-and-play instinct from the question is correct, but the right grain is transformation skills. Interpretation should not be plug-and-play — it should be the Companion's work, informed by engagement context, with the Operator commit gate at the end.
If transformation and interpretation are conflated into one "upload handler per file type," brittleness follows. Every new file type wants its own end-to-end handler; handlers duplicate interpretation logic across types; the engagement's reasoning about uploaded content fragments across handler boundaries. That's the failure mode the Operator's question is pointing at.
If transformation and interpretation are separated, the composition pattern is clean:
Step 1 needs to be extensible (new types emerge). Step 2 needs to be extensible (new transformations emerge). Step 3 is engagement-specific Companion work. Step 4 is unchanged. The plug-and-play surface is at steps 1 and 2; everything else is the existing lifecycle.
Most transformation work is genuinely skill-shaped. PDF-to-text doesn't need to deliberate; it has a job and it does it. Translation has the same shape — bounded operation, well-defined inputs, swappable implementations. Audio transcription likewise.
Some transformation work has agent character at the edges. Video interpretation is the obvious case: a video can be transcribed (skill), but understanding what's happening visually — diagrams on a whiteboard, body language in a meeting, a demonstrated technique in a how-to clip — is closer to agent work than skill work because the judgment about what to extract is engagement-dependent. A physiotherapy engagement watching a movement assessment video extracts different things than a sales engagement watching the same video.
The clean line: transformation skills produce a faithful rendering of the artifact; agents extract engagement-relevant interpretation from it. A skill can transcribe a video; an agent decides what about the transcribed content matters for this engagement's Memory.
The Operator's phrasing — "every file type discretely and therefore construct a skill or agent" — puts the granularity at the file type. That is close but slightly coarse. The granularity is at the transformation operation, not the file type.
PDF and DOCX both need text extraction; the text extraction is the skill, not "PDF handling" and "DOCX handling" as distinct skills. Foreign-language anything needs translation; translation is the skill, not "foreign-PDF handling" and "foreign-DOCX handling" as distinct skills. Composition is what handles novelty: a French-language scanned PDF is OCR-then-translate, two existing skills chained, not a fresh handler.
This makes the plug-and-play story stronger. The catalog of skills stays smaller than the catalog of file types, because skills compose. And the catalog grows along axes of operation (extract text, transcribe, translate, parse structure, OCR) rather than along axes of format (PDF, DOCX, MP3, MP4, XLSX). When a new format emerges — the holographic-document format ten years from now — what's needed is a new detection rule and a new extraction skill, not a refactor of every existing handler.
Detection should not require an enumerated handler per file type, because file extensions lie, MIME types are sometimes wrong, and content sniffing is heuristic. The cleaner pattern is a registry of detection rules — extension patterns, magic-byte signatures, content signatures — that map onto transformation skill-chains. Adding a new file type means adding a detection rule and registering a transformation skill-chain, not subclassing an upload handler.
This matches Phase 38's specification grammar approach: declare the grammar, register specialists that accept it, the engine matches at runtime. The same pattern fits upload: declare the artifact type (or detection signature), register a transformation skill-chain that handles it, the engine matches at upload time.
Asking the Operator to declare the file type makes the Operator do work the engine should do. That fails three principles already in play.
Plain-English-communication-with-Operators principle. Decision-requests are in plain English with examples, not technical vocabulary. "What file type is this?" is technical vocabulary, and worse, it is vocabulary the Operator may not be able to answer correctly. An Operator who has a .docx from a client has a Word document. An Operator who has a .pages file from someone on a Mac, or a screen-recording-export from a meeting tool, may genuinely not know what category their file falls under. Asking them is asking them to do work that is not theirs.
Only-show-what-is-available principle. No disabled buttons, no grayed-out options. A file-type picker has the inverse problem — it is an enumerated menu, and the moment that menu does not include the Operator's actual file type, the Operator either cannot proceed or picks something close-enough and corrupts the detection. Either failure mode is bad. Better to not have the menu.
Extensibility-first principle. The plug-and-play unit is transformation operations, not file-type handlers. A user-facing file-type picker locks the type-selection moment to the Operator interaction layer, which makes detection-rule changes either invisible (the picker shows the same options regardless of registered handlers) or churning (every new transformation skill bumps the picker). Both are wrong shapes.
"The magic happens" reads as flippant but is actually the architectural commitment worth being careful about. If the engine is going to handle detection, the detection has to be honest about its limits and surface uncertainty cleanly. Three cases worth thinking through.
Confident detection. Extension matches a registered handler; magic-byte signature confirms; transformation skill-chain runs; held assertions are drafted. The Companion narrates what happened in plain English ("I read the contract and pulled out the parties, the term, and the renewal clause as held assertions for your review"). Operator commits or doesn't.
Ambiguous detection. The artifact matches multiple plausible handlers (a .txt file might be plain text, might be structured data masquerading as text, might be a transcript from somewhere). The Companion asks the Operator a plain-English question about purpose, not type. "Is this a transcript I should treat as someone speaking, or a document I should treat as written content?" The question is about what the content is for the engagement, which is a question only the Operator can answer; it is not about what the file is, which is the engine's job.
Failed detection. No registered handler matches. The Companion says so plainly: "I'm not sure how to read this kind of file. Can you tell me what's in it, in your own words?" The Operator can describe it, paste content, or skip — and the system records that this artifact type exists in the wild without a handler, which is the input that drives the next transformation-skill registration.
The point: "the engine handles detection" does not mean "the engine never asks the Operator anything." It means the engine asks Operator-shaped questions, not engine-shaped questions. The Operator answers about purpose and content; the engine handles type and format.
The Operator's input should not be "upload this file" as a command. It should be a declaration about what they want to do, with the upload being one way to do it.
"I have a contract I want to add to engagement memory" → Companion proposes a file picker plus the option to paste text plus the option to type contents directly. Three pathways, all leading to the same held-assertion-drafting flow. The Operator picks the one that matches what they actually have.
"I have a file" without further context → Companion asks one question first: "What's in it?" The answer drives the rest. If the Operator says "a recording of yesterday's meeting," the file picker comes up with audio/video filtering hinted (not enforced — the picker still accepts anything; the hint is just a default). If the Operator says "a competitor's pricing sheet," the picker comes up with document/spreadsheet hinting.
The plain-terms-discipline observation: the Operator's vocabulary is "contract," "recording," "pricing sheet," "transcript," "report" — purpose-shaped nouns. The engine's vocabulary is "PDF," "DOCX," "M4A," "XLSX" — format-shaped nouns. The interaction translates between them: Operator says purpose, engine handles format. Asking the Operator to speak format is asking them to translate in the wrong direction.
There is a domain where Operator-declared type might seem useful: when the same file format means different things in different engagements. A .csv of clinical observations is a different thing than a .csv of sales pipeline data, even though the engine reads both as tabular data the same way. The transformation is identical; the interpretation (which is agent work, not skill work) is different.
This argues for a different shape than file-type selection: engagement-context-aware interpretation. The engine detects the format, applies the transformation skill-chain, and the Companion interprets the transformed content in the engagement's context — drawing on engagement Memory, the engagement's seed, the kinds of assertions the engagement typically holds. The clinical engagement's Companion reads the CSV through the lens of "what does this engagement know about clinical observations." The sales engagement's Companion reads it through "what does this engagement know about pipeline data." Same skill-chain, different interpretation, and the Operator never has to declare anything about the file beyond "I have this thing I want to add."
This is where the closed-loop and elevation pathway investigations from earlier in the session connect. The Companion's interpretation work is informed by engagement Memory (and, when the architecture extends, by organization Memory and domain Memory through the access set computation named in the memory-space-extensibility investigation §5.1). Format detection is engine work; interpretation is Companion work; the Operator's job is purpose declaration plus commit approval. Three roles, three layers, no Operator-shaped technical vocabulary required.
There is one case where the Operator probably should be invited to declare more explicitly, and it is a discipline question rather than a usability question: when the artifact contains something the Operator should be making a deliberate decision about, not letting the engine decide implicitly.
Examples — uploading a confidential file when the engagement's access mode would expose it; uploading a file that contains content the engagement's Memory should not accept (PII the engagement is not authorized to hold; copyrighted material the engagement should not be ingesting; content from a different access mode that needs an explicit decision to cross over).
In those cases the Companion should not smooth the upload into "magic happens" — it should pause and surface the decision plainly. "This file contains personal contact information for several people. The engagement's current settings don't store contact information by default. Do you want to add it anyway, or skip it?" The discipline is the same as elsewhere — the engine surfaces the decision when there is a real one to make; the Operator decides; the engine records the decision with provenance.
Initial framing under consideration in turn one — "every file type discretely and therefore construct a skill or agent to manage the upload."
Why it didn't quite land. The granularity is at the operation, not the format. PDF and DOCX both need text extraction; the skill is text extraction, not PDF-handling and DOCX-handling. Skills compose; the catalog of skills is smaller than the catalog of file types because composition handles novelty.
What landed. Plug-and-play is right, and the right unit is transformation operations (skills), with detection as a registry of rules mapping signatures to skill-chains, and interpretation kept at the agent layer where the Companion can do its work informed by engagement context.
Initial framing under consideration in turn two — two candidate patterns: Operator-declares-file-type, or Operator-declares-intent with engine-side detection.
Why the file-type-declaration framing fell. Three methodology principles fail: plain-English-communication (technical vocabulary in Operator path), only-show-what-is-available (enumerated menu has the wrong failure mode when the menu is incomplete), extensibility-first (the picker locks type-selection to the interaction layer in a way that fights detection-rule changes).
What landed. Operator declares purpose in plain English, Companion offers upload pathway plus alternatives, file gets uploaded, engine detects format, transformation skill-chain runs, Companion drafts held assertions in engagement context, Operator reviews and commits. No file-type picker. No format vocabulary in the Operator's path. The places where the engine asks the Operator something are about purpose, content, or consequence — never about format.
A subtler observation that emerged from the second turn. "Magic happens" is right but only if the magic is honest about its limits. The engine should not pretend to confident detection when detection is ambiguous; the engine should not silently fail when no handler matches; the engine should not smooth past consequential decisions just because the Operator said "I have a file." The magic does real work behind the seam, and the seam protects the Operator from format vocabulary while still surfacing genuine decisions that need a human.
Five commitments specifically should not bend under upload-pathway pressure.
Uploads do not write to Memory directly. The transformation skill-chain produces transformed content; the Companion drafts held assertions from that content; the Operator commits. Same lifecycle as every other contribution. There is no shortcut where uploaded content becomes committed Memory because it came from a file the Operator selected.
Held assertions drafted from uploads carry provenance pointing at the original artifact, the transformation chain that processed it, and (when the agent fabric extends) the agents that participated in interpretation. The full lineage is reconstructable. An assertion's history must walk back to "uploaded by Operator on date D, transformed via skill chain S, interpreted by Companion C, committed by Operator on date E."
The Operator's path uses purpose-shaped vocabulary ("contract," "recording," "pricing sheet"), not format-shaped vocabulary ("PDF," "DOCX," "M4A"). The methodology already commits to this; the upload pathway is one place where the temptation to introduce format vocabulary is acute and should be resisted.
Transformation work is skill-shaped (bounded, swappable, composable). Interpretation work is agent-shaped (engagement-contextual, deliberative, Companion-attributed). Conflating them leads to the failure mode the question started with — handlers per file type that duplicate interpretation logic. Keeping them separate keeps the architecture extensible.
When detection is ambiguous, the engine asks the Operator a purpose-shaped question. When detection fails, the engine says so plainly and offers alternatives. The engine never silently picks a wrong handler or smoothly proceeds past a consequential decision. "Magic happens" is the experience; honest detection is the discipline behind it.
What is the right data structure for the detection registry? Extension patterns are easy; magic-byte signatures need a binary-matching layer; content signatures may need MIME-type plus content sniffing; some artifacts (a PDF that's actually a scanned image versus a PDF with text content) need a two-stage detection. A registry that supports all of these without being unwieldy is real design work. Probably a tiered match — fast extension/MIME match first, fall through to content signature, fall through to ask-the-Operator. Worth filing as a design question for when the upload pathway enters the build.
If skills compose (PDF-to-text then translation for a French PDF), the engine needs a way to declare and execute chains. Three plausible shapes:
Probably starts as static chains and evolves toward declarative when the static cases prove brittle. Companion-orchestrated may be appropriate for the hardest cases (uncertain content, multi-stage extraction with branching) while never being the default. Worth filing.
The Companion's interpretation of transformed content is engagement-contextual and might be wrong in engagement-specific ways. The closed-loop investigation surfaced the held → committed cycle as the right shape for outcomes flowing back into Memory; the same shape applies to interpretation quality. When the Operator rejects a held assertion drafted from an upload, that's a signal — the interpretation was off, and the next interpretation in this engagement should learn from it. Whether and how that signal is captured is open.
When the agent fabric (closed-loop investigation §3 and prior memory) lands, transformation work and interpretation work both potentially involve agents. Where do the credentials sit (the FORAY/OVA integration strategy investigation's Authorizer protocol)? How does attribution flow from the artifact through the transformation chain through the interpretation agent through the Companion to the held assertion's provenance? Worth filing as work that lights up when both upload-pathway and agent-fabric are simultaneously in build scope.
A single uploaded artifact may produce multiple held assertions (a meeting recording produces decisions, action items, and observations as separate assertion candidates). The Phase 16 multi-output extraction strain is named; the question is how the held assertions relate. Are they linked by shared provenance? Reviewed and committed together, or independently? When the Operator commits some and rejects others, what happens to the artifact's overall provenance? Worth filing for multi-output design work.
If a transformation skill improves over time (a better OCR engine, a better translation model), should existing uploads be re-processed under the improved skill? The non-erasure discipline argues against silent re-processing — the existing held/committed assertions stand as the record of what was true at decision time. But there's a real case for offering re-processing as an explicit Operator action when a skill improves significantly. How that gets surfaced and what provenance the re-processed assertions carry is open.
The separation between transformation (skill-shaped, composable, format-aware) and interpretation (agent-shaped, engagement-contextual, Companion-attributed) is the strongest architectural commitment in this investigation. It should govern upload-pathway design when the work begins. Worth recording as a soft architectural commitment so it isn't re-litigated each time the question of "should we add a handler for X" comes up — the answer is consistently "what transformation operation is needed; is it new?"
The rule that skills are catalogued along axes of operation (extract text, transcribe, translate, parse structure, OCR) rather than format (PDF, DOCX, MP3, MP4, XLSX) is the practical expression of the separation in §7.1. It keeps the skill catalog smaller than the format catalog and makes composition the natural answer to novelty. Worth recording alongside §7.1.
The interaction model — Operator declares purpose, engine handles format, Companion interprets in engagement context, Operator commits — is the standing pattern for upload as well as for several other Operator-engine interactions where format-versus-purpose is the relevant axis (probably also for content extraction from external sources, for engagement seed material, for any case where the Operator brings something in that the engine has to make sense of). Worth recording as a standing pattern, not just as upload-specific guidance.
Detection as a registry of signatures mapping to skill-chains is structurally the same pattern as Phase 38's specification grammar declaration (declare grammar, specialists declare what they accept, engine matches at runtime). Worth recording the parallel because it suggests the same engine machinery may be applicable, reducing the design surface for upload-pathway work.
The clinical-CSV-versus-sales-CSV example shows that the apparent need for "different processing per engagement" is actually a need for different interpretation per engagement, with shared transformation. This is a useful framing because it preserves the operation-not-format granularity rule (transformation stays uniform) while honoring the engagement-specificity instinct (interpretation differs). Worth recording so the framing is preserved when the question arises in other contexts.
This investigation is the sixth in the session's coherent arc:
The first five investigations focused on how Memory flows, scales, and is governed. This sixth investigation focuses on how Memory is fed — the input edge of the architecture. It connects forward to several prior pieces:
The session's meta-observation continues to hold: the methodology has the structural answers; the build is at the simplest case; the work is in connecting structural answers to build sequencing. This investigation does that connecting work for the upload pathway specifically, and contributes the transformation/interpretation separation and operation-not-format granularity rules as standing patterns that govern upload-pathway design when it enters the build.
This investigation was produced through a two-turn conversation. The first turn responded to the Operator's plug-and-play question with the transformation/interpretation separation and operation-not-format granularity. The second turn responded to the Operator's interaction-model question with the purpose-declaration pattern and the "magic happens" honesty principle. The two turns were filed together because the extensibility model (turn one) determines what the Operator-facing interaction (turn two) can hide — the cleaner the separation, the less the Operator needs to know about format.
The trajectory worth preserving for future work: the conversation moved from a tooling question ("should we have a handler per file type?") to an architectural recognition (transformation versus interpretation; operation versus format) to an interaction-model question (Operator purpose-declaration versus type-selection) to a discipline observation ("magic happens" must be honest about its limits). Each stage built on the prior — the separation in turn one is what makes the interaction model in turn two clean.
The strongest synthesizing observation: the upload pathway is one place where Loomworks's standing principles (extensibility-first, plain-English-communication, skill-versus-agent, plain-terms-discipline) compose into a coherent design pattern that doesn't require new methodology. The work is recognizing the application; the answers follow.
DUNIN7 — Done In Seven LLC — Miami, Florida Loomworks — Upload Pathway Investigation — v0.1 — 2026-05-08