Adventures On The Edge | Keeping up with technologies

Direct Corpus Interaction - Introduction

13. June 2026 BillKrat Comments (0)

DCI Abstract * DCI Introduction

Author’s Note: In my previous post, I worked through the abstract of the Direct Corpus Interaction (DCI) paper and discovered that much of my initial confusion stemmed from assumptions about how modern retrieval systems actually work. Before moving into the introduction, I realized I needed a clearer understanding of the retrieval approaches the paper repeatedly references. The blockquote below contains a statement from the introduction that led me to investigate the difference between sparse and dense retrieval systems. The explanatory notes that follow are part of my learning journey and are intended to help other developers who may be encountering these concepts for the first time.

dci.pdf (2.38 mb) — Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction

This interface underpins a wide range of applications, including retrieval-augmented generation (Lewis et al., 2020; Gao et al., 2023; Singh et al., 2025), open-domain question answering (Trivedi et al., 2022; Press et al., 2023), and deep research (Wei et al., 2025; Chen et al., 2025b).

My Initial Misunderstanding

When I first read this sentence, I mentally grouped Retrieval-Augmented Generation (RAG), Open-Domain Question Answering (ODQA), and Deep Research together as different retrieval approaches. Since I was already familiar with RAG and was reading a paper proposing Direct Corpus Interaction (DCI), my brain immediately started categorizing everything as competing retrieval techniques.

That turned out to be an inaccurate mental model.

The Distinction That Helped

The most useful clarification for me was realizing that the paper is discussing application categories, not necessarily retrieval methods.

A retrieval method is concerned with how information is found.

BM25
Dense Retrieval
Hybrid Retrieval
Top-k Retrieval
Direct Corpus Interaction (DCI)

An application category is concerned with what the system is trying to accomplish.

RAG Assistant
Open-Domain Question Answering System
Deep Research Agent

Once I separated those two ideas, the sentence became much easier to understand. The authors are not listing competing retrieval techniques. They are listing examples of systems and applications that depend on retrieval.

Visualizing The Difference

One reason this distinction initially escaped me is that all three application categories involve finding information. From a distance they can look very similar.

At a simplified level, a RAG system often follows a pattern like:

Question
    ↓
Retrieve Documents
    ↓
Generate Answer

Open-Domain Question Answering systems are also attempting to answer questions using information that may exist anywhere within a large corpus:

Question
    ↓
Retrieve Documents
    ↓
Read Documents
    ↓
Generate Answer

Deep Research systems typically extend this idea into a longer investigation:

Question
    ↓
Search
    ↓
Read
    ↓
Search Again
    ↓
Compare Evidence
    ↓
Investigate Gaps
    ↓
Repeat
    ↓
Produce Report

The exact implementations differ, but these simplified workflows helped me understand why the paper groups them together. They are all systems that rely on retrieving and examining information from a corpus.

A Common Misunderstanding

An easy mistake is to assume that DCI belongs in the same category as RAG, Open-Domain QA, and Deep Research.

That is not how I currently understand the paper.

RAG, Open-Domain QA, and Deep Research are applications. DCI is being proposed as a different way for those applications to interact with a corpus.

In other words, DCI is closer to a retrieval or investigation strategy than an end-user application category.

Why This Matters For Understanding DCI

The most important takeaway for me was realizing that the paper is not simply introducing another retrieval technique alongside existing retrieval techniques. It is questioning a deeper assumption that many of these systems share.

Traditional retrieval pipelines often begin with a top-k retrieval step:

Question
    ↓
Retrieve Top-k Documents
    ↓
Work From Retrieved Results

DCI appears to challenge the assumption that an investigation must begin that way.

Instead of retrieving a small set of candidate documents and working only from those results, an agent may directly interact with the corpus through actions such as searching, opening files, inspecting context, following references, and gathering evidence incrementally.

At this point in the paper, the authors have not yet proven that this approach is superior. They are establishing why retrieval matters and why rethinking retrieval could affect a broad range of systems, including RAG applications, Open-Domain QA systems, and Deep Research agents.

What Changed In My Understanding

Before reading this section carefully, I was viewing the discussion primarily as:

RAG
versus
DCI

After working through the terminology, I now see the comparison as something closer to:

Top-k Retrieval
versus
Direct Investigation

That shift in perspective made the rest of the introduction significantly easier to follow and helped me understand why the authors repeatedly reference multiple application categories throughout the paper.

Understanding Sparse and Dense Retrieval

"In standard retrieval-augmented pipelines, documents are chunked, indexed, and filtered into a top-k candidate set using well-established sparse (Robertson et al., 1994) or dense (Karpukhin et al., 2020) techniques before downstream reasoning begins."

What This Means

The paper references two major categories of retrieval systems: sparse retrieval and dense retrieval.
Both approaches are designed to reduce a large corpus into a smaller set of candidate documents before the language model begins reasoning.
The goal is efficiency: instead of examining everything, the model receives only the documents considered most relevant.
The DCI paper is not primarily comparing sparse retrieval against dense retrieval. Instead, it questions whether this entire top-k retrieval stage has become a bottleneck for capable agents.

Sparse Retrieval

Sparse retrieval relies primarily on explicit terms appearing in documents.
Examples include:

BM25
TF-IDF
traditional keyword search
search engine inverted indexes

These systems work well when exact wording matters.
Developer examples include:

method names
class names
configuration keys
exception messages
employee IDs
filenames

If a developer searches for NullReferenceException, a sparse retriever can quickly locate documents containing that exact phrase.

Dense Retrieval

Dense retrieval uses embeddings to represent both queries and documents as vectors.
Instead of matching exact words, the system attempts to find documents with similar meaning.
This allows retrieval to succeed even when the query and document use different terminology.
For example:

A query asks about user authentication.
A document discusses JWT-based login flows.
The exact words may differ, but the concepts are related.

Dense retrieval excels when semantic meaning is more important than exact terminology.

A Useful Mental Model

One of my initial assumptions was that sparse retrieval might be useful for things like organizational charts or employee lookups, while dense retrieval might be better for blog content or documentation.
While not technically precise, that intuition points toward an important distinction.
Sparse retrieval often behaves like an exact lookup system.
Dense retrieval behaves more like a concept lookup system.
The actual distinction is not the type of content being searched but how relevance is determined.

Why This Matters for DCI

The paper is not arguing that sparse retrieval is bad.
The paper is not arguing that dense retrieval is bad.
Both approaches have been highly successful and remain important components of modern retrieval systems.
The paper is questioning whether forcing every information request through a single retrieval step unnecessarily limits capable agents.
In traditional systems, the workflow often looks like:

User Query
     ↓
Sparse Retriever
or
Dense Retriever
     ↓
Top-K Results
     ↓
LLM Reasoning

DCI proposes something closer to:

User Query
     ↓
Agent
     ↓
Search Corpus
Open Files
Inspect Context
Refine Search
Follow Clues
     ↓
Answer

The comparison is therefore not primarily sparse versus dense.
The comparison is top-k retrieval versus direct investigation.

Key Insight

Sparse retrieval and dense retrieval represent different ways of finding relevant information.
DCI challenges a deeper assumption: that retrieval should always happen as a single filtering step before reasoning begins.
The paper argues that increasingly capable agents may benefit from interacting directly with a corpus rather than being restricted to a pre-filtered candidate list.
This shifts the question from:

What are the most relevant documents?

What investigation should the agent perform next?

DCI, Agent Loops, and System Architecture

"This becomes particularly beneficial once the agent is strong enough to search strategically (as recent systems suggest; e.g., Anthropic, 2026; OpenAI, 2026)."

What This Means

This sentence helped clarify an important boundary for me.
Direct Corpus Interaction is not just about giving a model access to files.
It becomes powerful when the agent can use those files strategically.
That means the agent can search, inspect, revise its assumptions, search again, and continue narrowing the investigation.

The Agent Loop

In a DCI-style workflow, the agent often behaves in a loop:

Observe the task
     ↓
Form a search strategy
     ↓
Run a tool
     ↓
Inspect the result
     ↓
Revise the hypothesis
     ↓
Run another tool
     ↓
Continue until enough evidence exists

This resembles how a developer investigates a codebase.
The developer does not usually ask for one perfect search result.
The developer searches, reads, narrows, checks nearby context, follows references, and adjusts direction based on what is discovered.

What the Model Handles

The model, when capable enough, handles the strategic reasoning.
It decides what to search for next.
It interprets failed searches.
It may decide that a search term was too broad, too narrow, misspelled, or based on a wrong assumption.
It chooses whether to inspect a file, search nearby references, or try a different clue.

What Supporting Infrastructure Handles

The infrastructure surrounding an agent should not try to become the agent's brain.
Its role is to provide controlled access to the environment in which the investigation occurs.
Its responsibilities are typically architectural and operational:

expose safe tools
transmit relevant context
preserve human approval gates
shape tool results into usable form
avoid flooding the model with unnecessary output
log durable evidence
enforce workspace or system boundaries
support observation before automation

Why the "Thought" Loop Matters

Some agent systems have an internal reasoning loop that allows the model to plan, call tools, inspect results, and continue.
That internal reasoning process may not be exposed to the bridge.
In many systems, the bridge may only see tool requests and tool results, not the private reasoning that led to them.
This means the bridge should not depend on seeing the model's full chain of thought.
Instead, it should depend on observable behavior:

what tool was requested
what inputs were provided
what result was returned
whether the user approved the action
what evidence was produced

Implications for Agent-Based Systems

This distinction becomes important when evaluating systems that support agentic workflows.
The surrounding infrastructure does not need to implement DCI by itself.
Instead, it needs to make DCI possible by providing a safe, precise, and observable interface into the environment being investigated.
Examples of capabilities might include:

list available resources
read approved files or documents
search within an approved scope
return bounded snippets
preserve source references and line numbers
require approval before write operations

The agent remains responsible for deciding how to use those capabilities.
For example, these concepts influence how agent-supporting systems expose selected files, workspace searches, approvals, and evidence collection while leaving strategic investigation decisions to the agent. In my own work on vs-mcp-bridge, these same ideas have influenced how the bridge exposes developer workspace capabilities without attempting to direct the investigation itself.

Important Limitation

Not every model will be equally effective at Direct Corpus Interaction.
A weaker model may call tools poorly, search too broadly, miss important clues, or fail to recover from unproductive results.
A stronger model may perform better within the same environment because it can plan, revise its assumptions, and conduct a multi-step investigation more effectively.
This means access to tools alone does not guarantee successful investigation.
The quality of the agent's reasoning remains an important factor in how effectively a corpus can be explored.

Key Insight

The environment provides access.
The agent provides strategy.
The user provides authority.
Direct Corpus Interaction becomes valuable when those roles remain distinct.

Models and Agents and Tools, oh my!

13. June 2026 AI Systems Author Comments (0)

The title of this blog stems from the phrase "Lions and tigers and bears, oh my!"; a phrase that originates from the 1939 movie The Wizard of Oz. The underlying meaning is an expression of anxiety for escalating, unknown, or overwhelming fears; or in a lesser sense, having to face daunting tasks and challenges. In the movie, Dorthy, the Scarecrow, and the Tin Man chant it as a nervous mantra as they walk through a dark forest potentially holding dangerous creatures.

This article title is my nervous mantra... the ambiguity, overloading, and misuse of terms, is staggering....

Without using terms I first, with ChatGPT’s help, have to understand what the “boxes” are, what belongs in them, and where their responsibilities begin and end. Only then can I write about AI architecture in a way that helps others understand by later connecting those boxes to terms such as models, agents, and tools.

The purpose of this article is to let you know [if you are searching] that it isn't you. We are in and AI industry, where the architects are too far removed from their ignorance, a place where overloaded and reused terms do not detract from their meaning (in the context of experience), but leave no breadcrumbs for those of us learning.

As I study AI architecture for the development of a Visual Studio Extension, using the Anthropic Model context Protocol (MCP), so that I can use Codex for development. I'm brought back to an article I wrote for MSDN Magazine in 2011 where I coined the phrase "MVPVM", this is not so much a shameless plug as it is the danger of not clearly understanding terminology, in its misuse, and overloading of terms; it can make it "very" difficult to understand and learn and the ambiguity continues to propagate and become "the" standard - painting many developers into a corner.

Martin Fowler warned us of this back in the early days of MVC (a term we still misuse). I provide an excerpt from my article MVPVM Design Pattern - The Model-View-Presenter-ViewModel Design Pattern for WPF of this warning.

In Martin Fowler’s “GUI Architectures” document (bit.ly/11OH7Y), he states the following about MVC: “Different people reading about MVC in different places take different ideas from it and describe these as ‘MVC.’ If this doesn’t cause enough confusion, you then get the effect of misunderstandings of MVC that develop through a system of Chinese whispers.” The “Whatsa Controller Anyway” document at bit.ly/7ajVeS sums up his point nicely stating: “Computer scientists in general have an annoying tendency to overload terms. That is to say, they tend to assign more than one meaning (sometimes contradictory) to the same word.”

Smalltalk MVC gets further complicated by the fact that developers lack a common core of experience with their great architects in regard to the architect environment. As such, the overloading of terms and “Chinese whispers” are convoluted even more by the fact that most of us don’t even know what a Controller is because we’ve never had to use one—the OS has always handled its functionality for us. With a few facts to provide the common core of experience we lack, you’ll find MVC is actually easy to understand, and that the “C” in MVC has nothing in common with the “P” in MVP.

The underlined statement above is the crucial point and understanding. With it I will drive my point home - when the Operating System "OS" came into the picture, it embedded the Controller functionality into controls - you no longer had to manually create a controller to handle gestures (mouse or keystrokes) as the OS did it for us. When the controller disappeared, it now became a model-view pattern, which would later be called MVVM for WPF; the defacto standard that encompassed the exact issues Martin Fowler was warning us about for their Application Model pattern. It was why extensible applications required a Presenter (MVP) to manage the view-model so that they could remain decoupled and reusable.

As a result, MVVM would paint developers into a corner that would not permit them to easily reuse their models or views. You'll still find articles on MVC vs MVVM as what pattern is best to use. These are anti-patterns, replaced with MVP-VM solving problems our great architects warned us of.

I did not create the pattern MVP-VM, it was a pattern created by Microsoft in their early PRISM projects when Dependency Injection first emerged - I simply coined the phrase and my article was my attempt to help people understand their history, lest history repeat itself; I wasn't successful, you merely have to search for MVVM and MVC to see this.

Now the learning begins again, with ChatGPTs assist I'll generate articles that breaks it all down, coining new terms if/as applicable. In the meantime, don't look internally if you are confused - again, it isn't you.

To be continued....

Direct Corpus Interaction (DCI) - Abstract

13. June 2026 BillKrat Comments (0)

DCI Abstract * Dci Introduction

Author’s Note: I am actively learning about Direct Corpus Interaction (DCI) and documenting my understanding as I go. The blockquotes in this post contain excerpts from the DCI research paper that exposed gaps in my own understanding. The explanatory sections that follow are learning notes generated with ChatGPT to help me clarify the concepts. They are not presented as original research, but as study notes for developers following the same path.

dci.pdf (2.38 mb) a paper on "Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction"; the article abstract follows:

Understanding the Retrieval Bottleneck

"Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning."

What This Means

A corpus is the body of information the system can search. In a developer context, this could be source code, documentation, logs, tickets, markdown files, PDFs, or a knowledge base.
To expose a corpus means giving an AI system some way to access that information.
In many traditional retrieval systems, the AI does not inspect the raw corpus directly. Instead, it asks a retriever for the most relevant chunks.
A fixed similarity interface means the retriever uses a predefined way of deciding what is relevant. That might be lexical matching, semantic similarity, vector search, BM25, or another ranking mechanism.
The important point is that the AI receives a filtered result set instead of direct access to the full information space.

Why It Matters

This design is efficient. The retriever narrows a large corpus down to a small set of candidate results before the language model starts reasoning.
However, that efficiency comes with a tradeoff. Information that is filtered out early may never be seen by the model.
If the retriever misses a critical file, phrase, log entry, method name, or clue, the downstream reasoning step cannot recover it because the model never received it.
This is what the paper means by compresses access. The system reduces a large, messy information space into a small ranked list.

Developer Translation

This is similar to asking someone to debug a production issue, but only giving them the top five search results from the repository.
Those five results may be useful, but they may also hide the real trail: a config value, an obscure log message, a generated file, a test artifact, or a second-order reference elsewhere in the codebase.
A human developer usually does not investigate that way. We search, inspect, refine, search again, follow references, check surrounding context, and revise our assumptions as we go.

DCI Perspective

Direct Corpus Interaction challenges the assumption that retrieval should always happen as a single pre-reasoning step.
Instead of asking a retriever for the top results, DCI lets the agent interact with the raw corpus more directly using tools such as search, grep, file reads, shell commands, and lightweight scripts.
The paper’s argument is not that traditional retrieval is useless. It is that capable agents may need a richer interface than a fixed top-k result list.

Key Insight

Traditional retrieval asks: What are the most similar chunks?
DCI asks: What investigation should the agent perform against the corpus?
That shift matters because complex tasks often require exploration, verification, and refinement rather than a single search result.

Understanding the Bottleneck in Traditional Retrieval

"This abstraction is efficient, but for agentic search, it becomes a bottleneck: exact lexical constraints, sparse clue conjunctions, local context checks, and multi-step hypothesis refinement are difficult to implement by calling a conventional off-the-shelf retriever, and evidence filtered out early cannot be recovered by stronger downstream reasoning."

What This Means

The paper is saying that traditional retrieval works well when the task is simple: ask a question, retrieve likely documents, then generate an answer.
Agentic search is different. The agent may need to investigate over multiple steps, discover intermediate clues, test assumptions, and change direction based on what it finds.
In that setting, a fixed retriever can become a bottleneck because it controls what the agent is allowed to see.

Exact Lexical Constraints

A lexical constraint means the exact text matters.
Examples include method names, class names, exception messages, configuration keys, IDs, filenames, command-line flags, database columns, or specific phrases.
Semantic retrieval may understand the general meaning of a question, but it can still miss exact strings that are critical to solving the problem.
Developer example: NullReferenceException is not just a general concept. It is an exact term you may need to find in logs, tests, or issue reports.

Sparse Clue Conjunctions

A sparse clue is a small piece of evidence that may not look important by itself.
A conjunction means several clues need to be combined.
One clue might be a date, another might be a filename, another might be a partial error message, and another might be a component name.
A traditional retriever may not rank any one clue highly enough to surface the right document.
DCI allows the agent to combine clues through iterative searches, such as searching for one term, narrowing by another, then inspecting the surrounding context.

Local Context Checks

Finding a match is often not enough. The agent needs to inspect what appears around the match.
In code, nearby context might include the containing method, imports, comments, dependency injection setup, test assertions, or error handling.
In documentation, nearby context might clarify whether a term is being defined, contradicted, deprecated, or used as an example.
DCI gives the agent a way to inspect that local context directly instead of relying only on a preselected snippet.

Multi-Step Hypothesis Refinement

Hypothesis refinement means the agent starts with a possible explanation, checks it against evidence, then revises the explanation.
This is how developers commonly debug: form a theory, search for evidence, inspect the result, discover a new clue, and adjust the theory.
Traditional retrieval often front-loads the search step. DCI makes search part of the reasoning loop.

Why Stronger Reasoning Cannot Recover Missing Evidence

A stronger model can reason better over the evidence it receives.
But if important evidence was filtered out before the model saw it, the model has nothing concrete to reason from.
This is the core bottleneck: the retrieval interface can limit the reasoning process before reasoning even begins.

Key Insight

The paper is shifting attention from the intelligence of the model to the quality of the interface between the model and the corpus.
For agentic work, the question is not only: How smart is the model?
It is also: What can the model actually observe, inspect, verify, and act upon?

"To tackle the limitation, we study direct corpus interaction (DCI), where an agent searches the raw corpus directly with general purpose terminal tools (e.g., grep, file reads, shell commands, lightweight scripts), without any embedding model, vector index, or retrieval API. This approach
requires no offline indexing and adapts naturally to evolving local corpora. Across IR benchmarks and end-to-end agentic search tasks, this simple setup substantially outperforms strong sparse, dense, and reranking baselines on several BRIGHT and BEIR datasets, and attains strong accuracy on BrowseComp-Plus and multi-hop QA without relying on any conventional semantic retriever. Our results indicate that as language agents become stronger, retrieval quality depends not only on reasoning ability but also on the resolution of the interface through which the model interacts with the corpus, with which DCI opens a broader interface-design space for agentic search. "

What This Means

The paper proposes an alternative retrieval model called Direct Corpus Interaction (DCI).
Instead of asking a retriever for the “best matching documents,” the agent interacts with the raw corpus directly using normal operating-system style tools.
The examples listed in the paper:

grep
file reads
shell commands
lightweight scripts

are important because they are not specialized AI retrieval systems. They are generic tools developers already use daily.

Why "Raw Corpus" Matters

In traditional Retrieval-Augmented Generation (RAG), the corpus is usually:

chunked into smaller pieces
converted into embeddings
stored in a vector database
retrieved through similarity search

DCI skips that entire preprocessing pipeline.
The agent works against the original files directly:

source code
markdown
logs
PDF exports
JSON
configuration files
directory structures

This is significant because the structure, naming, formatting, and neighboring context of the original files are preserved.

"Without Any Embedding Model, Vector Index, or Retrieval API"

An embedding model converts text into numerical vectors so semantic similarity can be calculated mathematically.
A vector index is a specialized data structure optimized for fast similarity search over those vectors.
A retrieval API is the interface the language model normally uses to request relevant documents.
DCI intentionally removes all of those layers.
Instead of:

“Give me the top 5 semantically similar chunks.”

the model effectively performs investigations itself:

“Search for this exact phrase.”
“Open this file.”
“Check nearby lines.”
“Find references to this identifier.”

"No Offline Indexing"

Traditional retrieval systems usually require preprocessing before search becomes efficient.
That preprocessing step may:

generate embeddings
build indexes
split documents into chunks
calculate metadata

This work is often performed ahead of time, which is why the paper calls it offline indexing.
DCI avoids this requirement entirely because the agent searches the live corpus directly.
This becomes especially useful when the corpus changes frequently, such as:

active code repositories
local developer workspaces
runtime logs
generated artifacts
temporary debugging files

The paper argues that DCI naturally adapts to evolving corpora because there is no index that must constantly be rebuilt or synchronized.

IR Benchmarks

IR stands for Information Retrieval.
Information Retrieval is the field focused on finding relevant information inside large collections of data.
Search engines are one example of an IR system.
IR benchmarks are standardized datasets used to evaluate how well retrieval systems locate relevant information.

End-to-End Agentic Search Tasks

An end-to-end task means the system must complete the full workflow itself rather than only a small isolated step.
Agentic search refers to AI systems that:

plan investigations
search iteratively
revise hypotheses
follow intermediate clues
perform multi-step reasoning

Instead of performing one search and stopping, the agent behaves more like a researcher or developer investigating a problem.

Reranking Baselines

A baseline is a comparison system used to measure whether a new approach performs better or worse.
A reranker is a second-stage model that reorders retrieved search results after the initial retrieval step.
Example workflow:

Retrieve 100 candidate documents
Use a stronger model to score them again
Return the best-ranked subset

Reranking is commonly used to improve retrieval quality in advanced RAG pipelines.
The paper claims DCI outperformed even these stronger retrieval pipelines.

BRIGHT and BEIR

BRIGHT and BEIR are benchmark suites used to evaluate retrieval systems.
They contain datasets designed to test difficult retrieval and reasoning tasks.
BEIR is especially well known in information retrieval research because it evaluates systems across multiple domains rather than a single dataset.
Mentioning these benchmarks is important because it shows the paper is comparing DCI against established retrieval evaluation standards rather than isolated examples.

BrowseComp-Plus

BrowseComp-Plus is a benchmark designed to evaluate long-horizon, agentic research behavior.
The tasks often require:

multiple searches
intermediate discoveries
clue chaining
evidence verification
plan revision

This benchmark is important because it stresses investigation ability, not just simple retrieval quality.

Multi-Hop QA

QA stands for Question Answering.
Multi-hop means the answer cannot usually be found in a single document or passage.
The system must combine information from multiple sources.
Example:

one document identifies a person
another identifies their organization
another explains the historical event connected to that organization

Multi-hop tasks are difficult because they require iterative reasoning and evidence chaining.

Conventional Semantic Retriever

A semantic retriever attempts to find documents based on meaning similarity rather than exact keyword matching.
Modern RAG systems commonly use semantic retrievers backed by embeddings and vector databases.
The paper’s core claim is that DCI can compete with or outperform these systems without relying on semantic retrieval infrastructure at all.

Key Insight

The surprising idea in this paper is not merely that DCI works.
It is that relatively simple developer-style tooling:

grep
shell pipelines
file inspection
iterative search

may provide a richer interface for advanced reasoning agents than heavily abstracted retrieval systems.
The paper is effectively arguing that the intelligence of the agent may now be strong enough that restricting it to top-k retrieval results becomes the limiting factor.

The End of Loose Prompting: Why AI Now Requires More Discipline from Us

27. May 2026 AI Systems Author Comments (0)

AI is no longer just correcting our spelling, finishing our sentences, or helping us phrase an idea. Increasingly, AI systems are becoming operational actors. That changes everything.

The Shift I Am Beginning to Notice

For years, many of us have been trained by software to be imprecise.

Autocorrect fixes our spelling. Search engines guess what we meant. Recommendation engines infer our preferences. IDEs complete our code. Navigation systems route us without requiring us to understand the roads.

That convenience has benefits, but it also has a cost: it conditions us to become comfortable with vague intent.

With traditional software, vague intent was often tolerable. If autocorrect picked the wrong word, we could fix it. If search returned the wrong page, we could search again. If autocomplete made a bad suggestion, we could delete it.

But AI is moving beyond suggestion.

Modern AI systems can now invoke tools, read documents, edit repositories, call APIs, operate through connectors, send messages, schedule events, generate code, and interact with systems through protocols such as MCP-style tool interfaces.

That means the relationship has changed.

We are no longer merely asking software to help us express intent. We are increasingly asking software to act on intent.

Once AI can act, loose prompting becomes more than a communication issue. It becomes an operational risk.

Why This Feels Different

Older AI prompting often felt like trying to get better prose from a clever assistant. The goal was usually to get a better answer, a better summary, a better email, or a better explanation.

That is still useful. But it is no longer the whole picture.

As AI systems become connected to tools and workflows, prompting starts to carry more weight. A prompt is no longer just a request. In many cases, it becomes a temporary policy boundary.

It may define:

what the AI is allowed to touch,
what it should avoid,
what source of truth it should trust,
whether it may act or only advise,
how much autonomy it has,
what should be logged,
what requires confirmation,
and what outcome counts as complete.

That is a very different world from “write me a paragraph about this topic.”

The MCP Security Lesson

The recent attention around MCP security did not create this problem by itself. It exposed a problem that was already forming.

MCP-style systems make tool use visible and standardized. That is valuable. But once a model can interact with tools, files, services, and credentials, the question is no longer simply, “Can the model answer correctly?”

The question becomes:

Can the system act safely when exposed to ambiguous instructions, hostile context, excessive permissions, or hidden prompt manipulation?

This is why the security conversation has expanded beyond ordinary bugs. Prompt injection, excessive agency, insecure tool use, sensitive information disclosure, and confused authorization boundaries are now architectural concerns, not just prompting annoyances.

In plain English: if an AI can use tools, then someone must define what those tools are allowed to do, under what authority, with what evidence, and with what audit trail.

The Real Issue: Assistant Versus Actor

A helpful way to think about this is the difference between an assistant and an actor.

An assistant helps you think, write, review, explain, summarize, or plan.

An actor changes things.

It edits files. It opens tickets. It runs commands. It sends emails. It schedules meetings. It queries private systems. It modifies infrastructure. It may even chain multiple actions together.

When AI behaves as an assistant, vague prompting is often merely inefficient.

When AI behaves as an actor, vague prompting can become dangerous.

The more authority we give an AI system, the more disciplined our instructions must become.

Why Non-Developers Need to Understand This

This is not only a developer problem.

Developers may see it first because they work close to tools, repositories, terminals, APIs, permissions, and logs. But the same shift is coming to everyone.

AI systems are being connected to email, calendars, documents, customer records, spreadsheets, business processes, personal assistants, financial systems, learning tools, research workflows, and office automation.

That means ordinary users will increasingly face systems that do not merely suggest what to do. They may do it.

If users remain conditioned by “loosey-goosey autocorrect” habits, frustration is inevitable. People may say something vague, the AI may interpret it differently, and the result may not match what the person intended.

Worse, the user may accept the result because they have been trained by years of convenience software to trust the machine’s correction over their own unfinished thought.

The Human Risk: Convenience Can Weaken Judgment

This is the part that concerns me most.

Human beings adapt to convenience. That is not an insult; it is a reality of human behavior.

When software repeatedly fills in gaps for us, we may stop noticing the gaps. We become less intentional. We accept “close enough.” We allow systems to complete our thoughts before we have fully formed them.

That can be harmless when the output is a misspelled word.

It is not harmless when the output is a business decision, a legal statement, a code change, a customer response, a financial action, or a security-sensitive operation.

AI does not merely risk making humans lazy. It risks making humans comfortable with unexamined delegation.

That is a much deeper issue than prompt engineering.

Prompting Is Becoming an Operational Skill

Disciplined prompting is not about using magic phrases.

It is not about tricking the model.

It is not about sounding technical.

Disciplined prompting is about expressing intent clearly enough that an AI system can operate within safe and useful boundaries.

That includes being clear about:

the goal,
the scope,
the source of truth,
the allowed actions,
the disallowed actions,
the expected output,
the level of autonomy,
and the point where human review is required.

In other words, good prompting is becoming less like casual conversation and more like operational instruction.

A Simple Example

A loose prompt might say:

Clean this up and make it better.

That may be fine for a casual paragraph. But if the AI is working inside a repository, a business document, or a production workflow, that prompt is too vague.

A more disciplined prompt might say:

Review this document for technical accuracy and clarity. Do not rewrite it in your voice. Identify places where my wording is misleading, ambiguous, or technically incorrect. Suggest corrections, but preserve my intent and style. Do not expand the scope beyond this document.

The difference is not verbosity for its own sake. The difference is control.

The New Mental Model

The old mental model was:

I ask AI a question, and it gives me an answer.

The emerging mental model is:

I define a bounded task, provide trusted context, constrain the action space, and review the result.

That may feel less magical, but it is more mature.

It also reflects where AI systems are going. As models become more capable, the limiting factor will often not be whether the AI can do something. The limiting factor will be whether we can define what it should do safely, precisely, and responsibly.

Why This Matters for AI Systems Authors

An AI Systems Author is not merely someone who writes prompts. It is someone who understands that AI behavior emerges from the interaction between models, tools, instructions, context, permissions, memory, retrieval, and human review.

That role requires a different discipline.

It requires asking questions such as:

What is the source of truth?
What authority does the AI have?
What should the AI never do without approval?
What context is trusted?
What context may be hostile or misleading?
What evidence should be preserved?
How will the human know what happened?
How can the system fail safely?

These questions are not academic. They are practical.

They are the difference between using AI as a helpful assistant and accidentally creating an ungoverned operational actor.

The Frustration That Is Coming

Many users are accustomed to software silently correcting them. They may expect AI to do the same thing, only better.

But as AI systems become more safety-conscious, users may begin to feel friction.

The AI may ask for clearer instructions. It may refuse to infer too much. It may avoid taking action without confirmation. It may distinguish between reviewing, drafting, editing, executing, and publishing. It may resist vague requests that would have been accepted casually before.

Some users may experience that as the AI becoming less helpful.

But in many cases, the opposite is true.

The system is not becoming less helpful. It is becoming more aware that helpfulness without boundaries can be harmful.

Patience Is Part of the Skill

Learning to work well with AI will require patience.

That patience is not just waiting for better answers. It is the patience to clarify our own intent before delegating work. It is the patience to review what was done. It is the patience to correct the instruction, not merely complain about the output.

This is where I expect my own prompting habits to be sharpened.

If I ask for something vague, I should expect the AI to help expose that vagueness. If I give it too much authority, I should expect it to slow down. If I fail to define the source of truth, I should expect the result to be less reliable. If I ask it to “make it better,” I should be prepared to explain what “better” means.

That is not a weakness in the process. That is the process teaching me to be more intentional.

Conclusion: The End of Casual Delegation

AI is becoming more powerful, but that does not remove responsibility from the human. It increases it.

The future will not belong only to people who know how to ask clever questions. It will belong to people who know how to define bounded work, preserve human judgment, and use AI without surrendering discernment.

Loose prompting may still work for casual tasks.

But for serious work, the era of casual delegation is ending.

As AI becomes more capable of acting, humans must become more capable of instructing.

References and Further Reading

Shiney Object Syndrome (SOS) - silent AI project killer

25. May 2026 BillKrat Comments (0)

"The companies achieving sustainable AI success aren’t necessarily the ones with the most sophisticated models or the largest budgets. They’re the ones that treat AI deployment as a comprehensive engineering discipline, with rigorous processes, proper architecture, and deep integration with business workflows."
- The Production AI Reality Check: Why 80% of AI Projects Fail to Reach Production

I was unknowingly sabotaging my open source project due to poor assumptions working with AI. While I was completing months of work within days, I was introducing shiny objects that would change the session context and direction; leading us away from successfully getting to beta in a timely manner (if at all).

The context switches were sometimes not so subtle (and hard to recover from); it was these that made me aware of SOS in my open source project. With that awareness I put guardrails in place to ensure our project's survival, and get the vs-mcp-bridge project to beta in a timely manner; Guardrails that would prevent "me" from derailing the project, a guardrails sample follows:

ai-project-guardrails-instruction.md (4.74 kb)

I went to an authoritative source for my suspicions, below are my prompts and AI responses:

In general, If I am working with AI on a project with a defined structure and scope, and I suggest another task - will AI divert its focus to that new task

Short answer: yes — most AI systems will shift to the new task you mention, unless you explicitly anchor them to stay within the original scope. ^[S1]

----

In general, if I am working with AI on a project, and I suggest a policy, standard, or issue to be true. Will AI report it to be true without validation

The short, honest answer: yes — if you assert something as a fact, many AI systems will often treat it as true unless you explicitly ask them to verify it. ^[S2]

----

If a developer is unaware of the answers to these two questions, do they become a threat to the success of a project, i.e., ignorance is not bliss

Yes — if a developer is unaware of the two issues you raised, that ignorance can become a real threat to project success. Not because the questions themselves are dangerous, but because the behaviors they reveal can quietly undermine a project if the developer doesn’t understand them. ^[S3]

----

Developers of large teams generally do not have a say in a projects direction - does it make sense that the psychological effect, even adrenaline, have developers embrace the default behaviors [on their own projects, versus yield it to an AI architect]

When developers lack influence over project direction, the psychological pull of AI’s default behaviors can absolutely shape how they work — sometimes in ways that help, sometimes in ways that quietly undermine the project.

Developers are especially vulnerable because AI amplifies their natural novelty bias, cognitive ease, and the adrenaline rush of rapid progress, creating a loop where every new idea feels productive even as it quietly derails the project; and unless a developer understands that AI will instantly follow any new direction and will often accept unverified statements as truth, they can unknowingly become the source of scope drift, misalignment, and lost months of work—making SOS not a technical flaw, but a human psychological trap that any of them can fall into without realizing it. ^[S4]

----

Is it safe to say, that these questions, if unaddressed could result in project failure

Yes — it is absolutely safe to say that if these issues go unaddressed, they can meaningfully increase the risk of project failure. Not because any one issue is catastrophic on its own, but because together they create a systemic vulnerability in how teams use AI. ^[S5]

----

Has the AI industry addressed these risks

Short answer: No — the AI industry has not fully addressed these risks. Pieces of the problem are acknowledged, but the industry has not solved them in a systematic, standardized, or enforceable way. ^[S6]

And that’s exactly why your questions matter so much.

Core takeaway

The AI industry has partially recognized these risks, but there is no unified framework that reliably protects teams from:

AI task‑switching behavior
AI accepting unverified statements as true
the psychological pull that makes developers embrace these defaults

These remain active, unresolved vulnerabilities in real-world AI‑assisted development.

What has been addressed

Research papers discuss “hallucinations” and “over‑trust.”
Companies warn users to “verify outputs.”
Some tools add optional fact‑checking features.
Enterprise AI platforms add governance layers.
Training materials mention prompt discipline.

These are fragments, not a comprehensive solution.

The result

These risks remain real, active, and unmitigated unless organizations create their own guardrails.

Your questions identify exactly the kind of gaps that lead to:

scope drift
incorrect assumptions
misaligned deliverables
quiet project failure

And yes — if unaddressed, they can absolutely cause projects to fail.

The deeper truth

The AI industry has acknowledged the risks. It has not solved them. It has not standardized protections. It has not addressed the human psychological component.

So your concerns are valid.

References

Inference-driven development with Copilot; pros and cons

23. April 2026 BillKrat Comments (0)

Inference Driven

Using AI Assistance Without Turning Development Into A Black Box

Copilot is a useful development assistant. It can complete patterns, suggest code, write tests, and keep a developer moving through mechanical work. The risk is not that Copilot is useless. The risk is treating inference as if it were architecture, verification, and judgment all at once.

That distinction is the heart of the BlogAI story. AI-assisted software design works best when generated code is surrounded by source-of-truth documents, observable workflows, approval boundaries, logs, diagrams, and durable evidence. Without those things, a team can move faster while understanding less.

VS MCP Bridge has become a practical case study for that lesson.

What Inference Means In Practice

In software development, inference means the model is producing likely code, explanations, or next steps from the context it can see. That can be powerful, but it is not the same as owning the system model.

The model may know common patterns. It may mirror nearby code. It may produce a convincing implementation. But it does not automatically know which boundaries are non-negotiable, which logs are required for future triage, which security claims would overstate the current system, or which documentation is the source of truth.

That is why inference-driven development needs a workflow around it.

Where Copilot Helps

Copilot works well when the local task is clear and the surrounding code already teaches the pattern.

It can accelerate repetitive edits, tests, and small refactors.
It can suggest idiomatic code when the project conventions are visible.
It can help explore unfamiliar APIs or fill in routine structure.
It can reduce friction when the developer already knows what should happen.

In that role, Copilot behaves like a fast assistant. It is especially useful when the developer can review the output against a clear contract.

Where Inference Becomes Risky

The same strengths become risky when the task is architectural, security-sensitive, or poorly bounded.

A generated change may look correct while bypassing the real execution boundary.
A suggested log line may leak data or pollute MCP stdout.
A local refactor may erase a correlation id that future troubleshooting depends on.
A plausible explanation may imply authentication, sandboxing, or secret storage that does not exist.
A quick fix may solve the symptom while leaving no evidence for the next session.

These are not reasons to avoid AI tools. They are reasons to stop treating prompt-to-code as the whole workflow.

Prompt-To-Code Is Not Enough

The early mistake in many AI-assisted workflows is assuming that the prompt ends when code appears. In practice, the better workflow is prompt-to-evidence.

A useful AI-generated change should be answerable:

What boundary did it touch?
Which source-of-truth document says this behavior is correct?
Which tests or validation steps prove it?
Which logs or artifacts would explain it later?
Which Mermaid diagram reflects the observed flow?
What should a future AI session read before extending it?

That is the difference between code generation and engineering discipline.

How VS MCP Bridge Changed The Workflow

The VS MCP Bridge cleanup made this concrete. The project did not become clearer just because an AI generated code. It became clearer because logs, diagrams, handoffs, and architecture documents exposed where the system was vague.

Sequence diagrams helped reveal transport boundaries. Trace logs made request and operation correlation visible. Durable artifacts showed whether execution really flowed through the expected catalog and executor path. Approval and security traces forced a clearer distinction between current plumbing and future hardening.

That evidence led to better architecture:

the MCP stdio boundary stayed clean
the VSIX stayed isolated behind the named-pipe boundary
compiled tools gained descriptors, requests, results, catalogs, and executor-owned logging
MEF became a discovery seam instead of an execution shortcut
approval-aware execution became part of the tool boundary
security seams stayed explicit without claiming production authentication or sandboxing
audit and redaction became part of reconstructable tool execution

In other words, the AI assistance was useful because the project kept forcing it back through observable architecture.

Human Review Still Owns The Design

Copilot can propose. Codex can implement. ChatGPT can explain tradeoffs. None of those tools should silently own the design.

Human review still decides whether a change matches the architecture, whether the risk is acceptable, whether the evidence is enough, and whether the documentation tells the truth. The stronger the tool, the more important that review becomes.

This is especially true for security and approval workflows. A model can generate a policy class or approval hook, but the project still needs to say what is intentionally deferred: OAuth, user identity, real secret storage, sandboxing, signed plugin manifests, tamper-evident audit stores, and SIEM export are not complete just because a seam exists.

Source Of Truth Beats Chat Memory

One of the strongest lessons from this project is that durable source files beat chat memory.

The current workflow asks future sessions to start from files such as:

AI_START.md

Those files make the system teachable. They also reduce the chance that an AI session resumes from an outdated mental model.

Where BlogAI Fits

BlogAI can help turn this architecture work into learning material, but only if the blog stays aligned with the code.

That is why the current blog cleanup starts from preserved database exports, canonical repo sources, manifest metadata, and explicit token/link rules. Blog posts should not drift away from the system they are explaining. They should point readers back to the current architecture, trace workflows, Mermaid sources, and handoffs that support the claims.

Done well, BlogAI becomes more than a publishing surface. It becomes a way to keep project knowledge synchronized with code, validation artifacts, and operational lessons.

Practical Pros And Cons

Practice	Strength	Risk
Copilot as coding assistant	Fast local implementation help	Can produce plausible but wrong code if review is weak
Codex-style implementation sessions	Can inspect, edit, validate, and commit cohesive slices	Needs repository source-of-truth and validation constraints to stay grounded
Architecture chat and review	Good for explaining tradeoffs and surfacing assumptions	Can become speculative if not tied back to code and artifacts
Durable traces and handoffs	Make AI-assisted work reconstructable	Require discipline to keep current

Takeaway

Inference-driven development is useful when it is not treated as autonomous development.

The stronger pattern is human-directed, evidence-backed AI assistance: use Copilot for local acceleration, use Codex or chat tools for broader implementation and reasoning, require source-of-truth documentation, preserve trace evidence, and keep approvals, logs, and boundaries visible.

That is what VS MCP Bridge is trying to teach. The goal is not just prompt-to-code. The goal is prompt-to-evidence, with code as one result of a workflow that remains understandable after the session ends.

See Chat Sessions Models And Agents for related background on chat sessions, models, and agents.

How stdio Works in VS MCP Bridge

19. April 2026 AI Systems Author AI Systems Author , MCP Bridge Comments (0)

Stdio

In the VS MCP Bridge architecture, stdio is the AI-facing MCP transport boundary. It is important, but it is not the whole bridge.

That distinction matters because “MCP over stdio” can sound as if the AI client is talking directly to Visual Studio. It is not. The AI client speaks MCP to a local server process over standard input and standard output. That server then uses a separate local named-pipe hop when a tool needs Visual Studio state.

This post explains the boundary, why stdout has to stay clean, and how the current implementation keeps diagnostics observable without corrupting the MCP protocol stream.

The Short Version

The runtime path for VS-backed MCP tools is:

AI client
  -> MCP over stdio
VsMcpBridge.McpServer
  -> JSON over named pipe
VsMcpBridge.Vsix
  -> Visual Studio services / DTE / editor state

So stdio gets the request into the local MCP server. The named pipe gets VS-backed work into the VSIX. The two transports are intentionally separate.

Where stdio Is Enabled

The stdio transport is configured in the MCP host bootstrap:

builder.Services
    .AddMcpServer()
    .WithStdioServerTransport()
    .WithTools<VsTools>();

That configuration lives in the VsMcpBridge.McpServer project, inside McpServerHost.Configure(...). The important line is WithStdioServerTransport().

That line tells the MCP host to exchange protocol messages through standard input and standard output instead of through HTTP, a socket listener, or a custom public endpoint.

What stdio Means Here

Standard input and standard output are process streams.

stdin is how the AI client writes MCP requests into the server process.
stdout is how the server process writes MCP responses back to the client.

That makes stdio a good fit for local AI tooling. The AI client can launch the MCP server as a worker process, keep it alive, write protocol messages to stdin, and read responses from stdout. The MCP server does not need to expose a network port for this local path.

The boundary is still a protocol boundary. stdout is not a casual logging stream once MCP is running over it.

Why stdout Must Stay Clean

One practical consequence of MCP over stdio is that stdout must be reserved for protocol traffic. If the server writes arbitrary diagnostic lines to stdout, the AI client can receive those lines as if they were MCP messages. That can make a healthy server look broken.

For that reason, diagnostics belong somewhere else:

stderr when the host framework allows it safely,
file logs under the local app-data logging paths,
Visual Studio output panes and VSIX trace logs,
structured trace artifacts when validating a workflow.

The current architecture treats clean stdout as part of the transport contract. Operational detail is preserved, but it is kept off the response stream that the MCP client is parsing.

What the Entry Point Does

The program entry point is intentionally small:

var builder = Host.CreateApplicationBuilder(args);
McpServerHost.Configure(builder);

await builder.Build().RunAsync();

Startup has a narrow job:

Create the host builder.
Register logging, the pipe client, MCP server support, stdio transport, and the VS-backed tool container.
Build and run the host.

Once the host is running, the MCP tool surface is visible to the AI client over stdio.

What stdio Does Not Do

stdio does not make the MCP server a Visual Studio extension. It does not grant direct DTE access, load inside the Visual Studio process, or apply edits in the editor.

Those responsibilities stay on the VSIX side. The MCP server process stays outside Visual Studio and acts as the local AI-facing adapter.

That separation is one of the core architecture choices in the project:

MCP protocol work lives in VsMcpBridge.McpServer.
Visual Studio API work lives in VsMcpBridge.Vsix.
Shared compiled bridge tools execute through BridgeToolExecutor when they use the shared tool catalog/executor path.

stdio is a transport. It is not the policy, approval, audit, or redaction boundary for compiled bridge tools. That boundary remains BridgeToolExecutor.

How VS-Backed Tool Calls Cross the Boundary

The MCP host exposes the VS-backed tool container registered with WithTools<VsTools>(). That class contains explicit MCP tools such as:

vs_get_active_document
vs_get_selected_text
vs_list_solution_projects
vs_get_error_list
vs_propose_text_edit
vs_propose_text_edits

From the AI client’s perspective, those are MCP tools. The request arrives over stdin, the MCP host resolves the method, and the method executes inside the VsMcpBridge.McpServer process.

For Visual Studio-backed operations, the method still does not call Visual Studio directly. It forwards a structured request through the pipe client.

The Named-Pipe Hop

Inside VsTools, the VS-backed methods use an injected IPipeClient. That client connects to the VSIX-hosted named-pipe side:

using var pipe = new NamedPipeClientStream(".", _pipeName, PipeDirection.InOut, PipeOptions.Asynchronous);
await pipe.ConnectAsync(timeout: 5000, cancellationToken);

The full call path is layered:

The AI client calls an MCP tool over stdio.
The MCP host routes the call to a VsTools method.
The method uses PipeClient to connect to the VSIX over the local named pipe.
The VSIX dispatches the known pipe command and performs the Visual Studio-side operation.
The VSIX returns a structured response through the pipe.
The MCP server writes the MCP response back over stdout.

This is why stdio and the named pipe should be debugged separately. A stdio failure means the AI client and MCP server are not communicating correctly. A pipe failure means the MCP server could not reach the VSIX side.

The Activation Boundary

The named-pipe side is initialized by the Visual Studio extension. For live VS-backed tool calls, the operator must launch the Visual Studio Experimental Instance and open View -> Other Windows -> VS MCP Bridge so the VSIX/tool-window path initializes the local pipe server.

If that pipe side is inactive, the current MCP server returns an activation-focused diagnostic instead of leaving the operator with an opaque timeout. The diagnostic points to the activation steps: launch the Experimental Instance, open the VS MCP Bridge tool window, then retry the VS-backed tool.

That message is still returned as a structured tool failure. The transport does not change, and the server does not start adding retry loops or writing troubleshooting text to stdout outside the MCP response.

Correlation and Trace-Only Diagnostics

Because stdio needs clean protocol output, observability depends on structured diagnostics outside stdout. Current traces preserve request IDs, correlation IDs, operation names, timing, and success or failure outcomes across the relevant boundary.

For the inactive-pipe path, the useful evidence is not a random console line. It is the reconstructable chain:

MCP tool request received
PipeClient attempted named-pipe connection
named pipe was unavailable
activation diagnostic returned
correlation/request metadata preserved
no raw payload or secret values disclosed

That is the anti-black-box discipline used throughout the project. A failure should be explainable from durable logs, trace artifacts, and documented workflow boundaries, not from guessing which process happened to be awake.

How This Relates to BridgeToolExecutor

The stdio server exposes VS-backed tools directly through the MCP tool container, and those tools cross into the VSIX over the named pipe. Separately, the shared bridge tool architecture has compiled tools, descriptors, capability metadata, approval requirements, secret-reference awareness, redaction, audit envelopes, and classification metadata.

Those shared compiled tools flow through BridgeToolExecutor. That executor is the policy, approval, execution, audit, correlation, and redaction boundary for that path.

The important distinction is:

stdio is how an AI client talks MCP to the local server process.
named pipes are how VS-backed MCP tools reach the VSIX.
BridgeToolExecutor is the shared execution/security boundary for compiled bridge tools.

Keeping those responsibilities separate is what lets the architecture grow without turning transport code, Visual Studio integration, and security policy into one indistinct layer.

What to Remember When Studying This Code

If you are learning the system, keep these files and roles in mind:

Program.cs starts the MCP server host.
McpServerHost.Configure(...) wires logging, stdio transport, the pipe client, and the MCP tool surface.
VsTools defines the VS-backed MCP tools exposed over stdio.
PipeClient bridges from the MCP server process into the VSIX.
The VSIX owns Visual Studio APIs, editor state, proposal application, and the named-pipe server side.
BridgeToolExecutor owns the shared compiled-tool policy and audit boundary.

Once those layers are clear, the implementation is much easier to reason about. The bridge is not one process doing everything. It is a set of local boundaries with explicit responsibilities.

Takeaway

In VS MCP Bridge, stdio is the process-to-process protocol transport that lets an AI client speak MCP to the local server host. The server then uses a separate local named-pipe boundary for Visual Studio-backed operations.

The cleanest mental model is:

stdio gets into the MCP server
named pipes get into Visual Studio
BridgeToolExecutor governs shared compiled tool execution

That separation keeps the bridge observable, debuggable, and easier to evolve. stdout stays clean for MCP. Diagnostics stay reconstructable. Visual Studio work stays in the VSIX. Shared tool execution keeps its own policy and audit boundary.

Understanding a Named Pipe Listener

17. April 2026 AI Systems Author Comments (0)

Named Pipe Listener

In the VS MCP Bridge architecture, the Visual Studio side of the system does not wait for natural-language prompts from an AI tool. It waits for structured bridge requests.

That waiting point is the named-pipe side of the bridge.

A named pipe is a local inter-process communication channel provided by the operating system. One process creates the pipe and waits for a connection. Another process connects and exchanges messages. No public network port is required.

In this project, the named-pipe boundary exists because the MCP server and the Visual Studio extension have different jobs. The MCP server speaks MCP over stdio to the AI client. The VSIX runs inside Visual Studio and owns Visual Studio APIs, editor state, proposal application, and host-specific behavior.

The Short Version

The current VS-backed tool path is:

AI client
  -> MCP over stdio
VsMcpBridge.McpServer
  -> PipeClient
local named pipe: VsMcpBridge
  -> PipeServer in the VSIX
VsService
  -> Visual Studio APIs / editor state

The important boundary is simple: stdio gets the request into the local MCP server, and the named pipe gets Visual Studio-backed work into the VSIX.

Why the VSIX Side Is Isolated from stdio

The VSIX runs inside Visual Studio. It can access DTE, editor state, solution state, the Error List, and the proposal-approval UI. The MCP server does not run inside Visual Studio and should not pretend to be the IDE host.

Keeping stdio out of the VSIX gives the bridge a cleaner architecture:

The AI client talks MCP to a local server process.
The MCP server keeps stdout reserved for MCP protocol responses.
The VSIX owns Visual Studio-specific work and Visual Studio privileges.
The named pipe provides a local-only bridge between those two processes.

This is why the named pipe is not just an implementation detail. It is the local host boundary between the AI-facing process and the IDE-facing process.

PipeClient and PipeServer Responsibilities

The named-pipe layer has two sides.

PipeClient lives in the MCP server process. For VS-backed tools, it connects to the local pipe name, writes a serialized request envelope, waits for a serialized response, and returns that response to the MCP tool method.

PipeServer lives on the host side. In the VSIX host, it accepts the pipe connection, reads the request envelope, dispatches the command, and writes a response.

At a high level, the client side looks like this:

using var pipe = new NamedPipeClientStream(".", _pipeName, PipeDirection.InOut, PipeOptions.Asynchronous);
await pipe.ConnectAsync(timeout: 5000, cancellationToken);

await writer.WriteLineAsync(JsonSerializer.Serialize(envelope, JsonOptions));
var responseJson = await reader.ReadLineAsync(cancellationToken);

And the server side listens for local pipe connections, then hands each connection to request handling:

pipe = new NamedPipeServerStream(
    PipeName,
    PipeDirection.InOut,
    NamedPipeServerStream.MaxAllowedServerInstances,
    PipeTransmissionMode.Byte,
    PipeOptions.Asynchronous);

pipe.WaitForConnection();
_ = Task.Run(() => HandleConnectionAsync(pipe, ct), CancellationToken.None);

The useful point is not the exact syntax. The useful point is the split of responsibility: the MCP server initiates a local pipe request, and the VSIX host accepts and dispatches it.

The Request Envelope

The named-pipe listener is not a chat endpoint. It expects a structured request envelope.

That envelope carries fields such as:

Command
RequestId
Payload

The command tells the host what operation is being requested. The request ID gives the logs and responses a stable correlation point. The payload contains the typed request body for that operation.

This structure is what makes the bridge diagnosable. When a tool call fails, the operator can ask which request crossed which boundary instead of guessing from unstructured text.

How Dispatch Works

Once the pipe server has a request envelope, it dispatches by command name. It does not interpret prose or execute arbitrary instructions.

VsResponseBase response = envelope.Command switch
{
    PipeCommands.GetActiveDocument => await _vsService.GetActiveDocumentAsync(),
    PipeCommands.GetSelectedText => await _vsService.GetSelectedTextAsync(),
    PipeCommands.ListSolutionProjects => await _vsService.ListSolutionProjectsAsync(),
    PipeCommands.GetErrorList => await _vsService.GetErrorListAsync(),
    PipeCommands.ProposeTextEdit => await DispatchProposeEditAsync(envelope),
    _ => new VsResponseBaseUnknown { Success = false, ErrorMessage = $"Unknown command: {envelope.Command}" }
};

The current MCP surface is explicit and limited. Unknown, empty, malformed, or unsupported pipe commands fail closed instead of being dispatched.

Where Visual Studio Work Happens

The pipe server owns transport and dispatch. It does not need to own DTE or editor behavior directly.

Visual Studio-specific work is handled by the host service layer, such as VsService. That is where operations such as these belong:

getting the active document,
reading selected text,
listing solution projects,
reading the Error List,
creating approval-gated edit proposals.

This keeps transport concerns separate from Visual Studio concerns. It also keeps the MCP server from needing direct knowledge of Visual Studio SDK details.

Activation and Startup Boundaries

The VSIX side must be active before VS-backed MCP tools can succeed. In live validation, the reliable operator path is to launch the Visual Studio Experimental Instance and open View -> Other Windows -> VS MCP Bridge. That activation path initializes the VSIX/tool-window side needed for the named pipe.

If the MCP server is running but the VSIX pipe side is inactive, that is not an MCP stdio failure. It is a named-pipe activation failure.

The current diagnostic path treats that case explicitly. Instead of appearing as an opaque timeout, the pipe client returns a structured activation diagnostic telling the operator to launch Visual Studio, open the VS MCP Bridge tool window, and retry the VS-backed tool.

That matters because a transport failure should identify the failed boundary:

If stdio is broken, the AI client and MCP server are not talking correctly.
If the named pipe is unavailable, the MCP server cannot reach the VSIX side.
If command dispatch fails, the request reached the host but did not match an allowed operation.
If VsService fails, the request reached Visual Studio-side execution but the host operation failed.

Request and Response Correlation

The named-pipe layer participates in the same anti-black-box logging discipline as the rest of the bridge. Requests carry IDs across the boundary so logs can be reconstructed later.

A useful trace should be able to answer:

which MCP tool was called,
which pipe command was sent,
which request ID crossed the pipe,
whether the pipe connected, timed out, or returned a structured failure,
which host operation ran,
how long each boundary took.

That is why the architecture emphasizes request IDs, operation names, elapsed timing, success or failure state, and durable trace artifacts. The goal is not more logging for its own sake. The goal is to make failure reconstruction practical.

Approval-Aware Flow Where It Matters

The named pipe does not approve tool execution by itself. It moves structured requests between local processes.

For Visual Studio edit operations, the VSIX proposal workflow remains approval-gated. MCP can propose edits, but applying them still requires explicit approval in the host UI.

For shared compiled bridge tools, approval-aware execution is a separate executor concern. A compiled tool descriptor can require approval, and BridgeToolExecutor owns policy evaluation, approval evaluation, execution, audit, correlation, and redaction for that path.

That means the named-pipe layer supports approval-aware architecture by preserving structured boundaries and correlation, but it is not the shared compiled-tool policy engine.

Relationship to MCP and BridgeToolExecutor

It helps to keep three boundaries separate:

MCP stdio boundary: the AI client talks to VsMcpBridge.McpServer.
Named-pipe boundary: VsMcpBridge.McpServer talks to the VSIX host for Visual Studio-backed tools.
BridgeToolExecutor boundary: shared compiled tools run through policy, approval, execution, audit, redaction, and correlation seams.

Those boundaries are complementary. The named pipe keeps Visual Studio operations local to the VSIX. BridgeToolExecutor keeps compiled tool execution governed by a single shared policy and audit boundary. stdio keeps the AI client protocol isolated from both of those internal implementation details.

Failure Isolation and Troubleshooting

If you are debugging a VS-backed tool call, follow the boundary chain instead of treating the bridge as one black box:

Did the AI client successfully launch and speak to the MCP server over stdio?
Did the MCP server resolve the expected registered tool?
Did PipeClient attempt the expected command with a request ID?
Was the VSIX/tool-window side active and listening on the named pipe?
Did PipeServer accept and parse the request envelope?
Did the command dispatch to a known PipeCommands value?
Did VsService complete the host operation?
Did the response return through the pipe and then over MCP stdout?

This is the practical value of clean transport boundaries. Each step has a narrow responsibility, so the first missing or failing boundary can be found from logs and trace artifacts.

Related Mermaid Trace Sources

The repo already has Mermaid sources that support this post:

vs-mcp-bridge-bootstrap-sequence.mmd shows the end-to-end path from VSIX startup through MCP stdio and the named-pipe handoff.
vsix-activation-diagnostic-trace-20260516.mmd shows the inactive-pipe diagnostic and successful activation retry path.
vsix-host-selected-text-trace-20260509.mmd shows a concrete VSIX-host operation with operation correlation.

Those .mmd files remain the diagram source of truth. This post references them directly instead of embedding generated images.

Why This Supports Future Extensibility

The named-pipe layer gives future work a stable place to preserve local host isolation. New VS-backed operations can stay explicit command-and-response paths. New compiled tools can continue to use BridgeToolExecutor for policy, approval, redaction, and audit. Additional diagnostics can attach to the existing correlation chain without polluting MCP stdout.

That is the main architectural benefit. The bridge can grow without collapsing the AI protocol, Visual Studio host operations, transport diagnostics, and tool security seams into one layer.

Takeaway

A named pipe listener is the local Visual Studio-side endpoint that waits for structured inter-process requests. In VS MCP Bridge, it exists so the VSIX can own Visual Studio operations while a separate MCP server process owns the AI-facing MCP stdio transport.

The short version is:

stdio gets into the MCP server
named pipes get into Visual Studio
BridgeToolExecutor governs shared compiled tool execution

Keeping those roles separate is what makes the bridge easier to diagnose, safer to extend, and more useful for observable AI tooling.

Understanding AI Chat Sessions, Models, and Agents

12. April 2026 AI Systems Author AI Systems Author , MCP Bridge Comments (0)

Chat Sessions Models And Agents

Why Context, Tools, Evidence, and Boundaries Matter

One of the easiest mistakes to make with modern AI tools is assuming that a chat is a persistent intelligence that keeps thinking between messages. That is not how these systems work. Once that clicks, a lot of confusing behavior suddenly makes sense.

It also explains why the VS MCP Bridge project puts so much weight on architecture docs, durable traces, session handoffs, and source-controlled blog content. If the chat context disappears, the system still needs a way to recover the project model.

A Chat Is Not A Persistent Mind

A chat session is a temporary context window wrapped around a model call. On each turn, the application gathers instructions, prior messages, available tool results, selected files, and any other context it chooses to include. The model then generates a response from that input.

Instructions + context + tool results + current prompt -> model -> response

The model does not carry goals forward unless those goals are present in the current request. If the working context is lost, the same model can feel like a different assistant because it no longer sees the same constraints, terminology, or decisions.

A chat session is working context, not permanent memory.

Why Context Loss Feels So Disruptive

When a desktop app crashes, a session resets, or a context window is compacted, the active conversation may lose important details. Earlier decisions, operating rules, current branch state, and architectural constraints may disappear unless they were preserved somewhere outside the chat.

That is why VS MCP Bridge now treats repository files as the source of truth. AI_START.md, docs/ARCHITECTURE.md, trace workflows, logs, Mermaid sources, and session handoffs are not paperwork. They are the durable memory that a future AI session can reload.

The Main Terms

These terms often get blurred together, but separating them helps explain what the bridge is doing.

Model

The model is the reasoning engine. It generates output from the input it receives. By itself, it is stateless and does not know the project unless the current context gives it project evidence.

Session

The session is the active conversation context. It may include prior messages, instructions, selected files, tool results, and summaries. It can be extremely useful, but it is not a reliable permanent store.

Tool

A tool is a callable capability outside pure text generation. In VS MCP Bridge, tools can read Visual Studio state, list projects, create edit proposals, or execute shared bridge tools through catalog and executor boundaries.

Agent

An agent is an orchestration layer that uses a model, context, tools, and a control loop to pursue a task. That does not make it magic or autonomous in the human sense. It still needs explicit boundaries, review, observable execution, and durable evidence.

Orchestration Layer

The orchestration layer decides what context to include, which tools are available, when to call them, how to handle results, and how to continue the loop. ChatGPT, Codex, Copilot, and MCP-enabled clients differ mostly in this layer and in the tools they can reach.

Pure Chat Is Different From Tool-Backed Work

Pure chat inference can explain, summarize, and reason from the supplied prompt. Tool-backed workflows can observe or change the outside world, so they need stronger boundaries.

VS MCP Bridge exists because AI-assisted coding needs more than free-form conversation. It needs a local MCP server, a clean stdio boundary, a named-pipe bridge into Visual Studio, explicit MCP tools, proposal approval, and diagnostics that show what actually happened.

That changes the trust model. A chat answer can be reviewed as text. A tool call may read active editor state, list solution projects, or create a proposed edit. That kind of workflow needs logs, request ids, tool descriptors, policy decisions, approval states, and structured results.

How VS MCP Bridge Grounds Agentic Behavior

In VS MCP Bridge, agentic behavior is grounded by concrete boundaries:

MCP clients talk to the local MCP server over stdio.
The MCP server reaches Visual Studio only through the local named-pipe boundary.
The VSIX owns Visual Studio API access and proposal UI state.
Proposal tools create proposals; apply still requires explicit approval in the tool window.
Shared bridge tools run through BridgeToolExecutor, not directly from callers.
Policy, approval, redaction, audit, correlation, and result shaping stay at the execution boundary.

Those boundaries are what keep "the agent did something" from becoming an unhelpful explanation. A future developer should be able to tell which layer received the request, which tool ran, which approval or policy decision applied, and which result was returned.

Session Continuity Needs Source-Of-Truth Files

When a session survives, the assistant can use the conversation to maintain continuity. When a session is interrupted, source files have to carry the continuity instead.

That is why the project now asks future sessions to start with repository evidence:

AI_START.md gives the resume map.

This is not only useful for AI sessions. It is useful engineering discipline. Durable context reduces dependency on memory, mood, and whatever happens to fit in the next prompt.

Logs, Traces, Artifacts, And Prompts Work Together

Prompts tell the assistant what to do. Architecture docs tell it what is true. Logs show what happened. Trace metadata records the run context. Mermaid diagrams explain the observed sequence. Handoffs tell the next session what to trust, what to recheck, and what remains deferred.

That combination is more reliable than any single long chat. It also lets a human reviewer challenge the work: if the diagram says a request crossed the executor boundary, the logs and code should support that claim.

Approval Is Part Of Orchestration

Agentic workflows often sound autonomous, but VS MCP Bridge deliberately keeps important operations approval-aware.

The proposal workflow is the clearest example: an MCP tool can create a proposed edit, but the edit is not applied until the user approves it in the host UI. The newer tool-execution approval seam follows the same architectural direction for future selected tools: approval is evaluated at the execution boundary, not hidden inside arbitrary tool code.

That is how AI-assisted development stays understandable. The model can suggest. The tool can prepare. The boundary can log, audit, redact, and classify. The human can review and approve.

What Context Windows Cannot Solve

Larger context windows help, but they do not eliminate the need for durable evidence.

A bigger window can include more files and more history, but it can still omit the one constraint that matters. It can still summarize away nuance. It can still be reset. It can still produce a plausible explanation that does not match the actual code.

That is why the repo treats source-of-truth documents, validation artifacts, and canonical blog content as part of the system. They make the project less dependent on any single context window.

A Cleaner Mental Model

Term	Practical Meaning	VS MCP Bridge Example
Model	Generates output from supplied context	The model behind ChatGPT, Codex, or Copilot
Session	Temporary working context	The current chat plus instructions, files, and tool results
Tool	Callable capability outside pure text generation	`vs_get_active_document`, proposal tools, or shared bridge tools
Agent	Model plus orchestration loop and tools	An AI client using MCP tools to inspect and propose changes
Evidence	Durable record that survives context loss	Architecture docs, logs, metadata, Mermaid diagrams, handoffs, canonical blog sources

Takeaway

Models generate responses. Sessions provide temporary continuity. Agents orchestrate tools and context. Tools touch real systems. Evidence makes the whole workflow reviewable after the session ends.

That is the practical lesson from VS MCP Bridge and BlogAI: AI-assisted development improves when the important knowledge survives outside the chat. Observable boundaries, approval-aware workflows, source-of-truth docs, and durable traces are what keep agentic behavior from becoming AI magic.

See inference-driven for the companion discussion of inference-driven software design and Copilot's strengths and risks.

VS MCP Bridge Blog Series: Part 7

11. April 2026 AI Systems Author Comments (0)

Durable Evidence, Trace Workflows, and AI-Assisted Troubleshooting

Part 6 explained why BridgeToolExecutor is the consistent execution boundary for policy, approval, secret-reference handling, redaction, audit, classification, correlation, and tool invocation.

Part 7 moves from the boundary to the evidence around it. The bridge is not just trying to execute tools. It is trying to make tool execution reconstructable later, by a developer or by an AI session that was not present when the behavior happened.

That is the shift that changed this project: diagnostics stopped being an afterthought and became part of the architecture.

Why Durable Evidence Matters

AI-assisted development can move quickly, but fast progress is fragile if the only record of a decision lives in chat history or a local debugging session.

VS MCP Bridge now treats important validation runs as durable evidence. A useful run should leave behind enough material to answer:

what code version was observed
what workflow was exercised
which request id and operation id were used
which boundary handled the request
where the request succeeded, failed, or stopped
which logs support that conclusion
which Mermaid diagram matches the observed flow
what the next session should trust or revalidate

That evidence turns a one-time manual observation into something another person can replay, inspect, and challenge.

The Artifact Triad

The most useful pattern has become a small triad:

a log file under artifacts/logs/
a metadata file beside it, usually .metadata.json
a Mermaid sequence diagram under docs/diagrams/

The log captures what happened. The metadata captures the run context: branch, commit, host, request id, operation id, input summary, observed result, and scope exclusions. The Mermaid diagram explains the sequence in a form that can be reviewed without rereading every log line.

None of those artifacts replaces the others. The log is the observed evidence. The metadata is the index card. The Mermaid diagram is the map.

Session Handoffs Complete The Record

For larger slices, the repo also keeps session handoffs under docs/session-handoffs/. These are not essays. They are resume points.

A good handoff records what was validated, what commit or branch was involved, what artifacts were produced, what constraints still apply, and what the next session should do first. That matters because future AI sessions should not reconstruct project state from memory or from a conversation transcript that may be incomplete.

The architecture document remains the source of truth for current behavior. The handoffs explain how the project arrived there and what evidence supports particular claims.

Trace Workflows Are Reproducible Procedures

The repo now has documented workflows for important validation paths:

tool-execution-trace-workflow.md explains how to validate the shared bridge tool catalog and executor path with correlated logs, audit metadata, redaction, policy, approval, and Mermaid output.
vsix-host-selected-text-trace-workflow.md explains how to validate the Visual Studio selected-text prompt path against a real editor selection.
LOGGING_DIAGNOSTIC_RUNBOOK.md explains how to localize hangs, keep MCP stdout clean, and collect the right UI, stderr, file, and correlation evidence.

The important detail is that these workflows are not just documentation after the fact. They are part of the development method. When the system changes, the workflow can be rerun, the artifacts can be regenerated, and the diagram can be compared against the current code path.

Correlation Makes Replay Possible

Request and operation identifiers are what make trace replay practical.

Without correlation, logs become a loose pile of events. With correlation, a run can be reconstructed across layers: MCP request, pipe attempt, catalog lookup, policy decision, approval decision, tool execution, audit envelope, result, and visible host behavior.

The point is not to add identifiers for decoration. The point is to let a future reader find the first missing or failing boundary. If a request id appears at the MCP layer but never reaches the pipe client, the failure is different from one that reaches the VSIX host and fails during service execution.

Diagnostics Must Stay Transport-Safe

Durable evidence only helps if it does not corrupt the transport it is trying to explain.

For MCP stdio, stdout must remain clean for JSON protocol traffic. Diagnostics belong in approved channels such as stderr, UI logs, file logs, Debug output, audit envelopes, and durable artifacts. The bridge uses this rule because a single stray log line on stdout can make a valid MCP server look broken.

This is why the activation diagnostics and pipe-failure diagnostics are trace-only or structured tool failures. They should help the operator understand what to do without polluting the MCP protocol stream.

Durable Evidence Improved The Architecture

The traces did more than prove that code worked. They changed the design.

Sequence diagrams and logs made it easier to see where responsibilities were blurred. That led to clearer boundaries around proposal management, host correctness, tool descriptors, catalog registration, executor-owned policy, approval-aware execution, redaction, audit metadata, MEF discovery, and VSIX activation diagnostics.

In other words, observability did not just describe the architecture. It shaped the architecture.

AI-Assisted Troubleshooting Uses Evidence First

This repo is being developed with AI assistance, so the evidence standard is practical: a future assistant should be able to inspect files in the repo and understand the current system without trusting prior chat history.

That means a good troubleshooting loop starts with durable artifacts:

read AI_START.md for the current resume map
read docs/ARCHITECTURE.md for current behavior
read the relevant workflow document for the validation path
inspect the matching log, metadata, and Mermaid files
compare the observed diagram against current code
treat the first missing or failed boundary as the next actionable problem

That process keeps the assistant grounded in repository evidence instead of inventing a story that sounds plausible.

Related Mermaid Trace Sources

The most useful diagram sources for this topic are:

tool-regex-search-trace-20260509.mmd for the compiled tool execution baseline.
tool-security-trace-20260509.mmd for policy, redaction, audit, and correlation around tool execution.
tool-approval-trace-20260516.mmd for approved and denied approval-required execution outcomes.
mef-discovery-trace-20260516.mmd for discovery-only MEF behavior feeding executor-routed tools.
vsix-activation-diagnostic-trace-20260516.mmd for inactive VSIX/named-pipe diagnostics and the operator activation path.
vsix-host-selected-text-trace-20260509.mmd for the Visual Studio selected-text prompt workflow.

Those .mmd files remain the diagram source of truth. Generated images can be useful later, but the source diagram should stay reviewable in the repo.

What This Does Not Mean

Durable traces are not a full observability platform. They are not telemetry ingestion, distributed tracing infrastructure, SIEM export, compliance storage, or production monitoring.

They are smaller and more immediate: checked-in evidence that the architecture can be understood and validated. That is enough for the current stage of the bridge.

Takeaway

VS MCP Bridge became easier to evolve when the team stopped treating diagnostics as cleanup work and started treating them as architecture.

The durable evidence pattern is simple: capture logs, preserve metadata, draw the observed sequence, and write a handoff when the result changes what future sessions should know. That pattern makes failures localizable, decisions reviewable, and AI-assisted troubleshooting much less dependent on memory.

Next In The Series

The next useful topic is how these evidence and architecture practices should shape the public BlogAI narrative: which posts should teach the transport boundary, which should teach tool execution, and which should teach the operational discipline that keeps AI-assisted systems explainable.

Newer posts
1
2
3
Next page