The End of Loose Prompting: Why AI Now Requires More Discipline from Us

AI is no longer just correcting our spelling, finishing our sentences, or helping us phrase an idea. Increasingly, AI systems are becoming operational actors. That changes everything.

The Shift I Am Beginning to Notice

For years, many of us have been trained by software to be imprecise.

Autocorrect fixes our spelling. Search engines guess what we meant. Recommendation engines infer our preferences. IDEs complete our code. Navigation systems route us without requiring us to understand the roads.

That convenience has benefits, but it also has a cost: it conditions us to become comfortable with vague intent.

With traditional software, vague intent was often tolerable. If autocorrect picked the wrong word, we could fix it. If search returned the wrong page, we could search again. If autocomplete made a bad suggestion, we could delete it.

But AI is moving beyond suggestion.

Modern AI systems can now invoke tools, read documents, edit repositories, call APIs, operate through connectors, send messages, schedule events, generate code, and interact with systems through protocols such as MCP-style tool interfaces.

That means the relationship has changed.

We are no longer merely asking software to help us express intent. We are increasingly asking software to act on intent.

Once AI can act, loose prompting becomes more than a communication issue. It becomes an operational risk.

Why This Feels Different

Older AI prompting often felt like trying to get better prose from a clever assistant. The goal was usually to get a better answer, a better summary, a better email, or a better explanation.

That is still useful. But it is no longer the whole picture.

As AI systems become connected to tools and workflows, prompting starts to carry more weight. A prompt is no longer just a request. In many cases, it becomes a temporary policy boundary.

It may define:

  • what the AI is allowed to touch,
  • what it should avoid,
  • what source of truth it should trust,
  • whether it may act or only advise,
  • how much autonomy it has,
  • what should be logged,
  • what requires confirmation,
  • and what outcome counts as complete.

That is a very different world from “write me a paragraph about this topic.”

The MCP Security Lesson

The recent attention around MCP security did not create this problem by itself. It exposed a problem that was already forming.

MCP-style systems make tool use visible and standardized. That is valuable. But once a model can interact with tools, files, services, and credentials, the question is no longer simply, “Can the model answer correctly?”

The question becomes:

Can the system act safely when exposed to ambiguous instructions, hostile context, excessive permissions, or hidden prompt manipulation?

This is why the security conversation has expanded beyond ordinary bugs. Prompt injection, excessive agency, insecure tool use, sensitive information disclosure, and confused authorization boundaries are now architectural concerns, not just prompting annoyances.

In plain English: if an AI can use tools, then someone must define what those tools are allowed to do, under what authority, with what evidence, and with what audit trail.

The Real Issue: Assistant Versus Actor

A helpful way to think about this is the difference between an assistant and an actor.

An assistant helps you think, write, review, explain, summarize, or plan.

An actor changes things.

It edits files. It opens tickets. It runs commands. It sends emails. It schedules meetings. It queries private systems. It modifies infrastructure. It may even chain multiple actions together.

When AI behaves as an assistant, vague prompting is often merely inefficient.

When AI behaves as an actor, vague prompting can become dangerous.

The more authority we give an AI system, the more disciplined our instructions must become.

Why Non-Developers Need to Understand This

This is not only a developer problem.

Developers may see it first because they work close to tools, repositories, terminals, APIs, permissions, and logs. But the same shift is coming to everyone.

AI systems are being connected to email, calendars, documents, customer records, spreadsheets, business processes, personal assistants, financial systems, learning tools, research workflows, and office automation.

That means ordinary users will increasingly face systems that do not merely suggest what to do. They may do it.

If users remain conditioned by “loosey-goosey autocorrect” habits, frustration is inevitable. People may say something vague, the AI may interpret it differently, and the result may not match what the person intended.

Worse, the user may accept the result because they have been trained by years of convenience software to trust the machine’s correction over their own unfinished thought.

The Human Risk: Convenience Can Weaken Judgment

This is the part that concerns me most.

Human beings adapt to convenience. That is not an insult; it is a reality of human behavior.

When software repeatedly fills in gaps for us, we may stop noticing the gaps. We become less intentional. We accept “close enough.” We allow systems to complete our thoughts before we have fully formed them.

That can be harmless when the output is a misspelled word.

It is not harmless when the output is a business decision, a legal statement, a code change, a customer response, a financial action, or a security-sensitive operation.

AI does not merely risk making humans lazy. It risks making humans comfortable with unexamined delegation.

That is a much deeper issue than prompt engineering.

Prompting Is Becoming an Operational Skill

Disciplined prompting is not about using magic phrases.

It is not about tricking the model.

It is not about sounding technical.

Disciplined prompting is about expressing intent clearly enough that an AI system can operate within safe and useful boundaries.

That includes being clear about:

  • the goal,
  • the scope,
  • the source of truth,
  • the allowed actions,
  • the disallowed actions,
  • the expected output,
  • the level of autonomy,
  • and the point where human review is required.

In other words, good prompting is becoming less like casual conversation and more like operational instruction.

A Simple Example

A loose prompt might say:

Clean this up and make it better.

That may be fine for a casual paragraph. But if the AI is working inside a repository, a business document, or a production workflow, that prompt is too vague.

A more disciplined prompt might say:

Review this document for technical accuracy and clarity. Do not rewrite it in your voice. Identify places where my wording is misleading, ambiguous, or technically incorrect. Suggest corrections, but preserve my intent and style. Do not expand the scope beyond this document.

The difference is not verbosity for its own sake. The difference is control.

The New Mental Model

The old mental model was:

I ask AI a question, and it gives me an answer.

The emerging mental model is:

I define a bounded task, provide trusted context, constrain the action space, and review the result.

That may feel less magical, but it is more mature.

It also reflects where AI systems are going. As models become more capable, the limiting factor will often not be whether the AI can do something. The limiting factor will be whether we can define what it should do safely, precisely, and responsibly.

Why This Matters for AI Systems Authors

An AI Systems Author is not merely someone who writes prompts. It is someone who understands that AI behavior emerges from the interaction between models, tools, instructions, context, permissions, memory, retrieval, and human review.

That role requires a different discipline.

It requires asking questions such as:

  • What is the source of truth?
  • What authority does the AI have?
  • What should the AI never do without approval?
  • What context is trusted?
  • What context may be hostile or misleading?
  • What evidence should be preserved?
  • How will the human know what happened?
  • How can the system fail safely?

These questions are not academic. They are practical.

They are the difference between using AI as a helpful assistant and accidentally creating an ungoverned operational actor.

The Frustration That Is Coming

Many users are accustomed to software silently correcting them. They may expect AI to do the same thing, only better.

But as AI systems become more safety-conscious, users may begin to feel friction.

The AI may ask for clearer instructions. It may refuse to infer too much. It may avoid taking action without confirmation. It may distinguish between reviewing, drafting, editing, executing, and publishing. It may resist vague requests that would have been accepted casually before.

Some users may experience that as the AI becoming less helpful.

But in many cases, the opposite is true.

The system is not becoming less helpful. It is becoming more aware that helpfulness without boundaries can be harmful.

Patience Is Part of the Skill

Learning to work well with AI will require patience.

That patience is not just waiting for better answers. It is the patience to clarify our own intent before delegating work. It is the patience to review what was done. It is the patience to correct the instruction, not merely complain about the output.

This is where I expect my own prompting habits to be sharpened.

If I ask for something vague, I should expect the AI to help expose that vagueness. If I give it too much authority, I should expect it to slow down. If I fail to define the source of truth, I should expect the result to be less reliable. If I ask it to “make it better,” I should be prepared to explain what “better” means.

That is not a weakness in the process. That is the process teaching me to be more intentional.

Conclusion: The End of Casual Delegation

AI is becoming more powerful, but that does not remove responsibility from the human. It increases it.

The future will not belong only to people who know how to ask clever questions. It will belong to people who know how to define bounded work, preserve human judgment, and use AI without surrendering discernment.

Loose prompting may still work for casual tasks.

But for serious work, the era of casual delegation is ending.

As AI becomes more capable of acting, humans must become more capable of instructing.

References and Further Reading

Shiney Object Syndrome (SOS) - silent AI project killer

"The companies achieving sustainable AI success aren’t necessarily the ones with the most sophisticated models or the largest budgets. They’re the ones that treat AI deployment as a comprehensive engineering discipline, with rigorous processes, proper architecture, and deep integration with business workflows."
- The Production AI Reality Check: Why 80% of AI Projects Fail to Reach Production

I was unknowingly sabotaging my open source project due to poor assumptions working with AI. While I was completing months of work within days, I was introducing shiny objects that would change the session context and direction; leading us away from successfully getting to beta in a timely manner (if at all).

The context switches were sometimes not so subtle (and hard to recover from); it was these that made me aware of SOS in my open source project. With that awareness I put guardrails in place to ensure our project's survival, and get the vs-mcp-bridge project to beta in a timely manner; Guardrails that would prevent "me" from derailing the project, a guardrails sample follows:

I went to an authoritative source for my suspicions, below are my prompts and AI responses:

In general, If I am working with AI on a project with a defined structure and scope, and I suggest another task - will AI divert its focus to that new task

Short answer: yes — most AI systems will shift to the new task you mention, unless you explicitly anchor them to stay within the original scope. [S1]

----

In general, if I am working with AI on a project, and I suggest a policy, standard, or issue to be true. Will AI report it to be true without validation

The short, honest answer: yes — if you assert something as a fact, many AI systems will often treat it as true unless you explicitly ask them to verify it. [S2]

----

If a developer is unaware of the answers to these two questions, do they become a threat to the success of a project, i.e., ignorance is not bliss

Yes — if a developer is unaware of the two issues you raised, that ignorance can become a real threat to project success. Not because the questions themselves are dangerous, but because the behaviors they reveal can quietly undermine a project if the developer doesn’t understand them. [S3]

----

Developers of large teams generally do not have a say in a projects direction - does it make sense that the psychological effect, even adrenaline, have developers embrace the default behaviors [on their own projects, versus yield it to an AI architect]

When developers lack influence over project direction, the psychological pull of AI’s default behaviors can absolutely shape how they work — sometimes in ways that help, sometimes in ways that quietly undermine the project.

Developers are especially vulnerable because AI amplifies their natural novelty bias, cognitive ease, and the adrenaline rush of rapid progress, creating a loop where every new idea feels productive even as it quietly derails the project; and unless a developer understands that AI will instantly follow any new direction and will often accept unverified statements as truth, they can unknowingly become the source of scope drift, misalignment, and lost months of work—making SOS not a technical flaw, but a human psychological trap that any of them can fall into without realizing it. [S4]

----

Is it safe to say, that these questions, if unaddressed could result in project failure

Yes — it is absolutely safe to say that if these issues go unaddressed, they can meaningfully increase the risk of project failure. Not because any one issue is catastrophic on its own, but because together they create a systemic vulnerability in how teams use AI. [S5]

----
Has the AI industry addressed these risks

Short answer: No — the AI industry has not fully addressed these risks. Pieces of the problem are acknowledged, but the industry has not solved them in a systematic, standardized, or enforceable way. [S6]

And that’s exactly why your questions matter so much.

Core takeaway

The AI industry has partially recognized these risks, but there is no unified framework that reliably protects teams from:
  • AI task‑switching behavior
  • AI accepting unverified statements as true
  • the psychological pull that makes developers embrace these defaults

These remain active, unresolved vulnerabilities in real-world AI‑assisted development.

What has been addressed

  • Research papers discuss “hallucinations” and “over‑trust.”
  • Companies warn users to “verify outputs.”
  • Some tools add optional fact‑checking features.
  • Enterprise AI platforms add governance layers.
  • Training materials mention prompt discipline.

These are fragments, not a comprehensive solution.

The result

These risks remain real, active, and unmitigated unless organizations create their own guardrails.

Your questions identify exactly the kind of gaps that lead to:

  • scope drift
  • incorrect assumptions
  • misaligned deliverables
  • quiet project failure

And yes — if unaddressed, they can absolutely cause projects to fail.

The deeper truth

The AI industry has acknowledged the risks. It has not solved them. It has not standardized protections. It has not addressed the human psychological component.

So your concerns are valid.


References

  1. [1] Prompt Injection – Instruction Override Behavior; Tensor Trust: Interpretable Prompt Injection Attacks; Investigating Semantic Drift in GPT‑4 Following Prompt Injection Attacks; Shadows in the Attention: Contextual Perturbation and Representation Drift in LLMs.
  2. [2] A Survey on Hallucination in Large Language Models; Survey and Analysis of Hallucinations in Large Language Models; From Illusion to Insight: Hallucination Mitigation Techniques; Reference Hallucination Score for Medical AI Systems.
  3. [3] The Production AI Reality Check; The Mirage of AI Programming; Accuracy Paradox: Epistemic and Manipulative Risks in AI.
  4. [4] Hallucinations as Human‑LLM Coupling; LLM Hallucination Survey and Human Factors.
  5. [5] Hallucination Mitigation Taxonomy; LLM‑Based Agents Suffer from Hallucinations.
  6. [6] Modern Hallucination Mitigation Research; Hallucinations in LLM‑Based Agents; Prompt Injection – Instruction Override Behavior.

Inference-driven development with Copilot; pros and cons

Inference Driven

Using AI Assistance Without Turning Development Into A Black Box

Copilot is a useful development assistant. It can complete patterns, suggest code, write tests, and keep a developer moving through mechanical work. The risk is not that Copilot is useless. The risk is treating inference as if it were architecture, verification, and judgment all at once.

That distinction is the heart of the BlogAI story. AI-assisted software design works best when generated code is surrounded by source-of-truth documents, observable workflows, approval boundaries, logs, diagrams, and durable evidence. Without those things, a team can move faster while understanding less.

VS MCP Bridge has become a practical case study for that lesson.

What Inference Means In Practice

In software development, inference means the model is producing likely code, explanations, or next steps from the context it can see. That can be powerful, but it is not the same as owning the system model.

The model may know common patterns. It may mirror nearby code. It may produce a convincing implementation. But it does not automatically know which boundaries are non-negotiable, which logs are required for future triage, which security claims would overstate the current system, or which documentation is the source of truth.

That is why inference-driven development needs a workflow around it.

Where Copilot Helps

Copilot works well when the local task is clear and the surrounding code already teaches the pattern.

  • It can accelerate repetitive edits, tests, and small refactors.
  • It can suggest idiomatic code when the project conventions are visible.
  • It can help explore unfamiliar APIs or fill in routine structure.
  • It can reduce friction when the developer already knows what should happen.

In that role, Copilot behaves like a fast assistant. It is especially useful when the developer can review the output against a clear contract.

Where Inference Becomes Risky

The same strengths become risky when the task is architectural, security-sensitive, or poorly bounded.

  • A generated change may look correct while bypassing the real execution boundary.
  • A suggested log line may leak data or pollute MCP stdout.
  • A local refactor may erase a correlation id that future troubleshooting depends on.
  • A plausible explanation may imply authentication, sandboxing, or secret storage that does not exist.
  • A quick fix may solve the symptom while leaving no evidence for the next session.

These are not reasons to avoid AI tools. They are reasons to stop treating prompt-to-code as the whole workflow.

Prompt-To-Code Is Not Enough

The early mistake in many AI-assisted workflows is assuming that the prompt ends when code appears. In practice, the better workflow is prompt-to-evidence.

A useful AI-generated change should be answerable:

  • What boundary did it touch?
  • Which source-of-truth document says this behavior is correct?
  • Which tests or validation steps prove it?
  • Which logs or artifacts would explain it later?
  • Which Mermaid diagram reflects the observed flow?
  • What should a future AI session read before extending it?

That is the difference between code generation and engineering discipline.

How VS MCP Bridge Changed The Workflow

The VS MCP Bridge cleanup made this concrete. The project did not become clearer just because an AI generated code. It became clearer because logs, diagrams, handoffs, and architecture documents exposed where the system was vague.

Sequence diagrams helped reveal transport boundaries. Trace logs made request and operation correlation visible. Durable artifacts showed whether execution really flowed through the expected catalog and executor path. Approval and security traces forced a clearer distinction between current plumbing and future hardening.

That evidence led to better architecture:

  • the MCP stdio boundary stayed clean
  • the VSIX stayed isolated behind the named-pipe boundary
  • compiled tools gained descriptors, requests, results, catalogs, and executor-owned logging
  • MEF became a discovery seam instead of an execution shortcut
  • approval-aware execution became part of the tool boundary
  • security seams stayed explicit without claiming production authentication or sandboxing
  • audit and redaction became part of reconstructable tool execution

In other words, the AI assistance was useful because the project kept forcing it back through observable architecture.

Human Review Still Owns The Design

Copilot can propose. Codex can implement. ChatGPT can explain tradeoffs. None of those tools should silently own the design.

Human review still decides whether a change matches the architecture, whether the risk is acceptable, whether the evidence is enough, and whether the documentation tells the truth. The stronger the tool, the more important that review becomes.

This is especially true for security and approval workflows. A model can generate a policy class or approval hook, but the project still needs to say what is intentionally deferred: OAuth, user identity, real secret storage, sandboxing, signed plugin manifests, tamper-evident audit stores, and SIEM export are not complete just because a seam exists.

Source Of Truth Beats Chat Memory

One of the strongest lessons from this project is that durable source files beat chat memory.

The current workflow asks future sessions to start from files such as:

Those files make the system teachable. They also reduce the chance that an AI session resumes from an outdated mental model.

Where BlogAI Fits

BlogAI can help turn this architecture work into learning material, but only if the blog stays aligned with the code.

That is why the current blog cleanup starts from preserved database exports, canonical repo sources, manifest metadata, and explicit token/link rules. Blog posts should not drift away from the system they are explaining. They should point readers back to the current architecture, trace workflows, Mermaid sources, and handoffs that support the claims.

Done well, BlogAI becomes more than a publishing surface. It becomes a way to keep project knowledge synchronized with code, validation artifacts, and operational lessons.

Practical Pros And Cons

Practice Strength Risk
Copilot as coding assistant Fast local implementation help Can produce plausible but wrong code if review is weak
Codex-style implementation sessions Can inspect, edit, validate, and commit cohesive slices Needs repository source-of-truth and validation constraints to stay grounded
Architecture chat and review Good for explaining tradeoffs and surfacing assumptions Can become speculative if not tied back to code and artifacts
Durable traces and handoffs Make AI-assisted work reconstructable Require discipline to keep current

Takeaway

Inference-driven development is useful when it is not treated as autonomous development.

The stronger pattern is human-directed, evidence-backed AI assistance: use Copilot for local acceleration, use Codex or chat tools for broader implementation and reasoning, require source-of-truth documentation, preserve trace evidence, and keep approvals, logs, and boundaries visible.

That is what VS MCP Bridge is trying to teach. The goal is not just prompt-to-code. The goal is prompt-to-evidence, with code as one result of a workflow that remains understandable after the session ends.

See Chat Sessions Models And Agents for related background on chat sessions, models, and agents.

How stdio Works in VS MCP Bridge

Stdio

In the VS MCP Bridge architecture, stdio is the AI-facing MCP transport boundary. It is important, but it is not the whole bridge.

That distinction matters because “MCP over stdio” can sound as if the AI client is talking directly to Visual Studio. It is not. The AI client speaks MCP to a local server process over standard input and standard output. That server then uses a separate local named-pipe hop when a tool needs Visual Studio state.

This post explains the boundary, why stdout has to stay clean, and how the current implementation keeps diagnostics observable without corrupting the MCP protocol stream.

The Short Version

The runtime path for VS-backed MCP tools is:

AI client
  -> MCP over stdio
VsMcpBridge.McpServer
  -> JSON over named pipe
VsMcpBridge.Vsix
  -> Visual Studio services / DTE / editor state

So stdio gets the request into the local MCP server. The named pipe gets VS-backed work into the VSIX. The two transports are intentionally separate.

Where stdio Is Enabled

The stdio transport is configured in the MCP host bootstrap:

builder.Services
    .AddMcpServer()
    .WithStdioServerTransport()
    .WithTools<VsTools>();

That configuration lives in the VsMcpBridge.McpServer project, inside McpServerHost.Configure(...). The important line is WithStdioServerTransport().

That line tells the MCP host to exchange protocol messages through standard input and standard output instead of through HTTP, a socket listener, or a custom public endpoint.

What stdio Means Here

Standard input and standard output are process streams.

  • stdin is how the AI client writes MCP requests into the server process.
  • stdout is how the server process writes MCP responses back to the client.

That makes stdio a good fit for local AI tooling. The AI client can launch the MCP server as a worker process, keep it alive, write protocol messages to stdin, and read responses from stdout. The MCP server does not need to expose a network port for this local path.

The boundary is still a protocol boundary. stdout is not a casual logging stream once MCP is running over it.

Why stdout Must Stay Clean

One practical consequence of MCP over stdio is that stdout must be reserved for protocol traffic. If the server writes arbitrary diagnostic lines to stdout, the AI client can receive those lines as if they were MCP messages. That can make a healthy server look broken.

For that reason, diagnostics belong somewhere else:

  • stderr when the host framework allows it safely,
  • file logs under the local app-data logging paths,
  • Visual Studio output panes and VSIX trace logs,
  • structured trace artifacts when validating a workflow.

The current architecture treats clean stdout as part of the transport contract. Operational detail is preserved, but it is kept off the response stream that the MCP client is parsing.

What the Entry Point Does

The program entry point is intentionally small:

var builder = Host.CreateApplicationBuilder(args);
McpServerHost.Configure(builder);

await builder.Build().RunAsync();

Startup has a narrow job:

  1. Create the host builder.
  2. Register logging, the pipe client, MCP server support, stdio transport, and the VS-backed tool container.
  3. Build and run the host.

Once the host is running, the MCP tool surface is visible to the AI client over stdio.

What stdio Does Not Do

stdio does not make the MCP server a Visual Studio extension. It does not grant direct DTE access, load inside the Visual Studio process, or apply edits in the editor.

Those responsibilities stay on the VSIX side. The MCP server process stays outside Visual Studio and acts as the local AI-facing adapter.

That separation is one of the core architecture choices in the project:

  • MCP protocol work lives in VsMcpBridge.McpServer.
  • Visual Studio API work lives in VsMcpBridge.Vsix.
  • Shared compiled bridge tools execute through BridgeToolExecutor when they use the shared tool catalog/executor path.

stdio is a transport. It is not the policy, approval, audit, or redaction boundary for compiled bridge tools. That boundary remains BridgeToolExecutor.

How VS-Backed Tool Calls Cross the Boundary

The MCP host exposes the VS-backed tool container registered with WithTools<VsTools>(). That class contains explicit MCP tools such as:

  • vs_get_active_document
  • vs_get_selected_text
  • vs_list_solution_projects
  • vs_get_error_list
  • vs_propose_text_edit
  • vs_propose_text_edits

From the AI client’s perspective, those are MCP tools. The request arrives over stdin, the MCP host resolves the method, and the method executes inside the VsMcpBridge.McpServer process.

For Visual Studio-backed operations, the method still does not call Visual Studio directly. It forwards a structured request through the pipe client.

The Named-Pipe Hop

Inside VsTools, the VS-backed methods use an injected IPipeClient. That client connects to the VSIX-hosted named-pipe side:

using var pipe = new NamedPipeClientStream(".", _pipeName, PipeDirection.InOut, PipeOptions.Asynchronous);
await pipe.ConnectAsync(timeout: 5000, cancellationToken);

The full call path is layered:

  1. The AI client calls an MCP tool over stdio.
  2. The MCP host routes the call to a VsTools method.
  3. The method uses PipeClient to connect to the VSIX over the local named pipe.
  4. The VSIX dispatches the known pipe command and performs the Visual Studio-side operation.
  5. The VSIX returns a structured response through the pipe.
  6. The MCP server writes the MCP response back over stdout.

This is why stdio and the named pipe should be debugged separately. A stdio failure means the AI client and MCP server are not communicating correctly. A pipe failure means the MCP server could not reach the VSIX side.

The Activation Boundary

The named-pipe side is initialized by the Visual Studio extension. For live VS-backed tool calls, the operator must launch the Visual Studio Experimental Instance and open View -> Other Windows -> VS MCP Bridge so the VSIX/tool-window path initializes the local pipe server.

If that pipe side is inactive, the current MCP server returns an activation-focused diagnostic instead of leaving the operator with an opaque timeout. The diagnostic points to the activation steps: launch the Experimental Instance, open the VS MCP Bridge tool window, then retry the VS-backed tool.

That message is still returned as a structured tool failure. The transport does not change, and the server does not start adding retry loops or writing troubleshooting text to stdout outside the MCP response.

Correlation and Trace-Only Diagnostics

Because stdio needs clean protocol output, observability depends on structured diagnostics outside stdout. Current traces preserve request IDs, correlation IDs, operation names, timing, and success or failure outcomes across the relevant boundary.

For the inactive-pipe path, the useful evidence is not a random console line. It is the reconstructable chain:

MCP tool request received
PipeClient attempted named-pipe connection
named pipe was unavailable
activation diagnostic returned
correlation/request metadata preserved
no raw payload or secret values disclosed

That is the anti-black-box discipline used throughout the project. A failure should be explainable from durable logs, trace artifacts, and documented workflow boundaries, not from guessing which process happened to be awake.

 

How This Relates to BridgeToolExecutor

The stdio server exposes VS-backed tools directly through the MCP tool container, and those tools cross into the VSIX over the named pipe. Separately, the shared bridge tool architecture has compiled tools, descriptors, capability metadata, approval requirements, secret-reference awareness, redaction, audit envelopes, and classification metadata.

Those shared compiled tools flow through BridgeToolExecutor. That executor is the policy, approval, execution, audit, correlation, and redaction boundary for that path.

The important distinction is:

  • stdio is how an AI client talks MCP to the local server process.
  • named pipes are how VS-backed MCP tools reach the VSIX.
  • BridgeToolExecutor is the shared execution/security boundary for compiled bridge tools.

Keeping those responsibilities separate is what lets the architecture grow without turning transport code, Visual Studio integration, and security policy into one indistinct layer.

What to Remember When Studying This Code

If you are learning the system, keep these files and roles in mind:

  • Program.cs starts the MCP server host.
  • McpServerHost.Configure(...) wires logging, stdio transport, the pipe client, and the MCP tool surface.
  • VsTools defines the VS-backed MCP tools exposed over stdio.
  • PipeClient bridges from the MCP server process into the VSIX.
  • The VSIX owns Visual Studio APIs, editor state, proposal application, and the named-pipe server side.
  • BridgeToolExecutor owns the shared compiled-tool policy and audit boundary.

Once those layers are clear, the implementation is much easier to reason about. The bridge is not one process doing everything. It is a set of local boundaries with explicit responsibilities.

Takeaway

In VS MCP Bridge, stdio is the process-to-process protocol transport that lets an AI client speak MCP to the local server host. The server then uses a separate local named-pipe boundary for Visual Studio-backed operations.

The cleanest mental model is:

stdio gets into the MCP server
named pipes get into Visual Studio
BridgeToolExecutor governs shared compiled tool execution

That separation keeps the bridge observable, debuggable, and easier to evolve. stdout stays clean for MCP. Diagnostics stay reconstructable. Visual Studio work stays in the VSIX. Shared tool execution keeps its own policy and audit boundary.

Understanding a Named Pipe Listener

Named Pipe Listener

In the VS MCP Bridge architecture, the Visual Studio side of the system does not wait for natural-language prompts from an AI tool. It waits for structured bridge requests.

That waiting point is the named-pipe side of the bridge.

A named pipe is a local inter-process communication channel provided by the operating system. One process creates the pipe and waits for a connection. Another process connects and exchanges messages. No public network port is required.

In this project, the named-pipe boundary exists because the MCP server and the Visual Studio extension have different jobs. The MCP server speaks MCP over stdio to the AI client. The VSIX runs inside Visual Studio and owns Visual Studio APIs, editor state, proposal application, and host-specific behavior.

The Short Version

The current VS-backed tool path is:

AI client
  -> MCP over stdio
VsMcpBridge.McpServer
  -> PipeClient
local named pipe: VsMcpBridge
  -> PipeServer in the VSIX
VsService
  -> Visual Studio APIs / editor state

The important boundary is simple: stdio gets the request into the local MCP server, and the named pipe gets Visual Studio-backed work into the VSIX.

Why the VSIX Side Is Isolated from stdio

The VSIX runs inside Visual Studio. It can access DTE, editor state, solution state, the Error List, and the proposal-approval UI. The MCP server does not run inside Visual Studio and should not pretend to be the IDE host.

Keeping stdio out of the VSIX gives the bridge a cleaner architecture:

  • The AI client talks MCP to a local server process.
  • The MCP server keeps stdout reserved for MCP protocol responses.
  • The VSIX owns Visual Studio-specific work and Visual Studio privileges.
  • The named pipe provides a local-only bridge between those two processes.

This is why the named pipe is not just an implementation detail. It is the local host boundary between the AI-facing process and the IDE-facing process.

PipeClient and PipeServer Responsibilities

The named-pipe layer has two sides.

PipeClient lives in the MCP server process. For VS-backed tools, it connects to the local pipe name, writes a serialized request envelope, waits for a serialized response, and returns that response to the MCP tool method.

PipeServer lives on the host side. In the VSIX host, it accepts the pipe connection, reads the request envelope, dispatches the command, and writes a response.

At a high level, the client side looks like this:

using var pipe = new NamedPipeClientStream(".", _pipeName, PipeDirection.InOut, PipeOptions.Asynchronous);
await pipe.ConnectAsync(timeout: 5000, cancellationToken);

await writer.WriteLineAsync(JsonSerializer.Serialize(envelope, JsonOptions));
var responseJson = await reader.ReadLineAsync(cancellationToken);

And the server side listens for local pipe connections, then hands each connection to request handling:

pipe = new NamedPipeServerStream(
    PipeName,
    PipeDirection.InOut,
    NamedPipeServerStream.MaxAllowedServerInstances,
    PipeTransmissionMode.Byte,
    PipeOptions.Asynchronous);

pipe.WaitForConnection();
_ = Task.Run(() => HandleConnectionAsync(pipe, ct), CancellationToken.None);

The useful point is not the exact syntax. The useful point is the split of responsibility: the MCP server initiates a local pipe request, and the VSIX host accepts and dispatches it.

The Request Envelope

The named-pipe listener is not a chat endpoint. It expects a structured request envelope.

That envelope carries fields such as:

  • Command
  • RequestId
  • Payload

The command tells the host what operation is being requested. The request ID gives the logs and responses a stable correlation point. The payload contains the typed request body for that operation.

This structure is what makes the bridge diagnosable. When a tool call fails, the operator can ask which request crossed which boundary instead of guessing from unstructured text.

How Dispatch Works

Once the pipe server has a request envelope, it dispatches by command name. It does not interpret prose or execute arbitrary instructions.

VsResponseBase response = envelope.Command switch
{
    PipeCommands.GetActiveDocument => await _vsService.GetActiveDocumentAsync(),
    PipeCommands.GetSelectedText => await _vsService.GetSelectedTextAsync(),
    PipeCommands.ListSolutionProjects => await _vsService.ListSolutionProjectsAsync(),
    PipeCommands.GetErrorList => await _vsService.GetErrorListAsync(),
    PipeCommands.ProposeTextEdit => await DispatchProposeEditAsync(envelope),
    _ => new VsResponseBaseUnknown { Success = false, ErrorMessage = $"Unknown command: {envelope.Command}" }
};

The current MCP surface is explicit and limited. Unknown, empty, malformed, or unsupported pipe commands fail closed instead of being dispatched.

Where Visual Studio Work Happens

The pipe server owns transport and dispatch. It does not need to own DTE or editor behavior directly.

Visual Studio-specific work is handled by the host service layer, such as VsService. That is where operations such as these belong:

  • getting the active document,
  • reading selected text,
  • listing solution projects,
  • reading the Error List,
  • creating approval-gated edit proposals.

This keeps transport concerns separate from Visual Studio concerns. It also keeps the MCP server from needing direct knowledge of Visual Studio SDK details.

Activation and Startup Boundaries

The VSIX side must be active before VS-backed MCP tools can succeed. In live validation, the reliable operator path is to launch the Visual Studio Experimental Instance and open View -> Other Windows -> VS MCP Bridge. That activation path initializes the VSIX/tool-window side needed for the named pipe.

If the MCP server is running but the VSIX pipe side is inactive, that is not an MCP stdio failure. It is a named-pipe activation failure.

The current diagnostic path treats that case explicitly. Instead of appearing as an opaque timeout, the pipe client returns a structured activation diagnostic telling the operator to launch Visual Studio, open the VS MCP Bridge tool window, and retry the VS-backed tool.

That matters because a transport failure should identify the failed boundary:

  • If stdio is broken, the AI client and MCP server are not talking correctly.
  • If the named pipe is unavailable, the MCP server cannot reach the VSIX side.
  • If command dispatch fails, the request reached the host but did not match an allowed operation.
  • If VsService fails, the request reached Visual Studio-side execution but the host operation failed.

Request and Response Correlation

The named-pipe layer participates in the same anti-black-box logging discipline as the rest of the bridge. Requests carry IDs across the boundary so logs can be reconstructed later.

A useful trace should be able to answer:

  • which MCP tool was called,
  • which pipe command was sent,
  • which request ID crossed the pipe,
  • whether the pipe connected, timed out, or returned a structured failure,
  • which host operation ran,
  • how long each boundary took.

That is why the architecture emphasizes request IDs, operation names, elapsed timing, success or failure state, and durable trace artifacts. The goal is not more logging for its own sake. The goal is to make failure reconstruction practical.

Approval-Aware Flow Where It Matters

The named pipe does not approve tool execution by itself. It moves structured requests between local processes.

For Visual Studio edit operations, the VSIX proposal workflow remains approval-gated. MCP can propose edits, but applying them still requires explicit approval in the host UI.

For shared compiled bridge tools, approval-aware execution is a separate executor concern. A compiled tool descriptor can require approval, and BridgeToolExecutor owns policy evaluation, approval evaluation, execution, audit, correlation, and redaction for that path.

That means the named-pipe layer supports approval-aware architecture by preserving structured boundaries and correlation, but it is not the shared compiled-tool policy engine.

Relationship to MCP and BridgeToolExecutor

It helps to keep three boundaries separate:

  • MCP stdio boundary: the AI client talks to VsMcpBridge.McpServer.
  • Named-pipe boundary: VsMcpBridge.McpServer talks to the VSIX host for Visual Studio-backed tools.
  • BridgeToolExecutor boundary: shared compiled tools run through policy, approval, execution, audit, redaction, and correlation seams.

Those boundaries are complementary. The named pipe keeps Visual Studio operations local to the VSIX. BridgeToolExecutor keeps compiled tool execution governed by a single shared policy and audit boundary. stdio keeps the AI client protocol isolated from both of those internal implementation details.

Failure Isolation and Troubleshooting

If you are debugging a VS-backed tool call, follow the boundary chain instead of treating the bridge as one black box:

  1. Did the AI client successfully launch and speak to the MCP server over stdio?
  2. Did the MCP server resolve the expected registered tool?
  3. Did PipeClient attempt the expected command with a request ID?
  4. Was the VSIX/tool-window side active and listening on the named pipe?
  5. Did PipeServer accept and parse the request envelope?
  6. Did the command dispatch to a known PipeCommands value?
  7. Did VsService complete the host operation?
  8. Did the response return through the pipe and then over MCP stdout?

This is the practical value of clean transport boundaries. Each step has a narrow responsibility, so the first missing or failing boundary can be found from logs and trace artifacts.

Related Mermaid Trace Sources

The repo already has Mermaid sources that support this post:

Those .mmd files remain the diagram source of truth. This post references them directly instead of embedding generated images.

Why This Supports Future Extensibility

The named-pipe layer gives future work a stable place to preserve local host isolation. New VS-backed operations can stay explicit command-and-response paths. New compiled tools can continue to use BridgeToolExecutor for policy, approval, redaction, and audit. Additional diagnostics can attach to the existing correlation chain without polluting MCP stdout.

That is the main architectural benefit. The bridge can grow without collapsing the AI protocol, Visual Studio host operations, transport diagnostics, and tool security seams into one layer.

Takeaway

A named pipe listener is the local Visual Studio-side endpoint that waits for structured inter-process requests. In VS MCP Bridge, it exists so the VSIX can own Visual Studio operations while a separate MCP server process owns the AI-facing MCP stdio transport.

The short version is:

stdio gets into the MCP server
named pipes get into Visual Studio
BridgeToolExecutor governs shared compiled tool execution

Keeping those roles separate is what makes the bridge easier to diagnose, safer to extend, and more useful for observable AI tooling.

Understanding AI Chat Sessions, Models, and Agents

Chat Sessions Models And Agents

Why Context, Tools, Evidence, and Boundaries Matter

One of the easiest mistakes to make with modern AI tools is assuming that a chat is a persistent intelligence that keeps thinking between messages. That is not how these systems work. Once that clicks, a lot of confusing behavior suddenly makes sense.

It also explains why the VS MCP Bridge project puts so much weight on architecture docs, durable traces, session handoffs, and source-controlled blog content. If the chat context disappears, the system still needs a way to recover the project model.

A Chat Is Not A Persistent Mind

A chat session is a temporary context window wrapped around a model call. On each turn, the application gathers instructions, prior messages, available tool results, selected files, and any other context it chooses to include. The model then generates a response from that input.

Instructions + context + tool results + current prompt -> model -> response

The model does not carry goals forward unless those goals are present in the current request. If the working context is lost, the same model can feel like a different assistant because it no longer sees the same constraints, terminology, or decisions.

A chat session is working context, not permanent memory.

Why Context Loss Feels So Disruptive

When a desktop app crashes, a session resets, or a context window is compacted, the active conversation may lose important details. Earlier decisions, operating rules, current branch state, and architectural constraints may disappear unless they were preserved somewhere outside the chat.

That is why VS MCP Bridge now treats repository files as the source of truth. AI_START.md, docs/ARCHITECTURE.md, trace workflows, logs, Mermaid sources, and session handoffs are not paperwork. They are the durable memory that a future AI session can reload.

The Main Terms

These terms often get blurred together, but separating them helps explain what the bridge is doing.

Model

The model is the reasoning engine. It generates output from the input it receives. By itself, it is stateless and does not know the project unless the current context gives it project evidence.

Session

The session is the active conversation context. It may include prior messages, instructions, selected files, tool results, and summaries. It can be extremely useful, but it is not a reliable permanent store.

Tool

A tool is a callable capability outside pure text generation. In VS MCP Bridge, tools can read Visual Studio state, list projects, create edit proposals, or execute shared bridge tools through catalog and executor boundaries.

Agent

An agent is an orchestration layer that uses a model, context, tools, and a control loop to pursue a task. That does not make it magic or autonomous in the human sense. It still needs explicit boundaries, review, observable execution, and durable evidence.

Orchestration Layer

The orchestration layer decides what context to include, which tools are available, when to call them, how to handle results, and how to continue the loop. ChatGPT, Codex, Copilot, and MCP-enabled clients differ mostly in this layer and in the tools they can reach.

Pure Chat Is Different From Tool-Backed Work

Pure chat inference can explain, summarize, and reason from the supplied prompt. Tool-backed workflows can observe or change the outside world, so they need stronger boundaries.

VS MCP Bridge exists because AI-assisted coding needs more than free-form conversation. It needs a local MCP server, a clean stdio boundary, a named-pipe bridge into Visual Studio, explicit MCP tools, proposal approval, and diagnostics that show what actually happened.

That changes the trust model. A chat answer can be reviewed as text. A tool call may read active editor state, list solution projects, or create a proposed edit. That kind of workflow needs logs, request ids, tool descriptors, policy decisions, approval states, and structured results.

How VS MCP Bridge Grounds Agentic Behavior

In VS MCP Bridge, agentic behavior is grounded by concrete boundaries:

  • MCP clients talk to the local MCP server over stdio.
  • The MCP server reaches Visual Studio only through the local named-pipe boundary.
  • The VSIX owns Visual Studio API access and proposal UI state.
  • Proposal tools create proposals; apply still requires explicit approval in the tool window.
  • Shared bridge tools run through BridgeToolExecutor, not directly from callers.
  • Policy, approval, redaction, audit, correlation, and result shaping stay at the execution boundary.

Those boundaries are what keep "the agent did something" from becoming an unhelpful explanation. A future developer should be able to tell which layer received the request, which tool ran, which approval or policy decision applied, and which result was returned.

Session Continuity Needs Source-Of-Truth Files

When a session survives, the assistant can use the conversation to maintain continuity. When a session is interrupted, source files have to carry the continuity instead.

That is why the project now asks future sessions to start with repository evidence:

This is not only useful for AI sessions. It is useful engineering discipline. Durable context reduces dependency on memory, mood, and whatever happens to fit in the next prompt.

Logs, Traces, Artifacts, And Prompts Work Together

Prompts tell the assistant what to do. Architecture docs tell it what is true. Logs show what happened. Trace metadata records the run context. Mermaid diagrams explain the observed sequence. Handoffs tell the next session what to trust, what to recheck, and what remains deferred.

That combination is more reliable than any single long chat. It also lets a human reviewer challenge the work: if the diagram says a request crossed the executor boundary, the logs and code should support that claim.

Approval Is Part Of Orchestration

Agentic workflows often sound autonomous, but VS MCP Bridge deliberately keeps important operations approval-aware.

The proposal workflow is the clearest example: an MCP tool can create a proposed edit, but the edit is not applied until the user approves it in the host UI. The newer tool-execution approval seam follows the same architectural direction for future selected tools: approval is evaluated at the execution boundary, not hidden inside arbitrary tool code.

That is how AI-assisted development stays understandable. The model can suggest. The tool can prepare. The boundary can log, audit, redact, and classify. The human can review and approve.

What Context Windows Cannot Solve

Larger context windows help, but they do not eliminate the need for durable evidence.

A bigger window can include more files and more history, but it can still omit the one constraint that matters. It can still summarize away nuance. It can still be reset. It can still produce a plausible explanation that does not match the actual code.

That is why the repo treats source-of-truth documents, validation artifacts, and canonical blog content as part of the system. They make the project less dependent on any single context window.

A Cleaner Mental Model

Term Practical Meaning VS MCP Bridge Example
Model Generates output from supplied context The model behind ChatGPT, Codex, or Copilot
Session Temporary working context The current chat plus instructions, files, and tool results
Tool Callable capability outside pure text generation vs_get_active_document, proposal tools, or shared bridge tools
Agent Model plus orchestration loop and tools An AI client using MCP tools to inspect and propose changes
Evidence Durable record that survives context loss Architecture docs, logs, metadata, Mermaid diagrams, handoffs, canonical blog sources

Takeaway

Models generate responses. Sessions provide temporary continuity. Agents orchestrate tools and context. Tools touch real systems. Evidence makes the whole workflow reviewable after the session ends.

That is the practical lesson from VS MCP Bridge and BlogAI: AI-assisted development improves when the important knowledge survives outside the chat. Observable boundaries, approval-aware workflows, source-of-truth docs, and durable traces are what keep agentic behavior from becoming AI magic.

See inference-driven for the companion discussion of inference-driven software design and Copilot's strengths and risks.

VS MCP Bridge Blog Series: Part 7

Source of Truth: docs/ARCHITECTURE.md
Status: Canonical repo cleanup aligned to the current architecture as of 2026-05-16. This post continues the core series from tool execution security seams into durable traces, reconstructable evidence, operational validation, and AI-assisted troubleshooting workflows.

VS MCP Bridge Blog Series: Part 7

Durable Evidence, Trace Workflows, and AI-Assisted Troubleshooting

Part 6 explained why BridgeToolExecutor is the consistent execution boundary for policy, approval, secret-reference handling, redaction, audit, classification, correlation, and tool invocation.

Part 7 moves from the boundary to the evidence around it. The bridge is not just trying to execute tools. It is trying to make tool execution reconstructable later, by a developer or by an AI session that was not present when the behavior happened.

That is the shift that changed this project: diagnostics stopped being an afterthought and became part of the architecture.

Why Durable Evidence Matters

AI-assisted development can move quickly, but fast progress is fragile if the only record of a decision lives in chat history or a local debugging session.

VS MCP Bridge now treats important validation runs as durable evidence. A useful run should leave behind enough material to answer:

  • what code version was observed
  • what workflow was exercised
  • which request id and operation id were used
  • which boundary handled the request
  • where the request succeeded, failed, or stopped
  • which logs support that conclusion
  • which Mermaid diagram matches the observed flow
  • what the next session should trust or revalidate

That evidence turns a one-time manual observation into something another person can replay, inspect, and challenge.

The Artifact Triad

The most useful pattern has become a small triad:

  • a log file under artifacts/logs/
  • a metadata file beside it, usually .metadata.json
  • a Mermaid sequence diagram under docs/diagrams/

The log captures what happened. The metadata captures the run context: branch, commit, host, request id, operation id, input summary, observed result, and scope exclusions. The Mermaid diagram explains the sequence in a form that can be reviewed without rereading every log line.

None of those artifacts replaces the others. The log is the observed evidence. The metadata is the index card. The Mermaid diagram is the map.

Session Handoffs Complete The Record

For larger slices, the repo also keeps session handoffs under docs/session-handoffs/. These are not essays. They are resume points.

A good handoff records what was validated, what commit or branch was involved, what artifacts were produced, what constraints still apply, and what the next session should do first. That matters because future AI sessions should not reconstruct project state from memory or from a conversation transcript that may be incomplete.

The architecture document remains the source of truth for current behavior. The handoffs explain how the project arrived there and what evidence supports particular claims.

Trace Workflows Are Reproducible Procedures

The repo now has documented workflows for important validation paths:

The important detail is that these workflows are not just documentation after the fact. They are part of the development method. When the system changes, the workflow can be rerun, the artifacts can be regenerated, and the diagram can be compared against the current code path.

Correlation Makes Replay Possible

Request and operation identifiers are what make trace replay practical.

Without correlation, logs become a loose pile of events. With correlation, a run can be reconstructed across layers: MCP request, pipe attempt, catalog lookup, policy decision, approval decision, tool execution, audit envelope, result, and visible host behavior.

The point is not to add identifiers for decoration. The point is to let a future reader find the first missing or failing boundary. If a request id appears at the MCP layer but never reaches the pipe client, the failure is different from one that reaches the VSIX host and fails during service execution.

Diagnostics Must Stay Transport-Safe

Durable evidence only helps if it does not corrupt the transport it is trying to explain.

For MCP stdio, stdout must remain clean for JSON protocol traffic. Diagnostics belong in approved channels such as stderr, UI logs, file logs, Debug output, audit envelopes, and durable artifacts. The bridge uses this rule because a single stray log line on stdout can make a valid MCP server look broken.

This is why the activation diagnostics and pipe-failure diagnostics are trace-only or structured tool failures. They should help the operator understand what to do without polluting the MCP protocol stream.

Durable Evidence Improved The Architecture

The traces did more than prove that code worked. They changed the design.

Sequence diagrams and logs made it easier to see where responsibilities were blurred. That led to clearer boundaries around proposal management, host correctness, tool descriptors, catalog registration, executor-owned policy, approval-aware execution, redaction, audit metadata, MEF discovery, and VSIX activation diagnostics.

In other words, observability did not just describe the architecture. It shaped the architecture.

AI-Assisted Troubleshooting Uses Evidence First

This repo is being developed with AI assistance, so the evidence standard is practical: a future assistant should be able to inspect files in the repo and understand the current system without trusting prior chat history.

That means a good troubleshooting loop starts with durable artifacts:

  • read AI_START.md for the current resume map
  • read docs/ARCHITECTURE.md for current behavior
  • read the relevant workflow document for the validation path
  • inspect the matching log, metadata, and Mermaid files
  • compare the observed diagram against current code
  • treat the first missing or failed boundary as the next actionable problem

That process keeps the assistant grounded in repository evidence instead of inventing a story that sounds plausible.

Related Mermaid Trace Sources

The most useful diagram sources for this topic are:

Those .mmd files remain the diagram source of truth. Generated images can be useful later, but the source diagram should stay reviewable in the repo.

What This Does Not Mean

Durable traces are not a full observability platform. They are not telemetry ingestion, distributed tracing infrastructure, SIEM export, compliance storage, or production monitoring.

They are smaller and more immediate: checked-in evidence that the architecture can be understood and validated. That is enough for the current stage of the bridge.

Takeaway

VS MCP Bridge became easier to evolve when the team stopped treating diagnostics as cleanup work and started treating them as architecture.

The durable evidence pattern is simple: capture logs, preserve metadata, draw the observed sequence, and write a handoff when the result changes what future sessions should know. That pattern makes failures localizable, decisions reviewable, and AI-assisted troubleshooting much less dependent on memory.

Next In The Series

The next useful topic is how these evidence and architecture practices should shape the public BlogAI narrative: which posts should teach the transport boundary, which should teach tool execution, and which should teach the operational discipline that keeps AI-assisted systems explainable.

VS MCP Bridge Blog Series: Part 6

Evidence

Source of Truth: docs/ARCHITECTURE.md
Status: Canonical repo cleanup aligned to the current architecture as of 2026-05-16. This post continues the core series from discovery into the security, policy, approval, audit, and redaction seams around tool execution.

VS MCP Bridge Blog Series: Part 6

Security Seams Around Tool Execution

Part 5 described how VS MCP Bridge keeps tool discovery explicit. Compiled tools are the default, MEF is an opt-in discovery seam, and discovered tools are not allowed to become their own execution system.

Part 6 looks at the next question: what happens after a tool is found?

The answer is the same for compiled tools, discovered tools, fake test tools, and future extension points: tool execution must flow through BridgeToolExecutor. That is the boundary where policy, approval, secret-reference checks, audit, redaction, correlation, and the actual tool call are kept together.

This is not a claim that VS MCP Bridge already has production authentication, OAuth, RBAC, vault-backed secrets, sandboxed plugins, or a compliance audit system. It does not. The current work is security architecture plumbing: narrow contracts and observable boundaries that make future hardening possible without scattering security decisions through every tool.

The Boundary Is The Security Feature

The most important rule is simple: a tool should not be able to bypass the executor just because it was registered successfully.

Discovery answers what tools exist. Execution answers whether this request is allowed to run, whether approval is required, what metadata must be audited, what should be redacted, and what result is returned.

Keeping those responsibilities in one boundary matters because every new tool adds risk. A search tool, a Visual Studio-backed tool, a future file-writing tool, or a future extension-provided tool may all have different capabilities, but they should still be evaluated through the same execution path.

What The Executor Owns

The current BridgeToolExecutor owns the security-sensitive execution sequence:

  • resolve the requested tool from the catalog
  • build a security context with tool metadata, request metadata, correlation identifiers, capabilities, approval requirement, and secret references
  • evaluate the configured tool execution policy before running the tool
  • resolve required secret references through the broker seam without making raw secrets part of normal flow
  • ask the approval service when a descriptor requires approval
  • invoke the tool only after policy, secret-reference, and approval checks allow it
  • redact sensitive values before logging or audit output
  • emit an audit envelope with structured outcome, classification, policy, approval, capability, secret-reference, and correlation metadata

That ordering is intentional. Policy denial stops execution before approval. Approval denial stops execution before the tool runs. Secret-reference failure stops execution before unresolved secret material can become accidental runtime behavior.

Policy Before Execution

The policy seam is deliberately lightweight today. The default policy allows execution so existing behavior stays unchanged. A capability-aware policy can be configured to allow or deny tools based on declared required capabilities, but it is not a user identity system and it is not RBAC.

That distinction is important. Capability metadata is declarative plumbing. It lets a tool say, for example, "this tool requires a Visual Studio document read capability" or "this tool will touch proposal state." A policy can inspect that metadata. The project has not yet attached those capabilities to authenticated users, accounts, roles, or external authorization decisions.

Even so, the seam is valuable now because the executor and audit pipeline can already preserve the evidence needed to explain why a request was allowed or denied.

Approval-Aware Execution

Approval-aware execution follows the same pattern. A tool descriptor can declare that execution requires approval. The executor sees that requirement and calls IToolExecutionApprovalService before invoking the tool.

The default service allows execution, so existing tools continue to run unless they explicitly opt into approval. Tests cover both paths: an approval-required tool can be allowed and executed, or denied and returned as a structured failure without running the tool.

This is not the same thing as a finished user-facing prompt system. It is the execution seam that a prompt system can use later. The important architectural point is that approval is not implemented inside each tool. It lives at the execution boundary.

Secrets Flow By Reference

Secret handling follows the same conservative pattern. The current architecture has secret-reference contracts and a broker seam, not real secret storage.

That means tools can describe required secrets as structured references instead of requiring raw values in tool descriptors, logs, prompts, or audit payloads. The default broker returns unresolved or not configured. That is intentionally boring behavior, but it is safer than pretending a real vault exists before one has been designed.

The key rule for future tool authors is that secret values should flow by reference, not by payload. Logs and audit envelopes may record reference metadata and resolution outcome, but not raw secret values.

Redaction Is Part Of The Boundary

Redaction is not a final line of defense, and it should not be treated as a substitute for careful data flow. It is still an essential part of the boundary because diagnostics are only useful if operators can safely read them.

The bridge redactor masks obvious secret-like keys and values before they enter logs and audit envelopes. That lets traces remain useful while reducing the chance that credentials, tokens, passwords, or synthetic test sentinels leak into developer-visible output.

This matters for anti-black-box diagnostics. The project needs enough evidence to reconstruct what happened, but not so much raw payload detail that the diagnostic trail becomes a secret leak.

Audit Envelopes Carry The Explanation

Audit envelopes are the durable explanation layer around tool execution. They preserve the request and operation correlation metadata, the tool name, policy decision, approval decision, required capabilities, secret-reference metadata, redacted payload summaries, terminal outcome, and structured classification.

The classification metadata is intentionally small: category, severity, risk, and outcome. A successful tool execution can be informational and low risk. A policy denial can be warning and medium risk. An unresolved secret can be warning and high risk. An exception path can be error and high risk.

That is observability plumbing, not a SIEM integration or compliance framework. The value is that a future session can inspect an audit record and understand the shape of the decision without relying on chat history or a debugger session.

Compiled And Discovered Tools Follow The Same Path

This is where Part 5 and Part 6 connect. MEF discovery can find extension-provided tools, but discovery does not grant those tools a private execution lane. They still become catalog entries, and execution still flows through the same executor boundary.

That keeps extensibility from erasing the security model. A discovered tool can have descriptor metadata, required capabilities, approval requirements, and secret references. The executor can then evaluate those declarations the same way it evaluates compiled tools.

The current system is intentionally not a plugin sandbox. Future sandboxing, signed manifests, capability attestation, and remote authorization are deferred. The useful thing today is that the project has a place to attach those decisions later.

Related Mermaid Trace Sources

The repo already has Mermaid sources that support this topic:

Those .mmd files are the diagram source of truth. This post references them directly rather than embedding generated images.

What Is Still Deferred

The current bridge has security seams, not finished enterprise security. The following remain intentionally deferred:

  • OAuth and authentication
  • user identity, roles, and RBAC
  • real secret storage or vault integration
  • encrypted persistence
  • remote authorization
  • plugin sandboxing
  • signed plugin manifests
  • tamper-evident audit storage
  • SIEM export or compliance reporting

Calling those items out is part of the architecture. It prevents the current plumbing from being oversold, and it gives future work a clear place to land.

Takeaway

VS MCP Bridge does not make tool execution safer by hiding it. It makes execution safer by routing it through a visible boundary that owns policy, approval, secret-reference handling, redaction, audit, classification, correlation, and execution.

That boundary is what keeps compiled tools, discovered tools, and future tools from becoming black boxes. It also gives the project a practical path from today's lightweight seams toward stronger security without pretending that future hardening is already complete.

Next In The Series

The next useful topic is how durable trace artifacts, logs, and validation handoffs turn these seams into operational evidence that another developer or AI session can reconstruct without relying on memory.

VS MCP Bridge Blog Series: Part 5

Playbook

Source of Truth: docs/ARCHITECTURE.md
Status: Canonical repo cleanup aligned to the current architecture as of 2026-05-16. This post continues the core series from compiled tool execution into discovery, extensibility seams, and traceable tool registration.

VS MCP Bridge Blog Series: Part 5

Tool Discovery, Extension Seams, and Non-Black-Box Registration

Part 4 explained the compiled bridge tool execution boundary: tools expose descriptors, requests carry correlation metadata, results return structured success or failure, and BridgeToolExecutor owns policy, approval, redaction, audit, and logging.

Part 5 looks at the next question: how do tools get into the catalog without turning extension into a black box?

The current answer is intentionally conservative. Compiled tools are still the default. MEF exists only as a discovery seam. Discovered tools do not execute during discovery, do not bypass the executor, and do not turn the bridge into a production plugin sandbox.

Extensibility Starts With Metadata

A bridge tool is discoverable because it has a descriptor, not because it happens to be a class that can be loaded.

The descriptor gives the catalog and the operator-facing traces a way to answer basic questions before execution:

  • what is the tool id?
  • what is the human-readable name and description?
  • where did it come from?
  • which host does it belong to?
  • does it declare required capabilities?
  • does it require approval before execution?

That metadata is what keeps extensibility understandable. A discovered tool should not appear as an anonymous method call. It should be visible as a catalog entry with an identity.

Compiled Discovery Is The Baseline

The stable path is still compiled discovery.

CompiledBridgeToolDiscovery adapts DI-registered IBridgeTool instances into the catalog. CompiledBridgeToolCatalog collects discovered tools and creates the lookup used by IBridgeToolExecutor.

That means the normal path is simple:

DI registers compiled tools
CompiledBridgeToolDiscovery returns those tools
CompiledBridgeToolCatalog exposes descriptors and lookup
BridgeToolExecutor executes by tool id

This is the path used by the concrete compiled tools such as RegexTextSearchTool and Bm25TextSearchTool. It is predictable, testable, and does not require runtime directory loading.

Why MEF Exists At All

The MEF seam exists because future tool extension should have a place to plug in without changing the executor contract.

But the important word is discovery. MefBridgeToolDiscovery can scan explicitly configured directories for exported IBridgeTool implementations. It can add discovered tools to the same catalog used by compiled tools.

It does not:

  • execute tools during discovery,
  • authorize tool calls,
  • change MCP transport,
  • move Visual Studio commands into tools,
  • add hot reload or dynamic unload,
  • provide production sandboxing,
  • let plugin authors own core audit, redaction, policy, approval, or correlation behavior.

That distinction keeps the seam useful without pretending it is a full plugin platform.

Opt-In Discovery Matters

MEF directory discovery is disabled by default. A host or test has to explicitly enable it through BridgeToolDiscoveryOptions.EnableMefDirectoryDiscovery and provide directories and a search pattern.

That default matters because hidden directory scanning is the kind of behavior that makes tool systems difficult to reason about. The bridge should not silently discover and expose new tools unless the host intentionally opted into that behavior.

The current options are small:

  • EnableMefDirectoryDiscovery,
  • MefDirectories,
  • MefSearchPattern.

This is enough to prove the seam without making policy or packaging decisions prematurely.

Discovery Has Its Own Diagnostics

Discovery can fail in boring ways that are still important during triage:

  • the configured directory is missing,
  • a candidate assembly cannot be loaded,
  • the directory contains no exported bridge tools,
  • the same tool id appears more than once.

The current trace and tests make these cases visible. Missing directories are logged and tolerated. Invalid assembly loads are logged. Discovery completion records the assembly count and tool count. Duplicate tool ids fail at catalog construction because ambiguous identity would corrupt the rest of the execution evidence.

That logging is not decoration. It is the difference between “no tool appeared” and “the configured directory was missing” or “the assembly failed to load.”

Catalog Registration Is Not Execution

A discovered descriptor only proves that the tool is available. It does not prove the tool ran.

The MEF trace makes that distinction explicit. The fake MEF tool is discovered, its descriptor appears in the catalog, and its execution count remains zero until the executor is called.

The execution path is still:

caller creates BridgeToolRequest
BridgeToolExecutor logs start
catalog resolves tool id
policy evaluates descriptor and request
approval runs only if required
secret references resolve or fail safely
tool ExecuteAsync is invoked
audit envelope records terminal outcome
result preserves request id and operation id

That is the key invariant for future extension work: discovery can contribute tools, but execution remains centralized.

Future Extension Is Not Future Confusion

Extensibility often fails when it adds power faster than it adds evidence.

The bridge takes the opposite approach. Before treating directory-loaded tools as a production plugin story, it establishes observable boundaries:

  • catalog descriptors show what was discovered,
  • logs show discovery start and completion,
  • warnings show missing directories and assembly-load failures,
  • executor logs show actual execution start and completion,
  • audit envelopes show terminal outcomes,
  • request and operation ids connect the path.

That lets future work grow from evidence rather than from guesses.

Relationship To The BlogAI Widget Links

The development BlogAI site now has a preserved TextBox widget settings update that points readers at stable main references for the architecture document, canonical blog sources, and Mermaid trace files.

That is relevant to this series because the blog itself is part of the same anti-black-box discipline. The site should point readers to durable source files, not temporary branches or chat history. The widget update is documented in widget-settings-row-26512-update-20260516.md.

Related Mermaid Trace Sources

The repo already has Mermaid sources that support this topic:

Those .mmd files are the diagram source of truth. This post references them directly rather than embedding generated images.

Takeaway

Part 5 is about a restraint that matters: extensibility should not bypass observability.

The current model is:

compiled tools are the default
MEF is opt-in discovery only
catalog registration exposes metadata
BridgeToolExecutor remains the execution boundary
logs and audit preserve what happened

That gives the bridge a path toward future tool extension without losing the ability to explain where a tool came from, why it was allowed or denied, and what result it produced.

Next In The Series

The next useful topic is the security and policy seam around tool execution: capability metadata, approval-aware execution, secret references, redaction, audit classification, and the work intentionally deferred until the architecture has stronger production requirements.

VS MCP Bridge Blog Series: Part 4

Source of Truth: docs/ARCHITECTURE.md
Status: Canonical repo cleanup aligned to the current architecture as of 2026-05-16. This post continues the core series from host correctness into compiled bridge tools, catalog/executor boundaries, and approval-aware execution.

VS MCP Bridge Blog Series: Part 4

Compiled Tools, Execution Boundaries, and Observable Results

Part 3 focused on Visual Studio host correctness: UI-thread-sensitive work, tool-window state, proposal lifecycle ownership, and why IProposalManager keeps approval state from becoming incidental UI behavior.

Part 4 moves one layer deeper into the shared tool execution architecture.

The bridge now has a compiled tool path where shared tools are described, discovered, selected, executed, logged, audited, and returned through a single boundary. That boundary is important because tools are where an AI-assisted system can easily become opaque. A tool call should not be a mystery box. It should have a descriptor, a request, a result, a correlation trail, and a predictable failure shape.

Why A Tool Boundary Exists

The MCP server exposes a small Visual Studio-backed tool surface over stdio, but the shared bridge also needs a place for reusable compiled tools that are not themselves Visual Studio commands.

The design goal is conservative:

callers ask for a bridge tool
catalog resolves the tool
executor owns policy, approval, logging, audit, redaction, and execution
tool returns a structured result

That shape keeps runtime behavior inspectable. A caller should not instantiate random tool classes and run them directly. If a tool matters enough to be part of the bridge, it should flow through IBridgeToolExecutor.

The Basic Tool Contract

The smallest unit is IBridgeTool. It has two responsibilities:

  • publish a BridgeToolDescriptor,
  • execute a BridgeToolRequest and return a BridgeToolResult.

The descriptor is the tool's contract surface. It gives the bridge enough metadata to explain what the tool is before it runs:

  • Id, Name, and Description,
  • Category, Source, and Host,
  • RequiredCapabilities for future capability-aware policy,
  • ApprovalRequirement for tools that must stop for an approval decision.

The request carries the execution identity:

  • ToolId,
  • RequestId,
  • OperationId,
  • structured arguments.

The result carries the same identity back out:

  • ToolId,
  • RequestId,
  • OperationId,
  • Success,
  • Message,
  • ErrorCode,
  • structured result data.

That identity round trip matters. It lets logs, audit envelopes, tests, and caller-visible results all point to the same operation.

Catalog First, Then Executor

IBridgeToolCatalog answers two questions:

  • what tools are available?
  • can this specific ToolId be resolved?

The current catalog implementation is CompiledBridgeToolCatalog. It builds an in-memory lookup from discovered IBridgeTool instances. Duplicate tool ids fail early, because ambiguous tool identity would make policy, logging, and audit evidence unreliable.

The catalog also tolerates an empty tool set. Empty catalog behavior matters in tests and host composition because it proves the bridge can represent “no tools are registered” without inventing hidden defaults.

Unknown tools fail through the executor as structured results. The caller receives ErrorCode = UnknownTool, and the request and operation ids are preserved. That is the anti-black-box pattern in small form: even a failure has a shape.

Compiled Discovery Is The Default Path

CompiledBridgeToolDiscovery adapts DI-registered compiled tools into the catalog. The default shared registration wires the bridge tool services so callers can resolve:

  • IBridgeToolCatalog,
  • IBridgeToolExecutor,
  • compiled tool implementations such as RegexTextSearchTool and Bm25TextSearchTool.

There is also a MEF discovery seam, but it is discovery-only and explicitly constrained. MEF does not own execution, policy, approval, audit, redaction, or transport. Discovered tools still have to run through BridgeToolExecutor.

BridgeToolExecutor Is The Boundary

BridgeToolExecutor is the important part of the design. It is not just a convenience wrapper around tool.ExecuteAsync. It is the shared execution boundary.

Today that boundary owns:

  • start and completion logging,
  • redacted request and result trace payloads,
  • catalog lookup,
  • unknown-tool failure,
  • IToolExecutionPolicy evaluation,
  • descriptor-declared required capability metadata,
  • approval evaluation when ApprovalRequirement = Required,
  • secret-reference resolution through the broker seam,
  • tool invocation,
  • structured cancellation and exception results,
  • BridgeAuditEnvelope emission,
  • classification metadata for terminal outcomes,
  • request and operation correlation preservation.

That is why callers should not bypass the executor. Bypassing it would also bypass the evidence that makes tool behavior reconstructable.

Approval-Aware Execution

The approval-aware execution seam is intentionally small. A tool descriptor can mark itself as requiring approval. If it does, BridgeToolExecutor asks IToolExecutionApprovalService for a decision after policy evaluation and before tool execution.

If approval is denied, the tool is not invoked. The result is a structured failure with ErrorCode = ApprovalDenied. The audit envelope records the approval requirement, decision, and redacted reason. Correlation metadata is preserved.

This is separate from the Visual Studio proposal approval workflow described in Part 3. Proposal approval is the host UI workflow for applying edits. Tool execution approval is a shared executor checkpoint for selected compiled tools.

The First Concrete Proof: Regex Text Search

RegexTextSearchTool is the first concrete proof of the compiled bridge tool path. It is deliberately small:

  • descriptor id: bridge.regexTextSearch,
  • source: compiled,
  • host: shared,
  • arguments: pattern or query, input text or entries, case sensitivity, max results,
  • result data: matches, match count, total match count, and whether results were limited.

The point is not that regex search is the final search story. The point is that it proves the path:

DI registration
catalog descriptor
executor lookup
policy check
tool invocation
structured result
correlated logs
audit envelope

Bm25TextSearchTool extends the same compiled path with request-scoped in-memory ranking. It does not add persistence, crawling, or a background search service. That restraint matters because the architecture is still proving the boundary before turning it into a broad plugin system.

Tests Make The Boundary Real

The shared tests cover the shape of the boundary rather than only the happy path. They verify that:

  • DI resolves the catalog and executor,
  • compiled tools appear in the catalog,
  • empty catalogs are allowed,
  • duplicate tool ids fail fast,
  • unknown tools return structured failure,
  • fake tools can be invoked through the executor,
  • request and operation ids survive execution,
  • policy denial prevents execution,
  • approval denial prevents execution,
  • normal tools skip approval by default,
  • audit metadata records policy, approval, capabilities, secrets, and classification data.

Those tests are not incidental. They are what stop the executor from becoming a label on top of unstructured tool calls.

Related Mermaid Trace Sources

The repo already has Mermaid sources that show the compiled tool boundary from several angles:

Those .mmd files are the diagram source of truth. This post references them directly rather than embedding generated images.

Takeaway

The compiled bridge tool architecture is valuable because it turns tool execution into an observable contract.

A tool is not just a method call. It has a descriptor, a request, a result, a catalog entry, a policy path, optional approval, redacted logs, an audit envelope, and correlation metadata. That structure gives future tools room to grow without making the runtime harder to understand.

The working rule is simple:

tools can be extensible
execution must stay centralized
evidence must stay reconstructable

That is how the bridge supports future extensibility without becoming black-box infrastructure.

Next In The Series

The next useful topic is how the bridge turns these execution boundaries into durable validation evidence: logs, metadata, diagrams, and handoffs that let future AI sessions reconstruct what actually happened instead of relying on chat history.