Buy or build agent security controls for MCP?
Bottom Line: For regulated teams, the decision is not whether to add security controls to an MCP-based agent stack — it is where to place the enforcement boundary. Buying a policy gateway and audit tooling (such as microsoft/mcp-gateway) typically compresses time-to-control from months to weeks and offloads telemetry and access-control infrastructure to a vendor. Building enforcement inside the MCP stack preserves tighter ownership over scopes, audit logs, and approval paths — critical for SOC 2 and HIPAA environments where evidentiary integrity and log chain-of-custody cannot be delegated to a third party. The practical answer for most regulated teams is a split: buy the policy gateway and observability layer, build the approval workflows and incident-response logic in-house. The minority case for full in-house builds exists, but only where the engineering capacity and compliance mandate explicitly justify the maintenance burden.
MCP's authorization specification establishes a protocol-native security foundation through OAuth 2.1 conventions, but it explicitly does not deliver a complete governance stack. Regulated teams sourcing agent security tooling face a landscape where vendors sell MCP-adjacent gateway products that centralize policy gateway enforcement and audit logs, while open-source frameworks like LangGraph and the OpenAI Agents SDK leave those controls inside application ownership. Neither path is free: vendor tooling introduces third-party data custody and contract risk; custom builds carry sustained engineering overhead across every tool update and compliance cycle.
What regulated teams are actually buying when they buy MCP security
What is an MCP gateway? An MCP gateway is a reverse proxy and control plane that sits between MCP clients (Claude Desktop, LangGraph orchestrators, OpenAI Agents SDK workflows) and MCP servers, enforcing access control, routing, and observability at the boundary rather than inside each server. Microsoft's mcp-gateway describes itself as "a reverse proxy" and "a control plane for managing the MCP server lifecycle (deploy, update, delete)" with enterprise integration points including telemetry, access control, and observability.
When a regulated team purchases agent security tooling in the MCP context, it is typically buying four distinct control surfaces, each with different outsource-vs-retain logic:
| Control Surface | Typically Outsourced to Vendor | Typically Retained In-House |
|---|---|---|
| Policy gateway enforcement (allow/deny/rate-limit at boundary) | ✅ Faster to deploy; vendor maintains rule engine | ❌ When policy logic encodes proprietary business rules |
| Telemetry and observability (latency, error rates, tool call volume) | ✅ Standard infrastructure; low compliance sensitivity | ❌ When telemetry contains PII or PHI that cannot leave your perimeter |
| Audit logs with evidentiary integrity | ❌ Chain-of-custody risk if vendor controls log writes | ✅ Log signing, retention schedules, and SIEM integration stay internal |
| Approval workflows and human-in-the-loop gates | ❌ Approval UX tied to internal identity and role structure | ✅ Approval state, exception handling, and audit trail stay internal |
The comparison above is the main divide in vendor pitches: they often bundle all four surfaces together, even though a regulated team usually should not delegate every control to the same external system.
Policy enforcement at the boundary
Destructive agent actions — deleting files, sending email, executing financial transactions — should be blocked or held at the gateway boundary, not inside the LLM's reasoning loop. The MCP tools specification explicitly states that implementations should present confirmation prompts to users for operations to ensure a human is in the loop. That recommendation maps directly to a policy gateway function: intercept the tool call before execution, evaluate it against a policy, and either approve, deny, or route to a human approver.
| Enforcement Placement | Latency Impact | Audit Coverage | Failure Mode if Compromised |
|---|---|---|---|
| Inside the LLM prompt (soft guardrails) | Minimal | None — no external record | Prompt injection bypasses entirely |
| Inside the MCP server handler | Low | Server-side only | Lateral movement to other servers undetected |
| At a gateway/reverse proxy (boundary) | Moderate, but still operationally acceptable for most deployments | Full cross-server visibility | Single point of failure; gateway bypass exposes all servers |
| Client-side orchestrator (LangGraph, Agents SDK) | Low | Orchestrator logs only | Orchestrator compromise = full bypass |
Boundary enforcement at a dedicated gateway is the correct default for destructive actions. The trade-off is that it introduces a single point of failure: if the gateway is misconfigured or bypassed, no downstream server has independent enforcement. Defense in depth requires server-side validation as a secondary check even when a gateway is present. In a Zero Trust Architecture, this boundary check matters because no client or tool should be treated as implicitly trusted after initial authentication.
Audit logs, approvals, and incident response ownership
The controls easiest to outsource are telemetry and access-control enforcement; the controls that must stay internal under SOC 2 and HIPAA are audit log integrity, approval chain-of-custody, and incident-response workflows. OpenAI's Agents SDK documentation explicitly states that applications own orchestration, tool execution, approvals, and state when using the SDK's application-owned path — meaning a build path keeps those controls in-house by design.
| Control | Outsource Risk | In-House Cost | Compliance Implication |
|---|---|---|---|
| Audit logs storage | Log tampering by vendor; retention gaps | Storage + SIEM integration engineering | General compliance evidence requirements may be harder to satisfy if custody is external |
| Approval workflow UX | Vendor outage blocks agent operations | UX build + role/identity integration | Internal role hierarchy must map to approver assignments |
| Incident response runbooks | Vendor SLA may not match your RTO | Security team must author and test | Incident evidence must be producible under regulatory inquiry |
| Agent security tooling alerting | Alert routing depends on vendor uptime | Pager/SIEM integration in-house | False-negative alerts during incident are a compliance finding |
The pattern for regulated teams is: buy the gateway for enforcement velocity, retain the log sink and approval state in infrastructure you control, and own the incident-response runbooks regardless of which layer generated the alert.
What the MCP standards body already gives you
MCP's authorization specification provides OAuth 2.1-based trust establishment between clients and servers — it does not provide a governance stack. As the MCP authorization documentation states directly: "MCP uses standardized authorization flows to build trust between MCP clients and MCP servers" and "Its design doesn’t focus on one specific authorization or identity system, but rather follows the conventions outlined for OAuth 2.1." The draft MCP authorization specification further mandates that MCP servers MUST implement OAuth 2.0 Protected Resource Metadata (RFC 9728), and authorization servers and MCP clients MAY support OAuth 2.0 Dynamic Client Registration (RFC 7591).
The MCP authorization documentation says MCP uses standardized authorization flows and follows OAuth 2.1 conventions. Source: MCP Authorization Tutorial.
| Protocol Capability | Delivered by MCP Standard | Requires Additional Build or Buy |
|---|---|---|
| Client-server trust establishment | ✅ OAuth 2.1 flows | — |
| Protected Resource Metadata | ✅ RFC 9728 (MUST implement) | — |
| Dynamic Client Registration | ✅ RFC 7591 (MAY support) | — |
| Tool-level access policies | ❌ | Policy gateway or custom middleware |
| Structured audit logs | ❌ | External log sink or vendor tooling |
| Human approval gates | ❌ | Orchestrator-level or gateway-level build |
| Prompt injection defense | ❌ | Input/output validation (build or buy) |
Authorization, tool discovery, and scoped access
MCP's built-in authorization reduces identity plumbing work but does not resolve the governance question of which tools are exposed and under what policy. The MCP authorization specification describes a pattern where an agent calls a secure MCP tool to check inventory at a specific store without impersonating the end user — the server acts on behalf of the agent with its own credentials. This is useful: it eliminates the need to pass user tokens through the agent chain.
| Scoped Access Question | MCP Authorization Answers | You Still Decide |
|---|---|---|
| How does the client authenticate to the server? | OAuth 2.1 token flow | Which scopes to grant per tool |
| Can the agent act without impersonating the user? | Yes — server acts with its own credentials | Whether that privilege is appropriate for destructive tools |
| How are new clients registered? | Dynamic Client Registration (RFC 7591) | Whether to allow dynamic registration or lock to pre-approved clients |
| Which tools does the server advertise? | Tool discovery via MCP protocol | Which tools to expose at all — this is a governance decision |
The buy-vs-build impact: MCP's authorization removes the need to invent token exchange from scratch, giving a build path a head start on identity plumbing. But it does not remove the need to decide — and enforce — which tools are accessible under which conditions.
Why protocol support does not remove governance work
OAuth 2.1 authorization establishes trust; it does not prevent a trusted agent from executing a prompt-injected instruction. The MCP security best practices page is explicitly framed as complementary to the authorization specification, identifying session hijacking, tool poisoning, and prompt injection as security risks that persist after authorization is configured. The MCP tools specification requires implementations to carefully validate all prompt inputs and outputs to prevent injection attacks.
| Residual Risk After OAuth 2.1 Auth | Mitigated by Protocol? | Requires Policy Gate or Audit? |
|---|---|---|
| Prompt injection via tool response | ❌ | Input/output validation at gateway or server |
| Tool poisoning (malicious server advertising destructive tools) | ❌ | Tool allowlist enforcement |
| Session hijacking | ❌ | Token rotation, short-lived credentials |
| Unauthorized data exfiltration via allowed tool | ❌ | Audit logs + anomaly detection |
| Policy gateway bypass via direct server access | ❌ | Network segmentation + server-side auth |
The practical implication: a team that deploys MCP with OAuth 2.1 and no policy gate has completed only the identity portion of the governance work for a regulated environment. The remaining work — injection defense, audit coverage, approval flows, and incident response — lives outside the protocol.
Cost, integration effort, and ownership over a 12-month horizon
Over a 12-month horizon, the dominant cost difference between buying and building is not licensing versus engineering salaries — it is ongoing maintenance drag. No authoritative public TCO benchmark specific to MCP security tooling versus custom build exists in the primary source materials; the figures below are scenario estimates based on the control surfaces described in the official documentation.
Watch Out: Treat all cost ranges in this section as scenario estimates, not benchmarks. Actual spend depends heavily on team size, tool count, compliance scope, and vendor contract terms.
| Scenario | Year-1 Spend Estimate | Primary Cost Drivers | Ongoing Annual Drag |
|---|---|---|---|
| Buy gateway + buy audit tooling | Scenario estimate: roughly $80K–$200K for a mid-market SaaS deployment | Vendor licensing, integration engineering, contract negotiation | Licensing renewals, vendor audit prep, SLA monitoring |
| Buy gateway + build audit/approvals in-house | Scenario estimate: roughly $40K–$120K plus about 1–2 engineer-quarters | Gateway licensing + internal log sink + approval workflow build | Gateway renewals + internal maintenance for approval logic |
| Build everything in-house | Scenario estimate: roughly 3–5 engineer-quarters upfront | Gateway, policy engine, log pipeline, approval UX, incident runbooks | Full maintenance burden across every MCP protocol update |
| No explicit security layer | Near-zero upfront | Technical debt | Compliance remediation cost at audit time — typically higher than prevention |
Upfront spend versus time-to-control
A purchased gateway reaches usable governance — policy enforcement, telemetry, access control — in weeks; a custom build typically requires a full quarter before the first production enforcement policy is live. Microsoft's mcp-gateway delivers telemetry, access control, observability, and lifecycle management as pre-built capabilities. The OpenAI Agents SDK explicitly positions application-owned approvals and state as the engineering team's responsibility in the build path — meaning a team that builds must supply and maintain those surfaces from scratch.
| Capability | Buy Path (Gateway Product) | Build Path (In-House) |
|---|---|---|
| Policy gateway enforcement live | Scenario estimate: 2–6 weeks for integration and config | Scenario estimate: 8–16 weeks for design, build, and test |
| Telemetry and observability | Included at gateway | 4–8 weeks (log pipeline + dashboards) |
| Agent security tooling alerting | Vendor-provided alert templates | Internal SIEM rule authoring |
| Approval workflow | Often limited or absent in gateway products | Custom build required regardless of path |
| Incident response runbooks | Vendor guidance only | Must be authored in-house either way |
The approval workflow row is the critical insight: no current gateway product eliminates the need to build approval UX and exception-handling logic internally. Both paths require that investment; the buy path simply starts the clock later.
Ongoing ownership: logging, approvals, and response workflows
The hidden cost of the build path is not the initial sprint — it is the recurring engineering cost of keeping approval logic and audit log pipelines current as MCP protocol semantics evolve. The MCP tools specification recommends confirmation prompts for tool operations; as the protocol evolves, any custom approval UX tied to tool invocation semantics must be updated. The Agents SDK documentation places approval state under application ownership, which means every SDK version update that changes the approval or state API surface requires an internal engineering response.
| Recurring Operation | Buy Path Ownership | Build Path Ownership | Compliance Impact if Neglected |
|---|---|---|---|
| Audit log retention and rotation | Shared (vendor stores; team configures) | Full in-house | SOC 2 / HIPAA audit findings |
| Approval exception handling | Team authors exceptions; vendor routes | Full in-house | Runaway agent actions under emergency access |
| Incident response evidence collection | Vendor provides export; team assembles | Full in-house | Regulatory inquiry cannot be satisfied |
| Protocol update compatibility | Vendor ships updates | Team engineers update | Policy gaps after MCP version upgrade |
Decision framework for regulated environments
The MCP security decision is not binary — it is a boundary placement question. The MCP security best practices documentation explicitly treats security as an additive layer over authorization, not a replacement for governance. The gateway versus custom-build choice applies differently to each control surface.
| Control Surface | Buy | Build | Split |
|---|---|---|---|
| Policy gateway enforcement | ✅ Compliance deadlines, small security team | ✅ Proprietary policy logic, data residency constraint | ✅ Buy gateway, add custom rules via policy-as-code |
| Audit logs integrity | ❌ Chain-of-custody risk for HIPAA/SOC 2 | ✅ SIEM integration, log signing in-house | ✅ Buy telemetry, retain log sink in-house |
| Approval workflows | ❌ Vendor UX rarely maps to internal roles | ✅ Own identity hierarchy, exception logic | Rare — approvals almost always built in-house |
| Incident response | ❌ Vendor SLA ≠ your RTO | ✅ Security team owns runbooks | ✅ Buy alerting, build response procedures |
The split rule is simple: buy the enforcement layer when speed, telemetry, and boundary control are the priority; build the internal workflows when audit integrity, approval ownership, or proprietary policy logic must remain inside your trust boundary.
When buying is the safer default
Teams with compliance deadlines measured in weeks, limited security engineering capacity, or no existing agent governance infrastructure should default to purchasing a gateway product. The mcp-gateway control plane delivers telemetry, access control, observability, and lifecycle management as day-one capabilities. For a team that needs to demonstrate policy enforcement to an auditor before a SOC 2 assessment window closes, shipping a custom gateway is the wrong tradeoff.
| Signal | Buy Is Appropriate |
|---|---|
| First MCP deployment with no existing security infrastructure | ✅ |
| Compliance deadline within one quarter | ✅ |
| Security team headcount under 3 FTEs | ✅ |
| No requirement for log custody to remain fully on-premises | ✅ |
| Agent security tooling from established vendor with SOC 2 attestation | ✅ |
| Audit scope limited to tool-call volume and access patterns (not content) | ✅ |
Before signing, verify the vendor's SOC 2 Type II report, confirm their log retention window matches your regulatory minimum, and test their incident-response SLA against your actual RTO. These details are often absent from marketing materials and must be contractually specified. Microsoft's mcp-gateway lists telemetry, access control, observability, and lifecycle management among its enterprise-facing capabilities, which is why it is often evaluated first in regulated environments.
When building inside the MCP stack is justified
Building is justified when regulatory requirements demand log chain-of-custody you cannot contractually obtain from a vendor, when your policy logic encodes proprietary business rules that cannot be exposed externally, or when your MCP tool surface changes frequently enough that a vendor's update cycle becomes a governance gap. The OpenAI Agents SDK guidance explicitly places orchestration, tool execution, approvals, and state under application ownership — which is appropriate when tighter control is the requirement, not a preference.
| Signal | Build Is Justified |
|---|---|
| HIPAA or PCI DSS requires log content to remain on-premises | ✅ |
| Policy logic encodes proprietary business rules or client-specific scopes | ✅ |
| MCP tool surface changes weekly (rapid product iteration) | ✅ |
| Audit log evidentiary chain must be demonstrably unbroken | ✅ |
| Policy gateway rules must be version-controlled alongside application code | ✅ |
| Security engineering team can sustain 0.5–1 FTE of ongoing MCP security maintenance | ✅ |
The maintenance cost is real: every MCP specification update that affects tool invocation semantics, authorization flows, or session management requires an internal engineering response. Teams that build should budget that response explicitly rather than treating it as overhead.
Vendor risk, compliance gaps, and failure modes to test before you sign
The questions that expose vendor risk in MCP security tooling are not about feature checklists — they are about what breaks under audit and what fails during a live incident. The MCP security best practices documentation identifies session hijacking and prompt injection as live attack vectors for MCP implementations. The MCP tools specification requires input and output validation to prevent injection attacks. Neither the protocol nor a vendor product eliminates these risks automatically — they are test conditions for any agent security tooling deployment.
| Failure Mode | Vendor Risk Signal | In-House Risk Signal |
|---|---|---|
| Prompt injection bypasses policy gateway | Vendor's input validation not configurable | Custom validation logic missed edge cases |
| Tool poisoning via unvetted MCP server | Vendor tool allowlist not enforced at boundary | No allowlist enforcement built |
| Session hijacking via stolen token | Vendor's token rotation interval too long | Token rotation not implemented |
| Audit log gap during vendor outage | Vendor SLA doesn't cover log delivery | Log pipeline not redundant |
| Policy exception escalates silently | Vendor approval path has no fallback | Exception logic not tested under load |
Questions to ask about data retention and log integrity
External audit tooling creates a new audit surface: the vendor's own log pipeline becomes an evidentiary dependency. The Agents SDK documentation confirms that applications can retain approval state and logs in-house; the mcp-gateway repository advertises telemetry and observability, but no public retention-period guarantee or log-integrity SLA appears in the available source materials.
| Question | Why It Matters | Red Flag Answer |
|---|---|---|
| What is the minimum log retention period, and is it contractually guaranteed? | General compliance evidence requirements depend on retention windows | "Configurable by customer" without a minimum floor |
| Are audit logs write-once and cryptographically signed? | Evidentiary integrity requires tamper evidence | "Logs are stored in our database" with no signing mechanism |
| Can you export the complete log corpus on demand? | Regulatory inquiry requires full log production | Export limited by rate or record count |
| Where do log writes fail when the vendor's pipeline is degraded? | Gaps in log coverage are audit findings | "Best-effort delivery" during outages |
| Who controls the encryption keys for stored logs? | Key custody affects evidentiary chain | Vendor holds master keys with no customer-managed key option |
Failure modes in approval paths and policy exceptions
Approval workflows fail in two predictable ways under real operational pressure: humans route around them under urgency, or the exception path becomes the default path. The MCP tools specification recommends that implementations present confirmation prompts for tool operations to keep a human in the loop. The Agents SDK documentation frames approvals as application-owned, meaning emergency-access logic and exception handling are the implementer's responsibility in either path.
| Failure Mode | Policy Gateway Symptom | Agent Security Tooling Symptom |
|---|---|---|
| Emergency access bypass | Ops team creates standing "break-glass" exception that never expires | Vendor emergency access path not logged at same fidelity as standard path |
| Approval queue saturation | Agents block on pending approvals; team creates blanket pre-approvals | Vendor approval SLA degrades under load; timeouts auto-approve |
| Policy exception becomes policy | One-time exception propagates to production policy without review | Vendor configuration drift not captured in change log |
| Approver availability gap | No backup approver defined; single person becomes bottleneck | Vendor on-call for approval system failures not mapped to your incident response |
Test these scenarios explicitly before go-live. For regulated environments, the test protocol should include: simulating a saturated approval queue, revoking an approver's access mid-session, injecting a prompt into a tool response to verify gateway-level filtering, and triggering a gateway outage to confirm log continuity.
FAQ
How do you secure MCP?
Security for MCP deployments requires layering controls above the protocol's native OAuth 2.1 authorization. The MCP security best practices documentation identifies prompt injection, session hijacking, and tool poisoning as active risk categories. Minimum effective controls include: a policy gateway enforcing tool allowlists and destructive-action gates; structured audit logs with tamper evidence; input/output validation at the server or gateway boundary; and human-in-the-loop approval for irreversible actions.
Does MCP support authorization?
Yes, with an important scope caveat. MCP Authorization uses OAuth 2.1 conventions for client-server trust and mandates OAuth 2.0 Protected Resource Metadata (RFC 9728). This handles authentication and delegated access — it does not handle tool-level policy enforcement, audit logging, or approval workflows. Those require additional controls.
What is an MCP gateway?
A reverse proxy and control plane that enforces access, telemetry, and tool routing at the boundary. microsoft/mcp-gateway is the reference implementation most often cited in regulated buying decisions.
Should I build or buy security tooling?
Buy the gateway for enforcement velocity; build or tightly control audit logs and approvals.
What are the risks of MCP in enterprise?
Prompt injection, tool poisoning, session hijack, audit log gaps, and approval path bypass are the main failure modes to test.
Which audit logs must stay in-house?
Logs containing PHI, PII, or approval chain-of-custody for HIPAA or SOC 2 scope should stay in-house.
Sources & References
- Model Context Protocol — Official Site — Home page for the open-source MCP standard; confirms broad ecosystem support, tool discovery, and delegated agent actions
- MCP Authorization Tutorial — Official documentation describing OAuth 2.1-based authorization flows and the MCP trust model
- MCP Authorization Specification (Draft) — Normative specification mandating RFC 9728 (Protected Resource Metadata) and permitting RFC 7591 (Dynamic Client Registration)
- MCP Authorization Specification (2025-03-26) — Stable version describing agent-credential patterns and scoped access examples
- MCP Security Best Practices — Official guidance identifying prompt injection, session hijacking, and tool poisoning as MCP-specific attack vectors; explicitly complementary to the authorization spec
- MCP Tools Specification (2025-06-18) — Normative spec requiring input/output validation and human-in-the-loop confirmation prompts for tool operations
- microsoft/mcp-gateway — GitHub Repository — Microsoft's MCP Gateway described as a reverse proxy and control plane with telemetry, access control, observability, and lifecycle management
- OpenAI Agents SDK Guide — OpenAI's guidance distinguishing application-owned orchestration, approvals, and state from hosted editor paths
Keywords: Model Context Protocol, MCP Authorization, OAuth 2.1, MCP Gateway, policy gateway, audit logs, LangGraph, OpenAI Agents SDK, Claude Desktop, Zero Trust Architecture, tool poisoning, prompt injection, SOC 2, HIPAA, microsoft/mcp-gateway

