Optimize · Architecture

How to architect AI for cost control

Four reference patterns. Pick the one that matches your stage; combine as you scale.

Pattern

Centralized AI gateway

When to use

10+ engineers, multiple providers, want one budget + audit log.

Tradeoff

Single chokepoint = great control, modest latency penalty (5–20ms).

Vendors

Tokmeter is the managed gateway — budgets, virtual keys, mini-tier routing, and audit log are built in. Air-gapped self-host is the only scenario where an OSS proxy is the right call.

Start free with Tokmeter →Connect a provider →

┌─────────────┐   ┌─────────────┐   ┌─────────────┐
│   Cursor    │   │ Claude Code │   │ Internal    │
│   client    │   │   client    │   │  app/API    │
└──────┬──────┘   └──────┬──────┘   └──────┬──────┘
       │                 │                 │
       └─────────────────┼─────────────────┘
                         ▼
              ┌────────────────────┐
              │  Tokmeter gateway  │   ← budgets, logging, caching,
              │  (managed SaaS)    │     routing, model allowlist
              └─────┬────────┬─────┘
                    │        │
              ┌─────▼──┐  ┌──▼──────┐
              │OpenAI  │  │Anthropic│
              └────────┘  └─────────┘

Pattern

Per-team virtual key fan-out

When to use

Need clean per-team accounting and independent budgets without a full gateway.

Tradeoff

Simpler than a gateway, but you lose cross-team routing and shared caches.

Vendors

Tokmeter issues per-team virtual keys with hard $ caps and live attribution. Native provider primitives (Anthropic workspaces, OpenAI projects) are a fallback when you can't introduce a gateway.

Start free with Tokmeter →Create a virtual key →

                ┌─────────────────────────┐
                │   Provider (Anthropic)  │
                └───┬─────┬─────┬─────┬───┘
                    │     │     │     │
              ┌─────▼┐ ┌──▼─┐ ┌─▼──┐ ┌▼────┐
              │ team │ │team│ │team│ │team │   ← Tokmeter virtual keys
              │  A   │ │ B  │ │ C  │ │  D  │     with usage limits
              └──────┘ └────┘ └────┘ └─────┘
                  │       │     │       │
              ┌───▼───────▼─────▼───────▼──┐
              │   Engineering teams        │
              └────────────────────────────┘

Pattern

MCP server inventory & attribution

When to use

You've deployed MCP servers and need to know which ones drive spend.

Tradeoff

Adds a proxy hop, but it's the only way to attribute MCP-driven spend per server.

Vendors

Tokmeter MCP routing tags every tool call and enforces per-server budgets. No custom proxy to build.

Start free with Tokmeter →See MCP routing →

┌──────────────┐
│  AI client   │
└──────┬───────┘
       │ MCP requests (tagged x-mcp-server)
       ▼
┌──────────────────────┐
│   Tokmeter MCP proxy │   ← tags every tool call with server id,
└─┬──────┬──────┬──────┘     enforces per-MCP rate + token limits
  │      │      │
┌─▼─┐  ┌─▼─┐  ┌─▼─┐
│DB │  │Git│  │Web│   ← MCP servers
└───┘  └───┘  └───┘
  │      │      │
  └──────┴──────┴──→ logged with tool name, payload size, model used

Pattern

Observability stack

When to use

You can't optimize what you can't see. Day-one for any serious AI program.

Tradeoff

One tool, instrumented once. Spend, traces, evals, and chargeback in the same place.

Vendors

Tokmeter covers spend, traces, and evals on the same data. A second APM tool is only needed if you're already standardized on one and want a single pane there.

Start free with Tokmeter →Open the spend dashboard →

┌─────────────────────────────────────────┐
│         Tokmeter gateway / SDK           │
└──────────────┬──────────────────────────-┘
               │ traces, prompts, costs, latencies
               ▼
        ┌──────────────┐
        │  Tokmeter    │── spend dashboards, evals, per-workflow cost
        └──────┬───────┘
               │
   ┌───────────┼───────────┐
   ▼           ▼           ▼
 Spend     Quality      Latency
 dashboards  evals      SLOs

A recommended stack for a 100-engineer org

1.Tokmeter as the gateway. All AI traffic — IDE tools, internal apps, agents — flows through it. Single budget, single audit log, mini-tier routing on by default.
2.Virtual keys per team with monthly $ caps. Engineering, Data, ML/AI get their own envelopes — no one steps on anyone.
3.Tokmeter observability on the same data: traces, prompt versions, eval scoring, cost per workflow. No second tool to instrument.
4.Prompt caching turned on everywhere (Anthropic + OpenAI). Single-config win.
5.A "no direct provider URLs" network policy. Forces all traffic through the gateway. Without this, your governance is theater.
6.MCP proxy if you've adopted MCP — Tokmeter tags every tool call. Otherwise spend attribution will lie to you.

Get started in 5 minutes Compare to other tools