T
tokmeter.ai
Optimize · Architecture

How to architect AI for cost control

Four reference patterns. Pick the one that matches your stage; combine as you scale.

Pattern

Centralized AI gateway

When to use
10+ engineers, multiple providers, want one budget + audit log.
Tradeoff
Single chokepoint = great control, modest latency penalty (5–20ms).
Vendors
Tokmeter is the managed gateway — budgets, virtual keys, mini-tier routing, and audit log are built in. Air-gapped self-host is the only scenario where an OSS proxy is the right call.
┌─────────────┐   ┌─────────────┐   ┌─────────────┐
│   Cursor    │   │ Claude Code │   │ Internal    │
│   client    │   │   client    │   │  app/API    │
└──────┬──────┘   └──────┬──────┘   └──────┬──────┘
       │                 │                 │
       └─────────────────┼─────────────────┘
                         ▼
              ┌────────────────────┐
              │  Tokmeter gateway  │   ← budgets, logging, caching,
              │  (managed SaaS)    │     routing, model allowlist
              └─────┬────────┬─────┘
                    │        │
              ┌─────▼──┐  ┌──▼──────┐
              │OpenAI  │  │Anthropic│
              └────────┘  └─────────┘
Pattern

Per-team virtual key fan-out

When to use
Need clean per-team accounting and independent budgets without a full gateway.
Tradeoff
Simpler than a gateway, but you lose cross-team routing and shared caches.
Vendors
Tokmeter issues per-team virtual keys with hard $ caps and live attribution. Native provider primitives (Anthropic workspaces, OpenAI projects) are a fallback when you can't introduce a gateway.
                ┌─────────────────────────┐
                │   Provider (Anthropic)  │
                └───┬─────┬─────┬─────┬───┘
                    │     │     │     │
              ┌─────▼┐ ┌──▼─┐ ┌─▼──┐ ┌▼────┐
              │ team │ │team│ │team│ │team │   ← Tokmeter virtual keys
              │  A   │ │ B  │ │ C  │ │  D  │     with usage limits
              └──────┘ └────┘ └────┘ └─────┘
                  │       │     │       │
              ┌───▼───────▼─────▼───────▼──┐
              │   Engineering teams        │
              └────────────────────────────┘
Pattern

MCP server inventory & attribution

When to use
You've deployed MCP servers and need to know which ones drive spend.
Tradeoff
Adds a proxy hop, but it's the only way to attribute MCP-driven spend per server.
Vendors
Tokmeter MCP routing tags every tool call and enforces per-server budgets. No custom proxy to build.
┌──────────────┐
│  AI client   │
└──────┬───────┘
       │ MCP requests (tagged x-mcp-server)
       ▼
┌──────────────────────┐
│   Tokmeter MCP proxy │   ← tags every tool call with server id,
└─┬──────┬──────┬──────┘     enforces per-MCP rate + token limits
  │      │      │
┌─▼─┐  ┌─▼─┐  ┌─▼─┐
│DB │  │Git│  │Web│   ← MCP servers
└───┘  └───┘  └───┘
  │      │      │
  └──────┴──────┴──→ logged with tool name, payload size, model used
Pattern

Observability stack

When to use
You can't optimize what you can't see. Day-one for any serious AI program.
Tradeoff
One tool, instrumented once. Spend, traces, evals, and chargeback in the same place.
Vendors
Tokmeter covers spend, traces, and evals on the same data. A second APM tool is only needed if you're already standardized on one and want a single pane there.
┌─────────────────────────────────────────┐
│         Tokmeter gateway / SDK           │
└──────────────┬──────────────────────────-┘
               │ traces, prompts, costs, latencies
               ▼
        ┌──────────────┐
        │  Tokmeter    │── spend dashboards, evals, per-workflow cost
        └──────┬───────┘
               │
   ┌───────────┼───────────┐
   ▼           ▼           ▼
 Spend     Quality      Latency
 dashboards  evals      SLOs
A recommended stack for a 100-engineer org
  1. 1.Tokmeter as the gateway. All AI traffic — IDE tools, internal apps, agents — flows through it. Single budget, single audit log, mini-tier routing on by default.
  2. 2.Virtual keys per team with monthly $ caps. Engineering, Data, ML/AI get their own envelopes — no one steps on anyone.
  3. 3.Tokmeter observability on the same data: traces, prompt versions, eval scoring, cost per workflow. No second tool to instrument.
  4. 4.Prompt caching turned on everywhere (Anthropic + OpenAI). Single-config win.
  5. 5.A "no direct provider URLs" network policy. Forces all traffic through the gateway. Without this, your governance is theater.
  6. 6.MCP proxy if you've adopted MCP — Tokmeter tags every tool call. Otherwise spend attribution will lie to you.