AIAlso: LLM token cost, Tokens

    Token cost

    Per-token pricing charged by LLM providers for input and output — the primary cost driver for AI applications at scale.

    Updated 2026-04-22 · 3 min read

    Definition

    Large language model providers price requests by tokens — roughly a sub-word unit. Inputs (prompt + context) and outputs (generated completion) are metered separately, usually at different rates, with output priced 3–5× input.

    Why it matters

    For most production AI workloads, token cost dwarfs everything else on the bill. The biggest wins come from reducing tokens, not from squeezing out a last few percent on infrastructure: prompt compression, retrieval-based context pruning, caching deterministic prefixes, and picking the smallest model that meets quality bar.

    Optimization levers

    • Shorten prompts — system prompts balloon quietly; audit them.
    • Truncate retrieved context — reranking beats stuffing.
    • Cache prefix tokens where providers support it (Anthropic, OpenAI).
    • Route by difficulty — small model first, escalate only when needed.
    • Quantise or self-host once volume justifies the fixed cost.

    FAQ

    Stay ahead of cloud, SaaS, and AI spend

    Research, governance frameworks, and cost intelligence for IT leaders managing modern technology spend.

    Your privacy is important to us.