Token-based pricing
Billing for LLM API usage by tokens processed — input and output text converted to billable units that scale with every request.
Updated 2026-05-23 · 3 min read
Definition
Token-based pricing charges for large language model API usage according to the number of tokens — chunks of text — processed on input and output. Vendors define tokens differently, but pricing always scales with volume and model tier.
Why it matters
Pilot costs rarely predict production spend. A workflow that looked affordable at thousands of requests per day can become a material line item at millions — especially with premium models.
Related Terms
Token cost
Per-token pricing charged by LLM providers for input and output — the primary cost driver for AI applications at scale.
Large language model
AI models trained on large text corpora that power chat, coding, and search assistants — usually billed per token or via enterprise seat licenses.
AI inference cost
The cost to run trained models in production — API calls, GPU compute, and hosted endpoints — distinct from one-off training spend.