Usage & Billing

JoyToken billing is built around tier wallets, router model rates, billing-service calculation, usage records, and analytics aggregation. Successful model responses keep the OpenAI-compatible structure and add JoyToken cost fields in metadata.billing.

Request Lifecycle

Validate API Key
-> policy and wallet precheck
-> route model
-> estimate credits
-> wallet freeze
-> provider invoke
-> calculate usage
-> record usage
-> wallet settle or release

Credit Tiers

The current system manages wallets and routing across three tiers:

TierCommon codeUse
economyECost-first, lower-cost models, batch work
standardSDefault balanced tier
premiumPHigher-quality, higher-cost, or higher-SLA requests

Do not hand-calculate credits. The gateway asks router-service for rates and billing-service calculates cost from tokens and rates.

Freeze

When wallet quota is enabled, the gateway freezes credits before invoking the provider.

Estimation inputs:

InputSource
Input token estimateRoughly derived from message character count
Output token estimateRequest max_tokens; otherwise gateway default max tokens; otherwise 1024
Input/output priceSelected model customer rates returned by router
Minimum freezeWhen estimate is above zero, at least 0.000001 credits; otherwise configured default freeze or 1 credit

Provider failure releases the freeze. If billing calculation or usage recording fails after provider success, the gateway also releases instead of charging incorrectly.

Cost Calculation

The gateway calls:

  • router-service GetBillingRates: fetch model rates.
  • billing-service Calculate: compute credits_used, usd_cost, provider cost, and margin.
  • billing-service RecordUsage: idempotently record usage by request_id.
  • wallet-service Settle: settle the freeze with actual credits.

Common metadata.billing fields:

FieldMeaning
credits_usedCredits consumed by this request
input_tokensInput tokens
output_tokensOutput tokens
cached_input_tokensCache read / cached input tokens
cached_output_tokensCache write / cached output tokens

Usage Record Fields

The gateway sends these fields when recording usage:

FieldDescription
request_idIdempotency key for the usage record, generated as usage-*
user_id / tenant_idPersonal or organization ownership
api_key_idAPI key attribution
model_keyRequested or routed model key
modelProvider-returned model, or selected model when missing
tierBilling tier
billing_modePERSONAL or ORG
providerActual provider
input_tokens / output_tokensToken usage
cache_read_tokens / cache_write_tokensCache-related tokens
upstream_latency_msProvider latency
failover_triggeredWhether provider failover occurred
customer_*_rateCustomer-facing rates
provider_*_rate_usdProvider cost rates

Streaming Billing

Streaming responses forward provider chunks and parse usage as the stream progresses. Before the stream ends, the gateway appends a metadata event:

Streaming tail
data: {"metadata":{"tier":"standard","billing":{"credits_used":"0.2288","input_tokens":54,"output_tokens":545}}}
data: [DONE]

If the stream does not produce usable usage, the gateway releases the freeze and records the stream as not settleable from usage.

Analytics

Billing and analytics services provide these aggregate views:

ViewUse
Usage summaryTotal requests, tokens, credits, USD cost
Cost trendDaily, weekly, monthly cost trend
Model rankingRanking by model
API key rankingCost attribution by key
Member rankingTeam member cost attribution
Tier distributioneconomy / standard / premium distribution
Usage metricsCache hit rate, decision latency, upstream latency

Cost Control

  • Use separate API keys per environment and workflow.
  • Give IDE, agent, and RAG indexing workloads separate budgets.
  • Prefer model: "auto" plus tier policy for cost governance.
  • Pin model or tier for critical production paths to avoid cost drift.
  • Use X-Request-ID to connect application logs with JoyToken Usage / Billing.