Usage & Billing | JoyToken

JoyToken billing is built around tier wallets, router model rates, billing-service calculation, usage records, and analytics aggregation. Successful model responses keep the OpenAI-compatible structure and add JoyToken cost fields in metadata.billing.

Request Lifecycle

Validate API Key
  -> policy and wallet precheck
  -> route model
  -> estimate credits
  -> wallet freeze
  -> provider invoke
  -> calculate usage
  -> record usage
  -> wallet settle or release

Credit Tiers

The current system manages wallets and routing across three tiers:

Tier	Common code	Use
`economy`	`E`	Cost-first, lower-cost models, batch work
`standard`	`S`	Default balanced tier
`premium`	`P`	Higher-quality, higher-cost, or higher-SLA requests

Do not hand-calculate credits. The gateway asks router-service for rates and billing-service calculates cost from tokens and rates.

Freeze

When wallet quota is enabled, the gateway freezes credits before invoking the provider.

Estimation inputs:

Input	Source
Input token estimate	Roughly derived from message character count
Output token estimate	Request `max_tokens`; otherwise gateway default max tokens; otherwise 1024
Input/output price	Selected model customer rates returned by router
Minimum freeze	When estimate is above zero, at least `0.000001` credits; otherwise configured default freeze or 1 credit

Provider failure releases the freeze. If billing calculation or usage recording fails after provider success, the gateway also releases instead of charging incorrectly.

Cost Calculation

The gateway calls:

router-service GetBillingRates: fetch model rates.
billing-service Calculate: compute credits_used, usd_cost, provider cost, and margin.
billing-service RecordUsage: idempotently record usage by request_id.
wallet-service Settle: settle the freeze with actual credits.

Common metadata.billing fields:

Field	Meaning
`credits_used`	Credits consumed by this request
`input_tokens`	Input tokens
`output_tokens`	Output tokens
`cached_input_tokens`	Cache read / cached input tokens
`cached_output_tokens`	Cache write / cached output tokens

Usage Record Fields

The gateway sends these fields when recording usage:

Field	Description
`request_id`	Idempotency key for the usage record, generated as `usage-*`
`user_id` / `tenant_id`	Personal or organization ownership
`api_key_id`	API key attribution
`model_key`	Requested or routed model key
`model`	Provider-returned model, or selected model when missing
`tier`	Billing tier
`billing_mode`	`PERSONAL` or `ORG`
`provider`	Actual provider
`input_tokens` / `output_tokens`	Token usage
`cache_read_tokens` / `cache_write_tokens`	Cache-related tokens
`upstream_latency_ms`	Provider latency
`failover_triggered`	Whether provider failover occurred
`customer_*_rate`	Customer-facing rates
`provider_*_rate_usd`	Provider cost rates

Streaming Billing

Streaming responses forward provider chunks and parse usage as the stream progresses. Before the stream ends, the gateway appends a metadata event:

Streaming tail

data: {"metadata":{"tier":"standard","billing":{"credits_used":"0.2288","input_tokens":54,"output_tokens":545}}}
data: [DONE]

If the stream does not produce usable usage, the gateway releases the freeze and records the stream as not settleable from usage.

Analytics

Billing and analytics services provide these aggregate views:

View	Use
Usage summary	Total requests, tokens, credits, USD cost
Cost trend	Daily, weekly, monthly cost trend
Model ranking	Ranking by model
API key ranking	Cost attribution by key
Member ranking	Team member cost attribution
Tier distribution	economy / standard / premium distribution
Usage metrics	Cache hit rate, decision latency, upstream latency

Cost Control

Use separate API keys per environment and workflow.
Give IDE, agent, and RAG indexing workloads separate budgets.
Prefer model: "auto" plus tier policy for cost governance.
Pin model or tier for critical production paths to avoid cost drift.
Use X-Request-ID to connect application logs with JoyToken Usage / Billing.