Rate Limits | JoyToken | Documentation

The current project does not enforce a traditional fixed QPS rate limit as the primary limit. The implemented limit mechanisms are API key quota, policy constraints, and wallet balance precheck. Client docs should describe those real behaviors.

If the gateway later adds requests-per-minute, concurrency, or token-per-minute limits, add them here. Do not promise fixed QPS, RPM, or TPM quotas yet.

Current Enforcement

Limit	Source	Behavior
Daily quota	`limit_daily` from API key validation + Finance usage counter	Returns `402 insufficient_quota`
Weekly quota	`limit_weekly` from API key validation + Finance usage counter	Returns `402 insufficient_quota`
Wallet balance	wallet-service / finance client	Current tier insufficient balance returns `402 insufficient_quota`; `auto` mode may try another tier
Fixed model	API key `fixed_model`	Requests for other concrete models return `403 policy_rejected`
Model blacklist	Policy snapshot	Matching model returns `403 policy_rejected`
Tier allowlist	Policy snapshot / API key tier / request `tier`	Disallowed tier returns `403 policy_rejected`
IP allowlist/blocklist	Policy snapshot	Disallowed source IP returns `403 policy_rejected`

Wallet Fallback

When the request uses model: "auto" or omits model, JoyToken may try a different tier when the selected tier lacks balance. Current order:

Current tier	Fallback order
`premium`	`standard` -> `economy`
`standard`	`premium` -> `economy`
`economy`	`standard` -> `premium`

If the request specifies a concrete model, the gateway does not automatically switch to another model because of insufficient balance.

Client Handling

Error	Retry?	Recommendation
`400 invalid_request_error`	No	Fix request body, `messages`, or body size
`401 missing_api_key`	No	Add Bearer API key
`403 invalid_api_key`	No	Check whether API key is valid and `ACTIVE`
`403 policy_rejected`	No	Adjust policy, IP, tier, or model
`402 insufficient_quota`	No	Top up, switch billing account, adjust key quota, or lower tier
`502 routing_error`	Short backoff	Could be routing service or candidate model issue
`502 upstream_error`	Short backoff	Could be temporary provider failure
`503` / `504`	Yes	Use exponential backoff and keep request logs

Exponential Backoff Example

retry.ts

1 const retryableStatuses = new Set([502, 503, 504]);
2 
3 export async function withJoyTokenRetry<T>(fn: () => Promise<T>, attempts = 3) {
4   let lastError: unknown;
5 
6   for (let attempt = 0; attempt < attempts; attempt += 1) {
7     try {
8       return await fn();
9     } catch (error: any) {
10       lastError = error;
11       const status = error?.status ?? error?.response?.status;
12       if (!retryableStatuses.has(status) || attempt === attempts - 1) {
13         throw error;
14       }
15       await new Promise((resolve) => setTimeout(resolve, 300 * 2 ** attempt));
16     }
17   }
18 
19   throw lastError;
20 }

Infinite retries on 402 and 403 usually amplify traffic. Fix the account, policy, or request configuration first.