Rate Limits

The current project does not enforce a traditional fixed QPS rate limit as the primary limit. The implemented limit mechanisms are API key quota, policy constraints, and wallet balance precheck. Client docs should describe those real behaviors.

If the gateway later adds requests-per-minute, concurrency, or token-per-minute limits, add them here. Do not promise fixed QPS, RPM, or TPM quotas yet.

Current Enforcement

LimitSourceBehavior
Daily quotalimit_daily from API key validation + Finance usage counterReturns 402 insufficient_quota
Weekly quotalimit_weekly from API key validation + Finance usage counterReturns 402 insufficient_quota
Wallet balancewallet-service / finance clientCurrent tier insufficient balance returns 402 insufficient_quota; auto mode may try another tier
Fixed modelAPI key fixed_modelRequests for other concrete models return 403 policy_rejected
Model blacklistPolicy snapshotMatching model returns 403 policy_rejected
Tier allowlistPolicy snapshot / API key tier / request tierDisallowed tier returns 403 policy_rejected
IP allowlist/blocklistPolicy snapshotDisallowed source IP returns 403 policy_rejected

Wallet Fallback

When the request uses model: "auto" or omits model, JoyToken may try a different tier when the selected tier lacks balance. Current order:

Current tierFallback order
premiumstandard -> economy
standardpremium -> economy
economystandard -> premium

If the request specifies a concrete model, the gateway does not automatically switch to another model because of insufficient balance.

Client Handling

ErrorRetry?Recommendation
400 invalid_request_errorNoFix request body, messages, or body size
401 missing_api_keyNoAdd Bearer API key
403 invalid_api_keyNoCheck whether API key is valid and ACTIVE
403 policy_rejectedNoAdjust policy, IP, tier, or model
402 insufficient_quotaNoTop up, switch billing account, adjust key quota, or lower tier
502 routing_errorShort backoffCould be routing service or candidate model issue
502 upstream_errorShort backoffCould be temporary provider failure
503 / 504YesUse exponential backoff and keep request logs

Exponential Backoff Example

retry.ts
1const retryableStatuses = new Set([502, 503, 504]);
2
3export async function withJoyTokenRetry<T>(fn: () => Promise<T>, attempts = 3) {
4 let lastError: unknown;
5
6 for (let attempt = 0; attempt < attempts; attempt += 1) {
7 try {
8 return await fn();
9 } catch (error: any) {
10 lastError = error;
11 const status = error?.status ?? error?.response?.status;
12 if (!retryableStatuses.has(status) || attempt === attempts - 1) {
13 throw error;
14 }
15 await new Promise((resolve) => setTimeout(resolve, 300 * 2 ** attempt));
16 }
17 }
18
19 throw lastError;
20}

Infinite retries on 402 and 403 usually amplify traffic. Fix the account, policy, or request configuration first.