Rate Limits
The current project does not enforce a traditional fixed QPS rate limit as the primary limit. The implemented limit mechanisms are API key quota, policy constraints, and wallet balance precheck. Client docs should describe those real behaviors.
If the gateway later adds requests-per-minute, concurrency, or token-per-minute limits, add them here. Do not promise fixed QPS, RPM, or TPM quotas yet.
Current Enforcement
Wallet Fallback
When the request uses model: "auto" or omits model, JoyToken may try a different tier when the selected tier lacks balance. Current order:
If the request specifies a concrete model, the gateway does not automatically switch to another model because of insufficient balance.
Client Handling
Exponential Backoff Example
Infinite retries on 402 and 403 usually amplify traffic. Fix the account, policy, or request configuration first.