Lesson 4. API Limits and Request Costs#
Goal: understand that APIs have limits and how to avoid exceeding them.
What Are Rate Limits#
APIs can't handle unlimited requests. So services set limits:
- requests per second (e.g., 10 requests/sec)
- requests per minute (e.g., 100 requests/min)
- requests per day (e.g., 10,000 requests/day)
If you exceed the limit, the API returns an error:
{
"error": "Rate limit exceeded. Try again in 60 seconds."
}
Why Limits Exist#
Limits protect the service from:
- overload (if everyone makes a million requests at once, the service goes down)
- abuse (e.g., DDoS attacks)
- inefficient use (if your agent makes 1000 requests instead of 1, that's bad code)
How to Find Out the Limits#
Limits are always in the API documentation. Look for:
- Rate Limits
- Quotas
- Usage Limits
Example: OpenAI API (ChatGPT)
On the free tier (2026):
- 3 requests per minute (RPM)
- 200 requests per day (RPD)
On the paid tier (Pay-as-you-go):
- 3,500 requests per minute (RPM) for GPT-5.2
- 10,000 requests per minute (RPM) for GPT-4o mini
How to Avoid Exceeding Limits#
1. Make requests only when needed
Don't make a request for every user message if you can handle it locally.
2. Use caching
If data doesn't change often (e.g., product list), cache it for 5–10 minutes.
3. Use batch requests
Some APIs support bulk requests (e.g., read 100 clients in one request instead of 100 separate requests).
4. Add delays (throttling)
If the API allows 10 requests per second, add a 100 ms delay between requests.
5. Handle errors
If the API returns "Rate limit exceeded", wait the specified time and retry.
Request Costs#
Many APIs are paid. Cost depends on:
- number of requests (e.g., $0.01 per 1000 requests)
- data volume (e.g., $0.02 per 1 GB transferred)
- resource usage (e.g., OpenAI charges per token count)
Example: OpenAI API (2026)
- GPT-5.2 Pro: $0.015 per 1000 input tokens, $0.045 per 1000 output tokens
- GPT-5.2: $0.01 per 1000 input tokens, $0.03 per 1000 output tokens
- GPT-4o mini: $0.00015 per 1000 input tokens, $0.0006 per 1000 output tokens (100x cheaper than GPT-5.2!)
Alternatives (Chinese models, 2026):
- DeepSeek-R1: $0.003 per 1000 input tokens (5x cheaper than GPT-5.2)
- GLM-4.5: $0.004 per 1000 input tokens
- Kimi K2: $0.005 per 1000 input tokens
What is a token? Roughly 4 characters. The phrase "Hello, how are you?" is ~5 tokens.
Example calculation:
- User writes 100 characters (~25 input tokens)
- Agent replies 400 characters (~100 output tokens)
- Cost (GPT-5.2): (25 / 1000) × $0.01 + (100 / 1000) × $0.03 = $0.0025 + $0.003 = $0.0055 (~$0.006 per conversation)
- Cost (DeepSeek-R1): (25 / 1000) × $0.003 + (100 / 1000) × $0.003 = $0.000075 + $0.0003 = ~$0.0004 per conversation (15x cheaper!)
- If 1000 conversations per month → GPT-5.2: ~$6/month, DeepSeek: ~$0.40/month
How to Control Costs#
1. Set limits in the service settings
Many services let you set a spending limit (e.g., "no more than $50 per month").
2. Monitor usage
Check API usage stats once a week.
3. Use cheaper models
If GPT-5.2 is too expensive → use GPT-4o mini (100x cheaper) or Chinese models DeepSeek/GLM (3–5x cheaper).
4. Optimize prompts
Shorter prompts → fewer tokens → lower cost.