Module 09Lesson 4

Lesson 4. API Limits and Request Costs

Hands-on: Zapier

Lesson 4. API Limits and Request Costs#

Goal: understand that APIs have limits and how to avoid exceeding them.

What Are Rate Limits#

APIs can't handle unlimited requests. So services set limits:

  • requests per second (e.g., 10 requests/sec)
  • requests per minute (e.g., 100 requests/min)
  • requests per day (e.g., 10,000 requests/day)

If you exceed the limit, the API returns an error:

{
  "error": "Rate limit exceeded. Try again in 60 seconds."
}

Why Limits Exist#

Limits protect the service from:

  • overload (if everyone makes a million requests at once, the service goes down)
  • abuse (e.g., DDoS attacks)
  • inefficient use (if your agent makes 1000 requests instead of 1, that's bad code)

How to Find Out the Limits#

Limits are always in the API documentation. Look for:

  • Rate Limits
  • Quotas
  • Usage Limits

Example: OpenAI API (ChatGPT)

On the free tier (2026):

  • 3 requests per minute (RPM)
  • 200 requests per day (RPD)

On the paid tier (Pay-as-you-go):

  • 3,500 requests per minute (RPM) for GPT-5.2
  • 10,000 requests per minute (RPM) for GPT-4o mini

How to Avoid Exceeding Limits#

1. Make requests only when needed

Don't make a request for every user message if you can handle it locally.

2. Use caching

If data doesn't change often (e.g., product list), cache it for 5–10 minutes.

3. Use batch requests

Some APIs support bulk requests (e.g., read 100 clients in one request instead of 100 separate requests).

4. Add delays (throttling)

If the API allows 10 requests per second, add a 100 ms delay between requests.

5. Handle errors

If the API returns "Rate limit exceeded", wait the specified time and retry.

Request Costs#

Many APIs are paid. Cost depends on:

  • number of requests (e.g., $0.01 per 1000 requests)
  • data volume (e.g., $0.02 per 1 GB transferred)
  • resource usage (e.g., OpenAI charges per token count)

Example: OpenAI API (2026)

  • GPT-5.2 Pro: $0.015 per 1000 input tokens, $0.045 per 1000 output tokens
  • GPT-5.2: $0.01 per 1000 input tokens, $0.03 per 1000 output tokens
  • GPT-4o mini: $0.00015 per 1000 input tokens, $0.0006 per 1000 output tokens (100x cheaper than GPT-5.2!)

Alternatives (Chinese models, 2026):

  • DeepSeek-R1: $0.003 per 1000 input tokens (5x cheaper than GPT-5.2)
  • GLM-4.5: $0.004 per 1000 input tokens
  • Kimi K2: $0.005 per 1000 input tokens

What is a token? Roughly 4 characters. The phrase "Hello, how are you?" is ~5 tokens.

Example calculation:

  • User writes 100 characters (~25 input tokens)
  • Agent replies 400 characters (~100 output tokens)
  • Cost (GPT-5.2): (25 / 1000) × $0.01 + (100 / 1000) × $0.03 = $0.0025 + $0.003 = $0.0055 (~$0.006 per conversation)
  • Cost (DeepSeek-R1): (25 / 1000) × $0.003 + (100 / 1000) × $0.003 = $0.000075 + $0.0003 = ~$0.0004 per conversation (15x cheaper!)
  • If 1000 conversations per month → GPT-5.2: ~$6/month, DeepSeek: ~$0.40/month

How to Control Costs#

1. Set limits in the service settings

Many services let you set a spending limit (e.g., "no more than $50 per month").

2. Monitor usage

Check API usage stats once a week.

3. Use cheaper models

If GPT-5.2 is too expensive → use GPT-4o mini (100x cheaper) or Chinese models DeepSeek/GLM (3–5x cheaper).

4. Optimize prompts

Shorter prompts → fewer tokens → lower cost.