AI Agents for Non-Techies

Lesson 3. Monitoring: How to Keep an Eye on Agent Health#

Goal: set up simple monitoring so you know when the agent is down or behaving incorrectly.

What Is Monitoring#

Monitoring is continuously watching how the agent works so you notice problems in time:

agent stopped responding
agent returns errors
agent is slow
agent exceeded API limits

Basic Metrics to Monitor#

1. Uptime (availability)

Percentage of time the agent is working.

Example:
If the agent worked 23 hours out of 24 → Uptime = 95.8%

Target: aim for 99%+ (less than 1% downtime)

2. Response Time

How long the agent takes to process a request.

Example:
User asked a question → agent replied in 3 seconds → Response Time = 3s

Target: under 5 seconds for text requests

3. Error Rate

Percentage of requests that ended in error.

Example:
Out of 100 requests, 5 failed → Error Rate = 5%

Target: under 1% (99% of requests succeed)

4. Request Rate

How many requests the agent handles per hour / day.

Example:
Agent processed 500 requests in a day → Request Rate = 500/day

Target: track growth (if requests spike, you need to scale)

How to Set Up Simple Monitoring#

Option 1: Uptime Monitoring (for webhook bots)

If your agent works via webhook (e.g., Telegram bot on n8n), use an availability checker:

UptimeRobot (free for up to 50 monitors)
Pingdom (paid, more powerful)
Healthchecks.io (simple and free)

How it works:

You give the service your webhook URL
The service sends a test request every 5 minutes
If the webhook doesn't respond → the service sends you a notification (email, SMS, Telegram)

Setup in UptimeRobot:

Sign up at uptimerobot.com
Add a new monitor: your webhook URL, type: HTTP(s)
Configure notifications (email or Telegram)
Save

You'll now get notified if the agent stops responding.

Option 2: Log-Based Monitoring

If you log agent actions in Google Sheets / Airtable, set up automatic checks:

Example: error threshold notification

Logic:

Every hour (or once a day) a workflow runs (Zapier / n8n)
Workflow reads logs from the last hour
Counts errors (Status = failed)
If errors > 10 → sends a Telegram notification: "Attention! 15 errors in the last hour. Check your agent."

Implementation in n8n:

Trigger: Cron (every hour)
Action 1: Google Sheets → Read (logs from last hour)
Action 2: Function (count Status = failed)
Action 3: IF (if errors > 10)
Action 4 (true): Telegram → Send Message ("Attention! ...")

Option 3: Built-in Platform Tools

Many platforms have built-in monitoring:

Zapier: Task History + Email Alerts (Zapier sends email on error)
Make: History + Notifications
n8n: Error Workflow (a workflow that runs when an error occurs)

Example: Error Workflow in n8n

Create a new workflow with an Error Trigger
Add a Telegram → Send Message node
Configure the message: "Error in workflow [name]. Details: {{ $json.error.message }}"
Save and activate

Now you'll get a Telegram notification for any error in any workflow.

What to Do When Monitoring Shows a Problem#

Problem 1: agent not responding (Uptime = 0%)

Possible causes:

server down (if self-hosted)
account balance depleted (if cloud platform)
webhook broken (wrong URL, expired SSL certificate)

What to do:

Check server / platform status
Check account balance
Check webhook (send a test request manually)
Restart the workflow / bot

Problem 2: high Error Rate (>5%)

Possible causes:

API issues (rate limit exceeded, API unavailable)
invalid data (e.g., wrong email format)
logic error in the agent

What to do:

Open logs, find errors
Check ErrorMessage
Fix the issue (increase limits, fix data, fix logic)
Test

Problem 3: slow Response Time (>10 seconds)

Possible causes:

slow API (e.g., OpenAI overloaded)
too many steps in the workflow
no caching

What to do:

Measure time for each step (in n8n this is visible in Executions)
Find the slowest step
Optimize (caching, faster API, parallel requests)