Skip to main content

Agent Monitoring

Health status

Each endpoint and agent has a computed health_status field based on the time since the last report:

StatusConditionDefault threshold
onlineLast seen < 15 minutes ago15 min
degradedLast seen 15–60 minutes ago60 min
offlineLast seen > 60 minutes ago

Thresholds are configurable via Settings → Notifications in the dashboard or via PUT /api/settings/notifications.

Offline alerts

The server's Health Monitor runs every 5 minutes and:

  1. Finds all agents with health_status = offline
  2. Creates an agent_offline alert for each (deduplicated — only one per agent)
  3. Sends notification via email/webhook if configured
  4. Automatically resolves the alert when the agent reconnects

Dashboard indicators

  • Endpoints page: Red offline badge, yellow degraded badge next to affected endpoints
  • Agents page: Same badges, plus "Show offline only" filter
  • Dashboard overview: Orange banner with count of offline endpoints
  • Navbar: Pending approvals badge (refreshes every 60s)

Alerting workflow

  1. Agent stops sending reports (e.g. workstation shut down, network issue)
  2. Health Monitor detects last_seen > offline_threshold
  3. Alert created: type=agent_offline, severity=high
  4. Email/webhook notification sent (if configured, subject to dedup window)
  5. Admin sees alert in dashboard under Alerts → Open
  6. Admin acknowledges or resolves the alert
  7. Agent reconnects → Health Monitor resolves the alert automatically
  8. Resolved alerts remain visible under Alerts → Resolved for audit purposes

Checking server logs

# Follow server logs
docker compose -f deploy/docker-compose.community.yml logs edr-backend -f

# Filter for health monitor events
docker compose -f deploy/docker-compose.community.yml logs edr-backend | grep health_monitor

# Follow agent logs (systemd)
journalctl -u sielum-agent -f

Key log messages

Log messageMeaning
agent connectedgRPC connection established
heartbeat receivedHeartbeat from agent processed
report receivedFull telemetry report processed
policy violationAlert generated
health_monitor: agent offlineOffline alert created
health_monitor: agent reconnectedAlert resolved