Agent Monitoring
Health status
Each endpoint and agent has a computed health_status field based on the time since the last report:
| Status | Condition | Default threshold |
|---|---|---|
online | Last seen < 15 minutes ago | 15 min |
degraded | Last seen 15–60 minutes ago | 60 min |
offline | Last seen > 60 minutes ago | — |
Thresholds are configurable via Settings → Notifications in the dashboard or via PUT /api/settings/notifications.
Offline alerts
The server's Health Monitor runs every 5 minutes and:
- Finds all agents with
health_status = offline - Creates an
agent_offlinealert for each (deduplicated — only one per agent) - Sends notification via email/webhook if configured
- Automatically resolves the alert when the agent reconnects
Dashboard indicators
- Endpoints page: Red
offlinebadge, yellowdegradedbadge next to affected endpoints - Agents page: Same badges, plus "Show offline only" filter
- Dashboard overview: Orange banner with count of offline endpoints
- Navbar: Pending approvals badge (refreshes every 60s)
Alerting workflow
- Agent stops sending reports (e.g. workstation shut down, network issue)
- Health Monitor detects
last_seen > offline_threshold - Alert created:
type=agent_offline, severity=high - Email/webhook notification sent (if configured, subject to dedup window)
- Admin sees alert in dashboard under Alerts → Open
- Admin acknowledges or resolves the alert
- Agent reconnects → Health Monitor resolves the alert automatically
- Resolved alerts remain visible under Alerts → Resolved for audit purposes
Checking server logs
# Follow server logs
docker compose -f deploy/docker-compose.community.yml logs edr-backend -f
# Filter for health monitor events
docker compose -f deploy/docker-compose.community.yml logs edr-backend | grep health_monitor
# Follow agent logs (systemd)
journalctl -u sielum-agent -f
Key log messages
| Log message | Meaning |
|---|---|
agent connected | gRPC connection established |
heartbeat received | Heartbeat from agent processed |
report received | Full telemetry report processed |
policy violation | Alert generated |
health_monitor: agent offline | Offline alert created |
health_monitor: agent reconnected | Alert resolved |