Documentation Index
Fetch the complete documentation index at: https://docs.phala.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Phala Cloud CVMs expose Prometheus-compatible/metrics endpoints from two sources: the built-in dstack-guest-agent (system-level metrics on port 8090) and individual services like dstack-kms (business metrics on their own ports). You can integrate Datadog by adding a Datadog Agent container as a sidecar in your Docker Compose file. No application code changes are needed.
This guide covers the guest-agent integration (CPU, memory, disk) first, then shows how to extend the pattern to any service that exposes Prometheus metrics, using dstack-kms as a concrete example.
Prerequisites
- A Datadog account with an API Key
- The Datadog site for your account (e.g.,
us5.datadoghq.com,datadoghq.com,eu.datadoghq.com) - Your CVM deployed with
--public-sysinfoenabled (default:true) for guest-agent metrics - Each service must enable its own
/metricsendpoint (e.g.,core.metrics.enabled = truein KMS)
Step 1: Add Datadog Agent to Your Docker Compose
Add adatadog-agent service to your docker-compose.yml:
Understanding network_mode: host
The network_mode: host setting puts the Datadog Agent directly on the CVM’s network stack. This is required for scraping dstack-guest-agent because it runs as a systemd service on port 8090 — not inside Docker. Without host networking, the agent can’t reach port 8090 at all.
But this rule applies only to systemd-level services. If your scrape target is another Docker container (like KMS or any application you deployed in the compose file), you have two options:
- Option A: Bridge network. Remove
network_mode: hostfrom the agent. Both containers share the default compose network, so the agent can reach your service via Docker DNS (https://kms:8000/metrics). This avoids the host’s iptables NAT and keeps configuration simpler. - Option B: Host network. Keep
network_mode: hostand use the host-mapped port (https://127.0.0.1:12001/metrics). This works for standard CVMs but can fail on TDX CVMs due to kernel-level iptables differences.
Step 2: Configure OpenMetrics Check for Guest-Agent Metrics
The Datadog Agent collects container logs and host metrics automatically. But to get custom Prometheus metrics, you tell the agent where to scrape them via aconf.yaml file.
The dstack-guest-agent endpoint is at http://127.0.0.1:8090/metrics. Create conf.d/openmetrics.d/conf.yaml in your project:
namespace: "dstack" prefix goes in front of every collected metric. system_uptime becomes dstack.system_uptime in Datadog.
The most common YAML trap. The key rule:
instances must be a top-level key. If you nest it under init_config, the check loads but silently finds zero valid instances.Any of these formats work:instances must sit at the file’s root indentation level. An empty init_config: on its own line is harmless, but instances must never be indented under it.Step 3: Deploy to CVM
CVMs have a read-only filesystem. The only writable path is/var/volatile/dstack/persistent/. Your conf.yaml must go there, then get mounted into the agent container.
Alternative: Embed Config in the Agent’s Command
When you can’t or don’t want to use volume mounts — for instance, when your config is generated by another container on a shared volume — you can have the Datadog Agent write its ownconf.yaml at startup. Add this to the agent’s command in your compose file:
Step 4: Verify
Check Agent Status
If SSH is available:openmetrics check showing [OK] with a non-zero metric sample count.
Verify in Datadog Dashboard
- Open Datadog at
<your-site>.datadoghq.com - Go to Metrics > Explorer
- Search for
dstack.system_uptimeto confirm guest-agent metrics are flowing - Go to Logs and filter by
source:nginx(or your service name) to confirm logs
Integrating Custom Service Metrics
The same pattern works for any service that exposes a Prometheus/metrics endpoint. Here’s the concrete setup for dstack-kms — the patterns apply to dstack-gateway, dstack-vmm, or your own services.
Prerequisite: Enable the Metrics Endpoint
Each service controls its/metrics endpoint via its own configuration. For KMS, you need this in kms.toml:
Docker Compose Setup
Since KMS runs as a Docker container, not a systemd service, we put the Datadog Agent on the bridge network and use Docker DNS to reach it. No host networking needed.- No
network_mode: host. The agent talks to KMS via Docker DNS (kms:8000), using the container’s internal port, not the host-mapped one. tls_verify: falsebecause KMS uses a self-signed certificate. For production, switch to a trusted CA and set this totrue.namespace: dstack_kmsto prevent metric name collisions with the guest-agent’sdstack.*namespace.- Conf.yaml is generated inline with
printfinstead of mounted from a file. This avoids cross-container volume issues.
KMS Metrics Reference
| Metric | Type | Description |
|---|---|---|
dstack_kms_attestation_requests_total | counter | Total attestation requests handled |
dstack_kms_attestation_failures_total | counter | Failed attestation requests |
Available Guest-Agent Metrics
Thedstack-guest-agent exposes 19 system-level metrics. All appear under the dstack. namespace in Datadog.
System metrics: system_os_name, system_os_version, system_kernel_version, system_cpu_model, system_num_cpus, system_uptime, system_load_average_1m, system_load_average_5m, system_load_average_15m
Memory metrics: system_memory_total, system_memory_available, system_memory_used, system_memory_free, system_swap_total, system_swap_used
Disk metrics: disk_total_size, disk_free_size, disk_used_size, disk_usage_percentage
Troubleshooting
Metrics: Only seeing default Datadog metrics, not your service’s
Your OpenMetrics check isn’t loading. The most common cause is YAML formatting. Double-check thatinstances is a top-level key in conf.yaml (see the format examples in Step 2).
Other things to verify:
- Can you curl the metrics endpoint from outside the CVM? If
curl https://<cvm-ip>:12001/metricsreturns nothing, the service’s metrics endpoint isn’t running. - Using
network_mode: host? The agent might not reach a Docker container’s host-mapped port on TDX CVMs. Try removing host networking and switching to Docker DNS. - On TDX CVMs,
network_mode: hostcombined with container port mapping can fail silently due to kernel-level iptables rules. Switch to bridge networking when scraping other Docker containers.
conf.yaml in Docker Compose crashing the agent
If you embedded your config directly in a Docker Compose command: block using a heredoc (cat <<EOF), the YAML block scalar (|) might be pulling in unexpected indentation. This breaks both the compose file and the generated config.
Always use printf for inline YAML generation inside compose command: blocks. It produces clean output with no indentation surprises.
Volume-mounted config not updating
If you’re mounting the config file from a shared Docker volume that another container writes to,cp -r inside the agent’s command can silently create nested paths. When /etc/datadog-agent/conf.d/openmetrics.d/ already exists in the agent image, cp -r source_dir target_dir/ creates target_dir/source_dir/ instead of copying into the target.
Fix: always rm -rf the target directory before copying, or just use printf inline to avoid file sharing altogether.
No logs appearing in Datadog
The agent collects logs in tail mode — it only picks up new entries after it starts. Generate some traffic to your application and logs should appear within seconds. If you disabledDD_LOGS_CONFIG_CONTAINER_COLLECT_ALL, add labels to each container:
Guest-agent /metrics returns “Service not found”
The CVM was deployed with --no-public-sysinfo. Redeploy with --public-sysinfo (the default):
Cannot mount config file (read-only file system)
CVMs have a read-only filesystem. Use/var/volatile/dstack/persistent/ for all config files and mount from there.

