Skip to main content
Use this page when an SDX-managed site or service does not behave as expected. Start with the narrowest symptom, then work outward from device reachability to service configuration and finally workflow or automation state.

First Checks

Before feature-specific troubleshooting, check:
  • The site exists in the expected workspace.
  • The site status and last heartbeat are current.
  • The management VPN is connected or recently connected.
  • The control plane policy still allows the required management services.
  • The fault log shows the same symptom you are investigating.
  • Recent scripts, workflows, or policy changes did not coincide with the issue.

Site Is Offline

  1. Check the site’s last heartbeat time.
  2. Confirm the local router has outbound internet access.
  3. Confirm outbound access to SDX endpoints is not blocked by an upstream firewall.
  4. Review recent WAN faults or ISP issues.
  5. If the router is reachable locally, check whether the SDX management interface still exists.
  6. Use Recreate Management VPN only when you have reason to believe the tunnel configuration is missing or corrupted.
The platform marks site health from router check-ins. A site can have working local LAN traffic and still appear offline if the management path is blocked.

Management VPN Is Missing Or Broken

  1. Open the site controls.
  2. Run Recreate Management VPN.
  3. If management firewall rules are also suspect, run Recreate Management Filter.
  4. Watch the orchestration or site job output.
  5. Confirm the site returns online and that management tasks work again.
If a newer MikroTik device returns failure: not allowed by device-mode, enable advanced mode on the device before retrying the relevant setup action. The portal surfaces the RouterOS command when this condition applies.

Device Job Or Script Failed

  1. Open the scheduled script, workflow run, or site orchestration log.
  2. Find the target site outcome rather than relying only on the parent job status.
  3. Check whether the script depends on a RouterOS feature not present on that device.
  4. Confirm the altostrat-api user and control plane policy still permit required actions.
  5. Retry on one test site before relaunching a broad rollout.

WAN Failover Is Not Switching

  1. Confirm each WAN tunnel has the expected interface and gateway.
  2. Confirm priorities are saved in the intended order.
  3. Check live WAN health for packet loss, jitter, latency, and tunnel status.
  4. Review WAN faults for offline and recovery events.
  5. Test during a maintenance window by changing priority order or disconnecting the primary link.

Captive Portal Users Cannot Log In

  1. Confirm the portal instance is attached to the correct site and subnet.
  2. Confirm the user’s device is actually on that subnet.
  3. For OAuth2, confirm the auth integration is valid and the provider can be reached before login.
  4. For coupons, confirm the code is valid, unexpired, and not already redeemed.
  5. Check the session TTL and whether the user had a previous active session.

Workflow Did Not Run

  1. Confirm the workflow is active.
  2. Confirm the trigger matches the expected event type.
  3. Check whether the workflow authorization is still valid.
  4. Check vault secrets used by the failing nodes.
  5. Open the workflow run and inspect node-level logs.
  6. For workflow chaining, check for dependency validation errors or inactive target workflows.

When To Escalate

Escalate with these details ready:
  • Workspace and site name.
  • Time range of the failure.
  • Relevant fault IDs or workflow run IDs.
  • The affected service, such as management VPN, WAN failover, captive portal, or workflows.
  • The last known successful change or run.
  • Whether local router access is available.