> ## Documentation Index > Fetch the complete documentation index at: https://altostrat.io/docs/llms.txt > Use this file to discover all available pages before exploring further. # Upstream provider outage: bulk customer comms in one workflow > Multiple monitoring triggers light up. Confirm the outage is upstream, open a carrier ticket, update the IVR, bulk-SMS subscribers in the affected region, run a status page update, and keep a Slack war room alive. At 18:47 Zabbix raises 43 subscriber-facing triggers and two core BGP-session-down alerts. The ISP's NOC needs to prove in three minutes that the problem is upstream, open a trouble ticket with the carrier, warn subscribers before the call volume spikes, and run a structured war room until the session is back. ## Systems involved | System | Role | | ----------------------------- | -------------------------------------------- | | Zabbix / LibreNMS | Flood of subscriber and core triggers. | | BGP looking glass | Confirm the peer outage externally. | | SSH to core routers | Local state, show bgp summary, neighbor log. | | Carrier trouble-ticket portal | Open a formal ticket with the upstream NOC. | | Twilio / Bulk SMS | Subscriber outbound SMS. | | IVR provider (Asterisk / 3CX) | Update the on-hold message. | | Atlassian Statuspage | Public status page. | | Slack `#noc-war-room` | Live operational channel. | | Splynx / Sonar BSS | Subscriber affected-region lookup. | ## Walkthrough Copilot groups the Zabbix triggers by root cause — 41 of 43 are downstream symptoms of the two core BGP neighbor drops. The two that aren't are unrelated customer-side issues. Copilot runs the SSH procedure: `show bgp summary` on both core routers, `show log | include bgp` on each. Both show the neighbor reset reason as received-from-neighbor, matching a carrier-side event. The looking glass connector confirms the carrier's own prefix advertisements are withdrawn. Through the carrier's trouble-ticket connector (or email if there's no API), Copilot drafts a ticket with: peer IPs, your ASN, the carrier ASN, timestamps in UTC, the two local log snippets, and a callback phone. Opened and ticket number captured. Query Splynx for subscribers whose last-mile depends on the affected upstream paths. 4,812 subscribers across three regions. Copilot groups them by region and prepares the outbound SMS list. Twilio connector sends a terse SMS: cause (upstream carrier), scope (region), ETA (updating), status page URL. Approval prompt shows the SMS count and approximate cost before send. SSH into the Asterisk dialplan. Swap the on-hold message to the outage notice. Calls landing in the support queue now hear the same message the SMS says. Push an Identified incident to the Atlassian Statuspage with the affected components, the upstream cause, and the carrier ticket reference (not the ticket number — for security). Copilot opens a Slack thread in `#noc-war-room` with the timeline, the carrier ticket, the affected subscriber count, the last update time. Updates to the thread auto-propagate to Statuspage and the on-hold message. Copilot polls the BGP sessions every 60 seconds. When both peers come back up and prefixes re-populate, the all-clear fires — SMS goes out, IVR reverts, Statuspage closes, Slack thread gets the resolution summary and a commitment for the post-mortem. ## Where Studio earns its keep * The 43-alarm flood becomes one root cause in two minutes, not thirty minutes of triage. * Subscriber SMS, IVR, and status page update in parallel instead of waiting for whoever remembers each one. * The carrier ticket references the same timestamps the local logs have, which speeds the carrier's side of the investigation. * The all-clear closes every external comms channel the same way it opened them — no stale status page messages at 04:00. ## Related Planning mode for the bulk SMS approval — the cost and scope need to be visible. `Upstream outage response` with region as an argument.