> ## Documentation Index
> Fetch the complete documentation index at: https://altostrat.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# CPE bulk firmware rollout: advisory to verified updates

> A vendor advisory drops mid-week. Stage the firmware through the TR-069 ACS, plan a regional rollout with maintenance windows, notify subscribers, monitor reboot success, and roll back the failures.

Ubiquiti publishes a critical firmware advisory for EdgeRouter X devices. The ISP operates 14,000 of them across residential subscribers. The rollout has to stage firmware centrally, go out regionally with per-region maintenance windows, notify subscribers before each window, watch reboot success rates, and roll back the handful that don't come back cleanly.

## Systems involved

| System                | Role                                    |
| --------------------- | --------------------------------------- |
| Vendor advisory       | Source announcement and firmware image. |
| TR-069 ACS (GenieACS) | Firmware distribution to CPE fleet.     |
| Studio inventory      | CPE records tagged by region.           |
| Twilio SMS            | Subscriber pre-maintenance notice.      |
| Gmail                 | Business-tier subscriber email comms.   |
| Atlassian Statuspage  | Maintenance windows published.          |
| Splynx                | Subscriber region and contact lookup.   |
| LibreNMS              | Reboot and reachability verification.   |
| Slack `#cpe-fleet`    | Operational channel.                    |

## Walkthrough

<Steps>
  <Step title="Verify and stage the image">
    Copilot downloads the firmware from the vendor advisory, validates the SHA against the published hash, and uploads it to the ACS repository. The image appears in the ACS catalogue with the advisory ID.
  </Step>

  <Step title="Plan the rollout">
    Split the fleet by region — 14 regions, roughly 1,000 CPEs each. Each region gets a 2-hour window spread over ten nights. Business subscribers are scheduled last so any issues surface on residential first.
  </Step>

  <Step title="Pre-window subscriber comms">
    48 hours before each regional window, Twilio sends an SMS to residential subscribers: brief outage, window, self-service URL for status. Business subscribers get a personalised email through Gmail.
  </Step>

  <Step title="Publish maintenance">
    Statuspage publishes all 14 maintenance windows with the affected regions and the advisory reference. The IVR hold message picks up an automated region-aware notice 30 minutes before each window.
  </Step>

  <Step title="Execute the first window">
    The `CPE firmware rollout` procedure targets Region 1, 10 percent of the CPEs at a time. For each batch, the ACS queues the upgrade. Copilot watches the CPEs come back online against LibreNMS reachability within the expected reboot interval.
  </Step>

  <Step title="Watch the success rate">
    Target threshold is 99.5 percent reboot-and-reauth within the window. Region 1 hits 99.7 percent. Seven CPEs didn't come back — Copilot flags each one with the last-known state and queues them for individual attention.
  </Step>

  <Step title="Handle the stragglers">
    For each failed CPE, Copilot pulls the RADIUS last-accounting record, the ACS session history, and the LibreNMS last-seen. Five come back on the next day's reboot. Two are dispatched for field swap.
  </Step>

  <Step title="Subsequent regions">
    Each following night, the rollout procedure runs for the next region. Statuspage and `#cpe-fleet` maintain a running status board. Residential complaints are near-zero because the comms went out ahead.
  </Step>

  <Step title="Close">
    After all 14 regions, generate the rollout report: fleet coverage, success rate, rollback count, field-swap count, advisory closed. The report is filed to the ISP's security advisory register and the post-mortem is auto-scheduled.
  </Step>
</Steps>

## Where Studio earns its keep

* The rollout is gated — each region only starts when the previous region hits the success threshold, so the first problem is caught on 1,000 subscribers, not 14,000.
* The SMS, the email, and the status page all point at the same regional schedule — there is no gap between "when we said" and "when it happened."
* Failed CPEs are handled individually from the same workspace with the full history available, not marked as errors in a report for someone else to chase next Tuesday.
* The runbook is parameterized by region, so the next advisory from any CPE vendor reuses the same structure.

## Related

<CardGroup cols={2}>
  <Card title="Procedures" icon="workflow" href="../../procedures" arrow="true" cta="Parameterize">
    `CPE firmware rollout` with advisory ID and region as arguments.
  </Card>

  <Card title="Connectors and MCP" icon="plug" href="../../connectors-and-mcp" arrow="true" cta="ACS">
    GenieACS, Twilio, and Splynx wired as connectors.
  </Card>
</CardGroup>
