> ## Documentation Index
> Fetch the complete documentation index at: https://altostrat.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Architecture and Scale

> Understand how ArcRadius uses a global RadSec data plane, management control plane, analytics plane, multi-region data stores, deterministic sharding, and streaming imports.

ArcRadius is the distributed RADIUS service behind Altostrat Radius. It is built as three separate planes: a data plane for live RADIUS traffic, a control plane for management and configuration, and an analytics plane for accounting, logs, triggers, and insights.

This matters operationally because authentication needs to stay fast while the rest of the platform can scale independently for logs, metrics, imports, quota checks, API calls, workflows, and dashboards.

```mermaid theme={null}
flowchart LR
    NAS["NAS device"] -->|"RadSec mTLS"| GA["Global anycast ingress"]
    GA --> NLB["Regional Network Load Balancer"]
    NLB --> Proxy["RadSec proxy on ECS"]
    Proxy -->|"Access-Request"| Core["RADIUS server tasks on ECS"]
    Core -->|"Internal REST"| API["Region-local API on Lambda"]
    API -->|"PrivateLink"| Data["DynamoDB Global Tables"]
    Proxy -->|"Accounting and post-auth events"| Stream["Kinesis stream"]
    Stream --> Analytics["Timestream analytics"]
    API --> Logs["Logs and auth outcomes"]
    Logs --> UI["Live View and dashboards"]
    UI -->|"Manual disconnect / quota action"| Control["Dynamic authorization sender"]
    Control -->|"PoD / CoA"| NAS
```

## Platform Planes

| Plane           | What it handles                                                                                                                             |
| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| Data plane      | RadSec ingress, TLS termination, NAS identity, Access-Request handling, Accounting-Request handling, and dynamic authorization packet flow. |
| Control plane   | Web UI, REST API, users, folders, groups, realms, NAS devices, certificates, quotas, metadata, workflows, and configuration changes.        |
| Analytics plane | Authentication logs, accounting data, usage metrics, accounting triggers, search, dashboards, and insight queries.                          |

## Request Flow

<Steps>
  <Step title="The NAS connects to the global ingress">
    A NAS device starts a RadSec connection to the global service endpoint. Global routing sends the connection to the nearest healthy regional deployment rather than waiting on DNS propagation.
  </Step>

  <Step title="Regional load balancing keeps the data plane fast">
    The connection reaches a regional Network Load Balancer, which forwards the TCP flow to the RadSec proxy layer running on containerized infrastructure.
  </Step>

  <Step title="The edge verifies the client certificate">
    The RadSec edge uses mutual TLS and validates the NAS certificate against the client CA issued for the workspace.
  </Step>

  <Step title="The edge binds traffic to the registered NAS">
    The certificate identifies the workspace, organization, and NAS. The edge rewrites the `NAS-Identifier` to the registered NAS identity, so authorization and logs rely on the trusted certificate identity instead of a mutable packet field.
  </Step>

  <Step title="Authentication requests go to the RADIUS backend">
    Access requests are handled by horizontally scalable RADIUS server tasks and translated into secure, region-local API calls for policy evaluation, password handling, check attributes, realm logic, quota state, and reply attributes.
  </Step>

  <Step title="Accounting is handled without slowing auth">
    Accounting and post-authentication data are streamed into the analytics plane so accounting load does not block authentication throughput.
  </Step>

  <Step title="Metrics and logs are processed asynchronously">
    Authentication results, accounting usage, session markers, quota state, and admin requests are processed by background workers and metrics pipelines.
  </Step>
</Steps>

## Authentication Behavior

The policy service supports common access patterns used by broadband, Wi-Fi, VPN, and network-access devices:

* PAP-style password authentication.
* CHAP authentication when the NAS sends CHAP attributes.
* MS-CHAP and MS-CHAPv2, including NT password material needed by FreeRADIUS.
* EAP challenge handling where the upstream RADIUS flow needs to continue the exchange.
* MAC-based lookup using Calling-Station-Id or MAC-like usernames.
* Optional auto-registration for unknown MAC-based users when the NAS allows it.

Access still depends on the user, NAS, customer boundary, account status, realm, check attributes, password, and quota state. A device cannot bypass policy just by sending a different `NAS-Identifier`; RadSec traffic is normalized to the NAS identity from the client certificate.

## Why The Architecture Scales

| Layer                  | Scaling behavior                                                                                                                                                       |
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Global ingress         | Routes RadSec traffic to the nearest healthy regional deployment and can fail over across regions without waiting for DNS changes.                                     |
| Network load balancing | Uses high-performance Layer 4 regional load balancing for TCP/RadSec traffic.                                                                                          |
| RadSec edge            | Uses a concurrent Go runtime with a goroutine per NAS connection and per-client UDP backend handling to avoid contention between devices.                              |
| RADIUS core            | Runs as horizontally scalable container tasks so authentication capacity can grow with live traffic.                                                                   |
| Internal API           | Uses serverless regional API workers for elastic policy evaluation and management operations.                                                                          |
| Device identity        | Uses mTLS certificate validation so the platform can trust which workspace, organization, and NAS produced the traffic.                                                |
| Policy lookup          | Caches NAS devices, authentication profiles, and group profiles for short windows to reduce repeated database reads during busy authentication periods.                |
| Data storage           | Uses DynamoDB-backed records, global replication, write sharding, sharded counters, and deterministic username sharding for fast lookups and hot-partition protection. |
| Logs                   | Uses sharded log access patterns so recent operational views can page through high-volume NAS logs without loading broad time ranges into memory.                      |
| Analytics              | Streams accounting and post-auth events independently of the authentication path, then stores time-series data for dashboards, triggers, and insight queries.          |
| Quotas                 | Reads quota status from a DynamoDB quota table during authorization, while scheduled workers refresh quota state from accounting usage data.                           |
| Imports                | Large migration jobs stream CSV rows in chunks instead of loading entire files into memory, with lookup data prefetched once for the batch.                            |

The current migration worker design is validated for large imports of 300,000 or more records with flat memory usage and O(1) lookup query growth for shared group and tag data. In the reviewed architecture notes, the 300,000-record path moved from multi-gigabyte memory pressure and thousands of repeated lookups to chunked processing, roughly 50 MB memory use, and two shared lookup queries for group and tag data.

## Data Plane Responsibilities

The RadSec edge handles:

* Global and regional ingress for RadSec traffic.
* TLS termination for RadSec.
* Client certificate authentication.
* Registered NAS identity extraction.
* RADIUS packet framing and forwarding.
* `NAS-Identifier` normalization.
* Accounting response generation.
* Accounting metric extraction.
* Session start, stop, usage, and last-IP metric publishing.
* Dynamic authorization metric extraction for Disconnect and CoA packets when observed.

## Control Plane Responsibilities

The RADIUS service handles:

* NAS registration and certificate material.
* User, folder, group, realm, tag, and metadata management.
* Check and reply attribute validation.
* Password storage and reset flows.
* PAP, CHAP, MS-CHAP, and MS-CHAPv2 handling.
* EAP challenge pass-through behavior where applicable.
* Realm matching and optional NAS-to-realm locking.
* MAC-based lookup and optional auto-registration.
* Quota status checks and top-ups.
* Manual session disconnects.
* Authentication metrics and NAS logs.

## Control Plane

The control plane is the management surface used by operators and integrations. The web UI and REST API manage configuration changes through authenticated and authorized API calls.

Use the control plane for:

* Creating and updating users, folders, groups, realms, and NAS devices.
* Managing certificate material and RadSec device configuration.
* Updating check attributes, reply attributes, metadata, tags, quotas, and account status.
* Connecting provisioning, billing, identity management, and workflow systems through the API.
* Searching operational records and reviewing logs.

The control plane scales independently from live RADIUS packet handling. That separation keeps authentication traffic isolated from operator activity, bulk imports, and integration traffic.

## Analytics Plane

The analytics plane receives accounting and post-authentication events from the data plane. It is designed for high-throughput ingestion and fast time-series queries across historical RADIUS events.

Use the analytics plane for:

* Authentication logs and 12-month log retention.
* Accounting data and 12-month accounting retention.
* Usage charts, sessions, quotas, and top-ups.
* Accounting triggers, such as usage-threshold automation.
* Dashboards, Live View, and insight queries.
* Full-text operational search.

Because analytics is decoupled from the authentication path, accounting bursts should not slow down Access-Request processing.

## Quota And Session Control Path

Quota enforcement is deliberately split:

1. Groups define quota attributes such as `X-Octet-Quota` and reset behavior.
2. Accounting packets update usage metrics.
3. Scheduled quota workers calculate current usage, apply active top-ups, and write quota state.
4. Authorization reads the current quota state during login.
5. When a user first crosses quota, the platform can dispatch a disconnect workflow for the active session.

This keeps the authentication path short while still supporting quota-aware replies, top-ups, and Packet of Disconnect workflows for devices that support dynamic authorization.

## Multi-Tenant Isolation

The platform uses several layers of isolation:

* NAS traffic is tied to a certificate identity.
* Authorization rejects unknown NAS devices.
* Users must belong to the same customer as the NAS.
* Realms can limit which group attributes apply for a matching username suffix.
* NAS devices can be locked to a realm when that behavior is configured.

If a request cannot be tied to a known NAS or the user belongs to a different customer, it is rejected.

## Metrics And Observability

RADIUS operations feed the monitoring views with:

* Access-Accept, Access-Reject, and Access-Challenge counters.
* Reject reasons when available.
* Accounting packets by status type.
* Input and output bytes, including 64-bit Gigawords accounting.
* Input and output packets.
* Session time.
* Session start and stop timestamps.
* Last observed framed IP address.
* Admin request events for Disconnect and CoA.

Accounting and post-authentication events are streamed independently from live authentication. Authentication outcomes and logs are published by the RADIUS service and background metrics pipeline. Together they power Live View, user dashboards, device dashboards, quota checks, top-ups, disconnect workflows, accounting triggers, search, and troubleshooting.

## Operational Implications

* Use RadSec certificates from the NAS detail page rather than sharing credentials between devices.
* Keep accounting enabled when you rely on usage, sessions, quotas, top-ups, or disconnect workflows.
* Use groups for policy because group profiles are cache-friendly and reusable.
* Use realms when username suffixes should constrain policy.
* Use CoA and PoD only on devices that support dynamic authorization and allow the configured source address.
* Review [Limits and Availability](./limits-and-availability) before large migrations, high-rate authentication deployments, or multi-region planning.
