> ## Documentation Index
> Fetch the complete documentation index at: https://altostrat.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# AI provider and data flow

> Where Studio's AI inference actually happens, which models are used, what region they run in, how the prompt cache is structured, what gets redacted before it leaves your machine, and the strict boundary between local context and cloud inference.

The AI surface is the part of Studio that talks to a hosted model on every interaction. This page documents the full data path: which provider, which models, which region, what gets cached, what gets redacted, and what is structurally impossible to send.

## The single provider

All model inference in Studio goes through **AWS Bedrock**. The desktop app does not call Anthropic's hosted API directly. It does not call OpenAI, Google, or any other provider. There is no per-user "bring your own API key" path that bypasses Bedrock.

This is a deliberate architectural choice with three properties:

* **Region locked.** Bedrock calls run in `us-east-1`. There is no path that ships your inference data to another region.
* **AWS contractual scope.** Anthropic's terms with AWS for Bedrock prohibit training on inference data and constrain handling. Direct-to-Anthropic-API would be governed by a different contract surface.
* **One audit point.** Every inference call is signed by your short-term AWS credential and observable in CloudTrail under the same account that hosts your Studio backend.

## The models

Studio's model catalog is defined at build time. The active models are:

| Model             | Bedrock identifier                                 | Context           | Typical use                                                                     |
| ----------------- | -------------------------------------------------- | ----------------- | ------------------------------------------------------------------------------- |
| Claude Opus 4.6   | `us.anthropic.claude-opus-4-6-v1`                  | 200,000 tokens    | Default. Long conversations, complex multi-step reasoning, procedure authoring. |
| Claude Sonnet 4.6 | `us.anthropic.claude-sonnet-4-6`                   | 200,000 tokens    | Faster than Opus, suitable for most operational work.                           |
| Claude Sonnet 4.5 | `global.anthropic.claude-sonnet-4-5-20250929-v1:0` | 200,000 tokens    | Cross-region inference profile.                                                 |
| Claude Haiku 4.5  | `us.anthropic.claude-haiku-4-5-20251001-v1:0`      | 200,000 tokens    | Cheapest, fastest. Background tasks, classification, summary.                   |
| Amazon Nova Micro | `amazon.nova-micro-v1:0`                           | (Bedrock-defined) | Fallback for very small operations.                                             |

The model selection is exposed in [Settings](../settings) as a default, and via the `/model` slash command for per-conversation override.

There is no open-weight or local LLM in the path. Local-only ML in Studio is limited to the [embeddings model](./agent-and-local-runtime#local-embeddings-and-search), used for semantic search; that model is small enough to run on every Studio install.

## The data flow

A single Copilot turn moves through this path:

<Steps>
  <Step title="The user composes a prompt">
    Local. The prompt sits in the Electron renderer process. Attached context (active tab, terminal selection, image, voice, hosts, memories) is gathered locally.
  </Step>

  <Step title="Pre-send redaction">
    Local. A redaction pass scrubs known-secret patterns from the assembled context — Authorization headers, bearer tokens, password-like strings. The redaction is conservative: false positives mean the model sees `[REDACTED]`, false negatives are the [known limit](./known-limits-and-roadmap#ai-context-scrubbing). It does not catch a credential the operator deliberately substituted into a procedure prompt.
  </Step>

  <Step title="Cache assembly">
    Local. The system context is assembled in three tiers: global system prompt, organization-level context, session-level context. Each tier is bounded with a Bedrock cache point so the upstream cache can hit.
  </Step>

  <Step title="Sign and send to Bedrock">
    The request is signed with the user's short-term AWS credential and sent to the Bedrock model invocation endpoint in `us-east-1`. TLS 1.2+ on the wire.
  </Step>

  <Step title="Bedrock returns streaming tokens">
    Bedrock streams tokens back. Studio parses tool calls as they arrive and routes them through the [approval gate](./human-in-the-loop) before execution.
  </Step>

  <Step title="Tool results re-enter the loop">
    Tool outputs become the next turn's user message, looping back to step 2. The same redaction applies to tool output before it joins the next turn.
  </Step>
</Steps>

There is no path from a Studio renderer to a model that does not go through this flow. There is no out-of-band telemetry channel that ships prompt or response content elsewhere.

## Layered prompt caching

Bedrock supports prompt caching: marking blocks of context as cacheable so subsequent calls with the same prefix don't re-tokenize the cached portion. Studio assembles the model's context in layers — broadly, the parts that change rarely sit before the parts that change per conversation, and the parts that change every turn sit at the end. The cacheable layers are marked so the upstream cache hits across calls; the volatile layer at the tail is what gets re-tokenized.

The exact layering and breakpoint placement is a tuning surface we keep refining and don't publish. The user-visible properties are: long conversations stay fast, repeated work in the same session is cheap, and the model's per-call cost stays predictable as the workspace scales.

## What does and doesn't reach the model

| Item                                               | Reaches model context?                                                                                                                                                                         |
| -------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Your prompt text                                   | Yes.                                                                                                                                                                                           |
| Active tab content you attached                    | Yes.                                                                                                                                                                                           |
| Terminal selection you sent to Copilot             | Yes.                                                                                                                                                                                           |
| Image you attached                                 | Yes (multimodal call).                                                                                                                                                                         |
| Voice transcription you sent                       | Yes (transcribed locally via AWS Transcribe).                                                                                                                                                  |
| Host inventory metadata (names, addresses, vendor) | Yes, when the model needs it for a tool call.                                                                                                                                                  |
| Memories you saved                                 | Yes, when retrieval surfaces them.                                                                                                                                                             |
| Procedure body                                     | Yes, when running.                                                                                                                                                                             |
| Tool descriptions and tool argument schemas        | Yes (system prompt).                                                                                                                                                                           |
| Tool **call results**                              | Yes (next user message).                                                                                                                                                                       |
| **Plaintext credentials**                          | **No, by design.** Credentials live in the vault; the model sees a reference, not the secret. The exception is procedure substitution, which is the [known limit](./known-limits-and-roadmap). |
| Cached plaintext DEKs                              | No. Never leaves the OS keychain or sidecar memory.                                                                                                                                            |
| Other organizations' data                          | No. Cryptographic isolation; the model can't see what your AWS credential can't fetch.                                                                                                         |
| Generated artifact source                          | No. Artifacts are stored encrypted; the model receives only what's currently attached.                                                                                                         |

## The boundary between local and cloud

Some computation in Studio is genuinely local. Some is genuinely in the cloud. The boundary is worth being explicit about:

| Computation                           | Where                                                                  |
| ------------------------------------- | ---------------------------------------------------------------------- |
| LLM inference                         | Bedrock (`us-east-1`).                                                 |
| Voice transcription (when used)       | AWS Transcribe (`us-east-1`).                                          |
| Image processing for multimodal calls | Bedrock (multimodal model).                                            |
| Semantic search index (embeddings)    | Local. A small transformer runs inside the Go sidecar.                 |
| Knowledge graph indexing              | Local. SQLite-backed in the agent.                                     |
| Session recording                     | Local. Stored encrypted; uploaded only when explicitly shared.         |
| Packet capture and live diagnostics   | Local. Never leaves the device unless you save and share the artifact. |
| Procedure run state                   | Persisted to your organization's encrypted store.                      |
| Conversation transcripts              | Persisted to your organization's encrypted store.                      |

The pattern is: anything that has to scale or that has to coordinate (LLM, transcription, sync) goes to AWS. Anything that depends on the local network state (capture, discovery, terminal session, embeddings) stays on your device.

## Extended thinking

Anthropic models in Studio support extended thinking — internal reasoning that the model performs before producing a visible response. When extended thinking is enabled (configurable per conversation and per sub-agent), the thinking trace streams to the right-side panel so you can see what the model is reasoning about.

Thinking content is **not** stored in the conversation transcript by default. It is observable while the run happens; the saved record is the visible response and the tool calls.

## Compaction and context management

Conversations grow. When the context window approaches the model limit, Studio runs a compaction pass:

* Old turns are summarized.
* Tool outputs are kept in summary form.
* Pinned context (host inventory snippets, attached artifacts) is preserved.
* The compacted history is the new starting point for subsequent turns.

Compaction is triggered automatically when the context exceeds a threshold below the model limit, so a turn never fails for size. The `/compact` slash command runs the same pipeline manually when you want to free room before a heavy turn.

## What we do not do

* **No fine-tuning on your data.** Studio does not run Anthropic fine-tuning on your conversations, your procedures, your memories, or your tool outputs. Bedrock-side, Anthropic does not train on inference traffic.
* **No off-region failover.** The Bedrock identifier is region-pinned. A regional outage degrades availability; it does not silently route inference to another region.
* **No multi-provider mux.** There is no path that ships prompts to a non-Bedrock provider. If a future Studio version adds one, it will be opt-in, region-disclosed, and documented.
* **No "improve our models with your data" toggle.** It does not exist. Your data is not training material.

## Related

<CardGroup cols={2}>
  <Card title="Human in the loop" icon="hand" href="./human-in-the-loop" arrow="true" cta="Read">
    The classifier and approval gate that decides what tool calls actually run.
  </Card>

  <Card title="Known limits" icon="alert-triangle" href="./known-limits-and-roadmap" arrow="true" cta="Read">
    Including AI context scrubbing of substituted secrets — the most important honest limit on this page.
  </Card>
</CardGroup>