Architecture

How the platform is built.

A reference for engineers and architects. Layered, agent-based, event-driven, audit-first. Below: the components that exist today, what each one does, and where they sit in the stack. For deployment, see /deployment. For security properties, see /security.

For engineers. This page documents internal architecture. If you're evaluating Sonoda Dynamics for procurement, /security, /compliance and /deployment are the relevant pages. This one is for people who want to know how it actually works underneath.

Layered stack Agent mesh Workflow runtime Audit envelope Policy engine Action registry Authority graph Memory graph Operational Twin Workflow Genome Time & ordering SLO + rate limiting

01 · Layered stack

Six layers, top to bottom.

Layer 06Surfaces

Admin console (HTML/JS served by FastAPI), tenant dashboard, chat hub, Approval Inbox. The pages a human or operator actually clicks.

Layer 05API gateway

FastAPI routers under /api/v1/<resource>/{firm_id}/.... Every endpoint enforces tenant scoping + role guards. Webhook ingress lives here too.

Layer 04Agent mesh

Pluggable capability registry. Each agent (rag, document_intelligence, crm, communication, scheduling, compliance_officer, consultant, operational_advisor, ...) registers fully-qualified capabilities the workflow engine can dispatch to.

Layer 03Workflow runtime

Durable executor with explicit state machine. Versioned definitions per tenant. Background job queue for async steps.

Layer 02Trust & control

Policy engine, action registry, authority graph, approval inbox. The substrate that enforces "no irreversible action without authorization".

Layer 01Storage & audit

Per-tenant SQLite (WAL) or PostgreSQL. Hash-chained, envelope-encrypted audit log. Bi-temporal memory graph. Event bus.

02 · Agent mesh

Capabilities, not endpoints.

Agents register named capabilities into a process-wide mesh (agents/common/mesh.py). A capability is identified by its fully-qualified name, e.g. rag.ask, document_intelligence.analyze, crm.contact_health, compliance_officer.classify_text. The workflow engine dispatches by name; agents can be swapped, mocked, or hot-replaced without changing call sites.

Adding a vertical-specific agent (e.g. healthcare.deidentify_note) is a matter of registering the capability — the workflow templates and API automatically discover it.

03 · Workflow runtime

Durable, versioned, observable.

Workflows are declarative graphs of step → capability → next-step. Each run gets a unique id; state transitions write to per-tenant SQLite and to the audit log. Failures retry with exponential backoff up to a policy-defined ceiling, then escalate via the Approval Inbox.

Definitions are versioned. Promoting v3 doesn't delete v2 — operators can revert with one call. Workflow templates ship per vertical: 7 for legal_corporate, 4 for finance_pe, 3 for legal_litigation, etc. — 35 workflow templates across 10 verticals at present.

Workflow Definitions admin view showing 7 legal_corporate templates registered, versioned, and runnable — Workflow library after Bootstrap on a freshly provisioned legal_corporate tenant · 7 templates · v1 · supervised autonomy

03.b · Operational telemetry (live)

Captured from a real tenant.

Operational telemetry endpoint output — per-agent metrics, capability calls, success rate, average duration — JSON telemetry from a 30-day window on a real tenant · breakdown by agent + capability_fqn + success rate + avg latency

04 · Audit envelope

Hash-chained. Envelope-encrypted. Crypto-shreddable.

Every state-changing call writes an audit entry: SHA-256 hash of payload + hash of previous entry. Payloads are wrapped in an AEAD envelope (AES-256-GCM) encrypted under a per-tenant Key Encryption Key (KEK).

Destroying the KEK renders the payloads permanently unreadable — the hash chain remains verifiable, but content is opaque. This is the GDPR Article 17 path: erase the data without breaking the cryptographic proof that the historical operations occurred.

05 · Policy engine

Declarative, not hard-coded.

Policies live as YAML/JSON documents per tenant: "this action requires approval", "that decision needs two approvers", "this workflow auto-pauses if anomaly score exceeds X". The engine evaluates policy at every gate the workflow runtime hits.

Policies are versioned alongside workflows. A policy change is a change-managed event with its own audit entry.

06 · Action class registry

Three levels of reversibility.

Every action an agent or workflow can perform is registered as read-only, or with one of three reversibility levels:

Read-only: queries, reads, classifications. Cannot mutate anything.
Reversible: internal-only writes (CRM notes, internal state). Undone by reversing the write.
Compensable: external mutations (emails sent, API calls to vendor systems, scheduled meetings). Reversible only by best-effort compensating actions.
Irreversible: payments, contract execution, regulatory filing. Requires explicit human approval and is auditable as such.

The runtime refuses to execute an irreversible action without an approval token from the Authority Graph.

07 · Authority graph

Multi-approver, role-bound, break-glass.

Who can approve what is encoded as a graph: roles → action classes → approval rules (single, dual, N-of-M). A break-glass override exists for incidents — using it auto-creates a high-priority audit entry and notifies the security contact.

Approvals carry HLC-stamped signatures and are durably stored; an approval is a first-class entity in the audit log, not a UI state.

08 · Memory graph

Bi-temporal, queryable, reversible.

An observer subscribes to the event bus and writes entities (contacts, matters, deals, documents, workflows) and relationships (assigned_to, depends_on, derived_from, classified_as) into a per-tenant graph. Each edge has two timestamps: valid time (when the relationship held in the world) and transaction time (when we learned it). This is what enables time-travel queries — "what did we believe was true on 2026-03-12, given only the data we had at that time?"

09 · Operational Twin

Causal layer over the memory graph.

A directed acyclic graph that captures causal hypotheses ("response time up because backlog up because hire delayed"). Used by the counterfactual simulator and the intervention recommender to answer questions like "what's the smallest action that would have moved metric X by Y%?"

The Twin is read-only by humans, write-only by the workflow + analytics pipeline. Operators inspect it from the admin console.

10 · Workflow Genome (federated)

Cross-tenant patterns, k-anonymity floor.

Anonymized workflow performance shapes are contributed to a federated catalog. A tenant can ask "what does the median legal_corporate firm's NDA-review look like at the 95th percentile of throughput?" without exposing any individual tenant's data — a pattern is only shared once it appears in N+ distinct tenants (a k-anonymity floor).

Contribution is opt-in per tenant and revocable. Queries against the genome are themselves auditable.

11 · Time & ordering

Hybrid Logical Clocks.

Every event, audit entry, approval, and state transition carries an HLC timestamp — physical time bounded by logical counters. This guarantees a total order across processes without requiring a single source of truth, and survives clock skew up to a configurable bound.

12 · SLO instrumentation & rate limiting

Token buckets per route, histograms per endpoint.

Each route has an independent token-bucket rate limiter sized per tier. Latency is recorded into in-memory histograms; p50/p95/p99 are scraped at /metrics in Prometheus exposition format. Public /status exposes build SHA, uptime, and last-deploy timestamp.

OpenAPI spec, webhook signing schemas and event payload schemas: /api.