Architecture

How the platform is built.

A reference for engineers and architects. Layered, agent-based, event-driven, audit-first. Below: the components that exist today, what each one does, and where they sit in the stack. For deployment, see /deployment. For security properties, see /security.

For engineers. This page documents internal architecture. If you're evaluating Sonoda Dynamics for procurement, /security, /compliance and /deployment are the relevant pages. This one is for people who want to know how it actually works underneath.

01 · Layered stack

Six layers, top to bottom.

Layer 06Surfaces
Admin console (HTML/JS served by FastAPI), tenant dashboard, chat hub, Approval Inbox. The pages a human or operator actually clicks.
Layer 05API gateway
FastAPI routers under /api/v1/<resource>/{firm_id}/.... Every endpoint enforces tenant scoping + role guards. Webhook ingress lives here too.
Layer 04Agent mesh
Pluggable capability registry. Each agent (rag, document_intelligence, crm, communication, scheduling, compliance_officer, consultant, operational_advisor, ...) registers fully-qualified capabilities the workflow engine can dispatch to.
Layer 03Workflow runtime
Durable executor with explicit state machine. Versioned definitions per tenant. Background job queue for async steps.
Layer 02Trust & control
Policy engine, action registry, authority graph, approval inbox. The substrate that enforces "no irreversible action without authorization".
Layer 01Storage & audit
Per-tenant SQLite (WAL) or PostgreSQL. Hash-chained, envelope-encrypted audit log. Bi-temporal memory graph. Event bus.

02 · Agent mesh

Capabilities, not endpoints.

Agents register named capabilities into a process-wide mesh (agents/common/mesh.py). A capability is identified by its fully-qualified name, e.g. rag.ask, document_intelligence.analyze, crm.contact_health, compliance_officer.classify_text. The workflow engine dispatches by name; agents can be swapped, mocked, or hot-replaced without changing call sites.

Adding a vertical-specific agent (e.g. healthcare.deidentify_note) is a matter of registering the capability — the workflow templates and API automatically discover it.

03 · Workflow runtime

Durable, versioned, observable.

Workflows are declarative graphs of step → capability → next-step. Each run gets a unique id; state transitions write to per-tenant SQLite and to the audit log. Failures retry with exponential backoff up to a policy-defined ceiling, then escalate via the Approval Inbox.

Definitions are versioned. Promoting v3 doesn't delete v2 — operators can revert with one call. Templates ship per vertical: 7 for legal_corporate, 4 for finance_pe, 3 for legal_litigation, etc. — 35 templates across 10 verticals at present.

Workflow Definitions admin view showing 7 legal_corporate templates registered, versioned, and runnable
Workflow library after Bootstrap on a freshly provisioned legal_corporate tenant · 7 templates · v1 · supervised autonomy

03.b · Operational telemetry (live)

Captured from a real tenant.

Operational telemetry endpoint output — per-agent metrics, capability calls, success rate, average duration
JSON telemetry from a 30-day window on a real tenant · breakdown by agent + capability_fqn + success rate + avg latency

04 · Audit envelope

Hash-chained. Envelope-encrypted. Crypto-shreddable.

Every state-changing call writes an audit entry: SHA-256 hash of payload + hash of previous entry. Payloads are wrapped in an AEAD envelope (AES-256-GCM) encrypted under a per-tenant Key Encryption Key (KEK).

Destroying the KEK renders the payloads permanently unreadable — the hash chain remains verifiable, but content is opaque. This is the GDPR Article 17 path: erase the data without breaking the cryptographic proof that the historical operations occurred.

05 · Policy engine

Declarative, not hard-coded.

Policies live as YAML/JSON documents per tenant: "this action requires approval", "that decision needs two approvers", "this workflow auto-pauses if anomaly score exceeds X". The engine evaluates policy at every gate the workflow runtime hits.

Policies are versioned alongside workflows. A policy change is a change-managed event with its own audit entry.

06 · Action class registry

Reversibility tiers.

Every action an agent or workflow can perform is registered with a tier:

  • Tier 0 — read only: queries, reads, classifications. Cannot mutate.
  • Tier 1 — soft mutate: internal-only writes (CRM notes, internal state). Reversible by undoing the write.
  • Tier 2 — external mutate: emails sent, API calls to vendor systems, scheduled meetings. Reversible only by best-effort compensating actions.
  • Tier 3 — irreversible: payments, contract execution, regulatory filing. Requires explicit human approval and is auditable as such.

The runtime refuses to execute Tier 3 without an approval token from the Authority Graph.

07 · Authority graph

Multi-approver, role-bound, break-glass.

Who can approve what is encoded as a graph: roles → action classes → approval rules (single, dual, N-of-M). A break-glass override exists for incidents — using it auto-creates a high-priority audit entry and notifies the security contact.

Approvals carry HLC-stamped signatures and are durably stored; an approval is a first-class entity in the audit log, not a UI state.

08 · Memory graph

Bi-temporal, queryable, reversible.

An observer subscribes to the event bus and writes entities (contacts, matters, deals, documents, workflows) and relationships (assigned_to, depends_on, derived_from, classified_as) into a per-tenant graph. Each edge has two timestamps: valid time (when the relationship held in the world) and transaction time (when we learned it). This is what enables time-travel queries — "what did we believe was true on 2026-03-12, given only the data we had at that time?"

09 · Operational Twin

Causal layer over the memory graph.

A directed acyclic graph that captures causal hypotheses ("response time up because backlog up because hire delayed"). Used by the counterfactual simulator and the intervention recommender to answer questions like "what's the smallest action that would have moved metric X by Y%?"

The Twin is read-only by humans, write-only by the workflow + analytics pipeline. Operators inspect it from the admin console.

10 · Workflow Genome (federated)

Cross-tenant patterns, ε-differentially private.

Anonymized workflow performance shapes are contributed to a federated catalog. A tenant can ask "what does the median legal_corporate firm's NDA-review look like at the 95th percentile of throughput?" without exposing any individual tenant's data — Laplace noise is added per query with a configurable ε budget.

Contribution is opt-in per tenant and revocable. Queries against the genome are themselves auditable.

11 · Time & ordering

Hybrid Logical Clocks.

Every event, audit entry, approval, and state transition carries an HLC timestamp — physical time bounded by logical counters. This guarantees a total order across processes without requiring a single source of truth, and survives clock skew up to a configurable bound.

12 · SLO instrumentation & rate limiting

Token buckets per route, histograms per endpoint.

Each route has an independent token-bucket rate limiter sized per tier. Latency is recorded into in-memory histograms; p50/p95/p99 are scraped at /metrics in Prometheus exposition format. Public /status exposes build SHA, uptime, and last-deploy timestamp.

OpenAPI spec, webhook signing schemas and event payload schemas: /api.