Back to portfolio

Chapter 07 - Payment orchestration

Alur - Payment Orchestration

v1 shipped - deploy gated

A Go/PostgreSQL payment orchestration service that drives authorize-to-capture intents through SimPSP failures, leases stuck work, and books terminal money movements into Arus exactly once.

Source: orchestration/backend origin/main e34c23d; PR #3 fixup b74828a; M3 walkthrough records race verification and chaos gate evidence
Alur payment orchestration flowPayment intents move from creation through authorize and capture, with SimPSP faults, reaper status polling, same-transaction outbox booking, and Selaras external references.claimauthorizedcapturedoutbox5xx / timeoutstatusexternal_refIntent createdidempotent APIAuthorizePSP attempt rowCaptureauto v1Succeededterminal moneyArus bookingsame-tx outboxSimPSP faultstokens, no RNGReaper pollPSP truthSelaras refalur:...:v1Invariant: terminal money state and booking outbox write happen in one database transaction.
  • 500intent seeded chaos storm with every SimPSP fault class
  • 0double charges after reaper convergence
  • 1xbooking outbox row and Arus post per successful capture external-ref
  • ReaperPSP status poll is the source of truth; webhooks are hints
  • ~100/shonest local end-to-end target, not a ledger append benchmark

The conductor layer

Arus records money, WebhookOps delivers operational events, Family Finance consumes the API, Selaras assures settlement, and Alur orchestrates the payment state machine in front of an unreliable PSP.

Source: orchestration/docs/product-requirements/2026-06-12-v01-orchestration-v1-prd.md

Built for the acquiring failure interview

The canonical question is what happens when the PSP charges but the response times out. Alur keeps the attempt record, lets the reaper poll PSP truth, and completes without a second charge.

Source: orchestration/docs/system-analysis/2026-06-12-v01-orchestration-v1-sa.md

State machine

Authorize, capture, recover, then book once.

created -> authorizing

Worker claims due work with `FOR UPDATE SKIP LOCKED`, records the PSP attempt, commits, then calls SimPSP with the persisted idempotency key.

Source: orchestration/docs/system-analysis/2026-06-12-v01-orchestration-v1-sa.md

authorized -> capturing -> succeeded

Auto-capture is v1's only capture mode; terminal capture success enqueues the Arus booking outbox row in the same transaction.

Source: orchestration/backend/internal/domain/transition.go

reaper status poll

Past-deadline in-flight work is leased by extending `state_deadline`, then resolved through SimPSP status truth instead of webhook optimism.

Source: orchestration/backend/internal/worker/worker.go

refund requested -> processing -> succeeded

Refunds use the same attempt discipline, with the review-fix path leasing `processing` retries before any PSP send.

Source: orchestration/backend/internal/httpapi/m3_acceptance_test.go

Three-layer idempotency

Every replay surface has its own key.

Client -> API

Merchant calls carry `Idempotency-Key`; same request replays return the original intent or refund, while different bodies conflict through request hashes.

Source: orchestration/docs/adr/2026-06-12-v01-adr-002-three-layer-idempotency.md

Worker -> PSP

Every outbound authorize, capture, or refund attempt gets a fresh PSP idempotency key recorded before send, so recovery can replay without blind double effects.

Source: orchestration/docs/system-analysis/2026-06-12-v01-orchestration-v1-sa.md

Service -> Arus

The outbox external ref is unique and the drainer treats Arus existing-entry replay as success, preserving exactly-once booking semantics across retries.

Source: orchestration/backend/internal/worker/outbox.go

ADR callouts

The design keeps recovery visible.

ADR-001: explicit state machine

The portfolio value is in visible payment mechanics, so v1 uses PostgreSQL legal transitions, append-only logs, and outbox rows rather than hiding recovery inside a workflow engine.

Source: orchestration/docs/adr/2026-06-12-v01-adr-001-hand-rolled-state-machine.md

ADR-002: independent replay boundaries

API retries, PSP retries, and Arus post retries each own their own idempotency key or external ref because any layer can fail after a previous layer succeeded.

Source: orchestration/docs/adr/2026-06-12-v01-adr-002-three-layer-idempotency.md

ADR-003: webhooks plus polling

Webhooks are a fast path. The reaper's PSP status poll is the mandatory correctness path for stuck in-flight intents and timeout-after-charge recovery.

Source: orchestration/docs/adr/2026-06-12-v01-adr-003-webhooks-plus-polling.md

Chaos PSP gate

The merge gate attacks every fault class together.

TestM3ChaosGateSeededStorm

The gate creates 500 intents and assigns `ok`, `decline_hard`, `fail_5xx_then_ok:2`, `timeout_after_charge`, `timeout_before`, `webhook_dup:3`, and `webhook_ooo` round-robin.

Source: orchestration/backend/internal/httpapi/m3_acceptance_test.go

Converges terminal

The test repeatedly runs worker and reaper with forced deadline expiry, then fails unless zero non-terminal intents remain.

Source: orchestration/backend/internal/httpapi/m3_acceptance_test.go

No double charge

For every created intent, the SimPSP charge ledger must show at most one charge after the storm.

Source: orchestration/backend/internal/httpapi/m3_acceptance_test.go

Exactly-once outbox drain

For every successful capture, the test requires one booking row, one fake-Arus post, and zero pending/failed outbox rows after drain.

Source: orchestration/backend/internal/httpapi/m3_acceptance_test.go

Loop closing

Capture success becomes reconciliable ledger truth.

Arus HTTP boundary only

Alur posts balanced entries through the Arus API and never configures an Arus database connection.

Source: orchestration/docs/security/2026-06-12-v01-alur-m3-threat-model.md

External refs built for reconciliation

`alur:<intent>:capture:v1` and `alur:<intent>:refund:<refund>:v1` are stable references that Selaras can reconcile against SimPSP settlement exports.

Source: orchestration/docs/system-analysis/2026-06-12-v01-orchestration-v1-sa.md

Existing-entry replay is success

The drainer marks rows posted when Arus reports the external ref already exists, so outbox retries close cleanly instead of creating duplicates.

Source: orchestration/backend/internal/worker/outbox.go

Honesty line

The limits stay visible.

  • CI chaos booking assertions use an in-repo Arus HTTP boundary double, not live Arus.
  • k6 was unavailable; the load evidence is the Go seeded storm runner.
  • The honest local target is about 100 intents/s end-to-end, below Arus's 600 TPS ledger append result.
  • v1 is IDR + auto-capture only; manual capture, multi-currency, and routing remain roadmap work.
  • SimPSP proves the failure classes, not real-processor authorization.
  • Cloud Run is prepare-only and gated; no cloud command executed.
  • Timeline output includes PSP idempotency keys and is operator-only.
  • Source: family-finance/docs/docs/system-analysis/2026-06-12-v01-alur-case-study-evidence-sa.md

Roadmap

The next payment work is named, not implied.

Manual capture

The current service ships auto-capture only; explicit authorize-then-capture control is a named v2 workflow extension.

Source: orchestration/docs/product-requirements/2026-06-12-v01-orchestration-v1-prd.md

Multi-currency and routing

v1 is IDR-only with one SimPSP adapter. Multi-currency, FX, processor routing, and failover are intentionally deferred.

Source: orchestration/docs/product-requirements/2026-06-12-v01-orchestration-v1-prd.md

Real PSP adapter

SimPSP proves orchestration failure classes, while processor-specific auth and production certification stay outside v1.

Source: orchestration/docs/security/2026-06-12-v01-alur-m3-threat-model.md

Gated Cloud Run deploy

The prepare script exits before cloud mutation and prints `no cloud command executed`; live deployment requires a separate human gate.

Source: orchestration/backend/scripts/prepare-cloud-run.sh

Platform links

Alur sits between Arus and Selaras.

  • Read Arus ledger

    Alur books terminal money states into Arus through the API and inherits its external-ref replay contract.

    Source: orchestration/docs/system-analysis/2026-06-12-v01-orchestration-v1-sa.md
  • Read Selaras

    Alur external refs are shaped so Selaras can reconcile SimPSP settlement exports against Arus postings.

    Source: orchestration/docs/system-analysis/2026-06-12-v01-orchestration-v1-sa.md

Evidence trail

Every public claim points to a real artifact.

  1. Phase-0 evidence SAfamily-finance/docs/docs/system-analysis/2026-06-12-v01-alur-case-study-evidence-sa.md
  2. Portfolio walkthroughfamily-finance/docs/docs/runbooks/2026-06-12-v01-alur-case-study-walkthrough.md
  3. PR #3 mergegit:orchestration/backend origin/main e34c23d Merge pull request #3
  4. PR #3 review-fix commitgit:orchestration/backend b74828a fix: lease refund retries and split m3 migration
  5. Alur PRDorchestration/docs/product-requirements/2026-06-12-v01-orchestration-v1-prd.md
  6. Alur system analysisorchestration/docs/system-analysis/2026-06-12-v01-orchestration-v1-sa.md
  7. ADR-001 state machineorchestration/docs/adr/2026-06-12-v01-adr-001-hand-rolled-state-machine.md
  8. ADR-002 idempotencyorchestration/docs/adr/2026-06-12-v01-adr-002-three-layer-idempotency.md
  9. ADR-003 webhooks plus pollingorchestration/docs/adr/2026-06-12-v01-adr-003-webhooks-plus-polling.md
  10. M3 walkthroughorchestration/docs/runbooks/2026-06-12-v01-alur-m3-walkthrough.md
  11. M3 threat modelorchestration/docs/security/2026-06-12-v01-alur-m3-threat-model.md
  12. Chaos gate acceptance testorchestration/backend/internal/httpapi/m3_acceptance_test.go
  13. Refund lease regression testorchestration/backend/internal/httpapi/m3_acceptance_test.go
  14. Worker and reaper lease codeorchestration/backend/internal/worker/worker.go
  15. Outbox drainerorchestration/backend/internal/worker/outbox.go
  16. Transition and booking helpersorchestration/backend/internal/domain/transition.go
  17. M3 migration splitorchestration/backend/db/migrations/000002_m3_money_ops.up.sql
  18. M3 chaos load transcriptorchestration/backend/build/20260612T114931Z-m3-chaos-load.txt
  19. Load baseline scriptorchestration/backend/scripts/run-load-baseline.sh
  20. Prepare-only Cloud Run scriptorchestration/backend/scripts/prepare-cloud-run.sh
  21. Portfolio registryfamily-finance/web/src/components/portfolio/portfolioRegistry.ts
  22. Arus roadmap datafamily-finance/web/src/components/portfolio/arusLedgerCaseStudyData.ts