ADR-003: Use ElastiCache as Distributed Session Store
| Field | Value |
|---|---|
| Status | Accepted |
| Date | 2026-04-07 |
| Related SAD | SAD-001 |
| Related ADR | ADR-002 |
Context
The Apps layer (External Actions) performs a dual OAuth handshake for each user session, producing two short-lived tokens:
- Signals access token — for write-back calls to the ELN via MuleSoft
- Roche identity token — for calls to internal systems via Janus
These tokens must be available to the application for the duration of the user’s session. The Apps layer runs on Kubernetes (EKS) and must support horizontal scaling.
Decision
Use AWS ElastiCache (Redis) as a distributed session store. Both OAuth tokens are written to ElastiCache after the dual handshake and retrieved on subsequent requests. Any application pod can serve any request.
Rationale
Horizontal scalability
By externalising session state to a shared store, the Apps layer can run multiple replicas without sticky sessions. Any pod can serve any request. This eliminates a single point of failure and allows Kubernetes to scale pods based on load.
Resilience
If a pod is terminated (rolling update, node failure, scaling event), sessions are not lost. The replacement pod reads session state from ElastiCache and continues serving the user.
Simplicity
Redis provides native TTL support, aligning naturally with the short-lived nature of OAuth tokens. Expired sessions are automatically cleaned up without application-level housekeeping.
Alternatives Considered
In-memory session storage
Rejected. Limits the application to a single replica and introduces a single point of failure. Pod restarts lose all active sessions. Does not meet the availability requirements for a production service used by scientists during active experiments.
Database-backed sessions (PostgreSQL)
Considered but rejected in favour of ElastiCache. Session lookups happen on every request and require sub-millisecond latency. PostgreSQL adds unnecessary write-ahead log overhead for ephemeral data that expires within minutes. Redis is purpose-built for this use case.
Sticky sessions (session affinity)
Rejected. Sticky sessions couple a user to a specific pod, reducing the effectiveness of horizontal scaling and creating uneven load distribution. If the assigned pod is terminated, the session is lost regardless.
Consequences
- Additional infrastructure dependency (ElastiCache cluster must be provisioned and maintained)
- Managed by Minerva platform — provisioning, patching, and backups are handled
- Network hop for every session lookup (mitigated by ElastiCache’s sub-millisecond latency within the same VPC)
- Token data in ElastiCache must be treated as sensitive — encryption at rest and in transit is enabled via Minerva’s default ElastiCache configuration
- Application code must handle ElastiCache unavailability gracefully (fail-open vs fail-closed decision is per-application)