Skip to Content
Decision RecordsADR-003: Session Store

ADR-003: Use ElastiCache as Distributed Session Store

FieldValue
StatusAccepted
Date2026-04-07
Related SADSAD-001
Related ADRADR-002

Context

The Apps layer (External Actions) performs a dual OAuth handshake for each user session, producing two short-lived tokens:

  1. Signals access token — for write-back calls to the ELN via MuleSoft
  2. Roche identity token — for calls to internal systems via Janus

These tokens must be available to the application for the duration of the user’s session. The Apps layer runs on Kubernetes (EKS) and must support horizontal scaling.

Decision

Use AWS ElastiCache (Redis) as a distributed session store. Both OAuth tokens are written to ElastiCache after the dual handshake and retrieved on subsequent requests. Any application pod can serve any request.

Rationale

Horizontal scalability

By externalising session state to a shared store, the Apps layer can run multiple replicas without sticky sessions. Any pod can serve any request. This eliminates a single point of failure and allows Kubernetes to scale pods based on load.

Resilience

If a pod is terminated (rolling update, node failure, scaling event), sessions are not lost. The replacement pod reads session state from ElastiCache and continues serving the user.

Simplicity

Redis provides native TTL support, aligning naturally with the short-lived nature of OAuth tokens. Expired sessions are automatically cleaned up without application-level housekeeping.

Alternatives Considered

In-memory session storage

Rejected. Limits the application to a single replica and introduces a single point of failure. Pod restarts lose all active sessions. Does not meet the availability requirements for a production service used by scientists during active experiments.

Database-backed sessions (PostgreSQL)

Considered but rejected in favour of ElastiCache. Session lookups happen on every request and require sub-millisecond latency. PostgreSQL adds unnecessary write-ahead log overhead for ephemeral data that expires within minutes. Redis is purpose-built for this use case.

Sticky sessions (session affinity)

Rejected. Sticky sessions couple a user to a specific pod, reducing the effectiveness of horizontal scaling and creating uneven load distribution. If the assigned pod is terminated, the session is lost regardless.

Consequences

  • Additional infrastructure dependency (ElastiCache cluster must be provisioned and maintained)
  • Managed by Minerva platform — provisioning, patching, and backups are handled
  • Network hop for every session lookup (mitigated by ElastiCache’s sub-millisecond latency within the same VPC)
  • Token data in ElastiCache must be treated as sensitive — encryption at rest and in transit is enabled via Minerva’s default ElastiCache configuration
  • Application code must handle ElastiCache unavailability gracefully (fail-open vs fail-closed decision is per-application)
Last updated on