ADR-009: Caching Outbound Calls to the Signals API

Field	Value
Status	Open
Date	2026-04-07
Related SAD	SAD-001
Related ADR	ADR-004, ADR-005
Related Research	LEM Platform Requirements

Context

When xRED integration services (Apps, Jobs) interact with Signals, they make outbound calls to the Signals REST API via the DataRiver Facade or the LEM API Proxy (ADR-005). These calls consume the tenant-wide 1,000 calls/minute quota shared across every consumer in the Roche Signals tenant — not just xRED, but ORCA, LEM, and every other integration team.

Common outbound call patterns:

Read before write — before writing data back into an experiment, the Apps layer reads the current experiment/worksheet state to determine what to update. Multiple users working on the same experiment trigger identical reads.
Metadata resolution — looking up user profiles, entity IDs, experiment structure, or template definitions from Signals before performing an action.
Validation checks — verifying that an experiment is in the right state (open, not signed) before attempting a write-back.
Background sync — Jobs that periodically scan Signals for new experiments or changes (e.g. ORCA’s Legal Metadata Injection pattern).

Every unnecessary call to Signals:

Eats into the shared 1,000/min quota (risking rate limiting for all consumers)
Adds latency (Signals to DataRiver to MuleSoft round-trip)
Increases load on Signals itself (a shared SaaS platform)

Currently, xRED services make fresh Signals API calls on every request with no caching.

Options Under Consideration

Option A: Redis cache for Signals API responses

Cache Signals API responses in ElastiCache (Redis) with appropriate TTLs.

Arguments for:

ElastiCache is already provisioned for the Apps layer session store — no new infrastructure
Fine-grained TTLs per entity type (experiments: 5 min, templates: 1 hour, user profiles: 30 min)
Shared across all App replicas — one cache miss populates for all pods
Sub-millisecond reads vs ~200ms+ for a Signals API round-trip
Native TTL expiry handles cache invalidation automatically

Arguments against:

Application code complexity — every Signals client call needs cache wrapper logic
Risk of stale data — if an experiment is modified between cache set and next read, the app operates on outdated state
Cache key design matters — must account for user context (some Signals responses are user-scoped)
Need to handle cache stampede on popular experiments
Shared Redis risk — the same ElastiCache instance currently holds sensitive OAuth session tokens (ADR-003). Mixing API response cache data into the same instance introduces concerns:
- Memory pressure — a large cache corpus could evict session tokens if the instance runs low on memory (Redis eviction policies). A scientist mid-workflow could lose their session because the cache filled up.
- Noisy neighbour — a cache stampede (many concurrent misses for the same key) could spike Redis CPU/connections, degrading session store performance.
- Security boundary — session tokens are sensitive credentials; cached API responses are business data. Co-locating them means a bug in cache key design could theoretically expose tokens.
- Mitigation options: use separate Redis databases (logical isolation, same instance), use key prefixes with strict naming conventions (session: vs cache:), or provision a second ElastiCache instance dedicated to caching (adds cost but eliminates all shared-resource risks).

Suggested TTLs:

Entity type	TTL	Rationale
Experiment structure	2–5 min	Changes during active editing
Worksheet metadata	2–5 min	Same as experiments
User profiles	30 min	Rarely change
Templates / definitions	1 hour	Admin-managed, very stable
Entity existence checks	10 min	Just validating something exists

Option B: Gravitee response caching on the outbound path

Configure Gravitee to cache responses when xRED services call through it to reach the DataRiver/Signals API.

Arguments for:

Zero application code — caching configured as a Gravitee policy
Transparent to the adapter

Arguments against:

Gravitee sits on the inbound path (Signals to xRED), not the outbound path (xRED to Signals). Outbound calls to Signals go through MuleSoft/DataRiver, which xRED does not control.
Would only work if xRED routed outbound Signals calls back through its own Gravitee, which adds an unnecessary hop
Gravitee caching is URL-based — poor fit for Signals API calls that include user tokens in headers

Conclusion: Not a natural fit for outbound caching. Gravitee caching is more relevant for inbound request caching (ADR-004).

Option C: Application-level in-memory cache (per-pod)

Cache Signals API responses in process memory (e.g. Python functools.lru_cache, cachetools, or a simple dict with TTL).

Arguments for:

Simplest implementation — no external dependencies
No Redis latency (in-process access)
Good for short-lived, high-frequency lookups within a single request lifecycle

Arguments against:

Not shared across replicas — each pod maintains its own cache, multiplying Signals calls by the number of pods
Cache lost on pod restart
Memory pressure on the application pod
No visibility into cache hit rates without custom instrumentation

Conclusion: Useful as a complement to Redis for very short-lived, request-scoped caching (e.g. “don’t look up the same user profile 3 times within one request”). Not sufficient as the primary caching strategy.

Option D: Write-through cache with invalidation on write-back

When the Apps layer writes back to Signals, invalidate or update the relevant cache entries immediately.

Arguments for:

Eliminates the stale-data-after-write problem
The write-back response often contains the updated state — can cache it directly
Natural integration point (we already know we’re modifying the entity)

Arguments against:

Only helps with data xRED itself modifies — other consumers modifying the same experiment won’t trigger our invalidation
Adds coupling between the write path and the cache layer

Recommended Approach

Option A (Redis) + Option C (in-memory) + Option D (write-through invalidation) as a layered strategy:

In-memory for request-scoped deduplication (don’t call Signals twice for the same entity within a single user action)
Redis for cross-pod, cross-request caching with entity-appropriate TTLs
Write-through invalidation when xRED itself modifies an entity

This reduces Signals API calls without adding infrastructure (Redis is already deployed) and provides a clear invalidation path for data we modify.

Open Questions

What is xRED’s actual Signals API call volume today? (Check Datadog / DataRiver analytics)
Are we hitting or approaching the 1,000/min quota?
Do other consumers (ORCA, LEM) cache their Signals calls, or are we all competing on raw API calls?
Should the cache be opt-in per integration, or a shared Signals client wrapper that all adapters use?
How do we handle user-scoped responses (where the same endpoint returns different data for different users)?

Decision

Pending. The layered approach (Redis + in-memory + write-through) is the likely direction. Implementation should start with a shared Signals client wrapper that encapsulates caching logic, so individual adapters don’t need to implement it themselves.