ADR-009: Caching Outbound Calls to the Signals API
| Field | Value |
|---|---|
| Status | Open |
| Date | 2026-04-07 |
| Related SAD | SAD-001 |
| Related ADR | ADR-004, ADR-005 |
| Related Research | LEM Platform Requirements |
Context
When xRED integration services (Apps, Jobs) interact with Signals, they make outbound calls to the Signals REST API via the DataRiver Facade or the LEM API Proxy (ADR-005). These calls consume the tenant-wide 1,000 calls/minute quota shared across every consumer in the Roche Signals tenant — not just xRED, but ORCA, LEM, and every other integration team.
Common outbound call patterns:
- Read before write — before writing data back into an experiment, the Apps layer reads the current experiment/worksheet state to determine what to update. Multiple users working on the same experiment trigger identical reads.
- Metadata resolution — looking up user profiles, entity IDs, experiment structure, or template definitions from Signals before performing an action.
- Validation checks — verifying that an experiment is in the right state (open, not signed) before attempting a write-back.
- Background sync — Jobs that periodically scan Signals for new experiments or changes (e.g. ORCA’s Legal Metadata Injection pattern).
Every unnecessary call to Signals:
- Eats into the shared 1,000/min quota (risking rate limiting for all consumers)
- Adds latency (Signals to DataRiver to MuleSoft round-trip)
- Increases load on Signals itself (a shared SaaS platform)
Currently, xRED services make fresh Signals API calls on every request with no caching.
Options Under Consideration
Option A: Redis cache for Signals API responses
Cache Signals API responses in ElastiCache (Redis) with appropriate TTLs.
Arguments for:
- ElastiCache is already provisioned for the Apps layer session store — no new infrastructure
- Fine-grained TTLs per entity type (experiments: 5 min, templates: 1 hour, user profiles: 30 min)
- Shared across all App replicas — one cache miss populates for all pods
- Sub-millisecond reads vs ~200ms+ for a Signals API round-trip
- Native TTL expiry handles cache invalidation automatically
Arguments against:
- Application code complexity — every Signals client call needs cache wrapper logic
- Risk of stale data — if an experiment is modified between cache set and next read, the app operates on outdated state
- Cache key design matters — must account for user context (some Signals responses are user-scoped)
- Need to handle cache stampede on popular experiments
- Shared Redis risk — the same ElastiCache instance currently holds sensitive OAuth
session tokens (ADR-003). Mixing API response cache data
into the same instance introduces concerns:
- Memory pressure — a large cache corpus could evict session tokens if the instance runs low on memory (Redis eviction policies). A scientist mid-workflow could lose their session because the cache filled up.
- Noisy neighbour — a cache stampede (many concurrent misses for the same key) could spike Redis CPU/connections, degrading session store performance.
- Security boundary — session tokens are sensitive credentials; cached API responses are business data. Co-locating them means a bug in cache key design could theoretically expose tokens.
- Mitigation options: use separate Redis databases (logical isolation, same instance),
use key prefixes with strict naming conventions (
session:vscache:), or provision a second ElastiCache instance dedicated to caching (adds cost but eliminates all shared-resource risks).
Suggested TTLs:
| Entity type | TTL | Rationale |
|---|---|---|
| Experiment structure | 2–5 min | Changes during active editing |
| Worksheet metadata | 2–5 min | Same as experiments |
| User profiles | 30 min | Rarely change |
| Templates / definitions | 1 hour | Admin-managed, very stable |
| Entity existence checks | 10 min | Just validating something exists |
Option B: Gravitee response caching on the outbound path
Configure Gravitee to cache responses when xRED services call through it to reach the DataRiver/Signals API.
Arguments for:
- Zero application code — caching configured as a Gravitee policy
- Transparent to the adapter
Arguments against:
- Gravitee sits on the inbound path (Signals to xRED), not the outbound path (xRED to Signals). Outbound calls to Signals go through MuleSoft/DataRiver, which xRED does not control.
- Would only work if xRED routed outbound Signals calls back through its own Gravitee, which adds an unnecessary hop
- Gravitee caching is URL-based — poor fit for Signals API calls that include user tokens in headers
Conclusion: Not a natural fit for outbound caching. Gravitee caching is more relevant for inbound request caching (ADR-004).
Option C: Application-level in-memory cache (per-pod)
Cache Signals API responses in process memory (e.g. Python functools.lru_cache,
cachetools, or a simple dict with TTL).
Arguments for:
- Simplest implementation — no external dependencies
- No Redis latency (in-process access)
- Good for short-lived, high-frequency lookups within a single request lifecycle
Arguments against:
- Not shared across replicas — each pod maintains its own cache, multiplying Signals calls by the number of pods
- Cache lost on pod restart
- Memory pressure on the application pod
- No visibility into cache hit rates without custom instrumentation
Conclusion: Useful as a complement to Redis for very short-lived, request-scoped caching (e.g. “don’t look up the same user profile 3 times within one request”). Not sufficient as the primary caching strategy.
Option D: Write-through cache with invalidation on write-back
When the Apps layer writes back to Signals, invalidate or update the relevant cache entries immediately.
Arguments for:
- Eliminates the stale-data-after-write problem
- The write-back response often contains the updated state — can cache it directly
- Natural integration point (we already know we’re modifying the entity)
Arguments against:
- Only helps with data xRED itself modifies — other consumers modifying the same experiment won’t trigger our invalidation
- Adds coupling between the write path and the cache layer
Recommended Approach
Option A (Redis) + Option C (in-memory) + Option D (write-through invalidation) as a layered strategy:
- In-memory for request-scoped deduplication (don’t call Signals twice for the same entity within a single user action)
- Redis for cross-pod, cross-request caching with entity-appropriate TTLs
- Write-through invalidation when xRED itself modifies an entity
This reduces Signals API calls without adding infrastructure (Redis is already deployed) and provides a clear invalidation path for data we modify.
Open Questions
- What is xRED’s actual Signals API call volume today? (Check Datadog / DataRiver analytics)
- Are we hitting or approaching the 1,000/min quota?
- Do other consumers (ORCA, LEM) cache their Signals calls, or are we all competing on raw API calls?
- Should the cache be opt-in per integration, or a shared Signals client wrapper that all adapters use?
- How do we handle user-scoped responses (where the same endpoint returns different data for different users)?
Decision
Pending. The layered approach (Redis + in-memory + write-through) is the likely direction. Implementation should start with a shared Signals client wrapper that encapsulates caching logic, so individual adapters don’t need to implement it themselves.