Research: ORCA Platform (LEM Team)
| Field | Value |
|---|---|
| Type | Research |
| Status | Active |
| Author | xRED Dev Team |
| Created | 2026-04-07 |
| Source | code.roche.com/orca (96 repositories) |
| Related SAD | SAD-001 |
1. What is ORCA?
ORCA (team name “Lemmings”, group xPET = cross-Pharma ELN Team) is the LEM (Lab Experiment Management) platform team’s integration platform. It builds custom services bridging Revvity Signals ELN with internal Roche data systems — the same problem space as xRED ELN, but from the platform team’s side.
ORCA and xRED ELN share the same Signals platform, the same MuleSoft/DataRiver gateway, and the same integration mechanisms (External Lists, External Tables, External Actions). The key difference is organisational scope and infrastructure:
| Aspect | ORCA (LEM/Platform) | xRED ELN |
|---|---|---|
| Team | Lemmings (xPET) | xRED Dev |
| Infrastructure | CaaS (on-prem K8s, Rancher) | Minerva (AWS EKS) |
| Primary backend | Java 21/25, Spring Boot 3 | Python, FastAPI |
| Frontend | Angular + R Design System | React |
| CI/CD | GitLab CI | GitHub Actions |
| Container registry | registry.code.roche.com | ghcr.io |
| Secrets | Vault (on-prem namespaces) | Vault (Minerva namespaces) |
| API gateway | MuleSoft → custom NGINX | MuleSoft → Gravitee |
| Repo count | 96 repos (many small services) | 3 repos (monorepo approach) |
2. Architecture Overview
ORCA follows a hub-and-spoke adapter pattern:
3. Service Catalogue
3.1 External List Adapters (Dropdowns)
These serve Signals “External Lists” — dropdown data that Signals polls periodically.
| Service | Data Source | Tech | Pattern |
|---|---|---|---|
| projects-gred | MAPS API | Java, SQLite in-memory | HTTP extract → SQLite transform → JSON |
| projects-pred | REDPanda API | Java, SQLite in-memory | Same pattern |
| cost-centers | Organization API (~7k entries) | Java, SQLite in-memory, scheduler | Cached with scheduled refresh |
Key pattern — SQLite as transform engine: Adapters load external API data into in-memory SQLite, then use SQL queries (supplied via K8s ConfigMaps) to transform into Signals-compatible JSON. Lightweight ETL without a dedicated data pipeline.
A shared Docker image (signals-external-list-adapter) provides the runtime; each adapter
is configured per-instance via environment variables and ConfigMaps. The external-list-template
repo provides scaffolding for new adapters.
An NGINX API gateway (api-gateway repo) routes requests based on the ?list= query
parameter to the correct K8s service.
3.2 External Data Source Services (Table Lookups)
These validate IDs typed into Signals table cells and return metadata.
Generation 2 (simple lookups)
| Service | Repo | Data Source | Auth (outbound) |
|---|---|---|---|
| HITSLIMS Lookup | hitslims-lookup | HITSLIMS project codes | Static header |
| Arvados Lookup | arvados-lookup | Arvados collection UUIDs | Static header |
| SMDI Lookup | smdi-lookup | SMDI G# lot data | OAuth2 client credentials |
Pattern: Single GET /v1/{source}/{id} endpoint returning a flat JSON object. No pagination.
Uses Spring Cloud OpenFeign for upstream calls. Returns markdownLink fields for clickable
links in Signals.
Generation 3 (advanced lookups)
| Service | Repo | Data Source | Extra Features |
|---|---|---|---|
| RCDB Lookup | rcdb-lookup | Roche Catalyst Database | Two-phase search+retrieve, user resolution, admin test mode |
| IRCI Lookup | irci-lookup | Integrated Roche Chemical Inventory | MOL file support, user resolution, admin test mode |
| User Lookup | signals-user-lookup | Signals SCIM API | In-memory cache of all Signals users, hourly refresh |
Pattern: More sophisticated — rcdb-lookup and irci-lookup depend on signals-user-lookup
to resolve email addresses to unix IDs. Two-phase search+retrieve pattern for RCDB. Admin
test IDs for Signals Admin panel validation without real data access.
Unified Python Service (newest approach)
| Service | Repo | Data Sources | Decision |
|---|---|---|---|
| External Sources | signals-external-sources | TaPIR (gRED + pRED), SMDI/ChemLot | ADR 0026: code-first over configuration |
Pattern: Python/FastAPI with dynamic router discovery (discover_and_include_routers).
Modular client architecture with base_client.py. RBAC evaluation — calls User Lookup to
determine user’s group/role, returns FULL/LIMITED/DENIED responses. Rate limited at
1000 req/min per IP. This is now the standard for all new external data integrations.
3.3 External Actions (Interactive Web Apps)
| Service | Repo | Integrations | Tech |
|---|---|---|---|
| Signals-FISH Integration | signals-fish-integration | FISH, MaxSMR, IRCI Score, SMART | Java/Spring Boot + Angular monolith |
| External Action Demo | signals-external-action | Reference implementation | Next.js + React |
Signals-FISH Integration is a modular monolith — rather than building separate apps per integration, ORCA chose to extend a single web app (ADR 0021, 0024). It handles:
- FISH study metadata import via GraphQL against FISH MDS
- IRCI registration and read operations
- MAX/SMR and SMART analytical requests
- LEAP bulk updates
Dual authentication: Roche SSO (PingFederate) for user identity + Signals OAuth2 for API access. Tokens stored in encrypted cookies (not ElastiCache like xRED).
Signals Gateway (signals-gateway repo) provides a JWE token broker — encrypts OAuth2
access tokens with RSA (RSA-OAEP-256 + A256GCM) so they can be safely passed in browser URLs.
The client holds the RSA private key for decryption. Stateless, self-contained pattern.
3.4 Background Services
| Service | Repo | Purpose | Pattern |
|---|---|---|---|
| Legal Metadata Injection | signals-legal-metadata | Auto-populates Unix ID and Cost Center on new experiments | Java cron, PostgreSQL state tracking |
| Reference Builder | signals-reference-builder | Scans experiments for FISH Study IDs, builds reference DB | Java CLI app (Spring Boot, no web), K8s CronJob |
| Reference Presenter | signals-reference-presenter | REST API serving FISH references to DataRiver | Java/Spring Boot, PostgreSQL, atomic table-swap |
| Adoption Metrics ETL | adoption-metrics-dashboard | Extracts usage data from Signals API → PostgreSQL → Tableau | Python Docker, K8s CronJob |
| LEAP Bulk Upload | leap-bulk-initial-upload | Bulk-creates materials/lots/containers from Google Sheets | Python, K8s Job via GitLab pipelines |
| Admin Scripts | signals-admin-scripts | Systematic Signals modifications | GitLab parameterized pipelines, plan-then-execute |
3.5 Integration Shell (Template)
The signals-integrations-shell repo is a clean fork of signals-fish-integration with
business logic as an example. It is the official template for bootstrapping new External
Action integrations — same multi-module Maven structure (backend, frontend, bundle,
blackbox-tests), same dual-auth pattern, same WireMock test setup.
4. Infrastructure
Docker Image Hierarchy
alpine:3.21
└── orca-base (Roche CA certs, libc6-compat, non-root user)
├── java-21 (custom JRE via jlink, minimal modules)
│ ├── java-21-jdk (full JDK)
│ │ └── maven-java-21 (Maven 3.9.8)
│ └── java-25 (newer JRE)
│ └── maven-java-25
├── python3_pandas (Python + pandas + FastAPI)
└── glab-release (glab + vault CLI)Environment Map
| Environment | Vault Namespace | K8s Namespace | Signals URL |
|---|---|---|---|
| DEV | orcaid-stg-001 | lem-dev | roche-dia-dev2.signalsresearch.revvitycloud.eu |
| TEST | orcait-prd-002 | lem-test | roche-dia-tst-signalsnotebook.srpste3.revvitycloud.eu |
| gTRAIN | orcaig-prd | lem-gtrain2 | — |
| PROD | orcaip-prd-001 | lem-prod | roche-dia-signalsnotebook.srpe3.revvitycloud.eu |
CI/CD (GitLab CI)
Centralised pipeline templates in base-pipelines repo:
java-k8s-base.yml— full Java service pipeline (build → package → verify → deploy)release-base.yml— release management (Docker retag, git tag, GitLab release)- K8s cluster:
rancher.emea.roche.com(Rancher-managed CaaS, EMEA) - Vault auth: GitLab JWT tokens with
aud: https://code.roche.com - Deploy image:
benchling/devops/k8s-deploy-image - Container registry:
registry.code.roche.com
Vault Bootstrap
The vault-bootstrap repo automates Vault role creation for new GitLab projects — creates
JWT auth roles bound by project_id with gitlab-jwt policy. Run manually via CI pipelines.
5. Architecture Decisions (from ORCA’s ADRs)
ORCA maintains formal ADRs in Y-statement format in their architecture repo. Key decisions
relevant to xRED ELN:
| ADR | Decision | Relevance |
|---|---|---|
| 0013 | CaaS (on-prem K8s) despite decommissioning plans — alternatives (Minerva, RAP) not validated yet | xRED chose Minerva; ORCA may eventually migrate |
| 0015 | JWE Virtual Token for secure token relay | Alternative to xRED’s ElastiCache session store |
| 0017 | Rejected Revvity’s pre-built ETL Docker container (security, code opacity) | Confirms build-your-own approach for Signals data extraction |
| 0021 | Monolith-first for External Actions (extend signals-fish-integration) | xRED uses separate apps per integration (monorepo, but independent deployments) |
| 0024 | Extend monolith for MaxSMR and IRCI Score | Reinforces monolith approach |
| 0026 | Code-first over configuration for new LoV integrations | Matches xRED’s FastAPI adapter approach |
| 0030 | Two-step validate + enrich with graceful fallback for TaPIR | Good resilience pattern for xRED lookups |
6. Patterns Worth Adopting
Already in use by xRED (validated by ORCA’s parallel adoption)
- FastAPI for data source adapters — ORCA’s
signals-external-sourcesconfirms this is the right direction - Vault for secrets — same approach, different namespaces
- MuleSoft as external gateway — mandatory per LEM governance
Worth considering for xRED
| Pattern | ORCA Implementation | xRED Applicability |
|---|---|---|
| SQLite as transform engine | Load API data → SQLite → SQL query → Signals JSON | Useful for complex data transformations in Lookups without a database dependency |
| JWE token broker | Encrypt OAuth tokens with RSA, pass as JWE | Alternative to ElastiCache for stateless token passing in External Actions |
| SCIM user cache | In-memory cache of all Signals users, hourly refresh | Useful if xRED needs user resolution for RBAC in lookups |
| Atomic table-swap | Worker table → swap with main table for consistent reads | Good pattern for background data sync jobs |
| Admin test IDs | Configurable IDs return mock responses for Signals Admin validation | Helpful for testing integrations without real upstream access |
| Two-step validate + enrich | Validate entity first, then enrich; return partial data on enrichment failure | Resilient pattern for unreliable upstream APIs |
| GitLab parameterised pipelines for admin ops | Plan-review-execute with protected environments | xRED could use GitHub Actions equivalent for admin Signals operations |
| Dynamic router discovery | discover_and_include_routers in FastAPI | Already natural in FastAPI — confirms the pattern |
Differences to be aware of
| Concern | ORCA Approach | xRED Approach | Notes |
|---|---|---|---|
| Session management | Encrypted cookies (no external store) | ElastiCache (distributed store) | ORCA trades scalability for simplicity; xRED’s approach supports HA |
| External Action architecture | Monolith (one app, many integrations) | Separate apps (monorepo, independent deploys) | ORCA’s monolith adds coupling risk but reduces auth boilerplate |
| Internal gateway | Custom NGINX (routes by ?list= param) | Gravitee (managed by Minerva) | Gravitee provides more features (caching, analytics, rate limiting) |
| Base images | Custom alpine → JRE/JDK chain | Standard python:3.12-slim | ORCA’s chain includes Roche CA certs; xRED handles certs in Dockerfile |
7. Key Repositories Reference
Core Services
| Repo | ID | Purpose | Stack |
|---|---|---|---|
architecture | 328070 | C4 diagrams, ADRs, SADs, API contracts | PlantUML, Markdown |
signals-fish-integration | 418009 | Main External Actions monolith (FISH, IRCI, MAX/SMR) | Java/Spring Boot + Angular |
signals-external-sources | 466430 | Unified data source adapter (TaPIR, SMDI, ChemLot) | Python/FastAPI |
signals-gateway | 433445 | JWE token broker for OAuth token relay | Java/Spring Boot |
signals-external-list-adapter | 364528 | Shared External List runtime + http-extractor library | Java/Spring Boot |
signals-user-lookup | 457004 | User resolution via SCIM cache | Java/Spring Boot |
Lookup Services
| Repo | ID | Data Source |
|---|---|---|
hitslims-lookup | 408257 | HITSLIMS project codes |
arvados-lookup | 420510 | Arvados collection UUIDs |
smdi-lookup | 427883 | SMDI G# lot data |
rcdb-lookup | 456275 | Roche Catalyst Database |
irci-lookup | 455447 | Integrated Roche Chemical Inventory |
irci-chemical-data-source-poc | 449428 | IRCI chemical data (PoC) |
Infrastructure and Templates
| Repo | ID | Purpose |
|---|---|---|
orca-base | 343496 | Alpine base image with Roche CA certs |
java-21 | 343162 | Custom JRE via jlink |
base-pipelines | 528604 | GitLab CI templates |
vault-bootstrap | 433541 | Automated Vault role setup |
signals-integrations-shell | 440863 | Template for new External Action integrations |
external-list-template | 353290 | Template for new External List adapters |
offerings | 437666 | Integration design guidelines and questionnaires |
Background / ETL
| Repo | ID | Purpose |
|---|---|---|
signals-legal-metadata | 337275 | Auto-populate compliance fields on new experiments |
signals-reference-builder | 501600 | Build cross-system reference data from Signals |
signals-reference-presenter | 501604 | Serve reference data via DataRiver |
leap-bulk-initial-upload | 438601 | Bulk create materials/lots from Google Sheets |
adoption-metrics-dashboard | 414346 | Usage analytics ETL → Tableau |
8. Collaboration Notes
The offerings repo contains a branch feature/initial-updates-for-collaboration-with-xRED-vertical,
indicating active collaboration planning between ORCA and xRED teams. The offerings repo
also documents the Integration Design Questionnaire (user view, system view, data view)
which could be useful for standardising how xRED documents new integration requirements.
Contact for the ORCA/LEM platform team integration requests can be routed through the DataRiver group or the LEM team directly.