Skip to Content
ResearchORCA Platform (LEM Team)

Research: ORCA Platform (LEM Team)

FieldValue
TypeResearch
StatusActive
AuthorxRED Dev Team
Created2026-04-07
Sourcecode.roche.com/orca  (96 repositories)
Related SADSAD-001

1. What is ORCA?

ORCA (team name “Lemmings”, group xPET = cross-Pharma ELN Team) is the LEM (Lab Experiment Management) platform team’s integration platform. It builds custom services bridging Revvity Signals ELN with internal Roche data systems — the same problem space as xRED ELN, but from the platform team’s side.

ORCA and xRED ELN share the same Signals platform, the same MuleSoft/DataRiver gateway, and the same integration mechanisms (External Lists, External Tables, External Actions). The key difference is organisational scope and infrastructure:

AspectORCA (LEM/Platform)xRED ELN
TeamLemmings (xPET)xRED Dev
InfrastructureCaaS (on-prem K8s, Rancher)Minerva (AWS EKS)
Primary backendJava 21/25, Spring Boot 3Python, FastAPI
FrontendAngular + R Design SystemReact
CI/CDGitLab CIGitHub Actions
Container registryregistry.code.roche.comghcr.io
SecretsVault (on-prem namespaces)Vault (Minerva namespaces)
API gatewayMuleSoft → custom NGINXMuleSoft → Gravitee
Repo count96 repos (many small services)3 repos (monorepo approach)

2. Architecture Overview

ORCA follows a hub-and-spoke adapter pattern:

3. Service Catalogue

3.1 External List Adapters (Dropdowns)

These serve Signals “External Lists” — dropdown data that Signals polls periodically.

ServiceData SourceTechPattern
projects-gredMAPS APIJava, SQLite in-memoryHTTP extract → SQLite transform → JSON
projects-predREDPanda APIJava, SQLite in-memorySame pattern
cost-centersOrganization API (~7k entries)Java, SQLite in-memory, schedulerCached with scheduled refresh

Key pattern — SQLite as transform engine: Adapters load external API data into in-memory SQLite, then use SQL queries (supplied via K8s ConfigMaps) to transform into Signals-compatible JSON. Lightweight ETL without a dedicated data pipeline.

A shared Docker image (signals-external-list-adapter) provides the runtime; each adapter is configured per-instance via environment variables and ConfigMaps. The external-list-template repo provides scaffolding for new adapters.

An NGINX API gateway (api-gateway repo) routes requests based on the ?list= query parameter to the correct K8s service.

3.2 External Data Source Services (Table Lookups)

These validate IDs typed into Signals table cells and return metadata.

Generation 2 (simple lookups)

ServiceRepoData SourceAuth (outbound)
HITSLIMS Lookuphitslims-lookupHITSLIMS project codesStatic header
Arvados Lookuparvados-lookupArvados collection UUIDsStatic header
SMDI Lookupsmdi-lookupSMDI G# lot dataOAuth2 client credentials

Pattern: Single GET /v1/{source}/{id} endpoint returning a flat JSON object. No pagination. Uses Spring Cloud OpenFeign for upstream calls. Returns markdownLink fields for clickable links in Signals.

Generation 3 (advanced lookups)

ServiceRepoData SourceExtra Features
RCDB Lookuprcdb-lookupRoche Catalyst DatabaseTwo-phase search+retrieve, user resolution, admin test mode
IRCI Lookupirci-lookupIntegrated Roche Chemical InventoryMOL file support, user resolution, admin test mode
User Lookupsignals-user-lookupSignals SCIM APIIn-memory cache of all Signals users, hourly refresh

Pattern: More sophisticated — rcdb-lookup and irci-lookup depend on signals-user-lookup to resolve email addresses to unix IDs. Two-phase search+retrieve pattern for RCDB. Admin test IDs for Signals Admin panel validation without real data access.

Unified Python Service (newest approach)

ServiceRepoData SourcesDecision
External Sourcessignals-external-sourcesTaPIR (gRED + pRED), SMDI/ChemLotADR 0026: code-first over configuration

Pattern: Python/FastAPI with dynamic router discovery (discover_and_include_routers). Modular client architecture with base_client.py. RBAC evaluation — calls User Lookup to determine user’s group/role, returns FULL/LIMITED/DENIED responses. Rate limited at 1000 req/min per IP. This is now the standard for all new external data integrations.

3.3 External Actions (Interactive Web Apps)

ServiceRepoIntegrationsTech
Signals-FISH Integrationsignals-fish-integrationFISH, MaxSMR, IRCI Score, SMARTJava/Spring Boot + Angular monolith
External Action Demosignals-external-actionReference implementationNext.js + React

Signals-FISH Integration is a modular monolith — rather than building separate apps per integration, ORCA chose to extend a single web app (ADR 0021, 0024). It handles:

  • FISH study metadata import via GraphQL against FISH MDS
  • IRCI registration and read operations
  • MAX/SMR and SMART analytical requests
  • LEAP bulk updates

Dual authentication: Roche SSO (PingFederate) for user identity + Signals OAuth2 for API access. Tokens stored in encrypted cookies (not ElastiCache like xRED).

Signals Gateway (signals-gateway repo) provides a JWE token broker — encrypts OAuth2 access tokens with RSA (RSA-OAEP-256 + A256GCM) so they can be safely passed in browser URLs. The client holds the RSA private key for decryption. Stateless, self-contained pattern.

3.4 Background Services

ServiceRepoPurposePattern
Legal Metadata Injectionsignals-legal-metadataAuto-populates Unix ID and Cost Center on new experimentsJava cron, PostgreSQL state tracking
Reference Buildersignals-reference-builderScans experiments for FISH Study IDs, builds reference DBJava CLI app (Spring Boot, no web), K8s CronJob
Reference Presentersignals-reference-presenterREST API serving FISH references to DataRiverJava/Spring Boot, PostgreSQL, atomic table-swap
Adoption Metrics ETLadoption-metrics-dashboardExtracts usage data from Signals API → PostgreSQL → TableauPython Docker, K8s CronJob
LEAP Bulk Uploadleap-bulk-initial-uploadBulk-creates materials/lots/containers from Google SheetsPython, K8s Job via GitLab pipelines
Admin Scriptssignals-admin-scriptsSystematic Signals modificationsGitLab parameterized pipelines, plan-then-execute

3.5 Integration Shell (Template)

The signals-integrations-shell repo is a clean fork of signals-fish-integration with business logic as an example. It is the official template for bootstrapping new External Action integrations — same multi-module Maven structure (backend, frontend, bundle, blackbox-tests), same dual-auth pattern, same WireMock test setup.

4. Infrastructure

Docker Image Hierarchy

alpine:3.21 └── orca-base (Roche CA certs, libc6-compat, non-root user) ├── java-21 (custom JRE via jlink, minimal modules) │ ├── java-21-jdk (full JDK) │ │ └── maven-java-21 (Maven 3.9.8) │ └── java-25 (newer JRE) │ └── maven-java-25 ├── python3_pandas (Python + pandas + FastAPI) └── glab-release (glab + vault CLI)

Environment Map

EnvironmentVault NamespaceK8s NamespaceSignals URL
DEVorcaid-stg-001lem-devroche-dia-dev2.signalsresearch.revvitycloud.eu
TESTorcait-prd-002lem-testroche-dia-tst-signalsnotebook.srpste3.revvitycloud.eu
gTRAINorcaig-prdlem-gtrain2
PRODorcaip-prd-001lem-prodroche-dia-signalsnotebook.srpe3.revvitycloud.eu

CI/CD (GitLab CI)

Centralised pipeline templates in base-pipelines repo:

  • java-k8s-base.yml — full Java service pipeline (build → package → verify → deploy)
  • release-base.yml — release management (Docker retag, git tag, GitLab release)
  • K8s cluster: rancher.emea.roche.com (Rancher-managed CaaS, EMEA)
  • Vault auth: GitLab JWT tokens with aud: https://code.roche.com
  • Deploy image: benchling/devops/k8s-deploy-image
  • Container registry: registry.code.roche.com

Vault Bootstrap

The vault-bootstrap repo automates Vault role creation for new GitLab projects — creates JWT auth roles bound by project_id with gitlab-jwt policy. Run manually via CI pipelines.

5. Architecture Decisions (from ORCA’s ADRs)

ORCA maintains formal ADRs in Y-statement format in their architecture repo. Key decisions relevant to xRED ELN:

ADRDecisionRelevance
0013CaaS (on-prem K8s) despite decommissioning plans — alternatives (Minerva, RAP) not validated yetxRED chose Minerva; ORCA may eventually migrate
0015JWE Virtual Token for secure token relayAlternative to xRED’s ElastiCache session store
0017Rejected Revvity’s pre-built ETL Docker container (security, code opacity)Confirms build-your-own approach for Signals data extraction
0021Monolith-first for External Actions (extend signals-fish-integration)xRED uses separate apps per integration (monorepo, but independent deployments)
0024Extend monolith for MaxSMR and IRCI ScoreReinforces monolith approach
0026Code-first over configuration for new LoV integrationsMatches xRED’s FastAPI adapter approach
0030Two-step validate + enrich with graceful fallback for TaPIRGood resilience pattern for xRED lookups

6. Patterns Worth Adopting

Already in use by xRED (validated by ORCA’s parallel adoption)

  • FastAPI for data source adapters — ORCA’s signals-external-sources confirms this is the right direction
  • Vault for secrets — same approach, different namespaces
  • MuleSoft as external gateway — mandatory per LEM governance

Worth considering for xRED

PatternORCA ImplementationxRED Applicability
SQLite as transform engineLoad API data → SQLite → SQL query → Signals JSONUseful for complex data transformations in Lookups without a database dependency
JWE token brokerEncrypt OAuth tokens with RSA, pass as JWEAlternative to ElastiCache for stateless token passing in External Actions
SCIM user cacheIn-memory cache of all Signals users, hourly refreshUseful if xRED needs user resolution for RBAC in lookups
Atomic table-swapWorker table → swap with main table for consistent readsGood pattern for background data sync jobs
Admin test IDsConfigurable IDs return mock responses for Signals Admin validationHelpful for testing integrations without real upstream access
Two-step validate + enrichValidate entity first, then enrich; return partial data on enrichment failureResilient pattern for unreliable upstream APIs
GitLab parameterised pipelines for admin opsPlan-review-execute with protected environmentsxRED could use GitHub Actions equivalent for admin Signals operations
Dynamic router discoverydiscover_and_include_routers in FastAPIAlready natural in FastAPI — confirms the pattern

Differences to be aware of

ConcernORCA ApproachxRED ApproachNotes
Session managementEncrypted cookies (no external store)ElastiCache (distributed store)ORCA trades scalability for simplicity; xRED’s approach supports HA
External Action architectureMonolith (one app, many integrations)Separate apps (monorepo, independent deploys)ORCA’s monolith adds coupling risk but reduces auth boilerplate
Internal gatewayCustom NGINX (routes by ?list= param)Gravitee (managed by Minerva)Gravitee provides more features (caching, analytics, rate limiting)
Base imagesCustom alpine → JRE/JDK chainStandard python:3.12-slimORCA’s chain includes Roche CA certs; xRED handles certs in Dockerfile

7. Key Repositories Reference

Core Services

RepoIDPurposeStack
architecture328070C4 diagrams, ADRs, SADs, API contractsPlantUML, Markdown
signals-fish-integration418009Main External Actions monolith (FISH, IRCI, MAX/SMR)Java/Spring Boot + Angular
signals-external-sources466430Unified data source adapter (TaPIR, SMDI, ChemLot)Python/FastAPI
signals-gateway433445JWE token broker for OAuth token relayJava/Spring Boot
signals-external-list-adapter364528Shared External List runtime + http-extractor libraryJava/Spring Boot
signals-user-lookup457004User resolution via SCIM cacheJava/Spring Boot

Lookup Services

RepoIDData Source
hitslims-lookup408257HITSLIMS project codes
arvados-lookup420510Arvados collection UUIDs
smdi-lookup427883SMDI G# lot data
rcdb-lookup456275Roche Catalyst Database
irci-lookup455447Integrated Roche Chemical Inventory
irci-chemical-data-source-poc449428IRCI chemical data (PoC)

Infrastructure and Templates

RepoIDPurpose
orca-base343496Alpine base image with Roche CA certs
java-21343162Custom JRE via jlink
base-pipelines528604GitLab CI templates
vault-bootstrap433541Automated Vault role setup
signals-integrations-shell440863Template for new External Action integrations
external-list-template353290Template for new External List adapters
offerings437666Integration design guidelines and questionnaires

Background / ETL

RepoIDPurpose
signals-legal-metadata337275Auto-populate compliance fields on new experiments
signals-reference-builder501600Build cross-system reference data from Signals
signals-reference-presenter501604Serve reference data via DataRiver
leap-bulk-initial-upload438601Bulk create materials/lots from Google Sheets
adoption-metrics-dashboard414346Usage analytics ETL → Tableau

8. Collaboration Notes

The offerings repo contains a branch feature/initial-updates-for-collaboration-with-xRED-vertical, indicating active collaboration planning between ORCA and xRED teams. The offerings repo also documents the Integration Design Questionnaire (user view, system view, data view) which could be useful for standardising how xRED documents new integration requirements.

Contact for the ORCA/LEM platform team integration requests can be routed through the DataRiver group or the LEM team directly.

Last updated on