Skip to Content
ResearchMinerva Platform Architecture

Research: Minerva / RAP Platform Architecture

FieldValue
TypeResearch
StatusActive
AuthorxRED Dev Team
Created2026-04-07
Sourcecode.roche.com/iix-science-and-research/architecture  — 21 ADRs
RelatedSecure Application Blueprint, Janus

1. What is the Platform?

The RISE AWS Platform (also called RAP, Minerva, and most recently Nebula) is Roche’s managed EKS-based developer platform for research informatics. These names refer to overlapping but distinct scopes within the same ecosystem:

  • RAP (Research Architecture Platform) — the broader SnR platform initiative
  • Minerva — specifically the pRED EKS platform
  • Nebula — the latest branding (referenced in ADR 0020)

The platform documentation is served via MkDocs at platform.apps.science.roche.com and the architecture repo contains 21 ADRs covering technology choices, patterns, and operational decisions.

2. Environment Strategy

5 static Kubernetes clusters:

ClusterPurposeDeveloper Access
DEVDevelopmentStandard
SBXSandboxFull kubectl freedom
TSTTestingStandard
UATUser acceptance testingRestricted
PRDProductionRead-only dashboards only

Ephemeral environments (triggered by GitLab MR labels) are being introduced for isolated E2E testing — namespace-based isolation within existing clusters, integrated with ArgoCD, Vault, Helm, and Datadog (ADR 0019).

3. Platform Service Catalog

API Gateway: Gravitee.io (ADR 0001)

Gravitee scored 93.44 in evaluation against AWS API Gateway (91.88) and MuleSoft (85.69).

Why Gravitee won:

  • Deployed on the EKS cluster itself — no latency from cross-account hops
  • No cross-account configuration needed
  • Native alerting (Slack/email)
  • Supports Keycloak authentication
  • Both RAP and Minerva (pRED) use Gravitee — standardisation advantage

Capabilities: Developer portal, OpenAPI console, API lifecycle management (versioning, deprecation), subscription management, rate limiting, CORS, multi-endpoint failover, logging, metrics, analytics.

Limitation: All configuration is click-ops — no infrastructure-as-code support.

Authentication: Janus (ADR 0018)

Two architectural approaches documented for multi-tenant Janus integration:

Istio-based:

NLB → Istio Ingress Gateway → Envoy Proxy → OAuth2-Proxy (per-tenant) → App

Nginx-based:

NLB → Nginx Ingress Controller → OAuth2-Proxy (per-tenant) → App
  • Each tenant gets a separate Cognito user pool
  • OAuth2-Proxy handles OIDC authentication, JWT injection, cookie management
  • Per-tenant OAuth2-Proxy instances provide isolation and independent scaling
  • DNS pattern: tenantA.apps.science.roche.com
  • Redirects unauthenticated users to Amazon Cognito Hosted UI

Auth Standard (Governance ADR 0001)

AuthService (Keycloak) with OAuth 2.0 is the governance-level standard for auth across SnR, with gCustoms for roles management (custom RBAC app integrated with Keycloak). Cognito noted as a potential future alternative. PingFederate and SailPoint were avoided due to complex GIS onboarding.

Observability: Datadog (ADRs 0003, 0011, 0012)

  • Datadog for monitoring, APM, tracing — universally praised by platform users
  • OpenTelemetry Operator on EKS with W3C Trace Context propagation standard
  • Java services auto-instrumented via annotation: instrumentation.opentelemetry.io/inject-java: observability/otel-instrumentation
  • Dashboards cover: ArgoCD sync/health, K8s pod metrics, app logs, error traces, frontend Core Web Vitals, backend latency/traffic
  • Datadog Synthetic Tests for no-code E2E testing of web apps

Runtime Security: Sysdig (ADR 0012)

  • Sysdig selected as CNAPP (Cloud Native Application Protection Platform)
  • Covers: runtime vulnerability scanning, infrastructure visibility, CIEM (least privilege), compliance, forensics
  • Deployed on DEV, UAT, PRD only (not TST or RPLATFORM)
  • Currently in POC mode with limited licenses
  • Limitation: security data not correlated with observability data

Code Quality: SonarQube (ADRs 0004, 0006)

  • SonarQube at sonarqube.roche.com for static code analysis
  • Integrated into the EKS Base Pipeline (platform CI/CD)
  • Quality gate currently advisory only — does not fail the pipeline (gradual rollout)

Deployment: ArgoCD Rollouts (ADR 0016)

  • ArgoCD Rollouts for Canary and Blue-Green deployment strategies
  • Integrated with existing ArgoCD GitOps setup
  • All deployments managed via Helm charts following GitOps practices

Feature Flags: GitLab Feature Flags (ADR 0017)

  • GitLab Feature Flags scored 90.19% in evaluation
  • Supports: percentage rollouts, user targeting, environment-specific toggling, RBAC, SSO, audit trails
  • No direct cost for RAP platform
  • Limitation: no OpenFeature standard support yet

Off-Hours Scaling (ADR 0020)

  • KEDA (Cron Scaler) for workload downscaling during off-hours
  • Karpenter for automatic node management and consolidation
  • Opt-in per application
  • Covers 6am–6pm Mon–Fri across US/Europe/India timezones
  • Goal: lower monthly AWS bills as tenant count grows

Developer Portal: Backstage (ADRs 0007, 0010, 0014)

  • Backstage (Spotify) as the RAP Developer Portal and Software Catalog
  • Services registered via catalog-info.yaml following the Backstage System Model
  • Multi-tenancy via namespace-driven UI switching
  • Entities: Components (microservices), Systems (logical apps), APIs, Groups (teams)
  • Single Backstage instance recommended for all of Roche (per Spotify engineer feedback)

Infrastructure-as-Code (ADR 0015)

Platform transitioned from AWS CDK to Terraform Cloud, driven by successful Terraform adoption across Minerva, MLOps, and RAP teams.

4. Networking

Ingress

Two options: Istio Ingress Gateway or Nginx Ingress Controller, both front-ended by AWS Network Load Balancer (NLB) for transparent pass-through.

DNS pattern: <app-name>.<env>.apps.science.roche.com

Security Layers

LayerTechnologyScope
EdgeCloudflare WAFDDoS, bot protection
NetworkRCP Inspection VPCTraffic flow controls
GovernanceService Control PoliciesAccount-level guardrails
RuntimeSysdigContainer forensics, CIEM
CodeSonarQubeStatic analysis

5. Disaster Recovery (ADRs 0005, 0008, 0009)

Strategy: Multi-Region + Pilot Light

TargetValue
RPO< 4 hours
RTO< 2 hours

Traffic management: Route 53 Application Recovery Controller with routing controls (ON/OFF switches) and safety rules. 5 redundant regional endpoints. Chosen over AWS Global Accelerator (no cost, better Nginx integration).

Data resilience:

  • EFS: AWS EFS Replication for continuous near-real-time cross-region sync
  • EBS: Snapshot-based replication across regions
  • Velero (open-source, K8s-native) for incremental backups of persistent volumes and K8s objects (Deployments, Services, ConfigMaps)

6. Known Platform Challenges (User Feedback, Nov 2023)

IssueDetail
Lack of self-serviceOnboarding a new app “takes days instead of an hour” — dominant complaint
Environment instabilityNon-PRD environments cause frustration
Support modelIssues spread across Slack, Jira, ServiceNow, gChat with no aggregation
ArgoCD/GitOps frictionPerceived as hindrance for quick MVCs/PoCs
AWS login complexityMultiple accounts across environments
Communication overloadAnnouncements scattered across channels
DatadogMost praised feature — universally valued
PDB statusPod Disruption Budget status unclear to teams

Key quote: “It is the RAP platform that should know something wrong is happening, but instead the users are informing DevOps about downtimes.”

7. Relevance to xRED ELN

xRED ELN runs on Minerva and inherits its service catalog. Key platform capabilities used by xRED:

Platform ServicexRED Usage
GraviteeInternal API gateway for all integration traffic
ArgoCDGitOps deployment from infrastructure repo
VaultAll secrets via External Secrets Operator
DatadogMonitoring and observability
NGINX IngressK8s routing and TLS termination
RDS PostgreSQLPersistent storage
ElastiCache RedisSession store

Not currently used by xRED but available: Backstage, SonarQube, GitLab Feature Flags, Sysdig, KEDA off-hours scaling, Confluent Kafka, Argo Rollouts (Blue/Green/Canary).

The Gravitee click-ops limitation is notable — xRED’s API definitions must be configured manually in the Gravitee console, not via infrastructure-as-code.

Last updated on