Skip to Content
ResearchMinerva Secure Blueprint

Research: Minerva Secure Application Blueprint (2026)

FieldValue
TypeResearch
StatusActive
AuthorCS CoE DCC Solutions / Catalyx Team
Created2026-04-07
SourceInternal architecture document (January 2026)
RelatedMinerva Platform Architecture
Related SADSAD-001
RelatedJanus

1. Executive Summary

This document captures the Secure Application Blueprint — a reference architecture for deploying cloud-native applications on Minerva, Roche’s internal Developer Platform (IDP) built on AWS EKS. It defines a “Zero Trust”, “Zero Human” operational model with defense-in-depth security, GitOps-driven deployment, and managed infrastructure services.

Minerva is the platform xRED ELN uses for all its integration services. Understanding this blueprint helps contextualise why the xRED architecture is shaped the way it is.

2. Strategic Objectives

  • Absolute Data Integrity — “Zero Data Loss” via managed RDS/S3 and automated cross-region resilience
  • Reduced Attack Surface — eliminate SSH, static passwords; use IAM Identity Center and SSM Break-Glass
  • Operational Excellence — GitOps (ArgoCD) reduces infrastructure toil by ~40%
  • Developer Velocity — cloud-based dev environments mirroring production

3. The Layered Defense Model

The architecture follows defense-in-depth with five distinct security zones:

Key principle: The Data Layer subnets have no route to an Internet Gateway or NAT Gateway — they are physically unreachable from the outside world.

Request Flow

  1. User request hits Global Accelerator Static IP
  2. Minerva Ingress validates against Gravitee security policies
  3. ALB routes to a FastAPI Pod (Compute Layer)
  4. FastAPI makes a gRPC call to the Gatekeeper Pod (Access Layer)
  5. Gatekeeper authenticates via IAM to RDS (Data Layer)

4. The gRPC Gatekeeper Pattern

The most significant architectural pattern in the blueprint. It decouples application logic from data storage — the FastAPI web tier never possesses database credentials.

ActionFastAPI Backend (Web Tier)gRPC Gatekeeper (Data Tier)
User LoginValidates JWT; extracts user_idVerifies user_id exists in DB
Update ProfileValidates input format (Pydantic)Executes the specific UPDATE SQL
Delete DataChecks if user is “Admin”Performs DELETE with audit log
ConnectivityNo DB drivers; only gRPC clientHas DB drivers and IAM DB permissions

Security benefits:

  • Even with root access to a FastAPI pod, an attacker finds no SQL primitives
  • Cannot run DROP TABLE or SELECT * FROM users
  • Confined to calling pre-defined business functions (e.g. GetProfile())
  • Database credentials are 15-minute IAM tokens, not static passwords

Enforcement:

  • Istio PeerAuthentication (STRICT mTLS) — verifies sender is part of the mesh
  • Istio AuthorizationPolicy (Default-Deny) — only the FastAPI ServiceAccount can call the Gatekeeper
  • Protobuf serialisation — binary, faster than JSON-over-HTTP

5. Core Data Strategy

Primary: PostgreSQL (Aurora)

EnvironmentConfiguration
ProductionProvisioned Aurora instances, Multi-AZ, high availability
DevelopmentAurora Serverless v2, minimises costs during idle

Why PostgreSQL over MongoDB:

  • ACID compliance for strict data integrity
  • Native JOINs and indexing for relational data
  • JSONB for schema-flexible metadata (best of both worlds)
  • Alembic migrations as version-controlled Python scripts
  • Automated self-healing storage (6-way replication across 3 AZs)

When to use DocumentDB (MongoDB):

  • High-volume logging/telemetric data
  • Content management with deeply nested catalogs
  • Truly unpredictable structure where write-speed > relational queries

Vector Data: pgvector

For AI/LLM features (semantic search, RAG), use pgvector within the existing Aurora cluster rather than a standalone vector DB. Eliminates operational overhead, enables hybrid relational + vector queries in single SQL statements.

Caching: ElastiCache (Redis)

  • Sub-millisecond performance for session management
  • JWT blacklisting for instant access revocation
  • Sliding-window rate limiting
  • Query caching to shield the database
  • IAM Authentication + encryption-in-transit (no static passwords)

6. Networking and Security

Network Isolation (VPC)

Subnet TypeContentsInternet Access
Public SubnetsGravitee, ALBInbound allowed
Private App SubnetsEKS nodes (FastAPI, Gatekeeper)Outbound only (NAT)
Isolated Data SubnetsRDS, RedisNone (no IGW, no NAT)

Service Mesh (Istio Ambient Mode)

  • Namespace isolation — Minerva blocks cross-namespace traffic by default
  • Explicit AuthorizationPolicies — whitelist-based, not just “same namespace”
  • mTLS everywhere — all pod-to-pod communication encrypted and authenticated
  • Traffic shifting — Blue/Green and Canary deployments via Istio

Gravitee Integration

Gravitee acts as the API management layer:

  • Advanced rate limiting
  • Janus OIDC/OAuth2 integration for authentication
  • FAIR data compliance
  • API discovery and catalog

Janus Integration

  • Zero Local Credentials — authentication offloaded entirely to enterprise IdP
  • Unified Authorization — CIDM groups via Janus; access auto-revoked on departure
  • AWS IAM Bridge — Roche AD groups map directly to AWS IAM Roles

7. Deployment Pipeline

CI Phase (GitHub Actions)

  • OIDC Authentication — short-lived STS tokens scoped to repo and branch
  • Roche DevHub Security Scans — dependency scanning, container scanning
  • Fail-fast — pipeline fails on Critical vulnerabilities
  • Images tagged with commit SHA — immutable traceability

CD Phase (ArgoCD GitOps)

Sync Waves ensure ordered rollout:

  1. Database migrations (PreSync Job via Alembic)
  2. Internal gRPC Gatekeeper update
  3. Public-facing FastAPI update

Blue/Green deployments via Argo Rollouts:

  • Green version launched alongside Blue
  • Automated health checks before traffic switch
  • 30-minute bake time for instant rollback

Governance

  • Production branch requires mandatory approval + security scan pass
  • Every deployment is a Git commit — full audit trail
  • No human ever needs kubectl or AWS credentials to deploy

8. Environment Strategy

FeatureDevelopmentProduction
ComputeARM64 Graviton, Spot InstancesARM64 Graviton, On-Demand
AvailabilitySingle Pod / Single AZMulti-Replica / Multi-AZ
DatabaseAurora Serverless v2Aurora Multi-AZ Cluster
ScalingKEDA event-driven, scale-to-zeroKEDA metric-driven, target tracking
WAFPermissive (debugging)Strict (managed rules)
Data ProtectionStandard backupsImmutable Vault Lock
ObservabilityStandard loggingGrafana Cloud (Loki/APM)
Human AccessRead-only console via JanusDisabled — dashboards only

9. Observability Stack

SignalSourceGoal
LatencyDatadog / IstiogRPC overhead < 20ms
TrafficGravitee / ALBRequests per second monitoring
ErrorsLoki / DatadogAlert on 5XX or gRPC InternalError
SaturationEKS MetricsScale pods before CPU hits 80%

Distributed tracing: Datadog traces requests end-to-end through API Gateway → FastAPI → gRPC Gatekeeper → RDS.

Log aggregation: Grafana Alloy (DaemonSet) → Grafana Cloud Loki. Same labels as K8s metrics for instant correlation.

Dual audit trail:

  • AWS CloudTrail — infrastructure changes (“who and when”)
  • Minerva Loki — application events (“what data was accessed”)

10. Data Resilience

Failure ScenarioRecovery MechanismRPO (Data Loss)RTO (Downtime)
AZ OutageMulti-AZ Failover0 seconds< 60 seconds
Data CorruptionPITR Restore (In-Region)1 second30–60 minutes
Regional DisasterCross-Region Restore< 24 hours2–4 hours

11. Encryption Matrix

StateTechnologyImplementation
At RestAWS KMS (CMK)AES-256 on RDS, DocumentDB, S3
In FlightIstio mTLSMandatory mutual TLS for all pod-to-pod traffic
SecretsHashiCorp VaultExternal Secrets Operator (ESO) syncing to K8s Secrets

12. Minerva Service Catalog

ComponentProviderPurpose
OrchestrationMinerva EKSMulti-AZ Kubernetes
IngressGravitee.ioAPI management, rate limiting, Janus integration
Service MeshIstiomTLS between services
DatabaseRDS PostgreSQL (Aurora)Primary transactional store
NoSQLDocumentDBOptional unstructured storage
CacheElastiCache RedisSession management, caching
StreamingConfluent KafkaEvent-driven consistency (Saga Pattern)
SecretsHashiCorp VaultDynamic secrets via External Secrets Operator
DeploymentArgoCDGitOps sync and Sync-Wave ordering
MonitoringDatadogAPM, tracing, anomaly detection
LoggingGrafana Cloud (Loki)Log aggregation and retention
DNS/TLSRoute 53 / ACMAuto-rotating certificates

13. Frontend Architecture

Standard: S3 + CloudFront (SSG)

  • Static site compiled to immutable assets (HTML, CSS, JS)
  • S3 bucket (private) + CloudFront with Origin Access Control
  • Unified domain routing: /* → S3 (UI), /api/* → EKS ALB (backend)
  • Zero patching, cost based on traffic only
  • Cache invalidation via CI pipeline

Discouraged: Dynamic frontend in EKS — SSR introduces unnecessary complexity for most use cases.

14. Saga Pattern and Event-Driven Consistency

For complex workflows spanning multiple services:

  • Orchestrator: gRPC Gatekeeper manages saga lifecycle via Kafka topics
  • Transient failures: Exponential backoff + retries (Istio + Kafka consumer)
  • Business failures: Compensating transactions via TransactionFailed events
  • Idempotency: Correlation-ID checked against Redis before every write
  • Visibility: Confluent Stream Lineage + Datadog Tracing

15. Relevance to xRED ELN

What xRED ELN inherits from Minerva

CapabilityHow xRED Uses It
EKSAll integration services (Lookups, Apps, Jobs) run as K8s deployments
RDS PostgreSQLPersistent storage for integration state
ElastiCache RedisSession store for dual OAuth tokens (ADR-003)
VaultAll secrets (API keys, DB credentials, client certs)
ArgoCDGitOps deployment from infrastructure repo
GraviteeInternal API gateway (ADR-004)
NGINX IngressK8s ingress routing and TLS termination
DatadogMonitoring and observability

Where xRED ELN diverges from the blueprint

Blueprint PatternxRED ELN ApproachReason
gRPC GatekeeperDirect DB access from FastAPISimpler adapter services don’t warrant the Gatekeeper overhead
S3 + CloudFront frontendReact frontend in K8s podExternal Actions require server-side OAuth handling
Istio STRICT mTLSDefault Minerva mesh policiesCurrent service count doesn’t justify custom AuthorizationPolicies
Saga/Kafka orchestrationSynchronous request-responseIntegration adapters are stateless transformers, not saga participants

Patterns worth adopting

  • Alembic migrations as ArgoCD PreSync Jobs — safer than manual migration runs
  • KEDA event-driven scaling — scale on gRPC queue depth rather than CPU
  • Immutable image tags (commit SHA) — already using branch tags, SHA would add traceability
  • Grafana Cloud for long-term log retention — better than cluster-local Loki

See Minerva Platform Architecture for the platform’s own ADRs covering Gravitee selection, Janus multi-tenancy, observability stack, disaster recovery, and known operational challenges.

Last updated on