Research: Minerva Secure Application Blueprint (2026)

Field	Value
Type	Research
Status	Active
Author	CS CoE DCC Solutions / Catalyx Team
Created	2026-04-07
Source	Internal architecture document (January 2026)
Related	Minerva Platform Architecture
Related SAD	SAD-001
Related	Janus

1. Executive Summary

This document captures the Secure Application Blueprint — a reference architecture for deploying cloud-native applications on Minerva, Roche’s internal Developer Platform (IDP) built on AWS EKS. It defines a “Zero Trust”, “Zero Human” operational model with defense-in-depth security, GitOps-driven deployment, and managed infrastructure services.

Minerva is the platform xRED ELN uses for all its integration services. Understanding this blueprint helps contextualise why the xRED architecture is shaped the way it is.

2. Strategic Objectives

Absolute Data Integrity — “Zero Data Loss” via managed RDS/S3 and automated cross-region resilience
Reduced Attack Surface — eliminate SSH, static passwords; use IAM Identity Center and SSM Break-Glass
Operational Excellence — GitOps (ArgoCD) reduces infrastructure toil by ~40%
Developer Velocity — cloud-based dev environments mirroring production

3. The Layered Defense Model

The architecture follows defense-in-depth with five distinct security zones:

Key principle: The Data Layer subnets have no route to an Internet Gateway or NAT Gateway — they are physically unreachable from the outside world.

Request Flow

User request hits Global Accelerator Static IP
Minerva Ingress validates against Gravitee security policies
ALB routes to a FastAPI Pod (Compute Layer)
FastAPI makes a gRPC call to the Gatekeeper Pod (Access Layer)
Gatekeeper authenticates via IAM to RDS (Data Layer)

4. The gRPC Gatekeeper Pattern

The most significant architectural pattern in the blueprint. It decouples application logic from data storage — the FastAPI web tier never possesses database credentials.

Action	FastAPI Backend (Web Tier)	gRPC Gatekeeper (Data Tier)
User Login	Validates JWT; extracts user_id	Verifies user_id exists in DB
Update Profile	Validates input format (Pydantic)	Executes the specific UPDATE SQL
Delete Data	Checks if user is “Admin”	Performs DELETE with audit log
Connectivity	No DB drivers; only gRPC client	Has DB drivers and IAM DB permissions

Security benefits:

Even with root access to a FastAPI pod, an attacker finds no SQL primitives
Cannot run DROP TABLE or SELECT * FROM users
Confined to calling pre-defined business functions (e.g. GetProfile())
Database credentials are 15-minute IAM tokens, not static passwords

Enforcement:

Istio PeerAuthentication (STRICT mTLS) — verifies sender is part of the mesh
Istio AuthorizationPolicy (Default-Deny) — only the FastAPI ServiceAccount can call the Gatekeeper
Protobuf serialisation — binary, faster than JSON-over-HTTP

5. Core Data Strategy

Primary: PostgreSQL (Aurora)

Environment	Configuration
Production	Provisioned Aurora instances, Multi-AZ, high availability
Development	Aurora Serverless v2, minimises costs during idle

Why PostgreSQL over MongoDB:

ACID compliance for strict data integrity
Native JOINs and indexing for relational data
JSONB for schema-flexible metadata (best of both worlds)
Alembic migrations as version-controlled Python scripts
Automated self-healing storage (6-way replication across 3 AZs)

When to use DocumentDB (MongoDB):

High-volume logging/telemetric data
Content management with deeply nested catalogs
Truly unpredictable structure where write-speed > relational queries

Vector Data: pgvector

For AI/LLM features (semantic search, RAG), use pgvector within the existing Aurora cluster rather than a standalone vector DB. Eliminates operational overhead, enables hybrid relational + vector queries in single SQL statements.

Caching: ElastiCache (Redis)

Sub-millisecond performance for session management
JWT blacklisting for instant access revocation
Sliding-window rate limiting
Query caching to shield the database
IAM Authentication + encryption-in-transit (no static passwords)

6. Networking and Security

Network Isolation (VPC)

Subnet Type	Contents	Internet Access
Public Subnets	Gravitee, ALB	Inbound allowed
Private App Subnets	EKS nodes (FastAPI, Gatekeeper)	Outbound only (NAT)
Isolated Data Subnets	RDS, Redis	None (no IGW, no NAT)

Service Mesh (Istio Ambient Mode)

Namespace isolation — Minerva blocks cross-namespace traffic by default
Explicit AuthorizationPolicies — whitelist-based, not just “same namespace”
mTLS everywhere — all pod-to-pod communication encrypted and authenticated
Traffic shifting — Blue/Green and Canary deployments via Istio

Gravitee Integration

Gravitee acts as the API management layer:

Advanced rate limiting
Janus OIDC/OAuth2 integration for authentication
FAIR data compliance
API discovery and catalog

Janus Integration

Zero Local Credentials — authentication offloaded entirely to enterprise IdP
Unified Authorization — CIDM groups via Janus; access auto-revoked on departure
AWS IAM Bridge — Roche AD groups map directly to AWS IAM Roles

7. Deployment Pipeline

CI Phase (GitHub Actions)

OIDC Authentication — short-lived STS tokens scoped to repo and branch
Roche DevHub Security Scans — dependency scanning, container scanning
Fail-fast — pipeline fails on Critical vulnerabilities
Images tagged with commit SHA — immutable traceability

CD Phase (ArgoCD GitOps)

Sync Waves ensure ordered rollout:

Database migrations (PreSync Job via Alembic)
Internal gRPC Gatekeeper update
Public-facing FastAPI update

Blue/Green deployments via Argo Rollouts:

Green version launched alongside Blue
Automated health checks before traffic switch
30-minute bake time for instant rollback

Governance

Production branch requires mandatory approval + security scan pass
Every deployment is a Git commit — full audit trail
No human ever needs kubectl or AWS credentials to deploy

8. Environment Strategy

Feature	Development	Production
Compute	ARM64 Graviton, Spot Instances	ARM64 Graviton, On-Demand
Availability	Single Pod / Single AZ	Multi-Replica / Multi-AZ
Database	Aurora Serverless v2	Aurora Multi-AZ Cluster
Scaling	KEDA event-driven, scale-to-zero	KEDA metric-driven, target tracking
WAF	Permissive (debugging)	Strict (managed rules)
Data Protection	Standard backups	Immutable Vault Lock
Observability	Standard logging	Grafana Cloud (Loki/APM)
Human Access	Read-only console via Janus	Disabled — dashboards only

9. Observability Stack

Signal	Source	Goal
Latency	Datadog / Istio	gRPC overhead < 20ms
Traffic	Gravitee / ALB	Requests per second monitoring
Errors	Loki / Datadog	Alert on 5XX or gRPC InternalError
Saturation	EKS Metrics	Scale pods before CPU hits 80%

Distributed tracing: Datadog traces requests end-to-end through API Gateway → FastAPI → gRPC Gatekeeper → RDS.

Log aggregation: Grafana Alloy (DaemonSet) → Grafana Cloud Loki. Same labels as K8s metrics for instant correlation.

Dual audit trail:

AWS CloudTrail — infrastructure changes (“who and when”)
Minerva Loki — application events (“what data was accessed”)

10. Data Resilience

Failure Scenario	Recovery Mechanism	RPO (Data Loss)	RTO (Downtime)
AZ Outage	Multi-AZ Failover	0 seconds	< 60 seconds
Data Corruption	PITR Restore (In-Region)	1 second	30–60 minutes
Regional Disaster	Cross-Region Restore	< 24 hours	2–4 hours

11. Encryption Matrix

State	Technology	Implementation
At Rest	AWS KMS (CMK)	AES-256 on RDS, DocumentDB, S3
In Flight	Istio mTLS	Mandatory mutual TLS for all pod-to-pod traffic
Secrets	HashiCorp Vault	External Secrets Operator (ESO) syncing to K8s Secrets

12. Minerva Service Catalog

Component	Provider	Purpose
Orchestration	Minerva EKS	Multi-AZ Kubernetes
Ingress	Gravitee.io	API management, rate limiting, Janus integration
Service Mesh	Istio	mTLS between services
Database	RDS PostgreSQL (Aurora)	Primary transactional store
NoSQL	DocumentDB	Optional unstructured storage
Cache	ElastiCache Redis	Session management, caching
Streaming	Confluent Kafka	Event-driven consistency (Saga Pattern)
Secrets	HashiCorp Vault	Dynamic secrets via External Secrets Operator
Deployment	ArgoCD	GitOps sync and Sync-Wave ordering
Monitoring	Datadog	APM, tracing, anomaly detection
Logging	Grafana Cloud (Loki)	Log aggregation and retention
DNS/TLS	Route 53 / ACM	Auto-rotating certificates

13. Frontend Architecture

Standard: S3 + CloudFront (SSG)

Static site compiled to immutable assets (HTML, CSS, JS)
S3 bucket (private) + CloudFront with Origin Access Control
Unified domain routing: /* → S3 (UI), /api/* → EKS ALB (backend)
Zero patching, cost based on traffic only
Cache invalidation via CI pipeline

Discouraged: Dynamic frontend in EKS — SSR introduces unnecessary complexity for most use cases.

14. Saga Pattern and Event-Driven Consistency

For complex workflows spanning multiple services:

Orchestrator: gRPC Gatekeeper manages saga lifecycle via Kafka topics
Transient failures: Exponential backoff + retries (Istio + Kafka consumer)
Business failures: Compensating transactions via TransactionFailed events
Idempotency: Correlation-ID checked against Redis before every write
Visibility: Confluent Stream Lineage + Datadog Tracing

15. Relevance to xRED ELN

What xRED ELN inherits from Minerva

Capability	How xRED Uses It
EKS	All integration services (Lookups, Apps, Jobs) run as K8s deployments
RDS PostgreSQL	Persistent storage for integration state
ElastiCache Redis	Session store for dual OAuth tokens (ADR-003)
Vault	All secrets (API keys, DB credentials, client certs)
ArgoCD	GitOps deployment from infrastructure repo
Gravitee	Internal API gateway (ADR-004)
NGINX Ingress	K8s ingress routing and TLS termination
Datadog	Monitoring and observability

Where xRED ELN diverges from the blueprint

Blueprint Pattern	xRED ELN Approach	Reason
gRPC Gatekeeper	Direct DB access from FastAPI	Simpler adapter services don’t warrant the Gatekeeper overhead
S3 + CloudFront frontend	React frontend in K8s pod	External Actions require server-side OAuth handling
Istio STRICT mTLS	Default Minerva mesh policies	Current service count doesn’t justify custom AuthorizationPolicies
Saga/Kafka orchestration	Synchronous request-response	Integration adapters are stateless transformers, not saga participants

Patterns worth adopting

Alembic migrations as ArgoCD PreSync Jobs — safer than manual migration runs
KEDA event-driven scaling — scale on gRPC queue depth rather than CPU
Immutable image tags (commit SHA) — already using branch tags, SHA would add traceability
Grafana Cloud for long-term log retention — better than cluster-local Loki

See Minerva Platform Architecture for the platform’s own ADRs covering Gravitee selection, Janus multi-tenancy, observability stack, disaster recovery, and known operational challenges.