Research: Minerva Secure Application Blueprint (2026)
| Field | Value |
|---|---|
| Type | Research |
| Status | Active |
| Author | CS CoE DCC Solutions / Catalyx Team |
| Created | 2026-04-07 |
| Source | Internal architecture document (January 2026) |
| Related | Minerva Platform Architecture |
| Related SAD | SAD-001 |
| Related | Janus |
1. Executive Summary
This document captures the Secure Application Blueprint — a reference architecture for deploying cloud-native applications on Minerva, Roche’s internal Developer Platform (IDP) built on AWS EKS. It defines a “Zero Trust”, “Zero Human” operational model with defense-in-depth security, GitOps-driven deployment, and managed infrastructure services.
Minerva is the platform xRED ELN uses for all its integration services. Understanding this blueprint helps contextualise why the xRED architecture is shaped the way it is.
2. Strategic Objectives
- Absolute Data Integrity — “Zero Data Loss” via managed RDS/S3 and automated cross-region resilience
- Reduced Attack Surface — eliminate SSH, static passwords; use IAM Identity Center and SSM Break-Glass
- Operational Excellence — GitOps (ArgoCD) reduces infrastructure toil by ~40%
- Developer Velocity — cloud-based dev environments mirroring production
3. The Layered Defense Model
The architecture follows defense-in-depth with five distinct security zones:
Key principle: The Data Layer subnets have no route to an Internet Gateway or NAT Gateway — they are physically unreachable from the outside world.
Request Flow
- User request hits Global Accelerator Static IP
- Minerva Ingress validates against Gravitee security policies
- ALB routes to a FastAPI Pod (Compute Layer)
- FastAPI makes a gRPC call to the Gatekeeper Pod (Access Layer)
- Gatekeeper authenticates via IAM to RDS (Data Layer)
4. The gRPC Gatekeeper Pattern
The most significant architectural pattern in the blueprint. It decouples application logic from data storage — the FastAPI web tier never possesses database credentials.
| Action | FastAPI Backend (Web Tier) | gRPC Gatekeeper (Data Tier) |
|---|---|---|
| User Login | Validates JWT; extracts user_id | Verifies user_id exists in DB |
| Update Profile | Validates input format (Pydantic) | Executes the specific UPDATE SQL |
| Delete Data | Checks if user is “Admin” | Performs DELETE with audit log |
| Connectivity | No DB drivers; only gRPC client | Has DB drivers and IAM DB permissions |
Security benefits:
- Even with root access to a FastAPI pod, an attacker finds no SQL primitives
- Cannot run
DROP TABLEorSELECT * FROM users - Confined to calling pre-defined business functions (e.g.
GetProfile()) - Database credentials are 15-minute IAM tokens, not static passwords
Enforcement:
- Istio PeerAuthentication (STRICT mTLS) — verifies sender is part of the mesh
- Istio AuthorizationPolicy (Default-Deny) — only the FastAPI ServiceAccount can call the Gatekeeper
- Protobuf serialisation — binary, faster than JSON-over-HTTP
5. Core Data Strategy
Primary: PostgreSQL (Aurora)
| Environment | Configuration |
|---|---|
| Production | Provisioned Aurora instances, Multi-AZ, high availability |
| Development | Aurora Serverless v2, minimises costs during idle |
Why PostgreSQL over MongoDB:
- ACID compliance for strict data integrity
- Native JOINs and indexing for relational data
- JSONB for schema-flexible metadata (best of both worlds)
- Alembic migrations as version-controlled Python scripts
- Automated self-healing storage (6-way replication across 3 AZs)
When to use DocumentDB (MongoDB):
- High-volume logging/telemetric data
- Content management with deeply nested catalogs
- Truly unpredictable structure where write-speed > relational queries
Vector Data: pgvector
For AI/LLM features (semantic search, RAG), use pgvector within the existing Aurora cluster rather than a standalone vector DB. Eliminates operational overhead, enables hybrid relational + vector queries in single SQL statements.
Caching: ElastiCache (Redis)
- Sub-millisecond performance for session management
- JWT blacklisting for instant access revocation
- Sliding-window rate limiting
- Query caching to shield the database
- IAM Authentication + encryption-in-transit (no static passwords)
6. Networking and Security
Network Isolation (VPC)
| Subnet Type | Contents | Internet Access |
|---|---|---|
| Public Subnets | Gravitee, ALB | Inbound allowed |
| Private App Subnets | EKS nodes (FastAPI, Gatekeeper) | Outbound only (NAT) |
| Isolated Data Subnets | RDS, Redis | None (no IGW, no NAT) |
Service Mesh (Istio Ambient Mode)
- Namespace isolation — Minerva blocks cross-namespace traffic by default
- Explicit AuthorizationPolicies — whitelist-based, not just “same namespace”
- mTLS everywhere — all pod-to-pod communication encrypted and authenticated
- Traffic shifting — Blue/Green and Canary deployments via Istio
Gravitee Integration
Gravitee acts as the API management layer:
- Advanced rate limiting
- Janus OIDC/OAuth2 integration for authentication
- FAIR data compliance
- API discovery and catalog
Janus Integration
- Zero Local Credentials — authentication offloaded entirely to enterprise IdP
- Unified Authorization — CIDM groups via Janus; access auto-revoked on departure
- AWS IAM Bridge — Roche AD groups map directly to AWS IAM Roles
7. Deployment Pipeline
CI Phase (GitHub Actions)
- OIDC Authentication — short-lived STS tokens scoped to repo and branch
- Roche DevHub Security Scans — dependency scanning, container scanning
- Fail-fast — pipeline fails on Critical vulnerabilities
- Images tagged with commit SHA — immutable traceability
CD Phase (ArgoCD GitOps)
Sync Waves ensure ordered rollout:
- Database migrations (PreSync Job via Alembic)
- Internal gRPC Gatekeeper update
- Public-facing FastAPI update
Blue/Green deployments via Argo Rollouts:
- Green version launched alongside Blue
- Automated health checks before traffic switch
- 30-minute bake time for instant rollback
Governance
- Production branch requires mandatory approval + security scan pass
- Every deployment is a Git commit — full audit trail
- No human ever needs
kubectlor AWS credentials to deploy
8. Environment Strategy
| Feature | Development | Production |
|---|---|---|
| Compute | ARM64 Graviton, Spot Instances | ARM64 Graviton, On-Demand |
| Availability | Single Pod / Single AZ | Multi-Replica / Multi-AZ |
| Database | Aurora Serverless v2 | Aurora Multi-AZ Cluster |
| Scaling | KEDA event-driven, scale-to-zero | KEDA metric-driven, target tracking |
| WAF | Permissive (debugging) | Strict (managed rules) |
| Data Protection | Standard backups | Immutable Vault Lock |
| Observability | Standard logging | Grafana Cloud (Loki/APM) |
| Human Access | Read-only console via Janus | Disabled — dashboards only |
9. Observability Stack
| Signal | Source | Goal |
|---|---|---|
| Latency | Datadog / Istio | gRPC overhead < 20ms |
| Traffic | Gravitee / ALB | Requests per second monitoring |
| Errors | Loki / Datadog | Alert on 5XX or gRPC InternalError |
| Saturation | EKS Metrics | Scale pods before CPU hits 80% |
Distributed tracing: Datadog traces requests end-to-end through API Gateway → FastAPI → gRPC Gatekeeper → RDS.
Log aggregation: Grafana Alloy (DaemonSet) → Grafana Cloud Loki. Same labels as K8s metrics for instant correlation.
Dual audit trail:
- AWS CloudTrail — infrastructure changes (“who and when”)
- Minerva Loki — application events (“what data was accessed”)
10. Data Resilience
| Failure Scenario | Recovery Mechanism | RPO (Data Loss) | RTO (Downtime) |
|---|---|---|---|
| AZ Outage | Multi-AZ Failover | 0 seconds | < 60 seconds |
| Data Corruption | PITR Restore (In-Region) | 1 second | 30–60 minutes |
| Regional Disaster | Cross-Region Restore | < 24 hours | 2–4 hours |
11. Encryption Matrix
| State | Technology | Implementation |
|---|---|---|
| At Rest | AWS KMS (CMK) | AES-256 on RDS, DocumentDB, S3 |
| In Flight | Istio mTLS | Mandatory mutual TLS for all pod-to-pod traffic |
| Secrets | HashiCorp Vault | External Secrets Operator (ESO) syncing to K8s Secrets |
12. Minerva Service Catalog
| Component | Provider | Purpose |
|---|---|---|
| Orchestration | Minerva EKS | Multi-AZ Kubernetes |
| Ingress | Gravitee.io | API management, rate limiting, Janus integration |
| Service Mesh | Istio | mTLS between services |
| Database | RDS PostgreSQL (Aurora) | Primary transactional store |
| NoSQL | DocumentDB | Optional unstructured storage |
| Cache | ElastiCache Redis | Session management, caching |
| Streaming | Confluent Kafka | Event-driven consistency (Saga Pattern) |
| Secrets | HashiCorp Vault | Dynamic secrets via External Secrets Operator |
| Deployment | ArgoCD | GitOps sync and Sync-Wave ordering |
| Monitoring | Datadog | APM, tracing, anomaly detection |
| Logging | Grafana Cloud (Loki) | Log aggregation and retention |
| DNS/TLS | Route 53 / ACM | Auto-rotating certificates |
13. Frontend Architecture
Standard: S3 + CloudFront (SSG)
- Static site compiled to immutable assets (HTML, CSS, JS)
- S3 bucket (private) + CloudFront with Origin Access Control
- Unified domain routing:
/*→ S3 (UI),/api/*→ EKS ALB (backend) - Zero patching, cost based on traffic only
- Cache invalidation via CI pipeline
Discouraged: Dynamic frontend in EKS — SSR introduces unnecessary complexity for most use cases.
14. Saga Pattern and Event-Driven Consistency
For complex workflows spanning multiple services:
- Orchestrator: gRPC Gatekeeper manages saga lifecycle via Kafka topics
- Transient failures: Exponential backoff + retries (Istio + Kafka consumer)
- Business failures: Compensating transactions via
TransactionFailedevents - Idempotency: Correlation-ID checked against Redis before every write
- Visibility: Confluent Stream Lineage + Datadog Tracing
15. Relevance to xRED ELN
What xRED ELN inherits from Minerva
| Capability | How xRED Uses It |
|---|---|
| EKS | All integration services (Lookups, Apps, Jobs) run as K8s deployments |
| RDS PostgreSQL | Persistent storage for integration state |
| ElastiCache Redis | Session store for dual OAuth tokens (ADR-003) |
| Vault | All secrets (API keys, DB credentials, client certs) |
| ArgoCD | GitOps deployment from infrastructure repo |
| Gravitee | Internal API gateway (ADR-004) |
| NGINX Ingress | K8s ingress routing and TLS termination |
| Datadog | Monitoring and observability |
Where xRED ELN diverges from the blueprint
| Blueprint Pattern | xRED ELN Approach | Reason |
|---|---|---|
| gRPC Gatekeeper | Direct DB access from FastAPI | Simpler adapter services don’t warrant the Gatekeeper overhead |
| S3 + CloudFront frontend | React frontend in K8s pod | External Actions require server-side OAuth handling |
| Istio STRICT mTLS | Default Minerva mesh policies | Current service count doesn’t justify custom AuthorizationPolicies |
| Saga/Kafka orchestration | Synchronous request-response | Integration adapters are stateless transformers, not saga participants |
Patterns worth adopting
- Alembic migrations as ArgoCD PreSync Jobs — safer than manual migration runs
- KEDA event-driven scaling — scale on gRPC queue depth rather than CPU
- Immutable image tags (commit SHA) — already using branch tags, SHA would add traceability
- Grafana Cloud for long-term log retention — better than cluster-local Loki
See Minerva Platform Architecture for the platform’s own ADRs covering Gravitee selection, Janus multi-tenancy, observability stack, disaster recovery, and known operational challenges.