Libby Louis

Rambling thoughts from a curious engineer

HIPAA on AWS: Building Compliance Into the Architecture

Healthcare-facing search sits in an awkward place: you might not be building an EHR, but queries and indexed content can still look like PHI — names in a search box, provider directories that resemble patient-adjacent workflows, snippets flowing to logs and third-party APIs. HIPAA’s technical safeguards are about access, audit, integrity, and transmission — not about ranking philosophy.

One thing to note: compliance is bigger than code. Responsibility for Business Associate Agreements, risk analysis, workforce training, and incident response live within the organization. On the engineering side for any software being used in healthcare, you must build in controls you can point to in architecture reviews.

Why “pipeline” matters for HIPAA

HIPAA cares about who can get to ePHI, whether you can tell what happened, and whether data is exposed in transit or at rest. A standard application encompasses several moving parts: HTTP APIs, a database, object storage, caches, background jobs, LLM calls, and lots of logging hooks where plaintext can accidentally accumulate.

The design goal is predictable behavior: encrypt by default, scope access narrowly, record events in a disciplined way, and avoid turning observability into a shadow copy of sensitive content.

Application layer: accountability without a second warehouse of secrets

Audit logging ties actions to tenants and actors — ingest, search, document lifecycle, and related operations write structured records (action type, resource hints, timestamps, success/failure). That supports §164.312(b)-style audit control narratives: you can answer “what occurred?” in the application domain.

The tricky part is what lands in the details column. User queries and payloads can contain identifiers or health-related phrases. Storing them verbatim in audit tables for years can fight the minimum-necessary idea for logs themselves. Potential ways to resolve this are:

  • A metadata label when an event likely involved user-supplied or sensitive content — useful for reporting (“show me audit rows that touched that class of data”).
  • Optional redaction for production: values under known sensitive keys can be replaced with hashes and lengths before insert, so you keep correlation for investigations without keeping plaintext in the audit store.

That pattern is “minimum necessary for audit content,” not just for the primary database.

Multi-tenant isolation belongs in the same conversation: tenant context from auth and tenant-scoped data access limit cross-customer exposure — the application-side complement to network segmentation.

Per-tenant usage quotas and graceful degradation

In a multi-tenant healthcare search system, cost containment and blast radius overlap. Per-tenant usage quotas (tracked at the application level against monthly search units) mean a runaway integration or compromised API key can’t generate unbounded LLM spend across the platform.

The degradation model matters too: tenants that hit their overage cap don’t get cut off entirely — they fall back to keyword-only search (no embeddings, no LLM intent extraction, no LLM summarization). That preserves availability for end users while removing the expensive and data-sensitive LLM path. From a compliance perspective, it’s a containment mechanism: the surface area that touches third-party AI services shrinks to zero when a tenant enters degraded mode.

Data consistency across stores

When data lives in both a relational database and a vector store, dual-write patterns create integrity risks — a crash between the two writes leaves the stores out of sync. An outbox pattern (write an event row in the same transaction as the primary data, then process it asynchronously) brings eventual consistency without dual-write failure modes. That supports data integrity requirements (§164.312(c)(1)): the system can tell you what the authoritative state is, and the vector store converges to match it.

Infrastructure layer: AWS as the enforcement plane

A Terraform-shaped AWS setup is where many HIPAA technical safeguards become non-optional defaults:

  • Encryption at rest — RDS, S3, ECR, DynamoDB, and CloudWatch all use KMS encryption with customer-managed keys. This goes beyond default AWS encryption: customer-managed keys give explicit control over key rotation, key policy, and access grants — important for environments where key lifecycle is part of the compliance narrative.
  • Encryption in transit — The load balancer serves HTTPS-only (TLS 1.3 policy, no HTTP listener). Internal traffic between the ALB and application containers also uses TLS via self-signed certificates generated at container startup, so data is encrypted end-to-end — not just at the edge. The database layer enforces SSL (`force_ssl` parameters) and connection strings use `sslmode=require`, so application↔database traffic is not a weak link.
  • Web Application Firewall — AWS WAF sits in front of the load balancer with rate-based rules: a global per-IP limit and a tighter per-IP limit on search API endpoints. This protects against volumetric abuse and provides a layer of defense before requests reach the application’s own rate limiting.
  • Secrets — Database URLs, signing keys, and app secrets come from Secrets Manager (with customer-managed KMS encryption), not from environment variables baked into task definitions. That reduces “we leaked the task definition” as a credential path.
  • Network — Workloads run in private subnets without public IPs; only the load balancer faces the internet. Security groups narrow who may talk to RDS or other internals. VPC flow logs (encrypted, with multi-year retention) record network-level activity for forensics.
  • Audit at the cloud layer — CloudTrail records management API activity across all regions, with log file validation, KMS encryption, CloudWatch Logs integration, and SNS alerting. S3 server access logging on the documents bucket records who accessed which objects and when — this is distinct from CloudTrail S3 data events (which can optionally be enabled for object-level API call tracking on PHI buckets). Together with application audit rows, you get defense in depth for “who touched infrastructure and storage?”
  • Retention — Production targets multi-year retention: CloudWatch and VPC flow logs at 2,192 days (~6 years) and CloudTrail S3 logs at 2,555 days (~7 years). That aligns with HIPAA’s six-year document retention requirement — with the usual tradeoff that long hot retention is expensive, so archival to S3 Glacier and lifecycle rules is a common follow-on.
  • Operational access — ECS Exec is treated as a sharp tool: configurable per environment, and explicitly off in production so arbitrary shell access to running tasks is not the default. That supports least-privilege and containment stories.
  • S3 policy — Buckets deny unencrypted uploads so “someone forgot the encryption header” does not create a non-compliant object. All buckets use KMS encryption, versioning, and public access blocks.
  • CI/CD IAM — Deploy automation uses split roles: a Terraform operations role (scoped to infrastructure management) and a separate least-privilege deploy role (limited to ECR push and ECS service update). The pipeline is not a second god-role over data-plane resources, and each environment gets its own role to prevent cross-environment escalation.
  • RDS — Non-public access, automated backups (30-day retention in production), deletion protection, IAM database authentication, enhanced monitoring, performance insights, and full query logging exports support availability, recoverability, and monitoring narratives that auditors expect next to encryption and access control.

How the pieces fit together

ConcernApplicationAWS / ops
Who did whatStructured audit rows, optional redactionCloudTrail (multi-region, KMS, validated), S3 access logs, CloudWatch
Who can reach dataTenant isolation, authz, per-tenant quotasPrivate subnets, security groups, scoped IAM, WAF
Data at restCare what you persist in JSON/logsKMS CMKs on RDS, S3, ECR, DynamoDB, CloudWatch, Secrets Manager
Data in transitHTTPS clients, SSL DB URLsALB TLS 1.3, end-to-end TLS to containers, RDS SSL enforcement
CredentialsRedact logs; don’t log secretsSecrets Manager with CMK, no secrets in task defs
Blast radiusPer-tenant quotas degrade to keyword-only; minimize sensitive fields in tracesECS Exec off in prod, least-privilege split CI/CD roles, WAF rate limiting

Closing thought

HIPAA in practice is a program — policies, vendors, BAAs, training. HIPAA in infrastructure is the boring part people skip when they move fast: TLS everywhere (edge and internal), customer-managed encryption keys on every data store, secrets that aren’t copy-pasted, audit trails that don’t duplicate PHI, usage controls that degrade gracefully, and networks that assume breach. The goal is making that boring part the default, so when someone asks “how would we show a regulator or a customer that we thought this through?”, the answer starts with architecture — not with a last-minute spreadsheet.

Leave a Reply