HIPAA-compliant infrastructure on AWS
Healthcare-facing search sits in an awkward place: you might not be building an EHR, but queries and indexed content can still look like PHI — names in a search box, provider directories that resemble patient-adjacent workflows, snippets flowing to logs and third-party APIs. HIPAA’s technical safeguards are about access, audit, integrity, and transmission — not about ranking philosophy.
One thing to note: compliance is bigger than code. Responsibility for Business Associate Agreements, risk analysis, workforce training, and incident response live within the organization. On the engineering side for any software being used in healthcare, you must build in controls you can point to in architecture reviews.
Why “pipeline” matters for HIPAA
HIPAA cares about who can get to ePHI, whether you can tell what happened, and whether data is exposed in transit or at rest. A standard application encompasses several moving parts: HTTP APIs, a database, object storage, caches, background jobs, LLM calls, and lots of logging hooks where plaintext can accidentally accumulate.
The design goal is predictable behavior: encrypt by default, scope access narrowly, record events in a disciplined way, and avoid turning observability into a shadow copy of sensitive content.
Application layer: accountability without a second warehouse of secrets
Audit logging ties actions to tenants and actors — ingest, search, document lifecycle, and related operations write structured records (action type, resource hints, timestamps, success/failure). That supports §164.312(b)-style audit control narratives: you can answer “what occurred?” in the application domain.
The tricky part is what lands in the details column. User queries and payloads can contain identifiers or health-related phrases. Storing them verbatim in audit tables for years can fight the minimum-necessary idea for logs themselves. Potential ways to resolve this are:
- A metadata label when an event likely involved user-supplied or sensitive content — useful for reporting (“show me audit rows that touched that class of data”).
- Optional redaction for production: values under known sensitive keys can be replaced with hashes and lengths before insert, so you keep correlation for investigations without keeping plaintext in the audit store.
That pattern is “minimum necessary for audit content,” not just for the primary database.
Multi-tenant isolation belongs in the same conversation: tenant context from auth and tenant-scoped data access limit cross-customer exposure — the application-side complement to network segmentation.
Infrastructure layer: AWS as the enforcement plane
A Terraform-shaped AWS setup is where many HIPAA technical safeguards become non-optional defaults:
Encryption at rest — RDS, S3, ECR (and related stores) use encryption so stolen media or snapshots are not trivially readable. KMS (including optional customer-managed keys for secrets in stricter environments) keeps key lifecycle under explicit policy.
Encryption in transit — Traffic hits the load balancer with TLS; the database layer enforces SSL (force_ssl style parameters) and application connection strings use required SSL, so application↔database traffic is not a weak link.
Secrets — Database URLs, signing keys, and app secrets come from Secrets Manager, not from environment variables baked into task definitions. That reduces “we leaked the task definition” as a credential path.
Network — Workloads run in private subnets without ad hoc public IPs; only intended entry points (e.g. ALB) face the internet. Security groups narrow who may talk to RDS or other internals.
Audit at the cloud layer — CloudTrail records management API activity (with log file validation where configured); S3 server access logging on the documents bucket supports object-level access forensics. Together with app audit rows, you get defense in depth for “who touched infrastructure and storage?”
Retention — Long CloudWatch, VPC flow log, and CloudTrail retention (where prod tfvars target multi-year policy) aligns with “how long might we need to reconstruct history?” — with the usual tradeoff that long hot retention is expensive, so archival to S3 and lifecycle rules is a common follow-on.
Operational access — ECS Exec is treated as a sharp tool: configurable, and off in production so arbitrary shell access to running tasks is not the default. That supports least-privilege and containment stories.
S3 policy — Buckets can deny unencrypted uploads so “someone forgot the encryption header” does not create a non-compliant object.
CI/CD IAM — Deploy automation uses scoped roles (per environment, limited to what Terraform/deploy needs) so the pipeline is not a second god-role over data-plane resources.
RDS — Non-public access, backups, optional multi-AZ / deletion protection, and database logging exports support availability, recoverability, and monitoring narratives that auditors expect next to encryption and access control.
How the pieces fit together
| Concern | Application | AWS / ops |
|---|---|---|
| Who did what | Structured audit rows, optional redaction | CloudTrail, S3 access logs, CloudWatch |
| Who can reach data | Tenant isolation, authz | Private subnets, security groups, scoped IAM |
| Data at rest | Care what you persist in JSON/logs | RDS/S3/KMS encryption |
| Data in transit | HTTPS clients, SSL DB URLs | ALB TLS, RDS SSL enforcement |
| Credentials | Redact logs; don’t log secrets | Secrets Manager, no secrets in task defs |
| Blast radius | Minimize sensitive fields in traces | ECS Exec off in prod, least-privilege CI |
Closing thought
HIPAA in practice is a program — policies, vendors, BAAs, training. HIPAA in infrastructure is the boring part people skip when they move fast: TLS everywhere that matters, encryption at rest, secrets that aren’t copy-pasted, trails that don’t duplicate PHI, and networks that assume breach. HIPAA-compliant AWS infrastructure is aimed at making that boring part the default, so when someone asks “how would we show a regulator or a customer that we thought this through?”, the answer starts with architecture — not with a last-minute spreadsheet.