From Edge to LLM: Designing a Serverless AI Pipeline

By - The Libzter
Posted on October 4, 2025April 4, 2026
Posted in AI

From Edge to LLM: Designing a Serverless AI Pipeline

When building AI-powered applications, infrastructure decisions can have an outsized impact on speed, cost, and maintainability. For one recent project, I designed the system to be serverless-first: a static React frontend hosted on S3 behind CloudFront, and a thin API layer on API Gateway + Lambda. This approach keeps operational overhead minimal, scales automatically, and lets me focus on the product rather than babysitting servers.

High-Level Architecture

The system is structured to keep the edge fast and the core small, with a clear separation of responsibilities:

Edge and hosting: CloudFront serves the React build from S3, with long TTLs on versioned assets, and forwards API calls to API Gateway. AWS WAF provides basic IP reputation checks and a few path-specific rate limits.
API layer: API Gateway terminates HTTPS, validates requests, and routes to Lambda functions. Routes are kept coarse-grained to reduce cold starts and deployment sprawl.
Compute: Lambda (Node.js) orchestrates text, image, and audio requests, along with permalink reads. Shared layers carry SDKs and utilities to keep function bundles lean.
Data and storage: DynamoDB stores templates, submissions, and permalink metadata, modeled with PK/SK and GSIs to support “by user” and “recent” lookups. Generated audio and optional images are stored in S3 with object-level encryption and presigned URLs.
Secrets and configuration: Secrets Manager stores API keys for third-party services (OpenAI, ElevenLabs). Lambdas fetch secrets at cold start and cache them briefly in memory.
Delivery: GitHub Actions builds the React app, syncs assets to S3, invalidates CloudFront, and deploys backend functions via AWS CLI/SAM.
Observability: CloudWatch collects structured logs and metrics; alarms monitor API 5xx errors, Lambda throttles, DynamoDB performance, and CloudFront anomalies.

Request Flow

The happy path is simple by design.

The browser hits CloudFront: static assets are served immediately; API calls are forwarded to API Gateway.
API Gateway handles authentication and validation, then invokes the appropriate Lambda.
The Lambda reads/writes DynamoDB, calls external AI services, and writes generated audio/images to S3 as needed. Responses may include permalinks or signed URLs.
Safe GETs, like permalink fetches, are cached at the edge with short TTLs and ETags.

LLM Orchestration: Structured Nonsense

A core feature of the application is generating mad-lib style templates using an LLM. This introduces a subtle but important challenge:

The LLM produces templates with placeholders for nouns, verbs, adjectives, etc.
It also provides example completions—not to be correct, but to indicate the type and tense of the word expected.
When users submit their completed mad-libs, the LLM lightly corrects minor issues (spelling, tense) without changing the words or making the content logically coherent, preserving the nonsensical intent of a mad lib.

The balance here is critical: too little correction leaves outputs messy, too much correction turns them into coherent sentences, breaking the experience. Careful prompt design ensures the LLM acts as a controlled transformer, not a content improver.

Key Design Choices and Tradeoffs

Serverless over Containers

Why: Per-request billing fits spiky, IO-heavy workloads; minimal ops overhead.
Tradeoff: Cold starts and execution time limits. Mitigated with coarse-grained routes and provisioned concurrency for hot paths.

DynamoDB for Metadata

Why: Single-digit millisecond reads, effortless scaling.
Tradeoff: Upfront data modeling; fewer ad-hoc queries. Modeled PK/SK and GSIs for permalinks, “by-template,” and “recent” views.

CloudFront + S3 for the SPA

Why: Cheap, global, resilient delivery.
Tradeoff: Cache discipline matters. Versioned assets and targeted invalidations handle this.

Secrets Manager

Why: Automatic rotation, auditable access.
Tradeoff: Small latency at cold start; mitigated by in-memory caching.

GitHub Actions CI/CD

Why: Simple, automated build/deploy pipeline.
Tradeoff: Limited release orchestration; Lambda stage aliases handle canary deployments.

Strengths Today

Minimal operational overhead with predictable scalability.
Low-latency reads via CloudFront and DynamoDB.
Clear separation between static delivery and dynamic API, simplifying caching and incident triage.
Secure by default: private data never leaves AWS; secrets aren’t baked into builds.

Scaling Considerations

If traffic spikes tenfold:

Scale reads at the edge: Leverage CloudFront caching for safe GETs.
Smooth writes: Keep DynamoDB on-demand; add jittered retries in Lambdas.
Control latency: Increase provisioned concurrency on hot functions; cap concurrency to protect hot partitions.
Backpressure/fallbacks: Circuit-break flaky third-party calls and serve last-known-good results where appropriate.

Closing Thought

Serverless-first architecture allows the platform to stay small, secure, and fast without an ops team. Combined with careful control over LLM behavior, it ensures the system can scale with demand while delivering a predictable, playful user experience—even when the AI is intentionally nonsensical.

Libby Louis

From Edge to LLM: Designing a Serverless AI Pipeline

High-Level Architecture

Request Flow

LLM Orchestration: Structured Nonsense

Key Design Choices and Tradeoffs

Serverless over Containers

DynamoDB for Metadata

CloudFront + S3 for the SPA

Secrets Manager

GitHub Actions CI/CD

Strengths Today

Scaling Considerations

Closing Thought

Next Article