Building a Real-time Generative UI Application
I recently built a real-time, AI-assisted web app that tailors responses and generates the UI based on what users are actually doing in the UI. It streams model output as it’s generated, and it folds in lightweight client telemetry—clicks, scroll velocity, component context—to keep the AI grounded in the moment.
The architecture – simplicity vs control
At a high level, the system is split into a Next.js frontend and an ASP.NET Core backend. The browser talks to the API over REST for requests that are easy to acknowledge quickly (like “start an AI response” or “record this event”), and then upgrades to real-time over SignalR for the parts that actually matter to the user experience: token streams, structured reasoning JSON, and tool payloads. That division lets the HTTP request finish fast while the interesting work happens over a durable, session-scoped channel.
On the frontend, I keep things fairly lean. A session service bootstraps a UUID and sends a “session started” telemetry event. From there, reusable hooks capture clicks, scroll behavior, and component mounts. Those events aren’t precious individually, so they don’t block the UI—just queue and send them to the backend. When a user asks the AI something, a SignalR connection is spun up, joins a group named after the session, and starts listening for three kinds of messages: token chunks for the chat stream, a bit of structured JSON that describes the model’s reasoning, and tool payloads that the server can generate on the fly (like a carousel of recommended content).
The backend is an ASP.NET Core host with a focused orchestration layer. The configuration is strict: non-secret toggles live in app settings; any secret material (LLM keys, Event Hub connections, storage credentials) comes from a secret store at runtime. Prompt variants ride in a central configuration service so I can swap them without redeploying. This helps me iterate on model behavior safely and reversibly.
The AI pipeline itself runs in a few small, composable stages. First, the model sends a JSON-only “planning” result—a rating and the component you think we should render next. That tiny step lets me branch: if the planned component is something I know how to assemble perfectly on the server (say, a carousel), I just build it deterministically and push it to the client as a tool payload. In parallel, I start the main chat response as a token stream so the user sees progress immediately. Finally, I run a short audit step that emits structured logs of what happened and why. That separation—plan, stream, tool, audit—adds some complexity, but it buys me predictable outputs, easier debugging, and better headroom for evaluation.
Context comes from telemetry. All client events are sent into a streaming pipeline and, on each new AI request, I read just the relevant time window since the last ask. From that slice, lightweight metrics are extracted: average scroll velocity, reversals (did they keep bouncing?), clicks-per-minute, and what component the user is currently engaging with. Time since the last question is also tracked. That summary nudges the prompts so the model can adapt tone, brevity, or focus. It’s a gentle feedback loop rather than a heavy-handed personalization system.
On the data side are a mix of services chosen for their operational characteristics. A streaming bus is great for telemetry because it decouples bursts in the browser from downstream consumers and makes it easy to add more processors later without changing producers. A blob/data lake is perfect for AI artifacts and session summaries—cheap, append-friendly, and easy to mine offline. Traditional relational storage stays available for reference and business data. The pattern is simple: events for motion, blobs for artifacts, tables for facts.Real-time delivery is where the experience comes together. SignalR gives a straightforward way to push multiple message types to the right browser tab, with the notion of a session group to keep everything scoped. If the application ever needs to scale horizontally, we can add a backplane or a managed SignalR service so any instance can publish to any client group. Until then, the basic setup keeps operational overhead low.
None of these choices are free. Splitting the AI flow into four stages means more moving parts and more places something could go wrong; the payoff is fine-grained control and cleaner observability. Running the message and tool steps in parallel cuts perceived latency, but it forces me to think hard about idempotency and ordering—what happens if a planned tool arrives before the user’s stream finishes, or if the plan changes mid-flight? Using a streaming bus for telemetry is arguably overkill for a small app, but it future-proofs analytics. And centralizing prompts in a config service is wonderful for iteration, but it needs discipline around versioning and rollout to avoid “surprise” behavior shifts.
Over time, a managed backplane for real-time at scale could be introduced, managed identities would reduce the key footprint, and an asynchronous analytics job that materializes telemetry summaries to a relational store for dashboards could be added. Additionally, prompt versioning with labeled environments to allow A/B changes with confidence would also be a wise addition.
The through line in all of this is latency and control. We want the user to see something meaningful as fast as possible, and we want to be able to reason about the system when things get weird. That’s why we opted for small, explicit stages, a simple real-time channel, and infrastructure that separates hot paths (streams and prompts) from cold ones (artifacts and analysis). It’s not the only way to build a system like this—but it’s a pragmatic balance that’s been easy to operate and easy to evolve.