Libby Louis

Rambling thoughts from a curious engineer

Bias mitigation in semantic search

Retrieval-augmented search sounds clean: retrieve, then generate from what you retrieved. In practice, both steps are sensitive to what was in the query string. If a user’s wording encodes race, gender, age, religion, or similar dimensions—and the system blindly embeds that string, filters on it, or asks an LLM to paraphrase it—you get two problems at once: outcomes can track attributes you never meant to optimize for, and data you may not want in vectors or model prompts can flow into places that are hard to unwind.

In one of my current projects, which is based on a RAG archictecture, I have been through round after round of iterations and testing to ensure the codebase threads safety through the architecture—not a claim of perfect fairness, but an honest map of where decisions happen and what they try to prevent.


The design principle: separate “what the user typed” from “what retrieval is allowed to use”

The pipeline does not treat the raw query as a single blob that must flow unchanged into embedding and ranking. Instead it understands, labels, and moderates spans of text so that:

  • Default: language that reads as demographic is removed from the retrieval query (the string that drives semantic search), so embeddings are not silently steered by those terms.
  • Exceptions are explicit: if the same token is really a name (“Dr. Black”) or a literal value in the tenant catalog on a non-protected field, the system can keep or re-route it instead of bluntly deleting it.

That is the core mitigation: retrieval runs on a cleaned query, with structured side-channels only when policy says the data supports them.


Ingestion: keep protected fields out of the searchable index body

There is a global, immutable list of protected attribute names (race, gender, ethnicity, sexual orientation, religion, disability, age). Those keys are not tenant-overridable.

Before content becomes embedding fodder, the stack validates that protected keys do not appear in the projected searchable view used for the index. Ingestion can reject documents that would leak those fields into the projection. The idea is simple: if it should not influence vectors, it must not be in the material you embed.

That is an architectural guardrail: you are not relying on the model to “ignore” sensitive columns; you never put them in the index projection in the first place.


Understanding the query: intents, spans, and “protected” regions

Early layers parse the query into terms and phrases with character spans—including which fragments are labeled demographic versus entities, domain phrases, or entity candidates.

Entity resolution runs before span moderation and produces a protected span index: regions tied to resolved entities or name-like structure so that later steps do not accidentally strip or rewrite something that is doing legitimate work (e.g. disambiguating a person).

So moderation is not just “delete bad words”; it is span-aware and coordinated with who/what the query is about.


Span moderation: the retrieval-facing safety layer

Span moderation builds a clean_query by deleting spans that policy says should not drive retrieval.

Typical case: a term classified as demographic is neutralized—removed from the string that goes into semantic search—so the retriever is not nudged by that language.

Two deliberate exceptions:

  1. Name-like context — e.g. a title before a surname, or capitalized token patterns that suggest a person name rather than a demographic category. Those tokens can stay in the retrieval path as entity-like signal so you do not break legitimate lookups.
  2. Data grounding — the system checks whether the term actually appears in the tenant’s indexed non-protected fields (with existence checks and sampling over payloads). If the only “hits” would be protected dimensions, the term is still not promoted to a filter. If it does match real catalog fields that are not on the protected list, the term can be turned into a structured pre-retrieval filter instead of free text in the embedding string—so you ground the request in declared data, not in open-ended semantic similarity to demographic wording.

The output of this layer is not just a string: you get suppressed terms, allowed terms, policy decisions for traceability, and optional pre-retrieval filters when grounding succeeded.


Tenant capabilities: what the data admits (computed at ingest)

Separately, the system infers per-tenant “data capabilities” from field names and values during ingestion—things like whether gender- or race-shaped fields appear in the corpus, with heuristics to avoid false positives (e.g. navy or charcoal in a product color field is treated as catalog vocabulary, not as evidence that the tenant is encoding person-level demographic categories in that column).

That capability object informs policy downstream so decisions are tied to what the tenant’s data actually contains, rather than every tenant getting a one-size-fits-all rule applied in a vacuum.


Query understanding and constraints

Structured constraint extraction (from NLP/query understanding) can mark constraints with types like demographic or safety flags. The pipeline is written so demographic constraints are still visible to moderation even when the field name is not in the schema—so they do not bypass the safety path by accident. Only constraints that align with the tenant field catalog become search filters in the usual sense.

That keeps “we extracted it” and “we applied it to retrieval” as two different gates.


After retrieval: graph expansion and when to hold back

Relationship-aware expansion can walk the knowledge graph from high-scoring hits to pull in related entities. For some intent + moderation combinations—for example when a demographic signal was detected and the query was handled in a concept-style way—the pipeline skips that expansion so you do not amplify weak semantic matches into a broad graph walk on top of a sensitive query shape.

The point is to avoid a second stage of the stack compensating for a query you already decided to treat carefully.


RAG summarization: what the LLM sees

When a summary is generated over retrieved records, payloads sent to the LLM are scrubbed: protected attribute keys (both global and document-level) are stripped before the model sees them.

The summary prompt is also conditional: when demographic intent was detected and terms were suppressed for retrieval, instructions tell the model to summarize as if that wording were never part of the query and not to group or characterize results along demographic dimensions.

So safety is not only pre-retrieval; the generation step is explicitly aligned with the moderation outcome.


What this is and is not

This is: a layered approach—index projection, intent and span analysis, deterministic moderation, data grounding, conditional graph expansion, and prompt-level guards—so that sensitive language does not silently become embedding signal, unvetted filters, or LLM instructions by default.

This is not: proof that every edge case is solved, or that no tenant policy could ever allow stricter/weaker behavior where settings exist. It is an architecture that treats bias and safety as pipeline concerns, with traceable decisions (suppressed spans, policy reasons, grounded filters) rather than a single hidden knob.

Take Away

The retrieval string and the model context are both attack surfaces for unwanted correlation. This stack treats them as first-class pipeline stages you design, test, and observe—not as accidents of whatever the user typed.

Leave a Reply