Skip to main content
Mistral Moderation logo

Mistral Moderation

API to detect harmful text and PII in chat inputs and outputs

About

Mistral Moderation is a classifier API that detects harmful content categories like hate, violence, and PII in raw text or conversations. It also offers a safe prompt system prompt to steer model behavior and reduce unsafe generation.