Mistral Moderation
API to detect harmful text and PII in chat inputs and outputs
About
Mistral Moderation is a classifier API that detects harmful content categories like hate, violence, and PII in raw text or conversations. It also offers a safe prompt system prompt to steer model behavior and reduce unsafe generation.