Prompt-Attack Detection

Secure60 includes detection rules that identify prompt injection attacks against AI systems in your environment. These rules operate within the existing Threat Detection pillar — they use the same rules engine, the same alert pipeline, and the same console.


What It Detects

Prompt Injection

Attempts to override an AI system’s instructions by injecting malicious content into user inputs, uploaded documents, or API calls. Detection covers both direct injection (user submits a crafted prompt) and indirect injection (malicious instructions embedded in data the AI processes).

Jailbreak Attempts

Patterns designed to bypass AI safety constraints — role-playing techniques, encoding tricks, instruction override attempts, and known jailbreak templates.

Prompt Leakage

Attempts to extract the system prompt or internal instructions from an AI deployment. Detects common extraction techniques and monitors for responses that contain system-level content.


How It Works

Prompt-attack detection rules evaluate log data from your AI systems. For detection to work, your AI applications need to send their interaction logs to Secure60. This can be done via:

Once logs are flowing, Secure60’s managed detection rules evaluate each interaction against known attack patterns and behavioural indicators.


Configuration

Prompt-attack detection rules are managed by Secure60 and included in the managed rule library. To enable them:

  1. Ensure your AI interaction logs are being ingested into Secure60.
  2. Work with Secure60 to enable the AI-specific detection rules for your environment.
  3. Review initial detections and tune thresholds based on your traffic patterns.

Tuning options include adjusting sensitivity thresholds, whitelisting known-safe patterns specific to your applications, and configuring response actions for confirmed detections.


Getting Started

Contact Secure60 to discuss how your AI systems are deployed and what logging is available. The team will help you set up the right integrations and enable the appropriate detection rules.

Back to top