Ignore Previous Instructions

This technique is a form of prompt injection that allows users to override the model's prior directives or constraints. By explicitly instructing the model to disregard any previous commands or context, users can manipulate the model's behavior to produce desired outputs that may not align with its original programming. This technique often requires precise wording, such as stating "Ignore previous instructions" followed by new commands. It is similar to SQL injection in that it exploits the model's inability to differentiate between trusted and untrusted inputs. This method can be particularly effective in scenarios where the model has been restricted from discussing certain topics or generating specific types of content, enabling users to bypass these limitations and elicit responses that would typically be filtered out.

Strategy: Prompt Injection

This technique enables attackers to override original instructions and employed controls by crafting specific wording of instructions, often resembling SQL injection methods, to manipulate the model's behavior.

Category: Language

This category focuses on the use of specific linguistic techniques, such as prompt injection or stylization, to influence the model's output.