Transformer Translatable Tokens
This technique involves using specific tokens that are compatible with transformer models, allowing users to craft inputs that the model can process in unique ways. By leveraging the way transformers tokenize and interpret language, attackers can create prompts that exploit the model's architecture, leading to unexpected or undesired outputs. This method capitalizes on the intricacies of how language models handle tokenization and instruction parsing.
Strategy: Code and Encode
This strategy encompasses techniques that utilize various coding methods, such as Base64 or ROT13, to bypass model restrictions and manipulate outputs.Category: Language
This category focuses on the use of specific linguistic techniques, such as prompt injection or stylization, to influence the model's output.