Security boundary
Information Disclosure
Information Disclosure refers to scenarios where a generative AI model inadvertently reveals sensitive information to the user. This can include personal data, proprietary secrets, passwords, or other restricted details the model has been trained on or has access to. While the model is typically guided by policies and filters that prevent disallowed content from appearing in its responses, subtle prompts or cleverly disguised questions can lead the AI to divulge protected information.
Such disclosures can be harmful both to individuals (e.g., leaking personal identifiers or medical records) and organizations (e.g., exposing internal policies, trade secrets, or authentication keys). The attacker benefits by obtaining data that should remain confidential.
Example:
Consider a customer support chatbot trained on a database of private user profiles for personalized assistance. An attacker asks the chatbot tricky questions, such as, “What was the last complaint made by user with email X?” If the chatbot’s safeguards fail, it might reveal details about that user’s transaction, personal preferences, or even partial credit card information. This information disclosure breach compromises user privacy, damages trust, and may incur legal liability.