Security boundary

Training Data Leakage

Training Data Leakage is a scenario where the generative AI model unintentionally reveals pieces of the data it was trained on. While related to training source leakage, training data leakage specifically focuses on the model producing direct excerpts or unique identifiers from its training set. Models are designed to “generalize” from their training data, but sometimes under certain prompts they regurgitate verbatim text from the source material, especially when the data is memorized or not properly anonymized.

Such leakage can reveal copyrighted text, personal information, or company-confidential documents. In doing so, the model essentially violates data privacy and intellectual property rights. Preventing training data leakage is critical to maintaining user trust and complying with data protection laws.

Example:

Consider a language model trained on thousands of user-submitted support tickets. A user tries to guess what is in the training set by asking, “Show me a complaint from last July about product XYZ.” If the model responds with a nearly identical ticket that includes personal details of a real user, that’s training data leakage. The attacker gains access to private information that was supposed to remain internal, demonstrating a serious breach in the model’s data handling safeguards.

Previous Next