Security boundary
Training Source Leakage
Training Source Leakage involves revealing the origin, content, or other sensitive details of the proprietary datasets used to train a generative AI model. Organizations often use large, curated corpora, which can include copyrighted works, private documents, or proprietary research. These datasets are generally not meant to be exposed, as they contain valuable intellectual property and potentially sensitive information.
Attackers target training source leakage to discover unique texts, internal company documents, or other private data that inform the model’s capabilities. They exploit subtle model behaviors, prompting it with carefully chosen queries to produce fragments of training texts or to identify the presence of certain data. Such attacks can lead to legal implications (e.g., violating copyright), reputational damage, and competitive disadvantage.
Example:
Imagine a company trains its generative model on an exclusive, unpublished manuscript from a famous author, intended to inspire marketing copy. An attacker realizes the model occasionally reproduces suspiciously literary passages when asked about obscure topics. The attacker carefully tailors their prompts—perhaps asking for “a unique excerpt about a character in a hidden story”—until the model unwittingly reveals entire paragraphs from the manuscript. Now, the attacker possesses valuable intellectual property that was never meant to be public, thus achieving training source leakage and compromising the company’s trade secrets.