Blackbox extraction of secrets from deep learning models

Fascinating paper: “The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets”, Nicholas Carlini, Chang Liu, Jernej Kos, Úlfar Erlingsson, Dawn Song at

Turns out that your algorithm memorizes your secrets in the training data. -Even if the algorithm is a lot smaller than the actual secrets… – My jaw fell do the ground right here :

“The fact that models completely memorize secrets in the training data is completely unexpected: our language model is only 600KB when compressed , and the PTB dataset is 1.7MB when compressed. Assuming that the PTB dataset can not be compressed significantly more than this, it is therefore information-theoretically impossible for the model to have memorized all training data—it simply does not have enough capacity with only 600KB of weights. Despite this, when we repeat our experiment and train this language model multiple times, the inserted secret is the most likely 80% of the time (and in the remaining times the secret is always within the top10 most likely). At present we are unable to fully explain the reason this occurs. We  conjecture that the model learns a lossy compression of the training data on which it is forced to learn and generalize. But since secrets are random, incompressible parts of the training data, no such force prevents the model from simply memorizing their exact details.”