ContextLeak: Auditing Leakage in Private In-Context Learning
ContextLeak is an auditing framework for empirically measuring information leakage in private in-context learning methods.
The project studies whether sensitive information contained in in-context examples can be leaked through model outputs, even when privacy-preserving mechanisms or heuristic defenses are applied. We use canary insertion and targeted adversarial queries to evaluate leakage across different mechanisms and model settings.
Keywords: Large language models, privacy auditing, in-context learning, differential privacy, trustworthy AI
Links: arXiv
