Shuying Cao

From General Capability to Situated Reliability.

Benchmarks test what models can answer; my work studies how foundation models behave in real contexts - how they infer intent, protect private information, and make reliable choices among valid options.

USC M.S. Computer Science, University of Southern California
WHU B.Eng. Geodesy and Geomatics Engineering, Wuhan University
Shuying Cao

Visiting Student Researcher, Stanford University Stanford, CA | shuyingc@stanford.edu

About

I am a M.S. student in Computer Science at the University of Southern California, advised by Prof. Sai Praneeth Karimireddy, and a Visiting Student Researcher at Stanford University, advised by Prof. Michael Zeineh.

My research studies foundation models beyond benchmark correctness, focusing on how they infer user intent, expose or protect private context, and choose among multiple valid responses in real-world settings. I develop empirical auditing and evaluation methods for trustworthy and human-centered AI, with applications in privacy-preserving in-context learning, LLM generation behavior, persona systems, and medical AI agents.

Previously, I received my B.Eng. in Geodesy and Geomatics Engineering from Wuhan University, advised by Prof. Zhenzhong Chen. I was also a visiting student at UC Berkeley, where I worked with Prof. Joseph E. Gonzalez and his former Ph.D. student Tianjun Zhang on visual-capable chatbot systems based on instruction-tuned language models.

Trustworthy LLMs

Privacy auditing, leakage evaluation, and reliability under deployment constraints.

LLM Behavior & Diversity

Understanding how models choose among multiple valid outputs, and why generation collapses.

Human-Centered AI

Studying intent, persona, and medical AI agents in human-facing settings.

News

Publications

* = equal contribution.

ContextLeak overview figure

ContextLeak: Auditing Leakage in Private In-Context Learning Methods

Jacob Choi*, Shuying Cao*, Xingjian Dong*, Amin Banayeeanzade, Wang Bill Zhu, Robin Jia, and Sai Praneeth Karimireddy

arXiv preprint arXiv:2512.16059; ACL 2025 L2M2 Workshop

ContextLeak is an empirical auditing framework for measuring information leakage in private in-context learning methods using canary insertion and targeted adversarial queries.

Projects

ContextLeak

An auditing framework for empirically measuring information leakage in private in-context learning methods.

project page →

Diversity in Language Model Generation

Studying how large language models select from multiple valid outputs and why diversity collapses in generation.

project page →

SoulSoul: LLM Persona Platform

A platform for creating, editing, and sharing persistent AI personas with memory and interaction boundaries.

project page →

Experience

Stanford University

Visiting Student Researcher

June 2026 - Present

University of Southern California

M.S. Computer Science

Sept 2024 - Dec 2026

UC Berkeley

Visiting student; visual-capable chatbot systems based on instruction-tuned language models.

Jan 2023 - July 2023

Education

University of Southern California

M.S. in Computer Science

Sept 2024 - Dec 2026

Wuhan University

B.Eng. in Geodesy and Geomatics Engineering

Sept 2020 - June 2024

Skills

Research Areas

Trustworthy AI, privacy auditing, LLM behavior, human-centered AI, medical AI agents.

Methods

Empirical auditing, canary insertion, targeted adversarial queries, model behavior analysis.

Systems

Large language models, instruction-tuned systems, visual-capable chatbots, AI persona platforms.