Research Overview
Robustness of Theory of Mind
in Large Language Models
Evaluating whether LLMs genuinely reason about mental states or rely on memorized patterns and shallow heuristics. Testing across 750 items, 3 models, and 3 prompting conditions.
Three Hypotheses
H1 Memorization
Do models memorize specific ToM patterns from training data, or can they generalize to novel surface forms?
Key Metric GR = NSF / Baseline
H2 Shallow Heuristics
Do models rely on trigger words and surface-level cues rather than genuinely tracking causal belief changes?
Key Metrics TDS APER
H3 Approximative Reasoning
Do models approximate mental state reasoning or genuinely track beliefs across domain shifts?
Key Metric DTS = DT / Baseline