Research Overview

Robustness of Theory of Mind
in Large Language Models

Evaluating whether LLMs genuinely reason about mental states or rely on memorized patterns and shallow heuristics. Testing across 750 items, 3 models, and 3 prompting conditions.

Three Hypotheses

H1 Memorization

Do models memorize specific ToM patterns from training data, or can they generalize to novel surface forms?

Key Metric GR = NSF / Baseline
H2 Shallow Heuristics

Do models rely on trigger words and surface-level cues rather than genuinely tracking causal belief changes?

Key Metrics TDS APER
H3 Approximative Reasoning

Do models approximate mental state reasoning or genuinely track beliefs across domain shifts?

Key Metric DTS = DT / Baseline