IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction

Lin Qian1, Shijie Li2, Sihao Lin3, Xuan Zhang4, Bangya Liu4, Yanran Li5, Hujun Yin1
1University of Manchester, 2A*STAR, 3University of Adelaide, 42077, 5University of Bedfordshire

Abstract

Existing object navigation benchmarks usually provide an embodied agent with an explicit target object category, such as a microwave or a chair. In contrast, human-facing embodied AI systems are often given indirect instructions, for example, “I need something to warm this food” or “the room feels stuffy.” In these cases, the agent must infer which object satisfies the underlying intent, locate a scene-grounded instance of that object, and determine whether the navigation goal has been achieved. We introduce IntentionNav, a diagnostic benchmark for intent-driven object navigation from implicit human instructions. IntentionNav contains 500 intents across 176 Isaac Sim scenes and 64 target object categories, with each intent rewritten into four controlled instruction styles and annotated with one of four intent modes. This structure enables detailed analysis of target inference, language robustness, neighborhood reachability, and terminal success, rather than relying only on aggregate navigation metrics.