IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction

Existing object navigation benchmarks usually provide an embodied agent with an explicit target object category, such as a microwave or a chair. In contrast, human-facing embodied AI systems are often given indirect instructions, for example, “I need something to warm this food” or “the room feels stuffy.” In these cases, the agent must infer which object satisfies the underlying intent, locate a scene-grounded instance of that object, and determine whether the navigation goal has been achieved. We introduce IntentionNav, a diagnostic benchmark for intent-driven object navigation from implicit human instructions. IntentionNav contains 500 intents across 176 Isaac Sim scenes and 64 target object categories, with each intent rewritten into four controlled instruction styles and annotated with one of four intent modes. This structure enables detailed analysis of target inference, language robustness, neighborhood reachability, and terminal success, rather than relying only on aggregate navigation metrics.

IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction

Abstract