A robot able to clean up a house without explicit instructions would be useful in everyday life. A recent paper on arXiv.org proposes Housekeep task to benchmark the ability of embodied AI agents to use physical commonsense reasoning and infer rearrangement goals that mimic human-preferred placements of objects in indoor environments.
Researchers collected a dataset of human preferences for object placements in tidy and untidy homes. They propose a modular baseline and demonstrate that embodied commonsense extracted from large language models (LLMs) is an effective planner for the proposed task. It is shown that the method generalizes to rearranging unseen objects without access to explicit instructions.
Further research is needed to develop the exploration module so that the agent can visit areas that get cluttered more frequently and the reasoning module to increase the precision at identifying misplaced objects.
We introduce Housekeep, a benchmark to evaluate commonsense reasoning in the home for embodied AI. In Housekeep, an embodied agent must tidy a house by rearranging misplaced objects without explicit instructions specifying which objects need to be rearranged. Instead, the agent must learn from and is evaluated against human preferences of which objects belong where in a tidy house. Specifically, we collect a dataset of where humans typically place objects in tidy and untidy houses constituting 1799 objects, 268 object categories, 585 placements, and 105 rooms. Next, we propose a modular baseline approach for Housekeep that integrates planning, exploration, and navigation. It leverages a fine-tuned large language model (LLM) trained on an internet text corpus for effective planning. We show that our baseline agent generalizes to rearranging unseen objects in unknown environments. See our webpage for more details: this https URL
Research article: Kant, Y., “Housekeep: Tidying Virtual Households using Commonsense Reasoning”, 2022. Link: https://arxiv.org/abs/2205.10712