Semantic Scene Understanding

We work on semantic scene understanding and representations for robotic applications.

In today’s world, robots need a semantically rich and compact 3D representation of their environment. This allows for efficient perception and planning, tailored to specific tasks with varying levels of detail, such as navigating to a particular room or locating a specific object. Achieving comprehensive semantic scene understanding requires reasoning about the functional properties and interrelationships of entities within the environment. For a robot to operate fully autonomously, it must be able to function in unfamiliar environments and comprehend scenes even in domains with limited data. This makes it difficult to pre-determine the necessary semantic categories for classification and stress the importance of open vocabulary solutions.

Traditional methods for robotic navigation often rely on high-precision metric mapping techniques, such as simultaneous localization and mapping (SLAM). These methods support intricate navigation and manipulation based on geometric reconstructions. However, recent advances in the field have allowed the merging of dense geometric maps with vision-language features, enabling semantic, open-vocabulary indexing of environments. At the Socially Intelligent Robotics Lab, we are interested in building 3D Scene Graphs, an expressive hierarchical data structure of 3D environments that efficiently represents large-scale scenes while enriching objects with semantically meaningful properties. This is fundamental for modern hierarchical path planning, RL-based object search, multi-robot communication, long-term motion prediction, goal navigation, manipulation, and many other downstream tasks in robotics.

To the top of the page