Sign In

Communications of the ACM

ACM TechNews

Using Language to Give Robots a Better Grasp of Open-Ended World

View as: Print Mobile App Share:
The systems 3D feature fields could be helpful in environments that contain thousands of objects, such as warehouses.

Feature Fields for Robotic Manipulation (F3RM) enables robots to interpret open-ended text prompts using natural language, helping the machines manipulate unfamiliar objects.

Credit: Ge Yang et al.

The Feature Fields for Robotic Manipulation (F3RM) method designed by Massachusetts Institute of Technology researchers helps robots identify and grasp nearby objects by forming three-dimensional (3D) scenes from two-dimensional (2D) images and vision foundation models.

F3RM can be applied to real-world settings with thousands of objects by interpreting open-ended text prompts from humans using natural language.

A camera mounted on a selfie stick shoots 50 2D images in different poses to build a neural radiance field, with the resulting collage rendering a 360-degree "digital twin" of the environment.

F3RM uses the Contrastive Language-Image Pre-training (CLIP) vision foundation model to enhance geometry with semantic data, reassembling the 2D CLIP features for the camera-shot images into a 3D representation.

Following a few demonstrations, the robot, when prompted, grasps previously unencountered objects by applying its geometric and semantic knowledge, choosing the highest-scoring option.

From MIT News
View Full Article


Abstracts Copyright © 2023 SmithBucklin, Washington, D.C., USA


No entries found