Sign In

Communications of the ACM

ACM TechNews

Helping Computer Vision, Language Models Understand What They See

View as: Print Mobile App Share:
An annoted image.

MIT researchers created a new annotated synthetic dataset of images that depict a wide range of scenarios, which can be used to help machine-learning models understand the concepts in a scene.

Credit: Khaled Shehada et al.

Massachusetts Institute of Technology researchers were part of a team that developed a technique that uses computer-generated data to help vision and language models better understand concepts.

The researchers used an annotated synthetic dataset to fine-tune popular vision and language models, increasing their accuracy in concept understanding by up to 10%.

They produced close to 800,000 photorealistic images using computer-generated synthetic videos of diverse three-dimensional environments and objects, with human avatars added to interact with them.

A detailed caption was added to each image, covering object attributes, positional relationships, and human-object interactions.

Synthetic data allowed the researchers to create more diverse images at a lower cost than generating real data while preserving privacy through the use of avatars.

From MIT News
View Full Article


Abstracts Copyright © 2023 SmithBucklin, Washington, D.C., USA


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account