Sign In

Communications of the ACM

ACM TechNews

Toward an ML Model That Can Reason About Everyday Actions

View as: Print Mobile App Share:
The model picked out the video in each set that conceptually didnt belong and highlighted it in red.

Researchers at the Massachusetts Institute of Technology, Columbia University, and IBM trained a model to reach human-level performance at recognizing abstract concepts in video.

Credit: Allen Lee

Researchers at the Massachusetts Institute of Technology (MIT), Columbia University, and IBM have trained a hybrid language-vision machine learning model to recognize abstract concepts in video.

The researchers used the WordNet word-meaning database to map how each action-class label in MIT's Multi-Moments in Time and DeepMind's Kinetics datasets relates to the other labels in both datasets.

The model was trained on this graph of abstract classes to generate a numerical representation for each video that aligns with word representations of the depicted actions, then combine them into a new set of representations to identify abstractions common to all the videos.

When compared with humans performing the same visual reasoning tasks online, the model performed as well as them in many situations.

MIT's Aude Oliva said, "A model that can recognize abstract events will give more accurate, logical predictions and be more useful for decision-making."

From MIT News
View Full Article


Abstracts Copyright © 2020 SmithBucklin, Washington, DC, USA


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account