THE SYNTAX OF ACTION
Abstract Understanding human activity from a video is a fundamental problem in today’s Computer Vision and Imitation Learning. The video discusses the issue of the syntax of human activity and advances the viewpoint that perceived human activity first needs to be parsed, just as in the case of language. Using these ideas then the video proposes the Ego-OMG framework. Egocentric object manipulation graphs are graphs that are extracted from a basic parsing of a video of human activity (they represent the contacts of the left and right hand with objects in the scene) and they can be used for action prediction.
Paper: https://arxiv.org/abs/2006.03201
Yiannis Aloimonos is a Professor of Computational Vision and Intelligence at the Department of Computer Science, University of Maryland, College Park, and the Director of the Computer Vision Laboratory at the Institute for Advanced Computer Studies (UMIACS). He is also affiliated with the Institute for Systems Research and the Neural and Cognitive Science Program. He was born in Sparta, Greece, and studied Mathematics in Athens and Computer Science at the University of Rochester, NY (PhD 1990). He is interested in Active Perception and the modeling of vision as an active, dynamic process for real-time robotic systems. For the past five years, he has been working on bridging signals and symbols, specifically on the relationship of vision to reasoning, action, and language.