IMUTube: Automatic Extraction of Virtual on-body Accelerometry from Video for Human Activity Recognition

Gregory D. Abowd, Thomas Ploetz
Hyeokhyen Kwon

The lack of large-scale, labeled data sets impedes progress in developing robust and generalized predictive models for on-body sensor-based human activity recognition (HAR). Labeled data in human activity recognition is scarce and hard to come by, as sensor data collection is expensive, and the annotation is time-consuming and error-prone. To address this problem, we introduce IMUTube, an automated processing pipeline that integrates existing computer vision and signal processing techniques to convert videos of human activity into virtual streams of IMU data. These virtual IMU streams represent accelerometry at a wide variety of locations on the human body.We show how the virtually-generated IMU data improves the performance of a variety of models on known HAR datasets. Our initial results are very promising, but the greater promise of this work lies in a collective approach by the computer vision, signal processing, and activity recognition communities to extend this work in ways that we outline. This should lead to on-body, sensor-based HAR becoming yet another success story in large-dataset breakthroughs in recognition.


We are interested in ubiquitous computing and the research issues involved in building and evaluating ubicomp applications and services that impact our lives. Much of our work is situated in settings of everyday activity, such as the classroom, the office and the home. Our research focuses on several topics including, automated capture and access to live experiences, context-aware computing, applications and services in the home, natural interaction, software architecture, technology policy, security and privacy issues, and technology for individuals with special needs.