Sep 28, 2015
Researchers from the School of Interactive Computing and the Institute for Robotics and Intelligent Machines developed a new method that teaches computers to “see” and understand what humans do in a typical day.
The technique gathered more than 40,000 pictures taken every 30 to 60 seconds, over a 6 month period, by a wearable camera and predicted with 83 percent accuracy what activity that person was doing. Researchers taught the computer to categorize images across 19 activity classes. The test subject wearing the camera could review and annotate the photos at the end of each day (deleting any necessary for privacy) to ensure that they were correctly categorized.
“It was surprising how the method’s ability to correctly classify images could be generalized to another person after just two more days of annotation,” said Steven Hickson, a Ph.D. candidate in Computer Science and a lead researcher on the project.
“This work is about developing a better way to understand people's activities, and build systems that can recognize people's activities at a finely-grained level of detail,” said Edison Thomaz, co-author and graduate research assistant in the School of Interactive Computing. “Activity tracking devices like the Fitbit can tell how many steps you take per day, but imagine being able to track all of your activities – not just physical activities like walking and running. This work is moving toward full activity intelligence. At a technical level, we are showing that it's becoming possible for computer vision techniques alone to be used for this.”
The group believes they have gathered the largest annotated dataset of first-person images to demonstrate that deep-learning can understand human behavior and the habits of a specific person.
Student Daniel Casto, a Ph.D. candidate in Computer Science and a lead researcher on the project, helped present the method earlier this month at UBICOMP 2015 in Osaka, Japan. He says reaction from conference-goers was positive.
“People liked that we had a method that combines time and images,” Castro says. “Time (of activity) can be especially important for some activity classes. This system learned how relevant images were because of people’s schedules. What does it think the image is showing? It sees both time and image probabilities and makes a better prediction.”
The ability to literally see and recognize human activities has implications in a number of areas – from developing improved personal assistant applications like Siri to helping researchers explain links between health and behavior, Thomaz says.
Castro and Hickson believe that someday within the next decade we will have ubiquitous devices that can improve our personal choices throughout the day.
“Imagine if a device could learn what I would be doing next – ideally predict it – and recommend an alternative?” Castro says. “Once it builds your own schedule by knowing what you are doing, it might tell you there is a traffic delay and you should leave sooner or take a different route.”
The research, “Predicting Daily Activities From Egocentric Images Using Deep Learning,” can be found at http://www.cc.gatech.edu/cpl/projects/dailyactivities/. Authors are Castro, Hickson, Vinay Bettadapura, Thomaz, with School of Interactive Computing Professors Gregory Abowd, Henrik Christensen and Irfan Essa.