Faceted Exploration of Large Text Document Collections

Faculty: 
John Stasko

Analysts routinely encounter large collections of text documents with many accompanying numeric or categorical data fields. For example, consider a collection of wine reviews where the main text part of the document is the actual review narrative, but the accompanying fields are data such as the wine's variety, color, age, rating, producer, region, reviewer, and so on. In this project, we are exploring techniques to allow analysts to investigate the document collection by "slicing" it along the different attributes affiliated with each document (we call them "facets"). These techniques should allow the analysts to look at common trends, patterns, outliers, and peculiarities of the document collection in order to gain more intelligence about and a better understanding of the collection.

Lab: 
Director: 
Brian D. Jones
Students: 
Graduate: Adviti Atluri, Avery Ao; Undergraduate: Ana Herrera, Aditya Kabu, Matthew Perry, Shayar Shah

Generally, people spend a good amount of time in their home performing everyday activities like: sleeping, eating, cooking, relaxing, entertaining, and so on; thus, it comes as no surprise that the home plays a key role in our health, lifestyle, and well-being. The Aware Home Research Initiative (AHRI) at Georgia Institute of Technology is an interdisciplinary research endeavor aimed at addressing the fundamental technical, design, and social challenges for people in a home setting. Central to this research is the Aware Home, a 3-story, 5040 square foot facility designed to facilitate research, while providing an authentic home environment. Research domains include: 1. Health and Well-being, 2. Sustainability, 3. Entertainment, 4. Connected Living / Home Management.