Information Extraction in Data Journalism

Faculty: 
Alex Endert
Students: 
Rachel Chen

Data Journalists play an important role in shaping how the public engages with quantitative information in the news. Through interactive visualizations, journalists can share complex data in a way that is more tailored and understandable for their audience. Intrigued by the potentials of this interdisciplinary field, we conducted semi-structured interviews with journalists to understand how they work. We find that there are many inefficiencies related to data extraction, specifically around transforming raw report files (e.g., PDF) into an analysis-ready format (e.g., CSV or JSON). As such, we explore how we might improve their workflow by designing a prototype that aims to resolve the issues related to data extraction. The tool aims to achieve the following: 1) allow journalists to extract data tables from PDF through an easy-to-use GUI, and 2) provide a one-stop shop that supports cross-referencing between the raw reports and analysis-ready files.

Lab: 
Director: 
Alex Endert
Faculty: 
Alex Endert
Our goal is to help people make sense of data. We research and develop interactive visualizations that couple machine learning with visual interfaces of data for exploration and sensemaking.