Beieler created the map as a trainee in Penn State’s National Science Foundation-funded Big Data Social Science Integrative Graduate Education and Research Traineeship (IGERT) program and used data culled from the Global Database of Events, Language- and Tone (GDELT) — an enormous repository chronicling every documented social event accessible on the Internet. This includes protests, bombings, speeches, peace agreements and a myriad of others.
In the world of political science, the data set is a big deal. Although similar databases have been created, Beieler said what makes the GDELT dataset stand out is its scale.
“The scope of the data set is what really makes it amazing,” Beieler said. “It doesn’t just tell you there was a protest in Egypt on a specific day, it also specifies who did what to whom.”
For example, a GDELT entry wouldn’t just cite that a bombing happened in Iran. Each entry has 57 fields, storing such information as the date, event, perpetrator (including their ethnicity, race and political standing), and the location and the tone of the coverage (on a scale from positive to negative). The extent of this detail throughout the database makes for an exhaustive, comprehensive picture of the world.
The GDELT has been more than 20 years in the making. Philip Schrodt, then a professor at the University of Kansas in the 1980s, laid the early foundations of the database with interests laying mainly in Middle Eastern and Asian countries. Years later, Kalev Leetaru (then a graduate student at the University of Illinois, and now the Yahoo! Fellow at Georgetown University) helped bring the project to fruition —creating the technical infrastructure and workflows to scale it up to a global database that monitored tens of thousands of news sources on a daily basis. The completed data set was announced last April.
“I was very interested in teasing apart and exploring the connection between emotions and physical behavior,” Leetaru said. “I think the GDELT has succeeded in capturing people’s imaginations, and it’s telling that something like this has gone mainstream — and now even people outside of political science have begun exploring event data.”
After Leetaru added so much to the database, it was difficult for Beieler to extract the information he needed to create his animated map. Totaling 85 gigabytes of data, the GDELT is too big to run on one computer, so he had to use a computer cluster to open the data set across multiple systems. Beieler then had to work on separating the data about protest events from millions of other entries.
Once he sorted the data, Beieler compiled it in a spreadsheet, did some minor coding and worked with fellow Big Data Social Science-IGERT trainee and doctoral student, Josh Stevens, to map the data using software called CartoDB. Since its completion, the map has been featured by such news sites as The Guardian, Slate, Foreign Policy and Wired Japan.
But even more exciting than the press the map has been getting are the possibilities for future research opened by the GDELT data, Beieler explains.