This week, GW Libraries’ software development librarians working on Social Feed Manager, Laura Wrubel, Justin Littman and Dan Kerchner, participated in the Archives Unleashed 2.0 hackathon hosted at the Library of Congress. Hackathon organizers selected the participants from among a pool of applicants from around the world. The hackathon brought together leading coding-savvy researchers in a variety of disciplines to dig into web archive data. Participants formed nine project teams and spent the next two days working intensively to try to derive interesting results from data sets such as web archives collected around the .gov domain, tweets related to the upcoming election, Cuban web archives, and more.
Of the nine teams, three used data collected by SFM. Laura’s highly interdisciplinary team visualized connections between presidential campaign account tweets harvested by SFM and Federal Elections Commission data, using named entity extraction, OpenRefine, a few sentiment analysis approaches, and some easily-accessible tools for visualization including d3plus and TimeMappr. Justin’s team used election 2016 Twitter data from SFM and the web resources linked to from those tweets, and attempted to examine the relationship between the topics of the web resources and when they were tweeted. Dan collaborated with a political science doctoral student from the University of Washington to extract and visualize relationships between U.S.-based ISIS sympathizers from tweet mentions, and to try to correlate relationship strength with ideological similarity, using Python, R, Gephi, an ideology metric developed at the Naval Postgraduate School, and the Twitter data that we have been collecting using SFM on behalf of the GW Program on Extremism.
At the end of the hackathon, teams evaluated the other teams’ projects, and Dan’s project received the highest score. Dan presented his results and participated in a panel discussion at the following day’s Saving The Web symposium. The symposium featured Internet pioneers such as Library of Congress scholar-in-residence Dame Wendy Hall, “father of the Internet” and Google VP Vint Cerf, and many key researchers working on problems of preserving the Internet and making web archives accessible and useful to future researchers. The hackathon also provided an opportunity for Justin, Dan, and Laura to present lightning talks on different aspects of our work with SFM and to engage in many substantive discussions with peers doing complementary work and interested in collaborating around SFM. Appropriately, tweets from the event can be searched (for a limited time) under #hackarchives and #SaveTheWeb.