SFM for Archivists: Establishing a Policy Basis for Access

Editor’s note: In June, we published the report Social Feed Manager: Guide for Building Social Media Archives, written by Christopher Prom of the University of Illinois with support from our grant from NHPRC. To highlight this work and the insights it provides to the community, we’re sharing excerpts of the report as blog posts. This is the fourth in a series of posts we’re calling “SFM for Archivists”.

Establishing a Policy Basis for Access

The SFM team has developed and released social media collection development guidelines. Rather than offering prescriptive advice or specific policy recommendations, the guidelines introduce a set of questions that archives and libraries can use in guiding conversations about how to capture, preserve, and provide access to social media data, focusing on the following areas:

Ethics
API Terms of Service
Harvest Scoping
Documenting collecting decisions
Access

Reading through the guidance and thinking about the various issues they raise, it may seem easy to become paralyzed. But the questions can and should be framed in light of overall archival objectives and with an understanding that risks can be mitigated by developing, implementing, and documenting policies and procedures that govern the way social media records will be accessed. In other words, by thinking through the end goals, repositories will shape and be able to preserve collections that meet projected access needs.

To help archives begin the process of developing these policies, I would like to describe three basic access scenarios and list some potential use cases for each one. These scenarios are not prescriptive. Repositories may wish to pursue strategies that are less (or more) risk averse, in line with local interpretations of the questions discussed in SFM’s collection development guidelines. But here are some examples of potential access scenarios:

Scenario One: Archives preserves and provides access ONLY to tweet ids. In this scenario, not much is being preserved. Twitter IDs have no inherent meaning. Data preservation and the ability to contextualize tweets depend wholly on Twitter. This scenario would be useful if the collecting repository cared only about the current use of the data. Such a strategy might also be useful if the collecting organization is harvesting data that implicates the privacy rights of many third parties, such as a repository seeking to document health issues, drug abuse, domestic violence, political resistance in an authoritarian country, or some other topic where there is a potential risk to the safety or privacy of individuals who have not consented to have information about the collected and distributed.

Scenario Two: In other cases, archives may wish to preserve and provides access to the full content returned by the API, but allow access to the full data only under very strict controls, such as signed agreement to a condition of use form. Such a scenario may be warranted when a large number of ‘third party’ social media records have been collected and when they speak to a topic of public importance, but the records are perceived as bearing risk to the institution, should deleted social media records be distributed or should the privacy of third parties by compromised. In this case, the signed condition of use form would inform the user of their responsibilities to use the social media records in a way that meets legal, institutional, and ethical requirements.

Scenario Three: In many cases, a collecting repository will want capture and preserve the full complement of data the social media service provider supplies, post tweet ids online, and provide on-request access to full data under the terms of use policy. Researchers would access social media records in a search room or remotely after being provided a copy by the archivist. The archivist would not closely review the content of the tweets, take steps to remove deleted tweets, or require that the user sign a condition of use form. Such a scenario might be particularly warranted when it is known that all of the tweets in the dataset were generated by the parent institution of the archives that collected them or where copyright to the tweets has been granted to the organization by the owner of the underlying copyrights. In this case, there is low risk to the repository. Even if a tweet has been deleted and is unavailable from the service provider, the underlying copyright is owned by the institution. Such a scenario might also be warranted in cases where the tweets involve prominent public figures whose activities via twitter and other social media services speak to their public role, for example, politicians or celebrities.