The SFM team at GW Libraries, like a number of other institutions and individuals, publishes datasets containing tweet ids for some of our collections.1 (Many of these public datasets are listed in the DocNow Catalog.) We publish our datasets to GW Libraries’ dataverse on Harvard’s Dataverse instance.
Soomin Park, a data science student and member of the SFM team, authored instructions for exporting a dataset from SFM and publishing to Dataverse. The instructions are based on her experiences releasing the 7 million tweet Women’s March dataset. Feedback is welcome.
1 As background: Twitter’s Developer Policy only permits public datasets to include tweet ids. Publishing the full JSON for tweets is prohibited. As has been pointed out, this is problematic for quality, reproducible research.