Web Archives - Collection Development Policy

Purpose of the Web Archives Collection

The purposes of the George Washington University Web Archives Program is to:

Serve as the official repository for GW websites.
Capture continuously changing web content created by GW’s departments, offices, and organizations.
Capture websites that complement or are part of the GW LAI’s Special Collections.
Make archived web content available to be used as primary sources for reference, research, and historical purposes.

Collecting Scope

The Web Archive Program documents George Washington University websites, as well as websites complement or relate to GW LAI’s specialized collections. GW websites, for the purposes of the Web Archiving Program, are defined as websites hosted by the university, and about the university. University websites reflect university functions and events, for example, websites for the different colleges, programs, and departments at GW. Collected websites may also include sites dedicated to student organizations on campus, and any other websites reflecting student life on campus.

Subject Coverage

The primary subjects to be documented by the Web Archive Program are:

Websites for GW’s colleges, departments, and programs.
Websites dedicated to campus life and student organizations.
Academic and non-academic organizations, programs, and events related to or sponsored by GW.
Websites in the following collecting areas:
- Corcoran Gallery of Art and Corcoran College of Arts and Design
- Global Resources Center Web Archives
- International Brotherhood of Teamsters Labor History Research Center Web Archives
- National Education Association Web Archives

Materials and Formats

The Web Archives program collects any websites that meet our established subject focus and can be accessed on the live web without the need for login credentials.

Dates of Coverage

The GW Web Archives Program began crawling websites in 2014. Most of the archived websites will be from 2014 forward; however, because any live website can be crawled, the collection may contain websites that were created before 2014.

Collecting Frequency

Websites can be crawled either one-time, monthly, quarterly, semi-annually, or annually depending on need. Websites are also crawled on a request basis via the Request Form for Archiving GW-Affiliated Website

Geographic Scope

The collecting scope is not limited by geographical region.

Exclusions

The Web Archive Program will generally not collect faculty or students’ personal blogs or websites not under the main gwu.edu domain or where the focus is not GW.
The Web Archive Program will not crawl and collect social media sites or other sites that may require log in credentials to be accessed. Data from Twitter, Tumblr, Flickr, and Sina Weibe can be collected using the university’s Social Feed Manager.
Archive-It’s crawling technology may have issues capturing pages with videos or photographs. Likewise, the crawling technology may have issues capturing certain sites, such as Wix sites, and some website features, such as Vimeo embedded video players or embedded Google features like Google Calendar.

Accessibility

A survey performed in 2022 by Tori Maches, Digital Archivist, UC San Diego, Lydia Tang, PhD., Outreach and Engagement Coordinator, Lyrasis, and Tanya Ulmer, Web Archivist, Archive-It, found that many older websites meet accessibility standards by default.¹ However, websites are captured in their original state and may not always meet accessibility standards. If you come across an archived page in one of our collections that doesn’t meet accessibility standards, please use the Accessibility Feedback Form to let us know.

Torres Mach, Lydia Tang, and Tanya Ulmer, “A Brief Look at Web Accessibility in Archive-It,” Archive-It Blog (blog), 2022, https://archive-it.org/post/a-brief-look-at-web-accessibility/. ↩