Metadata for Web Archives
Metadata for web archives exists in two primary places: within Archive-It, the system used by GWLAI to collect and manage its web archives collection, and within ArchivesSpace, the system that GWLAI uses to describe its archives collections.
Archive-It uses 15 Dublin Core elements. Additional custom metadata fields may also be applied. These metadata fields may be applied to any level of a web archive: collection, seed, and document. SCRC primarily applies metadata to the collection and seed level. See the Add, edit, and manage your metadata article in the Archive-It Help Center for more information about metadata and Archive-It
This page synthesizes local application recommendations based upon the Descriptive Metadata for Web Archiving: Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group. 1
Metadata Profile for Web Archives
Element | Required/Optional |
---|---|
Title | Required |
Description | Required |
Collector | Required |
Date | Required |
Language | Required |
Creator | Strongly Encouraged |
Identifier_Collection | Strongly Encouraged |
Subject | Strongly Encouraged |
Source of Description | Optional |
Genre/Form | Optional |
Contributor | Optional |
This metadata profile is primarily designed for individual seeds within web archive collections, rather than entire collections of web archive content. While it shares similarities with metadata used in finding aids/EAD, it focuses specifically on metadata creation for individual seeds.
Metadata should be applied to seeds in Archive-It when they are created. It is strongly encouraged that all new seeds receive all required fields from this profile.
Title
Required | Explanation |
---|---|
Definition | The name by which an archive website or collection is known. |
Standard Usage for SCRC | Office of the President website |
Standards | DACS 2.3 |
Guidance | Transcribe directly from the head of the homepage (website) or inspect the homepage for a relevant metatag or other related element. |
Description
Required | Explanation |
---|---|
Definition | One or more notes explaining the content, context, and other aspects of an archived website or collection. |
Example Usage for SCRC | Website of the Office of the President, George Washington University. Contains information about the President’s office and community messages published by the President. |
Standards | DACS 3.1 |
Guidance | Briefly describe the scope and content of the archived website. Describe what the website is about, its purpose, and describe who created it. |
Collector
Required | Explanation |
---|---|
Definition | Organization responsible for collecting the archived content. |
Example Usage for SCRC | - Special Collections Research Center. The George Washington University - George Washington University Libraries |
Standards | DACS 2.2 |
Guidance | Identify the institution responsible for selecting websites for archiving, crawling the websites, and creating and maintaining the metadata that describes the content |
Note | Use SCRC when content falls under SCRC collecting scope. Use GW Libraries when content falls outside of SCRC collecting scope. |
Date
Required | Explanation |
---|---|
Definition | A single date or span of dates associated with the capture of an archived website or collection. |
Example Usage for SCRC | * Date first crawled: 2024-11-01 * Captured 2024-ongoing * Captured 2021-2024 |
Standards | DACS 2.4 |
Guidance | For non-scheduled crawls, use single dates. For seeds that are scheduled, use an ongoing date. If a scheduled seed becomes inactive, use an end date as part of a date range. |
Note | Do not include dates outside the range of the archived content. For example, if a website was first crawled in 2024, but the website was initially created in 2004, only use the 2024 date. |
Language
Required | Explanation |
---|---|
Definition | The language(s) of the archived content. |
Standard Usage for SCRC | English Spanish Chinese |
Standards | DACS 4.5 |
Guidance | This field may be repeated as many times as necessary to capture the languages used throughout the archived content. Use the English name of Language from ISO 639.2 (not the ISO codes) |
Creator
Strongly Encouraged | Explanation |
---|---|
Definition | An organization or person principally responsible for creating the intellectual content of an archived website or collection. |
Standard Usage for SCRC | George Washington University. Office of the President |
Standards | DACS 2.6 |
Guidance | The creator of a single website, such as an institutional home page, blog or twitter feed, usually is easily identified unless purposely anonymous, while a collection of websites focused on a current event or topic rarely has an overall creator. See also: contributor. |
Identifier_Collection
Strongly Encouraged | Explanation |
---|---|
Definition | The collection Identifier associated with the archived website. Use the resource record that |
Example Usage for SCRC | - RG002 - MS2285 - NEA1011-RG |
Standards | DACS 2.1 |
Guidance | Identify the creator of the website first, then see if there is a collection for that creator. |
Note | May use multiple collection identifiers if the archived website is associated with multiple collections. |
Subject
Strongly Encouraged | Explanation |
---|---|
Definition | Primary topic(s) describing the content of an archived website or collection |
Example Usage for SCRC | - George Washington University - Education, Higher – United States - Cross-country running |
Guidance | Identify the creator of the website first, then see if there is a collection for that creator. |
Note | Identify topical subjects, geographic locations, and people and organizations relevant to the content of the collection. Use subjects already present in ArchivesSpace. If not present, use FAST or LCSH. |
Source of Description
Optional | Explanation |
---|---|
Definition | Information about the gathering or creation of the metadata itself, such as sources of data or the date on which source data was obtained. |
Example Usage for SCRC | Description based on archived webpage captured on November 6, 2024. |
Standards | DACS 7.1.8 |
Guidance | Added value. Use when the website has been crawled for a long period of time and undergone many changes. |
Genre/Form
Optional | Explanation |
---|---|
Definition | The type of content in an archived website or collection. |
Example Usage for SCRC | - Website - News article - Social Media |
Note | At present, do not use for individual seeds/websites unless the format is uncommon. This is mostly relevant to collection-level metadata or description for web archives in ArchivesSpace (finding aids). |
Contributor
Optional | Explanation |
---|---|
Definition | An organization or person secondarily responsible for the content of an archived website or collection |
Example Usage for SCRC | Knapp, Steven, 1951- |
Standards | DACS 2.6 |
Guidance | If two or more entities share principal responsibility, place them all in Creator field. Otherwise, place one in the Contributor element. Use Contributor for all that have secondary responsibility |
Note | Use agent records from ArchivesSpace. If agent record not present, please reach out to the archivist responsible for the relevant collection area to discuss adding an agent record. When no agent record is present in ArchivesSpace, use LCNAF. |
-
Dooley, Jackie, and Kate Bowers. 2018. Descriptive Metadata for Web Archiving: Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group. Dublin, OH: OCLC Research. https://doi.org/10.2 ↩