(Left) Occupy Wall Street, Image Source: Paul Stein, (Middle) Venn diagram of intersecting URLs in three Occupy Wall Street web archives, (Right) Domain breakdown of URLs in three Occupy Wall Street web archives.

Archives Unleashed 4.0, British Library

Collaborators:
  • Sawood Alam, Computer Science, Old Dominion University
  • Gil Hoggarth, Web Archiving, British Library
  • Mat Kelly, Computer Science, Old Dominion University
  • Jessica Ogden, Web Science, University of Southampton
  • Shawn Walker, Information Science, University of Washington
  • Dawn Walker, Information Studies, University of Toronto

Our group came together around an interest in how to assess what’s missing in web archives. We’re an interdisciplinary group and a mix of both researchers and web archivists with methodological interests in ways to quickly assess the presence or absence of URLs and domains within web archival collections.

Given the availability of data, we chose Occupy Wall Street as a case study for assessing the above across multiple collections. Occupy Wall Street (OWS) is the name given to a protest movement that began on September 17, 2011, in Zuccotti Park, located in New York City’s Wall Street financial district, receiving global attention and spawning the movement against economic inequality worldwide (source).

The Goal

Produce a set of recipes and metrics for assessing and calculating URL, domain coverage across web archives. This is relevant for issues of archival staffing, labour and efficiency, redundancy (and ramifications for resources), the presence/absence of domains and inherent issues of selection and representativeness in the preservation of web resources.

More on the methodology, code and results of our project can be found on the project pages on GitHub.