Publication Cover
Internet Histories
Digital Technology, Culture and Society
Volume 7, 2023 - Issue 4
408
Views
0
CrossRef citations to date
0
Altmetric
Research Articles

Sorting URLs out: seeing the web through infrastructural inversion of archival crawling

ORCID Icon
Pages 386-401 | Received 13 Apr 2023, Accepted 11 Sep 2023, Published online: 16 Sep 2023
 

Abstract

Web archives collections have become important sources for Internet scholars by documenting the past versions of web resources. Understanding how these collections are created and curated is of increasing concern and recent web archives scholarship has studied how the artefacts stored in archives represent specific curatorial choices and collecting practices. This paper takes a novel approach in studying web archiving practice, by focusing on the challenges encountered in archival web crawling and what they reveal about the web itself. Inspired by foundational work in infrastructure studies, infrastructural inversion is applied to study how crawler interactions surface otherwise invisible, background or taken-for-granted aspects of the web. This framework is applied to study three examples selected from interviews and ethnographic fieldwork observations of web archiving practices at the Danish Royal Library, with findings demonstrating how the challenges of archival crawling illuminate the web’s varied actors, as well as their changing relationships, power differentials and politics. Ultimately, analysis through infrastructural inversion reveals how collection via crawling positions archives as active participants in web infrastructure, both shaping and shaped by the needs and motivations of other web actors.

Acknowledgements

Many thanks to all the participants at the Netarchive for their time, to Zoe LeBlanc, Katie Mackinnon and Karen Wickett for their feedback on an early draft of this article, and to the anonymous reviewers for their helpful comments and suggestions throughout the review process.

Disclosure statement

No potential conflict of interest was reported by the author.

Notes

1 For a more thorough account of the Netarchive’s processes and collecting history, see Schostag and Fønss-Jørgensen (Citation2012), and Laursen and & Møldrup-Dalum (Citation2017).

2 An average of two to three event harvests are conducted each year, including both predictable events like regional and national elections, national celebrations or sporting events, as well as unpredictable events such as the financial crisis of 2008, the swine flu outbreak in 2009, a national teacher lockout in 2013, and terrorist attacks in Copenhagen in 2015.

3 See W3C’s historic document on HTTP status codes (https://www.w3.org/Protocols/http/HTRESP.html) and RFC 1945 HTTP/1.0 (https://www.ietf.org/rfc/rfc1945.txt).

4 IANA maintains a registry of current codes and their descriptions https://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml

5 CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart,” and Justie (Citation2021) presents an in-depth history of various CAPTCHA technologies and their implementation.

Additional information

Funding

Social Sciences and Humanities Research Council of Canada, Canada Graduate Scholarship 767-2015-2217 and Michael Smith Foreign Study Supplement.

Notes on contributors

Emily Maemura

Emily Maemura is Assistant Professor in the School of Information Sciences at the University of Illinois Urbana-Champaign. Her research focuses on data practices and the activities of curation, description, characterization, and re-use of archived web data. She is interested in approaches and methods for working with archived web data in the form of large-scale research collections, considering diverse perspectives of the internet as an object and site of study.

Log in via your institution

Log in to Taylor & Francis Online

PDF download + Online access

  • 48 hours access to article PDF & online version
  • Article PDF can be downloaded
  • Article PDF can be printed
USD 53.00 Add to cart

Issue Purchase

  • 30 days online access to complete issue
  • Article PDFs can be downloaded
  • Article PDFs can be printed
USD 148.00 Add to cart

* Local tax will be added as applicable

Related Research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.