Skip to main content Accessibility help
×
Hostname: page-component-5c6d5d7d68-pkt8n Total loading time: 0 Render date: 2024-08-06T18:21:47.342Z Has data issue: false hasContentIssue false

4 - Collection methods

Published online by Cambridge University Press:  08 June 2018

Get access

Summary

Introduction

This chapter describes various possible methods of collecting websites for archival purposes. The variety of approaches is dictated by the nature of web technology itself. This chapter therefore begins with a summary of website technology, before describing the various collection methods in detail. The strengths and limitations of each method are also considered. The design of a website can be an important factor in determining the ease with which it can be collected, and the range of methods appropriate. This chapter therefore also considers how webmasters can create ‘archive-friendly’ websites.

The technology of the web

The experience of using the world wide web arises from the interplay between two fundamental components – the web server and the web client, such as a web browser. A web server stores content, such as HTML pages and images, which it delivers, or ‘serves’, to a web browser in response to requests from that browser. A web browser requests content from web servers, and renders that received content for the user. The interaction between these two components is therefore as significant as the components themselves. Some form of communications protocol provides the mechanism by which this interaction takes place. The protocol defines a standard format for communications between the server and the browser. The most commonly used protocol on the web is the hypertext transfer protocol (HTTP). Thus, when a browser sends a request to a server, that request takes the form of an HTTP ‘message’, as does the reply from the server.

All the content available on a web server is identified using a uniform resource locator (URL) – a reference which describes where on the web that content is located (see Chapter 3, Selection, for a more detailed discussion of URLs). The nature of URLs is one of the defining characteristics of the web, and creates a very indirect relationship between browsers and servers. Neither the browser nor the server need to know anything about each other, beyond the information contained within the HTTP message. Thus, a browser requests content by sending an HTTP request containing the relevant URL.

Type
Chapter
Information
Archiving Websites
a practical guide for information management professionals
, pp. 42 - 68
Publisher: Facet
Print publication year: 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×