Skip to content
Andy Jackson edited this page Nov 14, 2013 · 22 revisions

Targets

https://raw.github.com/wiki/ukwa/w3act/flows/flow-targets.png

Adding An Entry

This is the a crucial user workflow - adding an entry to W3ACT. In the prototype, this is a single page form, but in this version, we need to break the workflow down.

  1. Look up a URI.
  2. If appropriate, add a new entry.

Stage 1 - Look up a URI

The user starts by looking up the URL of interest. This gives a status check, and allows the user to decide whether they need to create an entry for this item.

Here is an example from the current prototype:

http://www.webarchive.org.uk/act/websites/url-search?url=http%3A//www.bbc.co.uk/news/

This page allows the user to look up a URL in a way that is compatible with a bookmarklet or other RESTful services.

From here, they can find out:

  • If the URI, or a closely related URI (e.g. same domain), already has an entry in ACT.
  • If the URI has been crawled (hook into Monitrix - see Data sources).
  • If the URI is available from any of the Wayback instances (i.e. hooks to Wayback API - see Data sources).

What this page should also do, but does not do yet:

Stage 2 - Adding an entry

This uses the same form as the entry editor (below). However, the URL is passed in and should appear in the, and a number of other parameters should also be set.

Adding/Editing A Target

The same form is used to add or edit entries (aiming to avoid user confusion), but some fields may not be edited after they have been added. (TO BE SPECIFIED)

In the prototype, this was one long form. In the new version, this should be simplified as much as possible. The editor should present a horizontal array of tabs, one of each section of the editor:

  • Basic information
    • Title
    • URL(s)
    • Live Site status
    • Overall QA status
    • Key Site status
    • WCT/SPT IDs (to be shown only but not edited)
    • Notes
  • Metadata
    • Description
    • Subject
    • Collections
    • Nominating Organisation
  • Crawl permission
    • Parameters relating to crawl legal scope.
    • [Open License Request](#Permissions Flow) button.
  • Crawl policy
    • Crawl frequency, start date and end date (to be generalised to allow multiple entries for this set of fields, i.e. a crawl schedule composed of multiple crawl directives)
    • Crawl scope
    • Crawl depth (or rather, crawl cap)
    • Whether to ignore Robots.txt
    • FUTURE whitelist and/or backlist URLs/Regexes.

Along the bottom, a field should be presented for an optional comment on each revision.

The highest priority is to establish the permission to crawl based on the crawl scope policy, and therefore a label on the 'Crawl permissions' tab should be used to make it clear whether an item currently falls in scope or whether additional permissions are needed.

TBA

Viewing A Target

This should use the same layout as the editor, but with static labels instead of form elements.

Beneath the tabbed pane, the current set of known Instances of this Target should be shown. The system should check the known [Wayback endpoints](Data sources) whenever a Target page is viewed, and update the list of known Instances based on the currently available data.

Viewing An Instance

As indicated above, a summary of an Instance may be viewed inline (within the Target page).

Annotating An Instance

Individual Instances of a Target are annotated in order to QA them and in order to add them to Collections.

Permissions Flow

TBA