B2 Ingest: creation of the AIP (BA)

B2.1 Repository has an associated, printable definition for each AIP or class of AIPs preserved by the repository that is adequate to fit long-term preservation needs.

Supporting Text
This is necessary to ensure that the AIP and its associated definition can always be found and managed within the archives.

B2.1.1 The repository must be able to demonstrate which definition applies to which AIP.

Supporting Text
This is necessary to ensure each AIP can be properly parsed/interpreted.

Examples of Ways the Repository can Demonstrate it is Meeting this Requirement

Documentation clearly linking each definition to a specific AIP or class of AIPs and linking that AIP(s) back to the SIP(s) from which it was created.

Supporting Text

It is necessary to create definitions for each AIP and to demonstrate the clear persistent link between the definition, the SIP(s), and the AIP(s) that were created from the SIP(s)

B2.1.2 The repository must be able to demonstrate that the definition of each AIP is adequate for long term preservation by demonstrating that it has all the required components, each of which can be maintained over time.

Supporting Text
This is necessary in order to explicitly show that the AIPs are fit for purpose, that each component of an AIP has been adequately thought through and the plans for the maintenance of each AIP are in place. (See B.3 Preservation planning, below)

Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Documentation identifying each class of AIP and describing how each is implemented within the repository. Implementations may, for example, involve some combination of files, databases, and/or documents. Documentation that relates the AIP component%u2019s contents to the related preservation needs of the repository, with enough detail for the repository's providers and consumers to be confident that the significant properties of AIPs will be preserved. It should be clear how AIP components such as Representation Information and Provenance can be managed and kept up to date. It should be clear when new versions of AIPs need to be created in order to keep them fit for purpose. The external dependencies of the AIP should also be recorded.


Discussion
It is necessary that definitions exist for each AIP, or class of AIP if there are many instances of the same type. Repositories that store a wide variety of object types may need a specific definition for each AIP they hold, but it is expected that most repositories will establish class descriptions that apply to many AIPs. It must be possible to determine which definition applies to which AIP. It may also be necessary for the definitions to say something about the semantics or intended use of the AIPs if this could affect long-term preservation decisions. For example, say two repositories both only preserve digital still images, both using multi- image TIFF files as their preservation format. Repository 1 consists entirely of real-world photographic images intended for viewing by people and has a single definition covering all of its AIPs. (The definition may refer to a local or external definition of the TIFF format.) Repository 2 contains some images, such as medical x-rays, that are intended for computer analysis rather than viewing by the human eye, and other images that are like those in Repository 1. Repository 2 should perhaps define two classes of AIPs, even though it only uses one storage format for both. A future preservation action may depend on the intended use of the image%u2014an action that changes the bit-depth of the image in a way that is not perceivable to the human eye may be satisfactory for real-world photographs but not for medical images, for example. An AIP contains these key components: the primary data object to be preserved, its supporting Representation Information (format and meaning of the format elements), and the various categories of Preservation Description Information (PDI) that also need to be associated with the primary data object: Fixity, Provenance, Context, and Reference. There should be a definition of how these categories of information are linked.

B2.2 Repository has a description of how AIPs are constructed from SIPs.

Supporting Text
The repository must be able to show how the AIP(s) is constructed from the SIP(s). This is necessary in order to ensure that the AIP(s) adequately represents the information in the SIP(s).

Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Process description documents; documentation of SIP relationship to AIP; clear documentation of how AIPs are derived from SIPs; documentation of standard/process against which normalization occurs; documentation of normalization outcome and how outcome is different from SIP.


Discussion
In some cases, the AIP and SIP will be almost identical apart from packaging and location, and the repository need only state this. In other cases, complex transformations (e.g., data normalization) may be applied to objects during the ingest process, and a precise description of these actions may be necessary to reflect how the AIP(s) has been adequately transformed from the information in the SIP(s). The AIP construction description should include documentation that gives a detailed description of the ingest process for each SIP to AIP transformation, typically consisting of an overview of general processing being applied to all such transformations, augmented with description of different classes of such processing and, when applicable, with special transformations that were needed.

Some repositories may need to produce these complex descriptions case by case, in which case diaries or logs of actions taken to produce each AIP will be needed. In these cases, documentation needs to be mapped to individual AIPs, and the mapping needs to be available for examination. Other repositories that can run a more production-line approach may have a description for how each class of incoming object is transformed to produce the AIP. It must be clear which definition applies to which AIP. If, to take a simple example, two separate processes each produce a TIFF file, it must be clear which process was applied to produce a particular TIFF file.

B2.3 Repository can demonstrate that all accepted SIPs are either incorporated into one or more AIPs or otherwise disposed of in a recorded fashion.

In particular the following aspect must be checked.

B2.3.1 The repository must follow documented procedures in the event that a SIP is discarded.

Supporting Text
This is necessary in order to ensure that the SIPs received have been dealt with appropriately, and in particular have not been accidentally lost.


Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
System processing files; disposal records; donor or depositor agreements/deeds of gift; provenance tracking system; system log files. Process description documents; documentation of SIP relationship to AIP; clear documentation of how AIPs are derived from SIPs; documentation of standard/process against which normalization occurs; documentation of normalization outcome and how the resulting AIP is different from the SIP(s).


Discussion
The timescale of this process will vary between repositories from seconds to many months, but SIPs must not remain in an unprocessed limbo-like state forever. The accessioning procedures and the internal processing and audit logs should maintain records of all internal transformations of SIPs to demonstrate that they either become AIPs (or part of AIPs) or are disposed of. Appropriate descriptive information should also document the provenance of all digital objects.

B2.4 Repository has and uses a convention that generates persistent, unique identifiers for all AIPs.

In particular the following aspects must be checked.

B2.4.1 The repository must be able to show how any AIP can be uniquely identified within the repository.

B2.4.1.1 The repository must be able to demonstrate that the identifiers are unique.

B2.4.1.2 Documentation must show how the persistent identifiers of the AIP and its components are assigned and maintained so as to be unique within the context of the repository.

B2.4.1.3 Documentation must also describe any processes used for changes to such identifiers.

B2.4.1.4 The repository must be able to provide a complete list of all such identifiers and do spot checks for duplications.

B2.4.1.5 The system of identifiers must be seen to fit the repository%u2019s current and foreseeable future requirements for things like numbers of objects.
Supporting Text
These requirements are necessary in order to ensure that each AIP can be unambiguously found in the future. They also are necessary to ensure that each AIP can be distinguished from all other AIPs in the repository.

B2.4.2 The repository must have a system of reliable linking/resolution services in order to find the uniquely identified object, regardless of its physical location.

Supporting Text
This is so that actions relating to AIPs can be traced over time, over system changes, and over storage changes.


Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Documentation describing naming convention and physical evidence of its application (e.g., logs)


Discussion
A repository needs to ensure that an accepted, standard naming convention is in place that identifies its materials uniquely and persistently for use both in and outside the repository. The %u201Cvisibility%u201D requirement here means %u201Cvisible%u201D to repository managers and auditors. It does not imply that these unique identifiers need to be visible to end users or that they serve as the primary means of access to digital objects. Ideally, the unique ID lives as long as the AIP; if it does not, there must be traceability. Note that B2.1 requires that the components of an AIP be suitably bound and identified for long-term management, but places no restrictions on how AIPs are identified with files. Thus, in the general case, an AIP may be distributed over many files, or a single file may contain more than one AIP. Therefore identifiers and filenames may not necessarily correspond to each other. Documentation must represent these relationships.

B2.5 Repository demonstrates that it has access to necessary tools and resources to provide authoritative Representation Information for all of the digital objects it contains.

In particular the following aspects must be checked.

B2.5.1 The repository must have tools or methods to identify the file type of all submitted Data Objects.

B2.5.2 The repository must have tools or methods to determine what Representation Information is necessary to make each Data Object understandable to the Designated Communit(ies).

B2.5.3 The repository must ensure that it has access to the requisite Representation Information.

B2.5.4 The repository must have tools or methods to ensure that the requisite Representation Information is persistently associated with the relevant Data Objects.

Supporting Text
These are necessary in order to ensure that the repository's digital objects are understandable to the Designated Communit(ies).


Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Subscription or access to such registries; association of unique identifiers to registries of Representation Information (including format registries); viewable records in local registries (with persistent links to digital objects); database records that include Representation Information and a persistent link to relevant digital objects.


Discussion
These tools and resources can be held internally or can be shared via, for example, a trusted set of registries. However this requirement does not demand that each repository has such tools and resources, merely that it has access to them. The Global Digital Format Registry (GDFR), the UK National Archives' file format registry PRONOM, and the UK Digital Curation Centre's Registry Repository of Representation Information (RRORI) are three emerging examples of external registries a repository might adopt. Any such registry is a specialized type of repository, which itself must be certified/trusted. The repository may use these types of standardized, authoritative information sources to identify and/or verify the Representation Information components of Content Information and PDI. This will reduce the long-term maintenance costs to the repository and improve quality control. Sometimes there is both general representation information (e.g. format information) and also specific representation information (e.g., meanings of individual fields within a dataset). Often the general information will be available in an external repository, but the local repository may need to maintain the instance specific information. It is likely that many repositories would wish to keep local copies of relevant Representation Information; however this may not be practical in all cases. Even where a repository strives to keep all such information locally there may be, for example, a schedule of updates which means that until an update is performed, the local Representation Information is incomplete. This may be regarded as a kind of local caching of, for example, the Representation Information held in registries. Alternatively one may say that in these cases, the use of international registries is not meant to replace local registries but instead serve as a resource to verify or obtain independent, authoritative information about any and all Representation Information. Good practice suggests that any locally held Representation Information should also be made available to other repositories via a trusted registry. In addition any item of Representation Information should itself have adequate Representation Information to ensure that the Designated Community can understand and use the data object being preserved.

B2.6 Repository has documented processes for acquiring Preservation Description Information (PDI) for its associated Content Information and acquires PDI in accordance with the documented processes.

In particular the following aspects must be checked.

B2.6.1 The repository has documented processes for acquiring PDI.

B2.6.2 The repository must execute its documented processes for acquiring PDI.

B2.6.3 The repository must ensure that the PDI is persistently associated with the relevant Content Information.

Supporting Text
These requirements are necessary in order to ensure that an auditable trail to support claims of authenticity is available, that unauthorized changes to the digital holdings can be detected, and that the digital objects can be identified and placed in their appropriate context.


Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Standard operating procedures; manuals describing ingest procedures; viewable documentation on how the repository acquires and manages Preservation Description Information (PDI); creation of checksums or digests, consulting with designated community about Context.


Discussion
PDI is needed not only by the repository to help ensure the Content Information is not corrupted (Fixity) and is findable (Reference Information), but to help ensure the Content Information is adequately understandable by providing a historical perspective (Provenance Information) and by providing relationships to other information (Context Information). The extent of such information needs is best addressed by members of the designated community(ies). The PDI must be permanently associated with Content Information.

B2.7 Repository ensures that Content Information of the AIPs is understandable for their Designated Communities at the time of creation of the AIP.

In particular the following aspects must be checked.

B2.7.1 Repository has a documented process for testing understandability for their Designated Communities of the Content Information of the AIPs at their creation.

B2.7.2 The repository must execute the testing process for each class of Content Information of the AIPs.

B2.7.3 The Repository must bring the Content Information up to the required level of understandability when the AIP fails understandability testing.

Supporting Text
These requirements are necessary in order to ensure that one of the primary tests of preservation, namely that the digital holdings are understandable by their Designated Communit(ies), can be met. See B3 for additional requirements for understandability beyond ingest.


Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Test procedures to be run against the digital holdings to ensure their understandability to the defined Designated Communities and their knowledge bases; records of such tests being performed and evaluated; evidence of gathering or identifying Representation Information to fill any intelligibility gaps which have been found; Retention of individuals with the discipline expertise.


Discussion
These requirements are concerned with the understandability of the AIP. If the ingested material is not understandable, the repository needs to ingest or make available additional information to make sure that the AIPs are understandable to the designated comminut(ies). For example, if documents are written in a dying language and the Designated Communit(ies) are no longer able to understand the language the documents are written in, the repository would need to provide additional documentation that would allow the Designated Communit(ies) to understand the documents (e.g., translations of the documents in a language the Designated Communit(ies) could understand or dictionaries that would allow the Designated Communit(ies) to translate the documents into a language its members understand.)

B2.8 Repository verifies each AIP for completeness and correctness at the point it is created.

Supporting Text
The repository must be sure that the AIPs it creates are as they are expected to be by checking them against the associated definition for each AIP or class of AIP (see B2.1) and the description of how AIPs are constructed from SIPs (see B.2.3.). This is necessary in order to ensure that what is maintained over the long term is as it should be and can be traced to the information provided by the Producers.


Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Description of the procedure that verifies completeness and correctness of the AIPs; logs of the procedure.


Discussion
If the repository has a standard process to verify SIPs for both completeness and correctness and a demonstrably correct process for transforming SIPs into AIPs (see B2.2), then it simply needs to demonstrate that the initial checks were carried out successfully and that the transformation process was carried out without indicating errors. On the other hand repositories that must create unique processes for many of their AIPs will also need to generate unique methods for validating the completeness and correctness of AIPs. This may include performing tests of some sort on the content of the AIP that can be compared with tests on the SIP. Such tests might be simple (counting the number of records in a file, or performing some simple statistical measure), but they might be complex. Documentation should describe how the completeness and correctness of AIPs is ensured, starting with receipt from the producer and continuing through AIP creation and supporting long-term preservation. Example approaches include the use of checksums, testing that checksums are still correct at various points during ingest and preservation, logs that such checks have been made, and any special tests that may be required for a particular AIP instance or class.

B2.9 Repository provides a mechanism for verifying the integrity of the repository collection/content.


Supporting Text
The repository must provide a way to independently demonstrate the completeness and correctness of its collections and their contents. This is necessary to enable the audit of the integrity of the collection as a whole. It is the responsibility of the repository to choose the appropriate form of such a mechanism.


Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Documentation provided for B2.1 through B2.4; documented agreements negotiated between the producer and the repository (see B 1.1-B1.9); logs of material received and associated action (receipt, action, etc.) dates; logs of periodic checks.


Discussion
In general, it is likely that a repository that meets all the previous criteria will satisfy this one without needing to demonstrate anything more. As a separate requirement, it demonstrates the importance of being able to audit the integrity of the collection as a whole. For example, if a repository claims to have all e-mail sent or received by The Yoyodyne Corporation between 1985 and 2005, it has been required to show that:

  • The content it holds came from Yoyodyne%u2019s e-mail servers.
  • It is all correctly transformed into a preservation format.
  • Each monthly SIP of e-mail has been correctly preserved, including original unique identifiers such as Message-IDs.
However it may still have no way of showing whether this really represents all of Yoyodyne's email. For example, if there is a three-day period with no messages in the repository, is this because Yoyodyne was shut down for those three days, or was the e-mail lost before the SIP was constructed? This case could be resolved by the repository amending its description of the collection, but other cases may not be so straightforward. A familiar mechanism from the world of traditional materials in libraries and archives is an accessions or acquisitions register that is independent of other catalog metadata. A repository should be able to show, for each item in its accessions register, which AIP(s) contain content from that item. Alternatively, it may need to show that there is no AIP for an item, either because ingest is still in progress, or because the item was rejected for some reason. Conversely, any AIP should be able to be related to an entry in the acquisitions register.

B2.10 Repository has contemporaneous records of actions and administration processes that are relevant to AIP creation.


Supporting Text
The repository must create records of its preservation related activities essentially as they happen. This is necessary in order to be sure that nothing relevant is omitted from the record that might be needed to provide an independent means to verify that all AIPs have been properly created in accord with the documented procedures (see B.2.1. through B.2.9). It is the responsibility of the repository to justify its practice in this respect.


Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Written documentation of decisions and/or action taken; preservation metadata logged, stored, and linked to pertinent digital objects.


Discussion
These records must be created on or about the time of the actions they refer to and are related to actions associated with AIP creation. The records may be automated or may be written by individuals, depending on the nature of the actions described. Where community or international standards are used, the repository must demonstrate that all relevant actions are carried through.

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r3 - 2009-04-20 - BruceAmbacher
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback