B2.1 Repository has an identifiableassociated, written definition for each AIP or class of informationAIP preserved by the repository that is adequate to fit long-term preservation needs.

Supporting Text

The repository must have an identifiableassociated, written definition for each AIP or class of information preserved by the repository. An AIP contains these key components: the primary data object to be preserved, its supporting Representation Information (format and meaning of the format elements), and the various categories of Preservation Description Information (PDI) that also need to be associated with the primary data object: Fixity, Provenance, Context, and Reference. There should be a definition of how these categories of information are linked.

This is necessary to ensure that the AIP and its associated definition can always be found and managed within the archive.

It must be possible to determine which definition applies to which AIP.

This is necessary to ensure each AIP can be properly parsed/interpreted.

The repository must be able to demonstrate that the definition of each AIP is adequate for long term preservation by demonstrating that it has all the required components, each of which can be maintained over time.

This is necessary in order to explicitly show that the AIPs are fit for purpose, that each component of an AIP has been adequately thought through and the plans for the maintenance of each AIP are in place. (See B.3 Preservation planning, below)

-- MarkConrad - 03 Apr 2008 Should this say, "or class of AIP"? -- JohnGarrett - 07 Apr 2008 That seems a good change to me for the Requirement. I would also like to see something more here in the supporting text that is not just a repeat of the requirement. Perhaps the list of key components of an AIP that are needed in the definition. What does word "identifiable" add by being included here?

This is necessary in order to ensure that the definitions are explicitly, rather than implicitly (e.g. via software), available.

-- JohnGarrett - 07 Apr 2008 I think at least a source software definition might in some cases be a pretty good definition or significant portion of it. -- MarkConrad - 03 Apr 2008 Is this really the reason it is necessary to have a written definition?

Isn't it more along the lines of, "An AIP contains these key components: the primary data object to be preserved, its supporting Representation Information (format and meaning of the format elements), and the various categories of Preservation Description Information (PDI) that also need to be associated with the primary data object: Fixity, Provenance, Context, and Reference. There should be a definition of how these categories of information are bound together and/or related in such a way that they can always be found and managed within the archive."An AIP contains these key components: the primary data object to be preserved, its supporting Representation Information (format and meaning of the format elements), and the various categories of Preservation Description Information (PDI) that also need to be associated with the primary data object: Fixity, Provenance, Context, and Reference. There should be a definition of how these categories of information are bound together and/or related in such a way that they can always be found and managed within the archive."?

Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Documentation identifying each class of AIP and describing how each is implemented within the repository. Implementations may, for example, involve some combination of files, databases, and/or documents. Documentation that relates the AIP component’s contents to the related preservation needs of the repository, with enough detail for the repository's providers and consumers to be confident that the significant properties of AIPs will be preserved. It should be clear how AIP components such as Representation Information and Provenance can be managed and kept up to date. It should be clear when new versions of AIPs need to be created in order to keep them fit for purpose. The external dependencies of the AIP should also be recorded.

Discussion
An AIP contains these key components: the primary data object to be preserved, its supporting Representation Information (format and meaning of the format elements), and the various categories of Preservation Description Information (PDI) that also need to be associated with the primary data object: Fixity, Provenance, Context, and Reference. There should be a definition of how these categories of information are bound together and/or related in such a way that they can always be found and managed within the archive.

It is merely necessary that definitions exist for each AIP, or class of AIP if there are many instances of the same type. Repositories that store a wide variety of object types may need a specific definition for each AIP they hold, but it is expected that most repositories will establish class descriptions that apply to many AIPs. It must be possible to determine which definition applies to which AIP.

It may also be necessary for the definitions to say something about the semantics or intended use of the AIPs if this could affect long-term preservation decisions. For example, say two repositories both only preserve digital still images, both using multi-image TIFF files as their preservation format. Repository 1 consists entirely of real-world photographic images intended for viewing by people and has a single definition covering all of its AIPs. (The definition may refer to a local or external definition of the TIFF format.) Repository 2 contains some images, such as medical x-rays, that are intended for computer analysis rather than viewing by the human eye, and other images that are like those in Repository 1. Repository 2 should perhaps define two classes of AIPs, even though it only uses one storage format for both. A future preservation action may depend on the intended use of the image—an action that changes the bit-depth of the image in a way that is not perceivable to the human eye may be satisfactory for real-world photographs but not for medical images, for example.

-- MarkConrad - 03 Apr 2008 I believe the last sentence of the previous paragraph should be moved to the "Supporting Text" section. It should be followed by a sentence that says something like, "This is necessary to ensure each AIP can be properly parsed/interpreted." -- JohnGarrett - 07 Apr 2008 Is this what is meant by "identifiable" in the requirement, i.e. the definitions are linked to the AIPs?

While this requirement is primarily concerned with issues of identifying and binding key components of the AIP, B2.2 places more stringent conditions on the content of the key components to ensure that they are fit for the intended purpose. Separating the two criteria is important, particularly if a repository does not satisfy one of them. It is important to know whether some or all AIPs are not defined, or that the definitions exist but are not adequate.

*Examples of Ways the Repository can Demonstrate it is Meeting this Requirement*
Documentation identifying each class of AIP and describing how each is implemented within the repository. Implementations may, for example, involve some combination of files, databases, and/or documents

-- MarkConrad - 03 Apr 2008 I deleted the previous paragraph because it is a duplicate.

B2.2 (Removed)

-- MarkConrad - 02 Jun 2008 At our meeting of this date we decided to combine B.2.2 with B.2.1.

Requirement

Repository has a definition of each AIP (or class of AIP) that is adequate to fit long-term preservation needs.

-- BruceAmbacher - 17 Apr 2008 I would prefer "description" or "preservation plan" throughout B2.2 over "definition". Definition seems too narrow a term since the intent is to show what is the repository's preservation program for this AIP. I do not see how a "definition of each AIP" can prove the repository has an adequate preservation plan for that AIP.

Supporting Text

The repository must be able to demonstrate that the definition of each AIP is adequate for long term preservation by demonstrating that it has all the required components, each of which can be maintained over time.

This is necessary in order to explicitly show that the AIPs are fit for purpose, that each component of an AIP has been adequately thought through and the plans for the maintenance of each AIP are in place.

Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Documentation that relates the AIP component’s contents to the related preservation needs of the repository, with enough detail for the repository's providers and consumers to be confident that the significant properties of AIPs will be preserved. It should be clear how AIP components such as Representation Information and Provenance can be managed and kept up to date. It should be clear when new versions of AIPs need to be created in order to keep them fit for purpose. The external dependencies of the AIP should also be recorded.

-- JohnGarrett - 07 Apr 2008 The first sentence seems to be evidence. Maybe the others should be moved to Discussion. -- BruceAmbacher - 17 Apr 2008 I agree. Although it could be worded better. "Documentation that relates each AIP to the preservation plans of the repository, with sufficient detail to demonstrate howthe significant properties of each AIP will be preserved."

Discussion
In many cases, if the definitions required by B2.1 exist, this requirement is also satisfied, but it may also be necessary for the definitions to say something about the semantics or intended use of the AIPs if this could affect long-term preservation decisions. For example, say two repositories both only preserve digital still images, both using multi-image TIFF files as their preservation format. Repository 1 consists entirely of real-world photographic images intended for viewing by people and has a single definition covering all of its AIPs. (The definition may refer to a local or external definition of the TIFF format.) Repository 2 contains some images, such as medical x-rays, that are intended for computer analysis rather than viewing by the human eye, and other images that are like those in Repository 1. Repository 2 should perhaps define two classes of AIPs, even though it only uses one storage format for both. A future preservation action may depend on the intended use of the image—an action that changes the bit-depth of the image in a way that is not perceivable to the human eye may be satisfactory for real-world photographs but not for medical images, for example.

-- KatiaThomaz - 20 Mar 2008 - Considering AIP = Content Information + Preservation Description Information (PDI), I wonder if it wouldn't be better to change AIP to Content Information in B2.1.

B2.3 Repository has a description of how AIPs are constructed from SIPs.

Supporting Text

The repository must be able to show how the preserved object(s) (i.e., AIP(s)) is constructed from the object(s) initially submitted (i.e., SIP(s)).

This is necessary in order to ensure that the preserved object(s) (i.e., AIP(s)) adequately represents the information in the submitted object(s) (i.e., SIP(s)).

The repository must be able to show how an AIP is traceable back to the SIPs

This is necessary in order to be able to audit the correctness, Provenance and Authenticity of the information which is being preserved in any specific AIP.

-- BruceAmbacher 17 Apr 2008 Should this also/primarily address Content? Content is implicit but not explicit. Parts of one AIP may be traceable to one or more SIPs.

Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Process description documents; documentation of SIP relationship to AIP; clear documentation of how AIPs are derived from SIPs; documentation of standard/process against which normalization occurs; documentation of normalization outcome and how outcome is different from SIP

Discussion
The repository must be able to show how the preserved object is constructed from the object initially submitted for preservation. In some cases, the AIP and SIP will be almost identical apart from packaging and location, and the repository need only state this. More commonly In other cases, complex transformations (e.g., data normalization) may be applied to objects during the ingest process, and a precise description of these actions (i.e., preservation metadata) may be necessary to reflect how the preserved object(s) has been adequately transformed from the information in the submitted object(s) ensure that the preserved object represents the information in the submitted object.

-- MarkConrad - 03 Apr 2008 The previous sentence does not make sense. How does capturing preservation metadata ensure that the transformations were properly executed?

-- BruceAmbacher - 17 Apr 2008 I agree. Perhaps we just delete "(i.e., preservation metadata)" and make the last word "objects" to reinforce that one AIP may come from parts of one or more SIPs.

-- BruceAmbacher - 06 Apr 2008 I do not agree that complex transformations are more common than SIP to AIP transformations. I suggest deleting "commonly" so it reads: "More complex transformations . . ",

The AIP construction description should include documentation that gives the provenancea detailed description of the ingest process for each SIP to AIP transformation, typically consisting of an overview of general processing being applied to all such transformations, augmented with description of different classes of such processing and, when applicable, with special transformations that were needed.

-- MarkConrad - 03 Apr 2008 I do not believe we want the "provenance" of the ingest process. I believe we want a "detailed description" of the ingest process.

Some repositories may need to produce these complex descriptions case by case, in which case diaries or logs of actions taken to produce each AIP will be needed. In these cases, documentation needs to be mapped between to individual AIPs, and the mapping needs to be available for examination. Other repositories that can run a more production-line approach may have a description for how each class of incoming object is transformed to produce the AIP. It must be clear which definition applies to which AIP. If, to take a simple example, two separate processes each produce a TIFF file, it must be clear which process was applied to produce a particular TIFF file.

-- JohnGarrett - 07 Apr 2008 Even if a pipeline process is defined, the archive will need to record which SIP instances were converted into which AIP instances unless perhaps there are naming conventions of both SIP and AIP instances that allow a generic mapping that can be described.

-- MarkConrad - 23 Apr 2008 John, Doesn't B.2.4. cover your concern?

B2.4 Repository can demonstrate that all submitted objects (i.e., SIPs) are either accepted as whole or part of an eventual archival object (i.e., AIP), or otherwise disposed of in a recorded fashion.

Supporting Text

The repository must be able to show that each SIP has either been used in creating one or more AIPs or else has been discarded (and if so why).

-- MarkConrad - 23 Apr 2008 Should the previous sentence read, "either been used in creating one or more AIPs..."?

This is necessary in order to ensure that the SIPs received hashave been dealt with appropriately, and in particular hashave not been accidentally lost.

Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
System processing files; disposal records; donor or depositor agreements/deeds of gift; provenance tracking system; system log files. Process description documents; documentation of SIP relationship to AIP; clear documentation of how AIPs are derived from SIPs; documentation of standard/process against which normalization occurs; documentation of normalization outcome and how outcome is different from SIP

--Main.BruceAmbacher 06 Apr 2008 All of the examples used in B2.3 also apply as examples here.

-- MarkConrad - 23 Apr 2008 I agree with Bruce.

Discussion
The timescale of this process will vary between repositories from seconds to many months, but SIPs must not remain in a limbo-like state forever. The accessioning procedures and the internal processing and audit logs should maintain records of all internal transformations of SIPs to demonstrate that they either become AIPs (or part of AIPs) or are disposed of. Appropriate descriptive information should also document the provenance of all digital objects.

-- JohnGarrett - 07 Apr 2008 Should there be a discussion of a case where multiple SIPs are used to form a single AIP and only some of the needed SIPs arrive at the archives?

-- MarkConrad - 28 Apr 2008 At our meeting on 28 April 2008 the group decided that John's comment above did not apply to this particular requirement, but might be the basis for an additional requirement. Discussion of such a future requirement was tabled for a later meeting.

B2.5 Repository has and uses a naming convention that generates visible, persistent, unique identifiers for all archived objects (i.e., AIPs ).

Supporting Text

The repository must be able to show how any AIP can be uniquely identified. It must be possible to demonstrate that the identifiers are unique. Documentation must show how the persistent identifiers of the AIP and its components are assigned and maintained so as to be unique within the context of the repository. The documentation must also describe any processes used for changes to such identifiers. It must be possible to obtain a complete list of all such identifiers and do spot checks for duplications.

This is necessary in order to ensure that each AIP can be unambiguously found in the future. This is also necessary to ensure that each AIP can be distinguished from all other AIPs in the repository.

Equally important is a system of reliable linking/resolution services in order to find the uniquely named object, no matter its physical location.

This is so that actions relating to AIPs can be traced over time, over system changes, and over storage changes.

-- MarkConrad - 03 Apr 2008 This is also necessary to ensure that each AIP can be distinguished from all other AIPs in the repository.

-- JohnGarrett - 07 Apr 2008 Should there be something here also to emphasize the persistent part to ensure that long-term access.

-- BruceAmbacher 17 Apr 2008 Should this requirement encourage use of external persistent id developers such as those listed in B2.7 to ensure the persistent ID can continue if objects are transferred from one repositpry to another?

-- MarkConrad - 23 Apr 2008 Bruce, I do not understand what it is you are referencing in B.2.7.

-- MarkConrad - 23 Apr 2008 John, See the two sentences below in the Discussion section that begin, "Equally important..." I believe these two sentences should be moved to the Supporting Text section. I believe they address your concern.

Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Documentation describing naming convention and physical evidence of its application (e.g., logs)

Discussion
A repository needs to ensure that an accepted, standard naming convention is in place that identifies its materials uniquely and persistently for use both in and outside the repository. The “visibility” requirement here means “visible” to repository managers and auditors. It does not imply that these unique identifiers need to be visible to end users or that they serve as the primary means of access to digital objects.

Equally important is a system of reliable linking/resolution services in order to find the uniquely named object, no matter its physical location. This is so that actions relating to AIPs can be traced over time, over system changes, and over storage changes.

-- MarkConrad - 03 Apr 2008 Shouldn't the previous two sentences be moved to the "Supporting Text" section? This appears to be a mandatory sentence pair.

Ideally, the unique ID lives as long as the AIP; if it does not, there must be traceability. The ID system must be seen to fit the repository’s current and foreseeable future requirements for things like numbers of objects.

It must be possible to demonstrate that the identifiers are unique.

-- MarkConrad - 03 Apr 2008 Shouldn't the previous sentence be moved to the "Supporting Text" section? This appears to be a mandatory requirement.

Note that B2.1 requires that the components of an AIP be suitably bound and identified for long-term management, but places no restrictions on how AIPs are identified with files. Thus, in the general case, an AIP may be distributed over many files, or a single file may contain more than one AIP. Therefore identifiers and filenames may not necessarily correspond to each other. Documentation must represent these relationships.

Documentation must show how the persistent identifiers of the AIP and its components are assigned and maintained so as to be unique within the context of the repository. The documentation must also describe any processes used for changes to such identifiers. It must be possible to obtain a complete list of all such identifiers and do spot checks for duplications.

-- MarkConrad - 03 Apr 2008 Shouldn't the sentences in the previous paragraph be moved to the "Supporting Text" section? These appear to be mandatory requirements.

B2.6 (Removed)

B2.7 Repository demonstrates that it has access to necessary tools and resources to establish authoritative semantic or technical contextRepresentation Information of the digital objects it contains.

These tools and resources can be held internally or can be shared via, for example, a trusted set of registries.

-- MarkConrad - 03 Apr 2008 Shouldn't the previous sentence be moved to the "Discussion" section?

Supporting Text

The repository must be able to create and maintain authoritative Representation Information adequate for its Designated Community, and therefore needs to have access to the appropriate tools and resources.

This is necessary in order to ensure that the repository has the ability to ensure that its digital objects are, and will continue to be, understandable to the Designated Community. However this does not demand that each repository has such tools and resources, merely that it has access to them.

-- MarkConrad - 03 Apr 2008 Shouldn't the previous sentence be moved to the "Discussion" section?

Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Subscription or access to such registries; association of unique identifiers to registries of Representation Information (including format registries); Viewable records in local registries (with persistent links to digital objects); database records that include Representation Information and a persistent link to relevant digital objects.

Discussion
These tools and resources can be held internally or can be shared via, for example, a trusted set of registries. However this requirement does not demand that each repository has such tools and resources, merely that it has access to them. The Global Digital Format Registry (GDFR), the UK National Archives’ file format registry PRONOM, and the UK Digital Curation Centre’s Registry Repository of Representation Information Registry(RRORI) are three emerging examples of potential international standardsexternal registries a repository might adopt. Whenever possible, tAny such registry is a specialised type of repository, which itself must be certified/trusted.The repository shouldmay use these types of standardized, authoritative information sources to identify and/or verify the Representation Information components of Content Information and PDI. This will reduce the long-term maintenance costs to the repository and improve quality control.

Sometimes there is both general representation information (e.g. format information) and also specific representation information (e.g., meanings of individual fields within a dataset). Often the general information will be available in an external repository, but the local repository may need to maintain the instance specific information.

It is likely that many repositories would wish to keep local copies of relevant Representation Information, however this may not be practical in all cases. Even where a repository strives to keep all such information locally there may be, for example, a schedule of updates which means that until an update is performed, the local Representation Information is incomplete. This may be regarded as a kind of local caching of, for example, the Representation Information held in registries. Alternatively one may say that in these cases, the use of international registries is not meant to replace local registries but instead serve as a resource to verify or obtain independent, authoritative information about any and all Representation Information.

Good practice suggests that any locally held Representation Information should also be made available to other repositories via a trusted registry. In addition any item of Representation Information should itself have adequate Representation Information to ensure that the Designated Community can understand and use the data object being preserved

B2.8 (Removed)

B2.9 (Proposed Revision) Repository has documented processes for acquiring preservation metadata (i.e., PDI) for its associated Content Information and acquires preservation metadata in accordance with the documented processes. The repository must maintain viewable documentation on how the repository acquires and manages Preservation Description Information (PDI).

Supporting Text

The repository must be able to capture Provenance information, maintain Fixity information in a secure fashion and keep adequate Context and Reference Information.

This is necessary in order to ensure that an auditable trail to support claims of authenticity are available and that unauthorised changes to the digital holdings can be detected and that the digital objects can be identified and placed in their appropriate context.

Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Viewable records in local format registry (with persistent links to digital objects); local metadata registry(ies); database records that include Representation Information and a persistent link to relevant digital objects. Proposed Revision: Viewable documentation on how the repository acquires and manages Preservation Description Information (PDI) for reference, provenance, fixity, and context and viewable PDI records that have been acquired and managed for the Content Information in accordance with the documentation.Viewable documentation on how the repository acquires and manages Preservation Description Information (PDI).

-- MarkConrad - 23 Apr 2008 The previous sentence does not make sense. There appear to be extra words or words missing from the sentence. I am not sure which is the case.

Discussion
Preservation metadata (PDI) is needed not only by the repository to help ensure the Content Information is not corrupted (Fixity) and is findable (Reference Information), but to help ensure the Content Information is adequately understandable by providing a historical perspective (Provenance Information) and by providing relationships to other information (Context Information). The extent of such information needs is best addressed by members of the designated community(ies). The PDI must be permanently associated with Content Information.

B2.10 Repository has a documented process for testing understandability of the information content AIP and bringing the information content AIP up to the agreed level of understandability.Repository has a documented process for testing understandability of the AIP at ingest for their Designated Communities; the repository must bring the AIP up to the agreed level of understandability.

-- MarkConrad - 05 May 2008 At our meeting on this date we agreed to a few changes to this requirement. We also agreed to table discussion on this requirement until the next meeting. At that time we will discuss the possibility of moving this requirement to B.3. since it is concerned with activities beyond ingest. I also agreed to draft an example of how a repository might meet this requirement for the discussion section.

-- MarkConrad - 12 May 2008 At our meeting on this date we agreed to modify the text of the requirement so that it applies at the time of ingest - not at subsequent points in time. We did not complete that discussion by the end of the meeting. We also discussed the possibility of needing a similar requirement elsewhere in the document. We did not complete that discussion either.

Supporting Text

The repository must have a way of testing that its digital holdings continue to beare understandable at the time of ingesttheto their Designated Communities, and, if this is found not to be the case then it must be able to be corrected, for example by adding Representation Information.

-- MarkConrad - 30 Apr 2008 I would suggest removing the words, "able to be" from the previous sentence. I would also suggest moving the phrase, "for example by adding Representation Information" to the Discussion section since this text is not mandatory.

This is necessary in order to ensure that one of the primary tests of preservation, namely that the digital holdings are understandable by their Designated Communities, can be ensured over the long term by means of a process of testing and remediation.This is necessary in order to ensure that one of the primary tests of preservation, namely that the digital holdings are understandable by their Designated Communities, can be met.

See B.3.x. for additional requirements for understandability beyond ingest.

Examples of Ways the Repository can Demonstrate it is Meeting this Requirement

Test suites and definitions of Designated Communities for its digital holdingsTest procedures to be run against the digital holdings to ensure their continued understandibility to the defined Designated Communities and their knowledge bases; records of such tests being performed and evaluated; evidence of gathering or identifying Representation Information to fill any intelligibility gaps which have been found; Retention of individuals with the discipline expertise.

periodic assembly of designated or outside community members to evaluate and identify additional required metadata.

-- MarkConrad - 19 May 2008 The deleted phrase above will be moved to the new B.3.x.

-- MarkConrad - 30 Apr 2008 I do not understand what is being proposed by the phrase, "Test suites and definitions of Designated Communities for its digital holdings;"

Discussion

If Content Information or Preservation Description Information (PDI)the AIP is not directly usable by the current application tools of the designated community(ies), the repository needs to have a defined process for giving it usable form or for making additional Representation Information available (see B3.2).

This requirement is concerned with the understandability of the AIP. If the ingested material is not understandable, the repository needs to ingest or make available additional information to make sure that the AIPs are understandable to the designated comminut(ies). or make additional representation information available to make them understandable.

-- MarkConrad - 30 Apr 2008 It seems to me that we need to make a distinction between this requirement and B.3.2. My reading of the two requirements is that this requirement is concerned with the understandability of the content information and that B.3.2. is concerned with the understandability of the data formats (i.e., Representation Information). I believe the sentence above muddies the water rather than making a distinction between the two requirements.) I would also say that the phrase, "the current application tools of, " should be removed from this sentence.

Repositories that share the burden of ensuring that adequate metadata or documentation is captured or generated to meet a required degree of understandability can implement any number of procedures to address this requirement. Such repositories typically have a narrowly defined designated community, such as a particular science discipline.

-- MarkConrad - 30 Apr 2008 I do not understand what the paragraph above adds to the discussion. I would recommend removing it.

For example, if documents are written in a dying language and the Designated Communities are no longer able to understand the language the documents are written in, the repository would need to provide additional documentation that would allow the Designated Communities to understand the documents (e.g., translations of the documents in a language the Designated Communities could understand or dictionaries that would allow the Designated Communities to translate the documents into a language its members understand.)

-- MarkConrad - 05 May 2008 "For example, if documents are written in a dying language and the Designated Communities are no longer able to understand the language the documents are written in, the repository would need to provide additional documentation that would allow the Designated Communities to understand the documents (e.g., translations of the documents in a language the Designated Communities could understand or dictionaries that would allow the Designated Communities to translate the documents into a language its members understand.)"

B2.11 Repository verifies each AIP for completeness and correctness at the point it is generated.

Supporting Text

The repository must be sure that the AIPs it generates are as they are expected to be and that none are missing, given the known receipt of SIPs.The repository must be sure that the AIPs it generates are as they are expected to be by checking them against the associated written definition for each AIP or class of information (see B2.1 and B2.2.) and the description of how AIPs are constructed from SIPs (see B.2.3.).

-- MarkConrad - 30 Apr 2008 I do not believe the sentence above reflects what the requirement is calling for. The requirement talks about verifying EACH (i.e., individual) AIP for completeness and correctness. The second half of the first sentence in the Supporting Text refers to ensuring that no AIPs are missing. This does not jive with my understanding of this requirement. I would expect something more along the lines of, "The repository must be sure that the AIPs it generates are as they are expected to be by checking them against the associated written definition for each AIP or class of information (see B2.1 and B2.2.) and the description of how AIPs are constructed from SIPs (see B.2.3.)."

This is necessary in order to ensure that what is maintained over the long term is as it should be and can be traced to the information provided by the Producers.

Examples of Ways the Repository can Demonstrate it is Meeting this Requirement

Description of the procedure that verifies completeness and correctness of the AIPs; logs of the procedure.

Discussion

The AIP can be verified by comparision to its definition (see B2.2).

-- MarkConrad - 30 Apr 2008 I would argue that the previous sentence is a mandatory requirement and should be moved to the Supporting Text section.

An AIP may be constructed from the parts or the whole of one or more SIPs.

-- MarkConrad - 30 Apr 2008 I do not understand why the previous sentence is necessary.

If the repository has a standard process to verify SIPs for either or both completeness and correctness and a demonstrably correct process for transforming SIPs into AIPs (see B2.3), then it simply needs to demonstrate that the initial checks were carried out successfully and that the transformation process was carried out without indicating errors.

-- MarkConrad - 30 Apr 2008 I do not believe that the sentence above describes a procedure that would meet this requirement in all cases. If the SIP is not checked for BOTH completeness and correctness, the requirement would not be met

-- RobertDowns - 04 May 2008 In the sentence, above, removing the words "either or" is recommended to ensure that the SIP is checked for both completeness and correctness.

On the other hand Repositories that must create unique processes for many of their AIPs will also need to generate unique methods for validating the completeness and correctness of AIPs. This may include performing tests of some sort on the content of the AIP that can be compared with tests on the SIP. Such tests might be simple (counting the number of records in a file, or performing some simple statistical measure such as calculating the brightness histogram of an original and preserved image), but they might be complex or contain some subjective elements.

-- MarkConrad - 30 Apr 2008 If the validation of an AIP contains "some subjective elements" how does this ensure a trustworthy repository?

Documentation should describe how completeness and correctness of SIPs and AIPs are ensured, starting with ensuring receipt from the producer and continuing through AIP creation and supporting long-term preservation. Example approaches include the use of checksums, testing that checksums are still correct at various points during ingest and preservation, logs that such checks have been made, and any special tests that may be required for a particular SIP/AIP instance or class.

-- MarkConrad - 30 Apr 2008 This requirement is concerned with the completeness and correctness of the AIPs - not the SIPs. SIPs are covered by another requirement.

B2.12 Repository provides an independent mechanism for audit ofinventorying the integrity of the repository collection/content.

Supporting Text

The repository must provide a way of checking that it has everything it should have, other than simply detecting losses.The repository must provide a way to independently demonstrate the completeness and correctness of its collections and their contents.

-- MarkConrad - 02 May 2008 I would suggest replacing the sentence above with one like, "The repository must provide a way to conduct an independent audit of the completeness and correctness of its collections and their contents."

-- RobertDowns - 04 May 2008 I agree with the replacement sentence, immediately above, which was suggested by Mark on 2 May 2008 and with the replacement sentence, below, which also was suggested by Mark on 2 May 2008, for replacing the sentence immediately below.

-- MarieWaltz - 04 May 2008 I agree too, what Mark says.

-- BruceAmbacher - 11 May 2008 I suggest using another word for "audit" in the requirement and the supporting Text as it has a different meaning/connotation than the formal audit process being proposed for the entire document.

This is necessary in order to have confidence that some occurence outside the repository has limited the information it has ingested.This is necessary to show that the repository ingested everything that was in the relevant SIP(s).

-- MarkConrad - 02 May 2008 I do not understand the previous sentence. I would suggest replacing it with a sentence like, "This is necessary to assure the designated communities that the repository is adequately preserving its collections and their content.

-- BruceAmbacher - 11 May 2008 Mark, While poorly worded the intent was to show that any missing information resulted from somehint/some action external to the repository. Your suggested sentence does not convey that purpose. Would you accept: "This is necessary to show that any incomplete data objects are not the results of actions taken by the repository."

Examples of Ways the Repository can Demonstrate it is Meeting this Requirement

Documentation provided for B2.1 through B2.6B2.5; documented agreements negotiated between the producer and the repository (see B 1.1-B1.9); logs of material received and associated action (receipt, action, etc.) dates; logs of periodic checks. ...

-- MarkConrad - 02 May 2008 We should remove the reference to B.2.6. since we removed this requirement.

Discussion

In general, it is likely that a repository that meets all the previous criteria will satisfy this one without needing to demonstrate anything more. As a separate requirement, it demonstrates the importance of being able to audit the integrity of the collection as a whole.

For example, if a repository claims to have all e-mail sent or received by The Yoyodyne Corporation between 1985 and 2005, it has been required to show that:

  • The content it holds came from Yoyodyne�s e-mail servers.
  • It is all correctly transformed into a preservation format.
  • Each monthly SIP of e-mail has been correctly preserved, including original unique identifiers such as Message-IDs.

However it may still have no way of showing whether this really represents all of Yoyodyne's email. For example, if there is a three-day period with no messages in the repository, is this because Yoyodyne was shut down for those three days, or was the e-mail lost before the SIP was constructed? This case could be resolved by the repository amending its description of the collection, but other cases may not be so straightforward.

A familiar mechanism from the world of traditional materials in libraries and archives is an accessions or acquisitions register that is independent of other catalog metadata. A repository should be able to show, for each item in its accessions register, which AIP(s) contain content from that item. Alternatively, it may need to show that there is no AIP for an item, either because ingest is still in progress, or because the item was rejected for some reason. Conversely, any AIP should be able to be related to an entry in the acquisitions register.

B2.13 Repository has contemporaneous records of actions and administration processes that are relevant to preservation (AIP creation).

Supporting Text

The repository must create records of its preservation related activities essentialy as they happen.

This is necessary in order to be sure that nothing relevant is omitted from the record.

Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Written documentation of decisions and/or action taken; preservation metadata logged, stored, and linked to pertinent digital objects.

Discussion

These records must be created on or about the time of the actions they refer to and are related to actions associated with AIP creation. The records may be automated or may be written by individuals, depending on the nature of the actions described. Where community or international standards are used,such as PREMIS (2005), the repository must demonstrate that all relevant actions are carried through.

-- MarkConrad - 02 May 2008 I would remove the reference to PREMIS. PREMIS is not a standard. PREMIS often does not list what the "relevant action" should be.

Topic revision: r1 - 2009-02-10 - DavidGiaretta
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback