Section B: Digital Object Management

B1 Ingest: acquisition of content (RF)

B1.1 Repository identifies the Content Information and the properties of that information that the repository will preserve.


Supporting Text
The repository must define explicitly what properties of Content Information that or information content which must be preserved over the long term. This is necessary in order to make it clear to funders, depositors and users what responsibilities the repository is taking on and what aspects are excluded. It is also a necessary step in defining the information which is needed from the information producers or depositors.
Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Mission statement; submission agreements/deposit agreements/deeds of gift; workflow and Preservation Policy documents, including written definition of properties as agreed in the deposit agreement/deed of gift; written processing procedures; documentation of properties to be preserved.
Discussion
This process begins in general with the repository's mission statement and may be further specified in pre-accessioning agreements with producers or depositors (e.g., producer-archive agreements) and made very specific in deposit or transfer agreements for specific digital objects and their related documentation. For example, one repository may only commit to preserving the textual content of a document and not its exact appearance on a screen. Another may wish to preserve the exact appearance and layout of textual documents, while others may choose to keep the units of the measurement of data fields and to normalize the data during the ingest process. If unique identifiers are associated with digital objects before ingest, they may also be significant properties that need to be preserved.

B1.1.1 The repository has a procedure(s) for identifying those properties that it will preserve.


Supporting Text
The repository must explicity define the procedure(s) it uses to determine the properties of Content Information that it will preserve. This is necessary to establish a clear understanding with depositors, funders, and the repository's designated communities how the repository determines what the characteristics and properties of preserved items will be over the long term. These procedures will be necessary to confirm provenance or to identify erroneous provenance of the preserved digital record.
Examples of Ways the Repository Can Demonstrate It Is Meeting This Requirement
Submission agreements/deposit agreements, Preservation Policies, written processing procedures, workflow documentation.
Discussion
These procedure(s) document the methods and factors a repository uses to determine the aspects of different types of Content Information for which it accepts preservation responsibility to its designated communities. For example, a repository's
BA 16April2009 - Insert "practice" here
may be to use file formats in order to determine the properties it will preserve unless otherwise specified in a deposit agreement. In this case, the repository would be able to demonstrate provenance for objects that may have been in the same file format when received but are preserved differently over the long term.

B1.1.2 The repository has a record of the Content Information and the properties of that information that it will preserve.


Supporting Text
The repository identifies in writing the Content Information of the records for which it has taken preservation responsibility and the properties it has committed to preserve for those records based on their Content Information.
Examples of Ways the Repository Can Demonstrate It Is Meeting This Requirement
Preservation Policies, processing manuals, collection inventories or surveys, logs of Content Information types acquired, preservation strategies and action plans.
Discussion
The repository must demonstrate that it establishes and maintains an understanding of its digital collections sufficient to carry out the preservation necessary to persist the properties to which it has committed. The repository can use this information to determine the effectiveness of its preservation activities over time.

B1.2 Repository clearly specifies the information that needs to be associated with specific Content Information at the time of its deposit.


Supporting Text
The repository must explicitly specify what information is needed from the content provider.
Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Transfer requirements; producer-archive agreements. Workflow plans to produce the AIP.
Discussion
For most types of digital objects to be ingested, the repository should have written criteria, prepared by the repository on its own or in conjunction with other parties, that specify exactly what digital object(s) are transferred, what documentation is associated with the object(s), and any restrictions on access, whether technical, regulatory, or donor-imposed. This criteria documents what information the repository and its designated communities may expect for digital object(s) upon deposit. Note that the depositor may be a harvesting process created by the repository. The level of precision in these specifications will vary with the nature of the repository's collection policy and its relationship with creators. For instance, repositories engaged in Web harvesting, or those that rescue digital materials long after their creators have abandoned them, cannot impose conditions on the creators of material, since they are not "depositors" in the usual sense of the word. But Web harvesters can, for instance, decide which metadata elements from the HTTP transactions that captured a site are to be preserved along with the site's files, and this still constitutes "information associated with the digital material." They may also choose to record the information or decisions -whether taken by humans or by automated algorithms- that led to the site being captured. The repository can check what it receives from the producer based on the specifications.

B1.3 Repository has specifications enabling recognition and parsing of the SIPs.


Supporting Text
The repository has explicit written specifications for the file types and/or object types that are transferred so that it can identify the construction of ingested SIPs and verify the correctness and completeness of the SIP components and verify that the SIP overall contains the Content Information declared by the producer/depositor.
Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Documented file format specifications; published data standards; documentation of valid object construction.
Discussion
The repository must be able to determine what the contents of a SIP are with regard to the technical construction of its components. For example, the repository needs to be able to recognize a TIFF file and confirm that it is not simply a file with a filename ending in "TIFF." Another example, would be a website for which the repository would need to be able to recognize and test the validity of the variety of file types (e.g., HTML, images, audio, video, CSS, etc.) that are part of the website. This is necessary in order to confirm: 1) the SIP is what the repository expected; 2) the Content Information is correctly identified; and 3) the properties of the Content Information to be preserved have been appropriately selected.

B1.4 Repository has mechanisms to appropriately verify the depositor of all materials.


Supporting Text
The repository must ensure that the sources of the objects it intends to preserve are who/what they claim to be. This is necessary in order to avoid providing erroneous provenance to the information which is preserved.
Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Legally binding submission agreements/deposit agreements/deeds of gift, evidence of appropriate technological measures; logs from procedures and authentications, legally binding submission agreements/deposit agreements/deeds of gift
Discussion
The repository's written standard operating procedures and actual practices must ensure the digital objects are obtained from the expected sourcedepositor. Examples of a depositor include persons, organizations, corporate entities, or harvesting processes.Written procedures and practices are necessary to demonstrate that the provenance that has been maintained prior to submission. Confirmation can use various means including, but not limited to, digital processing and data verification and validation, and through exchange of appropriate instruments of ownership (e.g., legally binding submission agreements/deposit agreement/deed of gift). Different repositories will adopt different levels of proof needed; the Designated Community should have the opportunity to review the evidence.
-- KatiaThomaz - 20 Mar 2008 - What does "source of all materials" really mean? A physical or corporate person responsible for issuing the materials?

B1.5 Repository's ingest process verifies each SIP for completeness and correctness.


Supporting Text
The repository must verify, as far as it can, that each SIP is complete and correct. This is necessary in order to detect and correct potential transmission errors between the depositor and the repository.
Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Evidence that the repository checks the information that needs to be associated with digital material at the time of its deposit against the SIP. Appropriate Preservation Policy and Preservation Implementation Plan documents and system log files from system(s) performing ingest procedure(s); formal or informal "acquisitions register"logs or registers of files received during the transfer and ingest process; workflow, documentation of standard operating procedures, detailed procedures, and/or workflows<; format registries; definitions of completeness and correctness, probably incorporated in Preservation Policy documents.
Discussion
Information collected during the ingest process must be compared with information from some other source --the producer or the repository's own expectations-- to verify the correctness of the data transfer and ingest process. Other sources will include technical and descriptive metadata obtained prior to ingest and may also include expectations set by the depositor, the object producer, a format registry, or the repository's own expectations. The extent to which a repository can determine correctness will depend on what it knows about the SIP and what tools are available for verifying correctness. It can mean simply checking that file formats are what they claim to be (TIFF files are valid TIFF format, for instance), or can imply checking the content. This might involve human checking in some cases, such as confirming that the description of a picture matches the image. This allows the repository to demonstrate that its preserved objects have completeness and correctness, having originated from complete and correct SIPs. It also allows the repository to document reasons for other SIP related actions such as Repositories should have established procedures for handling incomplete SIPs. These can range from rejecting the transfer, to suspending processing until the missing information is received, orto simply reporting the errors. Similarly, the definition of "completeness" should be appropriate to a repository's activities. If an inventory of files was provided by a producer as part of pre-ingest negotiations, one would expect checks to be carried out against that inventory. But for some activities such as Web harvesting, "complete" may simply mean "whatever we could capture in the harvest session." Whatever checks are carried out must be consistent with the repository's own documented definition and understanding of completeness and correctness. One thing that a repository might want to do is check for network drop out or other corruption during the transmission process. %NOTE% B1.2 does not specify everything about completeness - only what needs to accompany the deposited info. %ENDNOTE%
-- JohnGarrett - 31 Mar 2008 Should this note be here or in B1.2? ---Bruce Ambacher - Does this function check the SIP against the metadata? The Evidence
Discussion
includes other aspects beyond those stated.

B1.6 Repository obtains sufficient control over the Digital Objects to preserve them.


Supporting Text
The repository must have adequate control over the bits which make up the digital objects. This is necessary in order to ensure that the most basic type of preservation, namely bit preservation, is assured.

**Note: We might want to come back to this. (First discussed 3/29/08 meeting)
Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Documents showing the level of physical control the repository actually has. A separate database/metadata catalog listing all of the digital objects in the repository and metadata sufficient to validate the integrity of those objects (file size, checksum, hash, location, number of copies, etc.) Documentation of the level(s) of access staff, contractors, and/or similar persons/organizations have to storage media and systems containing the Digital Objects.
Discussion
The repository must obtain complete control of the bits of the digital objects conveyed with each SIP. For example, some SIPs may only reference digital objects and in such cases the repository must get the referenced digital objects if they constitute part of the object that the repository has committed to conservepreserve. It is not always the case that referenced digital objects are preserved. For example a decision needs to be made if just an email message is to be the preserved object or if it is the email message with the attachments. In the latter case, the repository might, for example, need to go to a separate directory and pick up the attachment also.

Bruce Ambacher - Katia's question is very pertinent. Under Examples we should consider adding the crossed out examples to "documents . . .such as ..." In the
Discussion
the insert should read "It is not always the case"
Ricc Ferrante - I wonder if Katia's question got lost in reformatting...

B1.7 Repository provides producer/depositor with appropriate responses at agreed points during the ingest processes.


Supporting Text
The repository must provide responses to the producer/depositor at agreed points. This is necessary in order to ensure that the producer can verify that there is no inadvertent lapses in communications which might otherwise allow loss of SIPs.
Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Submission agreements/deposit agreements/deeds of gift; workflow documentation; standard operating procedures; evidence of "reporting back" such as reports, correspondence, memos, or emails.
*Discussion*
Based on the initial processing plan and agreement between the repository and the producer/depositor, the repository must provide the producer/depositor, if it is appropriate to have such a plan, with progress reports at agreed points throughout the ingest process. Responses can include initial ingest receipts, or receipts that confirm that the AIP has been created and stored. Repository responses can range from nothing at all to predetermined, periodic reports of the ingest completeness and correctness, error reports and any final transfer of custody document. Producers/Depositors can request further information on an ad hoc basis when the previously agreed upon reports are insufficient.

B1.8 (was B1.9) Repository has contemporaneous records of actions and administration processes that are relevant to content acquisition.


Supporting Text
The repository must document relevant events as they happen. This is necessary in order to avoid such documentation, which might be evidence in an audit, from being omitted or erroneous or of questionable authenticity. This is necessary to ensure that documentation which may be needed in an audit is captured and is accurate.
Examples of Ways the Repository can Demonstrate it is Meeting this Requirement
Written documentation of decisions and/or action taken; preservation metadata logged, stored, and linked to pertinent digital objects, confirmation receipts sent back to providers.
Discussion
These records must be created on or about the time of the actions they refer to and are related to actions taken during the Ingest:
BA 16April2009 - I suggest inserting "e.g."
DG 20090418 - OK

content acquisition process. The records may be automated or may be written by individuals, depending on the nature of the actions described. Where community or international standards are used, the repository must demonstrate that all relevant actions are carried through.

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r7 - 2009-04-20 - DavidGiaretta
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback