Technologies Technical Infrastructure and Security

DavidGiaretta - I assume that somewhere here we should state that these metrics are designed to be compatible with ISO 27001. ChrisRusbridge - 18 Jul 2007 - Are they? Not absolutely clear to me!

C1. System infrastructure

Without a secure and trusted [ChrisRusbridge - 18 Jul 2007 - trustworthy?] infrastructure, the functions carried out on AIPs cannot be trusted—they are built on a house of cards. Actions specified here are general enough to apply to systems other than repositories and archives.

ChrisRusbridge - 18 Jul 2007 - We might want to avoid the "trusted" word; don't want to get into Orange Book territory!

-- KatiaThomaz - 10 May 2007 - New item for “the IT infrastructure implements a procedure for documentation”; (see NESTOR CCTDR 5.2)

-- KatiaThomaz - 10 May 2007 - New item for “the IT infrastructure implements the object management demands”; (see NESTOR CCTDR 13.1)

-- KatiaThomaz - 10 May 2007 - New item for “the IT infrastructure monitors performance and service level”; (see DCC/DPE DRAMBORA R78).

MarkConrad - 24 May 2007 The repository system should be scalable. That is, it must be able to handle anticpated future volume (bytes and number of files) without a major disruption of the system.

MarkConrad - 24 May 2007 The repository system should be evolvable. That is, the system must be designed in such a way that major components of the system can be replaced with newer technologies without major disruption of the system as a whole.

MarkConrad - 24 May 2007 The repository system should be extensible. That is, the system should be designed to accomdate future formats (media and files) without major disruption of the system as a whole.

ChrisRusbridge - 18 Jul 2007 - I worry about the next two sections, C1.1 and C1.2 (and in fact much of this whole chapter). Aren't they really saying "This should be a really well-managed IT operation"? This could perhaps be better accomplished partly by referring to something like ITIL V3, or ISO 20000... I worry because these sections select out two specific parts of a well-managed IT operation, but ignore the rest. We could try to add lots more parts, but then we would end up attempting to duplicate (and worse, to keep up to date), these other standards. Just like OAIS Chapter 5 (I think it is!), we may be better off saying very little than making a hash of saying a few selective things. What I DO think we should say is anything that is significantly ABOVE the normal requirements for a well-managed IT operation.

C1.1 Repository functions on well supported operating systems and other core infrastructural software.

The requirement specifies “well-supported” as opposed to manufacturer-supported or other similar phrases. The level of support for these elements of the infrastructure must be appropriate to their uses; the repository must show that it understands where the risks lie. The degree of support required relates to the criticality of the subsystem involved. A repository may deliberately have an old system using out-of-date software to support some aspects of its ingest function. If this system fails, it may take some time to replace it, if it can be replaced at all. As long as its failure does not affect mission-critical functions, this is acceptable. Systems used for internal development may not be protected or supported to the same level as those for end-user service.

MarkConrad - 22 May 2007 - I do not understand the last 4 sentences of the paragraph above. Isn't the ingest function usually a mission-critical function?

ChrisRusbridge - 18 Jul 2007 - Mark, I assume that under the circumstances envisaged, this only applies to part of the ingest flow, ie specific older material, and so if the obsolete OS fails for a while that is not too critical!

Evidence: Software inventory; system documentation; support contracts; use of strongly community supported software (i.e., Apache).

MarkConrad - 22 May 2007 Should the above read (e.g., Apache) or is the document prescribing the use of Apache software? ChrisRusbridge - 18 Jul 2007 - Agreed re "eg".

MarkConrad - 22 May 2007 I believe that the Evidence: should include periodic market analyses.

C1.2 Repository ensures that it has adequate hardware and software support for backup functionality sufficient for the repository’s services and for the data held, e.g., metadata associated with access controls, repository main content.

The repository needs to be able to demonstrate the adequacy of the processes, hardware and software for its backup systems. Some will need much more elaborate backup plans than others.

Evidence: Documentation of what is being backed up and how often; audit log/inventory of backups;validation of completed backups; disaster recovery plan—policy and documentation; “firedrills”—testing of backups; support contracts for hardware and software for backup mechanisms.

MarkConrad - 23 May 2007 I believe that we need a clear statement that backup alone is not sufficient for long term preservation.

MarkConrad - 23 May 2007 Do we also need a pointer to C3.4?

C1.3 Repository manages the number and location of copies of all digital objects.

The repository system must be able to identify the number of copies of all stored digital objects, and the location of each object and their copies. This applies to what are intended to be identical copies, not versions of objects or copies.

ChrisRusbridge - 18 Jul 2007 - Presumably this is OK on the aggregate, does not have to be per item! But if so, with the exception of Mark's next point, isn't it trivial for a well-managed IT operation? If you don't know where your backups and mirrors are, you are in trouble! Then again, does this require that you have to know at all times where every single digital copy of every object is? Because that is not possible! MarkConrad - 10 Sep 2007 This is not impossible. In fact it is necessary if we are going to be able to assert that we are providing an authentic copy of a particular digital object.

MarkConrad - 23 May 2007 Somewhere in this document we need an explicit statement that the repository must be able to distinquish between versions of objects or copies and identical copies so that a repository can assert that it is providing an authentic copy of the correct version of an object. ChrisRusbridge - 18 Jul 2007 - Agreed; version management is critical, although it should presumably be part of provenance management...

The location must be described such that the object can be located precisely, without ambiguity. It can be an absolute physical location or a logical location within a storage media or a storage subsystem. One way to test this would be to look at a particular object and ask how many copies there are, what they are stored on, and where they are.

MarkConrad - 23 May 2007 This test would be incomplete without retrieving all of the copies of the object and comparing them to ensure that they are identical copies. ChrisRusbridge - 18 Jul 2007 - Wouldn't a sample be enough?

A repository can have different policies for different classes of objects, depending on factors such as the producer, the information type, or its value. Some repositories may have only one copy (excluding backups) of everything, stored in one place, though this is definitely not recommended. There may be additional identification requirements if the data integrity mechanisms use alternative copies to replace failed copies.

ChrisRusbridge - 18 Jul 2007 - Surely the point is that a repository should have a policy!

Evidence: random retrieval tests; system test; location register/log of digital objects compared to the expected number and location of copies of particular objects.

MarkConrad - 23 May 2007 What is meant by system test?

MarkConrad - 23 May 2007 In the first paragraph of this item a distinction is made between a version or copy of an object and an identical copy of the object. Do we need to make that distinction here?

C1.4 Repository has mechanisms in place to ensure any/multiple copies of digital objects are synchronized.

If multiple copies exist, there has to be some way to ensure that intentional changes to an object are propagated to all copies of the object. There must be an element of timeliness to this. It must be possible to know when the synchronization has completed, and ideally to have some estimate beforehand as to how long it will take. Depending whether it is automated or requires manual action (such as the retrieval of copies from off-site storage), the time involved may be seconds or weeks. The duration itself is immaterial—what is important is that there is understanding of how long it will take.

MarkConrad - 23 May 2007 I would suggest changing the word intentional in the first sentence to authorized. You do not want the repository to replicate an unauthorized, but intentional change to a digital object. This could potentially result in the destruction of all authentic copies of the digital object in the repository.

ChrisRusbridge - 18 Jul 2007 - I think the paragraph above is either confusing or wrong. We need to be able to make changes to digital objects, but we will also want to preserve un-changed versions. The paragraph should really be replaced with something relating to provenance tracking of changes.

There must also be something that addresses what happens while the synchronization is in progress. This has an impact on disaster recovery: what happens if a disaster and an update coincide? If one copy of an object is altered and a disaster occurs while other copies are being updated, it is essential to be able to ensure later that the update is successfully propagated.

Evidence: Workflows; system analysis of how long it takes for copies to synchronize; procedures/documentation of operating procedures related to updates and copy synchronization; procedures/documentation related to whether changes lead to the creation of new copies and how those copies are propagated and/or linked to previous versions.

C1.5 Repository has effective mechanisms to detect bit corruption or loss.

The repository must detect data loss accurately to ensure that any losses fall within the tolerances established by policy (see A3.6). Data losses must be detected and detectable regardless of the source of the loss. This applies to all forms and scope of data corruption, including missing objects and corrupt or incorrect or imposter objects, corruption within an object, and copying errors during data migration or synchronization of copies. Ideally, the repository will demonstrate that it has all the AIPs it is supposed to have and no others, and that they and their metadata are uncorrupted.

MarkConrad - 24 May 2007 I do not understand what is being referenced at A3.6.

The approach must be documented and justified and include mechanisms for mitigating such common hazards as hardware failure, human error, and malicious action. Repositories that use well-recognized mechanisms such as MD5 signatures need only recognize their effectiveness and role within the overall approach. But to the extent the repository relies on homegrown schemes, it must provide convincing justification that data loss and corruption are detected within the tolerances established by policy.

ChrisRusbridge - 18 Jul 2007 - It's not enough to use MD5 signatures etc; you would have to have a trustable system for managing the MD5 signatures separately from the data files, so that deliberate attack on one cannot be accompanied by deliberate attack on the other. Isn't this really authenticity management?

Data losses must be detected promptly enough that routine systemic sources of failure, such as hardware failures, are unlikely to accumulate and cause data loss beyond the tolerances established by the repository’s policy or specified in any relevant deposit agreement. For example, consider a repository that maintains a collection on identical primary and backup copies with no other data redundancy mechanism. If the media of the two copies have a measured failure rate of 1% per year and failures are independent, then there is a 0.01% chance that both copies will fail in the same year. If a repository’s policy limits loss to no more than 0.001% of the collection per year, with a goal of course of losing 0%, then the repository would need to confirm media integrity at least every 72 days to achieve an average time to recover of 36 days, or about one tenth of a year. This simplified example illustrates the kind of issues a repository should consider, but the objective is a comprehensive treatment of the sources of data loss and their realworld complexity. Any data that is (temporarily) lost should be recoverable from backups.

MarkConrad - 24 May 2007 I do not understand what is meant by, maintains a collection on identical primary and backup copies.

ChrisRusbridge - 18 Jul 2007 - Note that independence of failures is a weak assumption; a stronger assumption is to assume that many failures will be correlated, see Rosenthal et al's papers on this... and also that for extremely large datasets (eg petabytes) the prompt assertion of authenticity is problematic!

Evidence: Documents that specify bit error detection and correction mechanisms used; risk analysis; error reports; threat analyses.

C1.6 Repository reports to its administration all incidents of data corruption or loss, and steps taken to repair/replace corrupt or lost data.

Having effective mechanisms to detect bit corruption and loss within a repository system is critical, but is only one important part of a larger process. As a whole, the repository must record, report, and repair as possible all violations of data integrity. This means the system should be able to notify system administrators of any logged problems. These incidents, recovery actions, and their results must be reported to administrators and should be available.

MarkConrad - 24 May 2007 For the last sentence in the paragraph above, what should be available and to whom should it be available?

ChrisRusbridge - 18 Jul 2007 - Good management again, eh ISO 20000-2:2005 sections 8.3.5 and 8.3.6...

For example, the repository should document procedures to take when loss or corruption is detected, including standards for measuring the success of recoveries. Any actions taken to repair objects as part of these procedures must be recorded. The nature of this recording must be documented by the repository, and the information must be retrievable when required. This documentation plays a critical role in the measurement of the authenticity and integrity of the data held by the repository.

MarkConrad - 24 May 2007 For the next to last sentence in the paragraph above, what information must be retrievable? required for what purpose(s)?

Evidence: Preservation metadata (e.g., PDI) records; comparison of error logs to reports to administration; escalation procedures related to data loss.

C1.7 Repository has defined processes for storage media and/or hardware change (e.g., refreshing, migration).

The repository should have triggers for initiating action and understanding of how long it will take for storage media migration, or “refreshing”—copying between media without reformatting the bitstream. Will it finish before the media is dead, for instance? Copying large quantities of data can take a long time and can affect other system performance. It is important that the process includes a check that the copying has happened correctly. (See B4.2.)

Repositories should also consider the obsolescence of any/all hardware components within the repository system as potential trigger events for migration. Increasingly, long-term, appropriate support for system hardware components is difficult to obtain, exposing repositories to risks and liabilities should they chose to continue to operate the hardware beyond the manufacturer or third-party support.

Evidence: Documentation of processes; policies related to hardware support, maintenance, and replacement; documentation of hardware manufacturers’ expected support life cycles.

C1.8 Repository has a documented change management process that identifies changes to critical processes that potentially affect the repository’s ability to comply with its mandatory responsibilities..

Examples of this would include changes in processes in data management, access, archival storage, ingest, and security. The really important thing is to be able to know what changes were made and when they were made. Traceability makes it possible to understand what was affected by particular changes to the systems.

Evidence: Documentation of change management process; comparison of logs of actual system changes to processes versus associated analyses of their impact and criticality.

MarkConrad - 22 May 2007 I believe that this requirement is actually referring to configuration management rather than change management. There are several relevant sets of best practices/standards. For example:

ISO 10007:2003 gives guidance on the use of configuration management within an organization. It is applicable to the support of products from concept to disposal. It first outlines the responsibilities and authorities before describing the configuration management process that includes configuration management planning, configuration identification, change control, configuration status accounting and configuration audit. Since ISO 10007:2003 is a guidance document, it is not intended to be used for certification/registration purposes.

See also:


ANSI/EIA 649A, Configuration Management

IEEE Std 828-1990, IEEE Standard for Software Configuration Management Plans. This plan establishes the minimum required contents of a Software Configuration Management Plan and defines the specific activities to be addressed and their requirements for any portion of a software product's life cycle.

IEEE1042-1987, Guide to Software Configuration Management

ISO/IEC 12207-1995 "Information technology -- Software life cycle processes

ChrisRusbridge - 18 Jul 2007 - See also ISO 20000-2:2005 section 9.2.

MarkConrad - 22 May 2007 I believe we need a separate item that states, Repository has the capability to identify critical processes that affect its ability to comply with its mandatory responsibilities.

MarkConrad - 22 May 2007 I believe we need a separate item that says, Configuration Management efforts should result in a complete audit trail of decisions and design modifications.

C1.9 Repository has a process for testing the effect of critical changes to the system.

Changes to critical systems should be, where possible, pre-tested separately, the expected behaviors documented, and roll-back procedures prepared. After changes, the systems should be monitored for unexpected and unacceptable behavior. If such behavior is discovered the changes and their consequences should be reversed.

Whole-system testing or unit testing can address this requirement; complex safety-type tests are not required. Testing can be very expensive, but there should be some recognition of the fact that a completely open regime where no changes are ever evaluated or tested will have problems.

Evidence: Documented testing procedures; documentation of results from prior tests and proof of changes made as a result of tests.

MarkConrad - 22 May 2007 Again, Repository must have the capability to identify what is and is not a critical change to the system.

C1.10 Repository has a process to react to the availability of new software security updates based on a risk-benefit assessment.

Decisions to apply security updates are likely to be the outcome of a risk-benefit assessment; security patches are frequently responsible for upsetting alternative aspects of system functionality or performance. It may not be necessary for a repository to implement all software patches, and the application of any must be carefully considered. Each security update implemented by the repository must be documented with details was about how it is completed; both automated and manual updates are acceptable. Significant security updates might pertain to software other than core operating systems, such as database applications and Web servers, and these should also be documented.

Evidence: Risk register (list of all patches available and risk documentation analysis); evidence of update processes (e.g., server update manager daemon); documentation related to the update installations.

MarkConrad - 22 May 2007 I do not believe that this item should be limited to software security updates. It should include all software updates and any updates to a hardware system's firmware. Any of these updates has the potential to affect the repository's ability to carry out its responsibilities. ChrisRusbridge - 18 Jul 2007 - Yes, it's all part of change management...

C2. Appropriate technologies

A repository should use strategies and standards relevant to its designated community(ies) and its digital technologies.

C2.1 Repository has hardware technologies appropriate to the services it provides to its designated communities and has procedures in place to receive and monitor notifications, and evaluate when hardware technology changes are needed.

The repository needs to be aware of the types of access services expected by its designated community(ies), including, where applicable, the types of media to be delivered, and needs to make sure its hardware capabilities can support these services. For example, it may need to improve its networking bandwidth over time to meet growing access data volumes and expectations.

Evidence: Technology watch; documentation of procedures; designated community profiles; user needs evaluation; hardware inventory.

-- KatiaThomaz - 10 May 2007 - It could be separated in two items: “repository has hardware technologies appropriate to the services” and “repository has procedures to receive and monitor”

MarkConrad - 24 May 2007 I agree with Katia that this should be separated into multiple items, but I think that there should be 4 items. 1. repository has hardware technologies appropriate to the services 2. repository has procedures to receive and monitor notifications 3.repository has procedures to evaluate when changes are needed 4. repository has procedures to replace hardware when evaluation indicates the need to do so.

ChrisRusbridge - 18 Jul 2007 - Not sure what notifications are in this context! Mark's 3 & 4 are surely part of change and/or configuration management

C2.2 Repository has software technologies appropriate to the services it provides to its designated community(ies) and has procedures in place to receive and monitor notifications, and evaluate when software technology changes are needed.

The repository needs to be aware of the types of access services expected by its designated community(ies), and to make sure its software capabilities can support these services. For example, it may need to add format translations to meet the needs of currently widely used application tools, or it may need to add a data subsetting service for very large data objects.

Evidence: Technology watch; documentation of procedures; designated community profiles; user needs evaluation; software inventory.

-- KatiaThomaz - 10 May 2007 - It could be separated in two items: “repository has software technologies appropriate to the services” and “repository has procedures to receive and monitor”

MarkConrad - 24 May 2007 Again, I think that this item should be split into four items in the same manner proposed for C2.1

C3. Security

“System” here refers to more than IT systems, such as servers, firewalls, or routers. Fire protection and flood detection systems are also significant, as are systems that involve actions by people. The first two requirements here are general and the third addresses internal security, while the remainder address disaster recovery.

C3.1 Repository maintains a systematic analysis of such factors as data, systems, personnel, physical plant, and security needs.

Regular risk assessment should address external threats and denial of service attacks. These analyses are likely to be documented in several different places, and need not be comprehensively contained in a single document.

Evidence: ISO 17799 certification; documentation describing analysis and risk assessments undertaken and their outputs; logs from environmental recorders; confirmation of successful staff vetting.

-- KatiaThomaz - 10 May 2007 - It lacks “repository mantains a systematic analysis of such factors as third-partie”; (see ISO27001 A.10.2).

ChrisRusbridge - 18 Jul 2007 - Better surely to say that the repository maintains adequate security protection for the task in hand, following codes of practice such as ISO 27000 etc (plus perhaps the FIPS and other equivalents), with evidence being relevant certification...

C3.2 Repository has implemented controls to adequately address each of the defined security needs.

The repository must show how it has dealt with its security requirements. If some types of material are more likely to be attacked, the repository will need to provide more protection, for instance.

Evidence: ISO 17799 certification; system control list; risk, threat, or control analyses; addition of controls based on ongoing risk detection and assessment.

C3.3 Repository staff have delineated roles, responsibilities, and authorizations related to implementing changes within the system.

Authorizations are about who can do what—who can add users, who has access to change metadata, who can get at audit logs. It is important that authorizations are justified, that staff understand what they are authorized to do, and that there is a consistent view of this across the organization.

Evidence: ISO 17799 certification; organizational chart; system authorization documentation.

C3.4 Repository has suitable written disaster preparedness and recovery plan(s), including at least one off-site backup of all preserved information together with an offsite copy of the recovery plan(s).

The repository must have a written plan with some approval process for what happens in specific types of disaster (fire, flood, system compromise, etc.) and for who has responsibility for actions. The level of detail in a disaster plan, and the specific risks addressed need to be appropriate to the repository’s location and service expectations. Fire is an almost universal concern, but earthquakes may not require specific planning at all locations. The disaster plan must, however, deal with unspecified situations that would have specific consequences, such as lack of access to a building.

ChrisRusbridge - 18 Jul 2007 - See also ISO 20000-2:2005 section 6.3...

Evidence: ISO 17799 certification; disaster and recovery plans; information about and proof of at least one off-site copy of preserved information; service continuity plan; documentation linking roles with activities; local geological, geographical, or meteorological data or threat assessments.

-- DavidGiaretta - 09 May 2007

-- KatiaThomaz - 10 May 2007

-- SimonLambert - 21 Jun 2007

Edit | Attach | Watch | Print version | History: r11 < r10 < r9 < r8 < r7 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r11 - 2007-09-10 - MarkConrad
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback