Technical Metadata Elements: Discussion and Description

Background

A number of preservation metadata sets have been developed by initiatives such as CEDARS, NEDLIB, and the NLA. The report A Recommendation for Content Information, released in October 2001 by The OCLC/RLG Working Group on Preservation Metadata (12) compiles elements drawn from those three initiatives, adds other elements, and produces a preservation metadata set that describes the content data object (or the current instantiation of the original record) and its technical environment. This group of elements is called Environment Description.

An explanation of the Environment Description contained in the OCLC report states,

"...Environment Description is broken down into two components: Software Environment and Hardware Environment. A software environment is the collection of digital objects - e.g., Internet Explorer and Windows 95 - that when combined enable access to the content of the archived object. The hardware environment, on the other hand, consists of physical objects - primarily computer-related equipment such as monitors, microprocessors, and memory chips - that are necessary to operate the software environment." (12)

As it can be observed this set focuses on rendering/displaying the content data object kept in the digital repository. This set assumes that the content data can be stored in a compressed format. Thus, if the content data object does not bear a format compatible with the display software, a transformation has to take place for the object to be rendered and displayed. Instead, if the archived bit stream is compatible with the display/access software or application, no transformation needs to take place. The preservation metadata initiatives named above have devised sets that separately record the technical information needed to access the current manifestation of the digital object, and the information about the processes and equipment used to preserve/maintain the digital object current over time.

The technical metadata sub set presented in this portfolio pretends to record the technical environment changes that result from both access and preservation transformations. Other preservation metadata such as date of transformation process, permission to transform the object, authority in charge of transformation, etc. will be recorded within the Administrative Metadata Set.

The rationale of proposing a multifunctional metadata set lies in the fact that it will provide consistency in the documentation of the technical devices involved in any transformation process that a digital object goes through. Since for the most parts the preservation transformation processes are the ones that affect the display and hardware devices, it seems logical to use the same elements to describe both instances. Also, it can be assumed that the same hardware that supports the operating system and software used in a preservation strategy will support access to a digital record (NEDLIB supports this assumption and in its change history element only records software/application and operating system, 6 pp. 20-21). It is also logical to assume that many objects that do not get used very often, when accessed will need to be transformed. Thus, the transformation processes needed for access and preservation will take place at the same time.

The Technical_Metadata Subset draws from the OCLC/RLG report

The goals of the technical metadata sub set focuses both on preservation and access functions. The majority of the elements that compose this set have been taken from the OCLC/RLG report. The addition of new elements, the reinterpretation of others, and changes in the set structure have allowed to accommodate the preservation function within the access/display one.

The reason why the OCLC/RLG set was chosen to build the technical metadata sub set upon is because it provides a thorough and flexible structure that supports information for both simple and complex digital objects. Another reason has to do with the fact that the elements provide a thorough description needed in case the process has to be reversed or replicated in an emulation case.

This technical_metadata sub set assumes that at certain points in the object's continuum, transformations will need to take place for the content information to remain accessible. The kind of transformation that will take place will be dictated by the preservation strategy (migration-emulation) adopted by the repository. Since in this set the elements that describe the access transformation process are the same as the ones that describe a preservation process, the addition of a purpose element will indicates whether the transformation was for access or for preservation purposes.

Most transformation processes undertaken for access reasons - such as unzipping a file to give access to it for a user,- will need to be recorded only until the object has been zipped again and re located in the repository. Unlike the preservation transformation process, this kind of information can be kept in a log for a certain amount of time and then be discarded. Instead, the preservation transformation elements can be permanently recorded in a data base that will conform the Change History of the object. (6)

As this subset forms part of a Preservation Metadata Set, it is complemented by the description of the object and of the changes that the object itself goes through on account of transformations. Also, it is complemented with the correspondent administrative and rights management information.

What follows is the definition of the elements in table and in descriptive format. Since the majority of the elements have been extensively defined in the OCLC/RLG document, only the ones that had been added and modified will be described in detail.

Since this technical metadata set has already been transformed in a DTD, the structure/hierarchy in which they are presented is exactly like the one that allows for a valid XML structure. Within this set, except for the preservation instance identifier, all the elements are repeatable because they can serve to record a number of variables as one object might be rendered and displayed through different technologies (12 p.8).

Technical Metadata Elements: a Table

Sources: The sources from which the definitions have been extracted are linked in the source box. Some of the definitions have been changed to accommodate the new functions of the set. When the element was not modified the definitions remained the same as the source from which they where taken.

Level of Granularity: Considering the diversity of formats that can exist in a collection of digital objects and within a digital object, it was considered appropriate to include the possibility of recording the metadata elements at all levels, and specifically at the object level. This level of granularity will facilitate the management of collections, files, and individual objects within the repository.

ELEMENT NAME
SUB-ELEMENTS
DEFINITION
SOURCE
MANDATORY/
OPTIONAL/DESIRABLE
GRANULARITY
Repeatable
RECORDER OF ELEMENT
technical metadata
Includes all the elements that compose a technical metadata set
M
all levels
yes
preservation
link
Relationship between this instance and other external objects
10

 

O

any level
yes
preservation
preservation instance identifier

Identification given to each instanciation or manifestation of the original digital object

M
all levels
yes
preservation
transformation process
Description of: implementation, mechanisms, and technical tools used to automatically transform a digital object
2 12
M if applicable
all levels
yes
preservation
transformation process description
Description of process that takes place to automatically transform the digital object into a preservation instance, or on a particular computer platform
2 12
M if applicable
all levels
yes
preservation
purpose
Allows to state the purpose of the transformation process
M if applicable
all levels
yes
preservation
transformer engine name version
Name and version of the software capable of carrying out the transformation process
2 9
M if applicable
all levels
yes
preservation
parameters
Configuration of parameters to achieve successful operation
2 12
M if applicable
object level
yes
preservation
input format
Description of the format of digital object that the transformer engine works on
2 12
M if applicable
object level
yes
preservation
output format
Description of the format produced by processing the data object with the transformation engine

2 12

M if applicable
object level
yes
preservation
location of transformer engine

Location of the required transformer engine

12
M if applicable
all levels
yes
preservation
documentation of transformer engine
Supporting documentation necessary to use the transformer engine
12
D if applicable
all levels
yes
preservation
reverse
Points to the original version of the digital object, or to the tool that allows to reverse the transformation process
6
M if applicable
all levels
yes
preservation
display access application
Description of the software environment capable of displaying the content data object
12
M if applicable
all levels
yes
preservation

 

 

display access application name version
Name and version of software capable of displaying the content data object
12
M
all levels
yes
preservation
display access application input format
Description of the format of digital objet that the display access application worked on
2 12
M
all levels
yes
preservation
display access application output format
Description of the output expected from the display access application
2 12
M
all levels
yes
preservation
display access application location
Location of the display/access application needed to display and/or access the object's content
12
M
all levels
yes
preservation
display access application documentation
Supporting documentation necessary/useful for operation/use of the display access application
12
D
all levels
yes
preservation
operating system
Description of the platform upon which the rendering programs operate
6 12
M
all levels
yes
preservation
operating system name version
Name and version of platform upon which the rendering programs operate
6 12
M
all levels
yes
preservation

location operating system

 

Location of working copy of the operating system described in the operating system name and version
12
M
all levels
yes
preservation
documentation operating system
Supporting documentation necessary/useful for operation/use of the operating system
12
D
all levels
yes
preservation
hardware environment
Physical objects necessary to operate the software environment
12
M
all levels
yes
preservation
hardware name model
Name and model of the hardware necessary to operate the software environment
M
all levels
yes
preservation
microprocessor requirements
Description of microprocessor specifications necessary to operate the content data object's software environment
612
M
all levels
yes
preservation
memory requirements
Description of memory resources necessary to operate the content data object's software environment.
12
M
all levels
yes
preservation

 

storage information

 

Description of permanent storage resources necessary for the operation of the software environment and/or rendering of the content data object
12
M
all levels
yes
preservation
peripheral requirements
Description of additional equipment needed to render/display the content data object
6 12
M
all levels
yes
preservation
location hardware environment
Location of the physical devices needed to render the content data object
12
M
all levels
yes
preservation
documentation hardware environment
Supporting documentation necessary/useful for operation/use of hardware environment
12
D
all levels
yes
preservation
compiler information
Description of compilers or retargetable compiler
6
M if applicable
all levels
yes
preservation
compiler name version
Name and version of the program that can convert another program from source language to machine language.
6
M if applicable
all levels
yes
preservation
compiler documentation
Supporting documentation useful for operation/use of compiler
6
M if applicable
all levels
yes
preservation
retargetable compiler name version
Name and version of the retargetable compiler that can convert another program and its target hardware from source language to machine language
6
M if applicable
all levels
yes
preservation
retargetable compiler documentation
Supporting documentation useful for operation/use of retargetable compiler
6
M if applicable
all levels
yes
preservation

Technical Metadata Elements: Definitions and Discussion

ELEMENT: technical metadata

Definition: Describes the software and hardware environments needed to access the current preservation instance of the digital object as it is maintained in the digital repository. Records the hardware and software used to perform the preservation transformation needed to access the current instance of the digital object.

Source: 9 http://www.oclc.org/research/pmwg

Purpose: To contain all the elements needed to display/render the instantiation of the original digital object, and to document the technical environment that supports preservation processes.

Clarification: This element has been included to contain all the other elements as it is needed to construct a DTD that will parse xml marked-up metadata.

ELEMENT:link

Definition: Establishes relationships between this metadata_sub set and other objects or information.

Source: 10 http://www.nla.gov.au/preserve/pmeta.html

Purpose: It allows to link to standards, documents, or anything external that might be relevant to the technical metadata.

ELEMENT:preservation instance identifier

Definition: Identification given to each instantiation or new manifestation of a digital object.

Source: 9 http://www.oclc.org/research/pmwg

Purpose: It allows to locate and identify the information needed to render and display the instanciation of a digital object. It allows to keep track of the current instance of the content data object. This identifier will change every time the object is migrated or emulated and a new set of data will have to be recorded. It will not change when the object is being transformed for display/access purposes. It is like placing a number on a preservation master copy.

ELEMENT:transformation process

Definition: Description of implementation, mechanism, and technical tools used to automatically transform a digital object into a preservation instantiation, or for display/access purposes.

Source: 9 http://www.oclc.org/research/pmwg

Purpose: Contains the elements that describe the purpose of implementation, mechanisms, and tools needed to transform the original object into a preservation instantiation, or to render/display the content data in a human understandable way.

Clarification: According to the definition provided in the OCLC/RLG document this element is used to describe the transformation process that takes place when the original byte stream is taken from the archive and is transformed in order to be accessed/displayed. It assumes that within the archive the object is compressed and needs to be decompressed first before it is displayed.

In the technical metadata set another element called transformation process description was developed to describe this process, as it is interpreted that this description only exists when a transformation process needs to occur in the first place.The interpretation of this element in the preservation metadata set entails all the mechanisms and tools that transform the original byte stream into a preservation instantiation of the digital byte stream. In that sense, this element would for example support a migration instance.

In the technical metadata DTD, the elements enclosed in the transformation process are mandatory only when they are applicable.

ELEMENT:transformation process description

Definition: Description of process that takes place to automatically transform the digital object into an instantiation of the original digital object.

Source: 9 http://www.oclc.org/research/pmwg

Clarification:This element demands a narrative of how a software speaks to a computer platform to achieve the transformation of the digital object. In the future a standardized language will need to be in place to record this information in a precise way. Further experience in marking up documents and usage of preservation metadata will determine whether this element is really important or it could be avoided.

ELEMENT: purpose

Definition: Records the reason of the transformation process.

Purpose: It allows to determine if the transformation process is due to the implementation of a preservation strategy, or to access/display the current instantiation of the digital object.

ELEMENT: reverse

Definition: Points to the original version of the digital object, or to the tool that allows to reverse the transformation process.

Source: 6 http://www.kb.nl/coop/nedlib/results/D4.2/D4.2.htm

Purpose: Allows to reverse processes or to get to older versions of the digital object's manifestation.

Clarification: This element can be furthered developed to include sub-elements to define the software and operating system needed to reverse the transformation process.

ELEMENT: compiler information

Definition: Description of the compiler that allows analyzing and executing each statement in a source program that will render a specific digital object, or of the retargetable compiler that generates assembly code for different architectures while reusing the largest par of the compiler's source code.(5)

Sources: 6 http://www.kb.nl/coop/nedlib/results/D4.2/D4.2.htm

Purpose: Allows to render a digital object created or supported in a source coded program. The retargetable compiler goes even further because it takes the source code of the software and a model of the target hardware to render the digital object.