The 15 guiding FAIR Principles refer to the ‘(meta-)data’. What does that mean? On the one hand, they apply to the actual data elements (bit sequences). Examples of these include a text, a file, an image, source code of software, a 3D model, a service, or time series (audio signals). They can also refer to an aggregate of many units, which itself can be individually addressed, such as a database, a digitized book, the digitized materials of an estate, a recording system, software, or a research data publication with multiple components.
Metadata describes the properties of other data. It provides information about their content, properties, or structure, indicating the contexts in which they are found or how they can be used.
In the process of implementing the FAIR Principles, data and metadata take on different functions. The methods used to make data and the corresponding metadata FAIR can differ, and the degree of FAIRness can vary between the two components. Therefore, it is first necessary to identify the two components and structure them as digital objects (DO).
A digital object is initially ‘an object composed of a sequence of bit sequences’. This means that any file can be considered a digital object. Some digital objects can be structured in a simple way, such as a text file. A video, which consists of multiple elements (video track, audio track, container file, and possibly others), can be considered a complex digital object.
For a digital object to be interpreted by machine agents, it must be addressable, structured, and typified. To achieve this, the bit sequence is assigned an identifier—preferably a globally unique and persistent identifier (PID) — along with a description of its properties in the form of a metadata unit. The bit sequence, PID, and metadata are linked through distinct relations to form an extended digital object, which can be addressed and processed as a unit of knowledge.
.
In the field of memory institutions (libraries, archives, museums), the term ‘metadata’ is generally used for the supplementary information with which the mostly physical objects, collections, and resources kept there are organized, described, and managed. It is thus quite common for metadata to refer to non-digital entities, which may be material or immaterial, concrete or abstract. Instead of referring to a bit sequence, metadata can relate, for example, to a painting, a libretto, a source text, a person, a geographic location, an event, or a concept from the history of ideas. If authority data for these non-digital entities are available, they can be addressed via authority data PIDs. In this case, the digital object contains a bit sequence of a dataset in a structured format, which itself contains metadata about a non-digital entity.
To better understand the function of metadata, it is helpful to divide them into different categories. They all play a role in ensuring the FAIRness of the data.
The metadata information relevant to the digital object can vary depending on the data format and the intended usage context. It is generally assumed that metadata can be found not only in the documentation created for them (e.g. in a database or table) but also at the level of software and system configurations or process control (e.g. log files), from which they must be extracted.
Administrative Metadata
Descriptive Metadata
The PID of the digital object typically leads to a landing page, an HTML page that displays metadata about the digital object. This page must provide enough information for both humans and machines to identify the digital object. Additionally, the landing page must offer access to the bit sequence itself (text, image, video), as well as to any other available metadata. Machine interpretability of the metadata can be ensured by embedding it into the header of the HTML page using schema.org.