The amount of publications in the digital space is steadily increasing, along with a wide range of contemporary digital tools as well as processing and analysis methods that enable new ways of discovering and analysing relevant data sets. As a result, humans are increasingly reliant on computer support to find and select the data most relevant to them. The FAIR Principles therefore aim to ensure optimal reusability for both humans and machines alike. ‘Machine-readability’ here refers to the ability of computer systems to find, access, integrate, and reuse data with little to no human intervention.
A digital object is ‘machine-actionable’ when it provides information that enables an autonomously operating, computer-assisted algorithm to
Good research data management in accordance with the FAIR Principles enables a network of data and services that can find each other, communicate, and remain available for reuse. This is fundamentally based on Linked Data technologies. These technologies rely on the global, unique identification of digital objects and their associated resources through Uniform Resource Identifiers (URIs). This is the prerequisite for classifying and categorizing them via links to identification systems (e.g. ontologies), allowing them to be 'understood' by machines. Similarly, relationships between resources are indicated using URIs. The leading overarching standards for encoding semantics are RDF (Resource Description Framework), which defines the syntax for data exchange, OWL (Web Ontology Language), a formal description language for creating, publishing, and exchanging ontologies, and SKOS (Simple Knowledge Organization System), a formal language for encoding documentation languages such as thesauri, classifications, or other controlled vocabularies. The embedded semantics, connected with the data, offers significant advantages for the qualified evaluation of the data and handling of data sources with heterogeneous content.
The primary function of data standards is to make information more analysable through consistent encoding or uniformity in description. Their effectiveness is not rooted in their formal prescription but rather in their shared use. It is the most widely adopted systems and conventions within a user community that determine what types of information are recorded for each information object in a data collection and in what manner. These standards are typically well-documented and have active user communities that continuously work on their content and technical development, as well as on the software systems that adapt them to current and future challenges. An example of such an adaptation is the document format Text Encoding Initiative (TEI), which, originating from its initial application in specialized research libraries, is now used internationally in a wide variety of text edition projects across diverse disciplines.
The consistent use of standards is particularly important when creating metadata, including its sources and conditions of their production. This is a prerequisite for making metadata analysable by machines. Standards ensure the medium- and long-term comprehensibility and reusability of data, enabling people to work with data they did not create themselves. If rarely used or poorly documented formats, schemas, or models are employed, or if the data is embedded in software that is proprietary or no longer accessible due to lack of maintenance or documentation, the comprehensibility of the data is often no longer guaranteed, even for humans, after just a few years.
The consistent use of standards is especially important when creating metadata, including their sources and conditions of collection. This is a prerequisite for making the metadata analysable by machines. Standards ensure the medium- and long-term comprehensibility and reusability of data, allowing people to work with data they did not create themselves. When rarely used or poorly documented formats, schemas, or models are used, or when data is embedded in software that is proprietary or no longer accessible due to lack of maintenance or documentation, the comprehensibility of the data is often no longer guaranteed, even for humans, after just a few years.
Which standards should be followed depends on the practices of the specific domain in handling the respective type of material, the documentation goals of the research project, and the type of data being generated. Collection cataloguing projects in cultural heritage institutions usually follow what is widely used in their respective sector — whether library, archive, museum, or monument preservation. For digitisation projects in institutions with collections, the DFG Practical Guidelines on Digitisation have, for many years, served as a widely recognised and proven good practice recommendation for quality assurance through the use of standards, extending far beyond the original context of DFG grant applications. The latest version of these guidelines was released in 2023.
With regard to the future reuse of research data, data producers should always consider standards that may not yet be widely used in a particular domain but seem appropriate for the research question and hold great potential for good interoperability and reusability of the data. This is especially true for vocabulary standards.
The data standards relevant to research data pertain to several areas of application. Recommendations for the fields represented by NFDI4Culture can be found on the linked pages within this guide.