Understanding digital objects requires describing them in a common knowledge representation language. The chosen language should have a formal specification of its syntax and grammar, and these specifications should be documented and openly accessible.
In order to support interoperability, the language must be designed in such a way that it can be used in multiple scenarios. This can be achieved by using recognised, preferably open, formats and software. Work with the relevant community-recognised standards among metadata schemas, terminologies and ontologies.
All data stored in a repository should be in open, internationally recognised, standardised file formats to ensure long-term interoperability regarding usability, accessibility, and sustainability.
Here are Recommendations for file formats in the fields of cultural research and cultural heritage.
At the beginning of your research project, identify which future-proof file formats are suitable for your project. Use preferred formats recommended by your repository or data platform that can be utilised independently of specific software, developers, or providers. If you are not working with these data formats from the outset, make sure that the data can be converted to the recommended formats without any loss of content. Publish the (meta-)data in multiple formats if necessary.
Promote the use of formats that are suitable for long-term preservation. Provide a clear and detailed overview of the accepted or recommended file formats. If you perform format conversions, present this as part of your service offer. If functional long-term archiving is not covered by your service spectrum, make this clear (e.g. in the case of exclusive bitstream preservation).
The Rijksmuseum Amsterdam publishes its metadata via an OAI-PMH interface in the formats LIDO, EDM, and Dublin Core.
Using a data standard supported by a large community increases the likelihood of sharing, reusing, and combining data collections. Use standards that are commonly employed in your community or cross-cutting standards if possible. This section of the guideline elaborates on the importance of data standards.
Both the syntax and semantics of data models and formats used for (meta-)data in digital objects should be easily identifiable, analysable, processable, and translatable by machines. As with identification schemes and controlled vocabularies, various data formats can be considered FAIR in principle.
It is evident that any software-based synchronisation process for analysing and converting data is prone to errors, and it would be ideal to limit the publication of FAIR data to specific community-accepted formats and standards. However, if a researcher can demonstrate that an alternative data model or format can be processed in a manner that is equivalent to one of the accepted FAIR formats, there is no reason why such a format should not be considered FAIR.
Here you may find Recommendations for frameworks and reference models and for metadata structures (element sets, schemas) in the fields of cultural research and cultural heritage.
In the early stages of your project, inquire with the repository where you plan to deposit your data about the data standards they support. Check if these standards are appropriate for your research subject. From the beginning, structure your data of your project according to these standards.
If your research project presents specific challenges that cannot be adequately addressed by existing data standards, try to limit your own extensions to a minimum. Seek advice from experts on the relevant standard. They can often provide suitable methods for modelling the data within existing frameworks. If you still wish to implement local extensions, document them in a way that they are also reusable in a FAIR way, and check with your selected repository whether and how these extensions can be incorporated and processed.
Provide machine-readable data and metadata using a well-established framework as much as possible. Clearly define which data standards your institution supports and publish guidelines on how data should be prepared for optimal reuse, for example, by filling out certain core fields.
Support professional initiatives related to the maintenance, development, and dissemination of these standards, especially with regard to interoperability.
Guidelines of the Deutsche Digitale Bibliothek (German Digital Library) on core elements of delivery formats in Teilnahmekriterien, section Metadaten
Clear documentation of metadata schemas helps developers compare metadata and map them to one another. This process involves associating data elements from two different data models for the purpose of information integration.
Use existing metadata schemas, where possible, especially those commonly used in your field or required by your repository. Reference the models and their documentation clearly, preferably with a PID, and specify the version used.
Local extensions or custom developments should be avoided. If they have to be created, they should be carefully documented, licensed under open licenses, and published alongside the other research data.
Publish the metadata schemas supported by your research infrastructure. Document technical specifications and define classes and properties. Specify which components are mandatory, optional, or recommended.
Ensure that all used data elements are defined and unambiguous to prevent misinterpretations. For example, an element labelled ‘Location’ could refer to a place of origin, current or previous storage location, or a person’s birthplace.
If the metadata was originally created in a different data schema, this information facilitates mapping by the data producer. If carefully created mappings of metadata schemas are available, they should be published to ensure greater convergence of other datasets that need to be mapped accordingly.
To enhance the quality of metadata and thus improve interoperability, (automated) processes for cleaning, generating, and enriching metadata should be implemented. These processes can improve data quality in an ongoing project as well as significantly enhance the reusability of existing data.
Implement procedures that minimise the risk of errors during data collection. Use features in database software that assist in identifying correct terms and entities from a controlled vocabulary and automatically transfer their PIDs into your data. For example, select labels and PIDs using an integrated module in the software, or transfer a date from a calendar rather than entering it manually. Utilise configurable validation checks in your software to ensure data entry meets defined requirements, providing alerts or preventing saving of incorrect entries. Conduct post-entry consistency and completeness checks for core data fields by validating data against a suitable metadata schema or using editorial software tools.
For post-entry data quality improvement, the tool OpenRefine has proven effective for organising, structuring, and transforming data, and for referencing controlled vocabularies. Tools like Cocoda and the Europeana Archaeology Vocabulary Service also support subsequent referencing of controlled vocabularies and authority data.
When creating structured metadata for entities addressed in your research (e.g. works, objects, people, places, events) or when cataloguing collections in libraries, museums, or archives, you need to plan the level of detail for describing the objects. Consider the Recommendations of the relevant metadata schemas used in your domain. Elements marked as ‘core field’, ‘mandatory’, or ‘recommended if available’ should be populated with FAIR-controlled vocabulary (see I.2), if it is contextually appropriate and information on the specific matter is available. This ensures that the object can be found using core information which users usually search with, regardless of specific documentation focal points. This enhances the consistency of data sets and thus the interoperability in broader contexts.
Describe the collection or specific subsets of it in their own right, explain and index their relevance from a content perspective. This allows collections to be found in search systems that target more general levels of cataloguing information.
Minimise the likelihood of errors in metadata entry through automated validation rules or checks for completeness.
Establish fixed and transparent business processes that incorporate quality control whenever possible. This can be achieved, for example, through a role concept that assigns different rights to various users to represent internal responsibilities and support curation workflows while allowing review options by third parties (external reviewers). Both options are implemented at RADAR.
Consolidate efforts to develop workflows and software solutions for such automated processes, for example, by using machine learning tools. If it significantly improves the discoverability and reusability of the data, invest in tools for cleaning (meta-)data and converting data into standardised and interoperable data formats.
If the production and curation of the data is a continuous process on the part of the data producer and data is periodically reintroduced into the repository in modified form, offer to re-integrate the qualitative improvements achieved by the data platform into the original data.
The OpenRefine website offers a user manual, plus a large number of external recommendations for using the tool in various languages. The online workshops of the State Archives of Baden-Württemberg are also recommended.
Data sources that can be reconciled via the OpenRefine Service API (including GND, VIAF, Getty Vocabularies, Wikidata, GeoNames) are listed here.
The DFG Practical Guidelines on Digitisation. Updated version 2022 recommends creating the collection description with the same metadata schema that is used for the inventory objects (e.g. MODS, TEI-Header, EAD, LIDO) or use DCAT.
Further examples: DARIAH Collection Registry; collections of the Library of Congress with EAD and METS