4. Formats and methods for data provision supported in the future

If you have data in a format that is not listed in the previous chapter, the Culture Knowledge Graph team encourages and provides tools for community work to build further transformation routines. The list below teases community work that is already underway regarding further routines. Please note, however, that XML formats, such as MEI and TEI, can already be integrated if they are converted to NFDIcore/CTO via services such as XTriples.

Data formats

MEI/XML

The Music Encoding Initiative (MEI) is a community-driven framework for musicology data. Many data portals of the musicological community already provide MEI/XML. The main challenges in the automated transformation are to retrieve and transform incipit data (if available) and to retrieve full authority file IRIs without the need to parse additional look-up files that cannot be discovered automatically.

TEI/XML

In the humanities, XML files structured according to the Text Encoding Initiative (TEI) are commonly found in text-centric editions. As with MEI, the primary challenge in automated retrievals of this data is the common provision of full authority IRIs via additional look-up files that are not discoverable automatically. In addition, TEI/XML often allows for multiple ways to include specific metadata.

MARCXML

The MAchine-Readable Cataloging (MARC) standards are commonly used in library catalogs and repositories to describe their resources. MARCXML defines an XML serialisation of MARC 21 data for use on the web. The main challenge in harvesting this format for the Culture Knowledge Graph is to retrieve authority data, which is usually provided as a loosely standardised ID instead of an IRI. In addition, some library database systems encourage the use of combined free-text fields over neatly separating data.

EAD/XML

Encoded Archival Description (EAD) is an XML format commonly used to compile archive records. Similar to MARCXML, extracting authority data without IRIs can be a hassle using this format. Additionally, EAD features a number of date fields that do not follow ISO formats and may not always be automatically parsed.

Methods for data provision

CMIF

The Correspondence Metadata Interchange Format (CMIF) provides information on letters and other correspondences using a subset of TEI/XML. Many German letter editions aggregate these index files with the community service correspSearch. A core challenge in automated CMIF transformations is to get a link to the TEI/XML of the annotated letter in case not all relevant information is included in the CMIF file itself.

OAI-PMH

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a standardised API commonly provided by library and archive software. The API works well in combination with various XML formats, which can be included directly in an API endpoint’s responses. Due to the flexibility of OAI-PMH, a central harvesting challenge for the Culture Knowledge Graph is to get data in a transformable format or identify URLs to get to such a serialisation.