The following chapter provides an overview of data formats and provisioning methods which the ingest pipeline currently ‘understands’. Routines to create and integrate data feeds into the Culture Knowledge Graph have been built by members of the NFDI4Culture community along with the Culture Knowledge Graph team. In addition, chapter 4 lists further formats and methods for data provision for which such routines are in development but not yet published.
Metadata on digital representations and related research data about material and immaterial cultural heritage may be provided directly encoded as RDF that adheres to the NFDIcore ontology and, more specifically, its NFDI4Culture Ontology (CTO) module. This is the target format used within the Culture Knowledge Graph. RDF can be provided in various serialisation formats, e.g. N-Triples, Turtle, RDF/XML.
The Culture Graph Interchange Format (CGIF) is a small subset of schema.org. It is not as feature-rich as the target ontology used in the Culture Knowledge Graph, but may be an easier target to provide or convert your data to. CGIF-compatible data can be embedded in regular websites to make periodical harvests simple and improve your website’s search engine optimization (SEO) at the same time, because schema.org markup is used by many major search engines to understand website content and index it accordingly.
Lightweight Information Describing Objects (LIDO) is an XML standard that is well-established in cultural-heritage software. LIDO files describe individual museum or collection objects or object groups. The main challenge of transforming this format for the Culture Knowledge Graph is the extraction of IRIs since this is not a LIDO requirement, especially in case of the recordInfoLink
to identify the object.
A simple way to provide your data is by submitting it as a data dump in one of the supported data formats (see above). Instead of providing each file individually, those data dumps can be made available as single files in file formats such as JSON or in a ZIP archive available on the web. ZIP files/data dumps can be reloaded and their content reintegrated periodically. This allows you to rebuild your ZIP file/data dump whenever your data changes (or when there is not a lot of load on your server). If you are in need of a place to store your data dump, please get in contact with the NFDI4Culture Helpdesk.
If your data is available via a SPARQL endpoint, implement so-called CONSTRUCT
queries to match the entities and properties in your graph to the ones required by the NFDIcore/CTO ontology. Depending on the complexity of the query and the amount of data, however, the harvesting process may be simpler and less of a strain on your server if you run the query in a local environment and provide the Culture Knowledge Graph team with a link to the (periodically updated) data dump as outlined above.
CONSTRUCT
query created by data providerAs outlined above, the Culture Graph Interchange Format (CGIF) is a lightweight data exchange format based on schema.org. It includes an option to provide data of an entire feed, including pagination. This allows for reliable and fast harvesting without straining a server: if your website has a list view of all feed elements, adding the required markup to this template may be a very simple way of adapting your research data for the Culture Knowledge Graph. Alternatively, CGIF/schema.org may also be provided via dedicated APIs.
Another starting point for harvesting your data can be a simple text file containing URLs of the files to transform and ingest. This can be useful if, for example, you have individual feed element files but no API or ZIP utility to bind them together. The list of URLs may, in theory, be similar to the Beacon format, but should focus on listing all resources you want to be harvested instead of just, for example, links to a single authority file.
If you already passed on your research data to an aggregator, another harvesting option is to retrieve the aggregated data. An existing aggregator routine works with data stored in the Deutsche Digitale Bibliothek (DDB) and only requires knowledge of your provider identifier (ID) at the DDB. Harvesting these versions of your data, however, should only be your first choice if you are confident about their actuality, completeness, and quality.
If your data is available via a custom REST API, you might find it worthwhile to look into (and adapt) one of the following routines that produce RDF data according to NFDIcore/CTO: