2. How metadata is added to the graph

Due to the wide variance in how research data is made available by data providers in the domains of NFDI4Culture, there is no single, universal workflow for metadata integration. Metadata about research data from providers is instead integrated into the Culture Knowledge Graph via a flexible Extract-Transform-Load (ETL) environment, the so-called Culture Knowledge Graph Kitchen (PID E5877). This pipeline can incorporate various tools adapted to the wide range of needs of the data providers organised in NFDI4Culture and allows for periodical reharvests in case data changes over time.

Data Structure

Within the Culture Knowledge Graph, all metadata is structured into data feeds with clear information about who created or provided them. Each data feed consists of various data feed elements that contain information about digital representations of material and immaterial cultural heritage and related metadata and authority data. It is important to note, however, that only metadata will be ingested: the full data record remains with the data provider.

Chapters 3 and 4 outline the data formats and provisioning methods that the ingest pipeline currently supports or will support in the near future. Once a data provider and the Culture Knowledge Graph team agree on what and how to ingest, a data feed needs to be registered in the Culture Information Portal. In cases where a data feed belongs to a data portal that has not yet been registered in the NFDI4Culture Registry (PID E2392), the metadata record of the data portal will also be created during the registration of the data feed. The same holds true for the metadata record about the publisher of a data feed.

Basic integration requirements

In order for data to be made accessible via the Culture Knowledge Graph, a number of requirements must be met:

  • Data providers
    • must have a formalised relationship with NFDI4Culture (i.e. through involvement as a partner or any other form of written cooperation agreement between the consortium and the data provider)
  • Research data
    • must have resolvable and persistent IRIs on the web
    • should have external identifiers and classifiers from authority files or controlled vocabularies, such as the GND, VIAF, Iconclass, GeoNames, Wikidata, etc.
    • must have a clear license or rights statement for the research data, ideally a copyleft license. In cases where research data is rights protected, a consultation with the Legal Helpdesk Team of the NFDI4Culture Helpdesk about how metadata about such data can be made available is recommended.

If you are interested in making research data findable via the Culture Knowledge Graph, but your institution is not yet formally involved, you are very welcome to contact the NFDI4Culture Helpdesk and select “General Request”. Your request will then be forwarded to the Co-Spokespersons of NFDI4Culture and you will be contacted with suggestions for a possible cooperation.

Integration worklflow

Generally, the NFDI4Culture Helpdesk (PID E2409) should be contacted before creating a data feed. Please select the request type “Culture Knowledge Graph Data Integration”. We will then discuss your specific use case and draw up a plan for the integration of the data feed. Prior to contacting the Helpdesk, data providers should consult the checklist (see appendix). It also helps to have a clear idea of which formats and data provision methods a data provider could offer.

Minimum set of required properties for each metadata item

Metadata property Comment
Label The label of the metadata item.
Resolvable IRI The resolvable IRI to the source data record on the provider's side.
License/ rights statement A statement about the license of the source data record.
Publisher A statement about the publisher of the resource using the Culture IRI of the publisher. You will be provided with this IRI when registering your data feed.
Data feed IRI The Culture IRI of the data feed, which is provided through registering the data feed in the Culture Information Portal.

Beyond the minimum set of required properties for each metadata item, it is highly recommended to provide identifiers from authority files (e.g. Getty AAT, Wikidata, GND, etc.) for related persons, organizations, works, events, locations and classifiers (such as Iconclasses). It is also good practice to provide time information (like the creation period of the cultural artifact that is described by the metadata). Furthermore, if possible, provide URLs to digital representations (e.g. images, audio, video, 3D, etc.) including their bylines (if required), rights information in the form of license statements, and their type (e.g. schema:ImageObject, schema:AudioObject, schema:VideoObject, schema:3DModel). A license statement and a digital representation type are required if URLs are provided for digital representations. Additionally, providing the type of the real-world entity for each item (e.g. schema:Sculpture, schema:MusicComposition) is highly recommended. Please refer to the NFDIcore Ontology and the NFDI4Culture Ontology module for more details on optional properties and background information.

Minimum set of required properties of the data feed

Metadata property Comment
Modification date The date the data feed was modified (YYYY-MM-DD). In case of the first provision, this field corresponds to the creation date.

The Culture IRI of the data portal from which the data feed is provided, its label, description text and further metadata information are not themselves part of the required metadata fields. Instead, this information is gathered when a data feed is registered in the Culture Information Portal.