Appendix

Checklist for data providers

  • Resources recorded in the data feed are available via resolvable IRIs
  • Overview of external identifiers and classifiers (e.g. GND, Wikidata, GeoNames, Iconclass, Getty AAT et al.) used in the data to identify entities (such as places, persons, organizations or concepts)
  • If the data is part of an aggregator portal or search index for third-party data, the original data providers were contacted about the planned integration of their data into the Culture Knowledge Graph
  • The publisher of the data portal or its parent institution is already formally involved in NFDI4Culture (see partner list) (yes/no)
  • The data portal is registered in the Culture Information Portal (yes/no)
  • General information required per data feed:
    • Name and, if applicable, Culture IRI of the data portal the feed belongs to
    • Information about the type of the data feed elements
    • Whether resources of the data feed are listed in an aggregator portal, e.g. Deutsche Digitale Bibliothek (DDB) (PID E1183), heidICON (PID E2944), prometheus (PID E2428), Bildindex der Kunst und Architektur of the Deutsches Dokumentationszentrum für Kunstgeschichte – Bildarchiv Foto Marburg (DDK) (PID E2916)
    • Whether the data portal itself is an aggregator portal (yes/no)
    • Short description of the data feed in English and German with a focus on number, content, as well as data formats of the research data (up to 100 words)
    • Relevant subject areas

Glossary

Term Definition
Culture Knowledge Graph Kitchen The Culture Knowledge Graph Kitchen is a versatile ETL (Extract-Transform-Load) environment designed for efficient consumption, processing, integration, and analysis of data feeds into the Culture Knowledge Graph. It comprises several Python components, each serving a unique purpose within the "kitchen" to consume, clean, version control, publish and analyze data.
Culture Graph Interchange Format The Culture Graph Interchange Format (CGIF) is an easy-to-use, lightweight interchange format based on schema.org for the harvesting of data. CGIF has the added benefit of automatically making the data eligible for Google Dataset Search and to significantly improve the findability of websites and datasets through search engine optimization.
Data feed Following schema.org, a data feed provides structured information about one or more entities or topics (see https://schema.org/DataFeed). A data feed may also be known as a dataset.
Data feed element Following schema.org, a data feed element is an item within in a data feed. Several elements can be contained in a data feed (see https://schema.org/dataFeedElement).
Data portal Following the definition of the NFDIcore Ontology, a data portal is a website that serves as a centralised platform for accessing, managing, and sharing datasets, information, or resources related to a specific topic, theme, or domain (see https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0000123).
Hydra Scraper The Hydra Scraper is a command-line tool to pull data from various sources, such as Hydra-paginated APIs, Beacon-like URL list, ZIP files, or local file dumps. It can transform files in RDF-compatible formats such as JSON-LD or Turtle, but it can also handle, for example, LIDO files. Command-line calls can be combined and adapted to build fully-fledged scraping mechanisms, including the ability to output a set of triples, which is used to harvest data for the Culture Knowledge Graph.
NFDIcore Ontology The NFDIcore ontology (prefix nfdicore) describes resources such as datasets, data providers, persons, projects and other entities in the data domain of NFDI. It serves as the basis for further domain-specific ontology modules. Mappings to numerous external vocabularies and ontologies are provided in an extra file.
NFDI4Culture Ontology The NFDI4Culture ontology module (prefix cto) handles the representation and categorization of various resources within the domains of NFDI4Culture. It encompasses a wide array of entities including individual source items and properties for e.g. related persons, organizations, locations, data concepts, calssifiers and time information. CTO builds upon the NFDIcore Ontology.

List of abbreviations

Abbreviation Definition
AAT Getty Art & Architecture Thesaurus
API Application Programming Interface
CIDOC CRM CIDOC Conceptual Reference Model
CGIF Culture Graph Interchange Format
CMIF Correspondence Metadata Interchange Format
CTO NFDI4Culture Ontology
DDB Deutsche Digitale Bibliothek
DDK Deutsches Dokumentationszentrum für Kunstgeschichte - Bildarchiv Foto Marburg
EAD Encoded Archival Description
ETL Extract-Transform-Load
GND Gemeinsame Normdatei
ID Identifier
IRI Internationalised Resource Identifier
ISO International Organization for Standardization
JSON-LD JSON for Linking Data
KG Knowledge Graph
LIDO Lightweight Information Describing Objects
MARC MAchine-Readable Cataloging
MEI Music Encoding Initiative
NFDI Nationale Forschungsdateninfrastruktur (national research data infrastructure)
NFDI4Culture Consortium for research data on material and immaterial cultural heritage
OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting
PID Persistent Identifier
REST-API Representational State Transfer-Application Programming Interface
RDF Resource Description Framework
SEO Search Engine Optimization
SPARQL SPARQL Protocol And RDF Query Language
TEI Text Encoding Initiative
URL Uniform Resource Locator
VIAF Virtual International Authority File
XML Extensible Markup Language
ZIP File format for lossless compressed files