OpenRefine is a free data wrangling tool that can be used to clean tabular data, reconcile data entities (i. e. identify matching entities across data services) and connect these with external knowledge bases. It is a community-supported open source project (licensed under the BSD license). OpenRefine is used by diverse communities including: librarians, researchers, data scientists, and the NFDI4Culture community, too. It is also used in Task Areas 1 and 5 as part of the data enrichment and semantic infrastructure services offered by NFDI4Culture. In 2022, a small Flex Funds Tools grant from NFDI4Culture supported enabling extended connectivity between OpenRefine and the linked open data tool suite Wikibase, in particular with regards to developing a reconciliation service that can work with media files, too, not only text-based data.
In the course of working on extending OpenRefine’s capabilities, the OpenRefine team carried out user testing sessions and were able to identify a number of improvements to the reconciliation process that can significantly benefit the overall user experience. These concern: 1) how users interact with the reconciliation dialog window in OpenRefine; 2) how the interface displays reconciliation results from different services, including Wikidata, Wikibase, but also other standard terminology services such as the GND, Getty Vocabularies, VIAF and more; and 3) how users perform data enrichment on their own data via externally linked services. Work towards achieving these improvements was supported by a renewed Flex Funds Tools grant in 2023. A complete overview of related issues that were completed during the scope of the grant or remain under discussion and continuous development can be reviewed in this GitHub Project.
This deliverable has been completed to the stage of mockup designs for all parts of the dialog interface, iteratively refined through community discussions. The following interface design improvements have been completed and originally released as part of OpenRefine v3.8-beta1 (and now also in the stable version 3.8.0):
After several design iterations, the following improvements have been implemented and released as part of OpenRefine v3.8-beta1:
A large part of development work also focused on providing more informative and actionable error messages during various stages of the reconciliation/enrichment workflows, which previously remained invisible to users and meant reconciliation was a much slower and inefficient process. The following improvements have been implemented and released as part of OpenRefine v3.8-beta1:
Future developments related to the ongoing improvements of the reconciliation dialog design (Deliverable 1) and the enrichment visualizations (Deliverable 2) can be tracked via the GitHub Project. In particular, the inclusion of service logos in column headers is almost fully implemented (#6156) and can be expected to be part of the next release (3.9).
Lozana Rossenova, Antonin Delpeuch