Feedback geben

2.05. Text and Video Documentation for the Domestic Beethoven Annotator App

Central to any piece of sustainable software is documentation. Without documentation, the likelihood of a project's research output being reused is greatly reduced, for without instructions, even the simplest piece of software can be incomprehensible to someone who has not used it. This is especially true for experimental software, where an intuitive graphical interface is often less a priority than core functionality. Documentation is the sine qua non of sustainability. Documentation does not guarantee that software is reusable, but without it, no digital research output can claim to be sustainable.

The NFDI4Culture FlexFunds initiative was intended to contribute to the sustainability and usability of the results of digital research tools. A grant was awarded in December 2023 to develop text and video documentation for the Annotator App developed by the Beethoven in the House project – a three-year collaborative research project between Edirom, the Oxford e-Research Centre, and the Beethoven Archive Research Center in Bonn. The project's website now features tutorials that illustrate the use of the app in 5 short lessons.

The project’s core development question was how to create an app that could support annotation of the same passage of music as it appears in different arrangements. The data model created for this task was therefore multilayered, so that musical structures could be captured at varying degrees of scale and at varying levels of abstraction. When it came time to develop a user interface for the prototype app, a certain level of complexity was unavoidable. The researchers wanted to easily confirm that all components were working together as intended. The user interface, therefore, is perhaps not quite as intuitive as one might expect from an app ready for public use. Hence the need for tutorials.

Work began by seeking out examples and best practices in creating online tutorials, especially those that make use of screencasts. This background research confirmed that multimodal learning is extremely helpful for users learning new software and that users prefer short, targeted videos explaining the specific step they are seeking help with. Recommendations for maximum length range from 30 or 60 seconds up to 3 to 6 minutes. The organization and layout of video materials is also important. Users need to grasp the overall structure of the tutorial material and to grasp that it is accessible in either a linear or non-linear format. They should also feel as though they can solve the current task successfully. In addition, the less formal register of speech that often accompanies a screencast can also aid learning. Using a conversational style of speaking emulates a dialog, positioning the user as addressee and thus as a participant in the narrative. "Using the self as a reference point increases the learner’s interest," engaging a greater portion of the learner's cognitive capacity, facilitating the processing of information.

Bibliographical recommendations on creating online tutorials:

Mayer, Richard E.: “Cognitive Theory of Multimedia Learning”, in: The Cambridge Handbook of Multimedia Learning, edited by Richard E. Mayer, pp. 43–71. Cambridge, 2014.
Bowles-Terry, Melissa / Hensley, Merinda / Hinchliffe, Lisa: “Best Practices for Online Video Tutorials: A Study of Student Preferences and Understanding,” in: Communication in Information Literacy, vol. 4, no. 1 (oct. 2010), pp. 17–28.
Tandi, Maximilian / Niebuhr, Lorena / Jakobs, Eva-Maria: “How-To: Instructional Video”, 2020.
Bailin, Alan / Peña, Aisha: “Online Library Tutorials, Narratives, and Scripts”, in: The Journal of Academic Librarianship 33, no. 1 (January 2007), pp. 106–17.
Mayer, Richard E. / Fennell, Sherry / Farmer, Lindsay / Campbell, Julie: “A Personalization Effect in Multimedia Learning: Students Learn Better When Words Are in Conversational Style Rather Than Formal Style”, in: Journal of Educational Psychology 96, no. 2 (June 2004), pp. 389–95.

Following these recommendations, the operation of the app was broken down into a series of basic functions. These functions provided the structure for a table of contents that could also serve as a summary of the tutorials that accompany them. Each function was then carefully broken down into simple steps that would be easy to convey to a user. The functions were assigned chapter headings and were each linked to a separate tutorial on the site. The chapters in the table of contents were then summarized with three or four bullet points.

Each page contained a series of steps with instructions, and then screengrabs were made to illustrate the steps. Lastly, screencasts were recorded to accompany each of the chapters. The instructional materials, then, consisted of text, screen captures, graphics, and screencasts. Screencasts are intended to be used either as a supplement to the text explanations or on their own as a video tutorial.

Technically, the creation of the tutorials was accomplished by using markdown to write the text instructions for each chapter. Arrows were added to screenshots by using PowerPoint and then exporting single slides as JPG images. A standard PDF reader was used to perform additional cropping. (Arrows could also have been added with a PDF reader. Using presentation software made the task simpler, though it did introduce an extra step, as it was necessary to then crop each exported image. Ultimately, the decision was made to use PowerPoint because of its ability to create a drop shadow on the arrow. This helped the arrow to stand out in illustrations and therefore deemed to be worth the extra step.) For screencasts, OBS (Open Broadcaster Software) was used, a free, open-source screencasting program. Scripts were first written out completely and then read verbatim as the screencast was being recorded. Writing out the scripts ensured all the desired information was conveyed in the tutorial, and made the resulting video shorter and more dynamic. (Pro-tip: don't do anything too complex on the screen while speaking. It is ok to move the mouse pointer to direct attention, but if you need to click on buttons and text boxes, you should plan for ample rehearsal time.)

Screencasts ended up being around 1 minute, 15 seconds long, with the longest being 90 seconds. On average, it took users approximately 15 minutes to work through the tutorials and successfully create a web annotation. This included registering for a Solid pod, a necessary part of the process.

The tutorials were then tested on new users of the app. Feedback was generally positive, noting the clear presentation and clear instructions in the screencasts. Some users chose to follow only the screencasts, others only the text instructions. No one was observed using both; all managed to complete the tutorial successfully. A suggestion to place screencasts and text instructions together on the same chapter page was incorporated into the current version, as were many smaller suggestions for added clarity. Another useful comment regarding the size of the screencast on the page will have to await a further round of funding before it can be implemented.

These tutorials and screencasts have not yet been fully evaluated for accessibility. However, images do include alt text, and scripts were written beforehand for each screencast to ensure that sufficient description was included. Both of these are recommended measures for increasing accessibility (see in this context).

In addition to making the app and its code reusable, we hope this tutorial format can serve as an example of documentation for other research data software projects. There is little training available for researchers when it comes to making a web tutorial. While it is true that researchers are very often pedagogues as well and are probably already good at scaffolding knowledge to impart specific skills to learners, they may be interested in examples of tutorial design based on empirical research and verified through user testing. The Annotator App tutorial shows one straightforward possibility for improved online documentation.

In the future, it would be desirable to add a section to the tutorial explaining what goes on under the hood. That is, since this is a prototype and in many ways a demonstration of the data model developed by the project, it would be helpful to show where the model's classes are used in the app and what roles they play. Screenshots of the app could be juxtaposed with a graphic visualization of the model. Mockups of this sort were very useful when presenting the model at conferences and in journal articles and other publications. This would further increase the model's reusability, making it attractive for use in other projects, and ultimately, making a significant contribution to standardization in digital musicology.

Mark Saccomano