Training Transskribus on Middle Dutch: Are AI Transcriptions of Handwritten Text Here to Stay?

By Jake Erlewine

While the digitisation of memory institutions’ handwritten collections continues to increase, a tool that enables researchers to decipher and search through these inventories easily remains elusive. For digital humanities scholars, the labor-intensive process of manual transcription is a large obstacle toward creating user-friendly, accessible online archival collections. Transcription initiatives, whether using employed researchers or crowdsourcing, require funding and manpower that many non-profits do not have at their disposal. For example, UCL’s Transcribe Bentham project, which sought to digitise the 32,000 manuscripts written by Jeremy Bentham that were not already transcribed, cost upwards of 600,000 GBP and relied on over 6,000 volunteers to mark his letters up in XML. For smaller institutions and individuals who lack access to these resources, automating this process through Handwritten Text Recognition (HTR) technology represents a cost-effective and efficient way to make their manuscripts more available to researchers and the general public.

For medievalists, however, the diversity of languages and letterforms in illuminated manuscripts make accurate transcriptions from a generalised algorithm near-impossible to obtain without an inordinate amount of human training. Bespoke projects, such as the University of Groningen’s Monk system, are usually confined to one language/script, and broadening the algorithm’s applicability in a chronological sense requires the human digitisation/transcription of several books’ worth of material. Corporation-led projects, while having access to reliable funding, do not normally publish their algorithms, creating problems for researchers pertaining to the long-run storage of copyrighted images/documents and other digital assets on private servers. At the middle of this spectrum is Transkribus, developed by and for the academic community at the University of Innsbruck and managed by a cooperative of scholars (READ-COOP). Empowering users with the ability to train tailored HTR models for their respective transcription projects under one neural network, the platform strikes a balance between ensuring accurate targeted transcriptions while also creating a feedback loop wherein the activities of independent scholars increase the efficacy of Transkribus for the entire user base. In what follows, I will spotlight my own experience creating a model to transcribe late Middle Dutch handwritten texts, trained on the Gruuthuse Manuscript at the Royal Library of the Netherlands (KB). A key work in Dutch history and musicology, the poems and songs in the Gruuthuse Manuscript acquired an unexpected afterlife in this project, but it is one that will allow researchers to add to the wealth of information about the early modern Low Countries that the manuscript already contains.

The Gruuthuse Manuscript is primarily a songbook, with 147 songs included along with their respective melodies. It contains contributions spanning from 1395-1408, and it entered the collection of the Gruuthuse family around 1460. The texts and melodies represent notable developments in Dutch culture, with songs and poems covering not only religious topics, but also secular and illicit themes such as drunkenness and sex. Consequently, its vocabulary is more diversified than that of a psalter or a Book of Hours, which will allow for a broader application of the trained model. The orthography of the melodies, lacking rhythm markings and employing a form of rudimentary stroke notation, has been the target of several studies, and the specifics of the relationship between the texts and their melodies remains a topic of debate. While the text itself is spaced relatively clearly and consistently, it does contain scribal abbreviations (such as a line on top of the last letter of a word to represent an n or an m) and ligatures that I thought would provide a challenge for Transkribus’ optical character recognition (OCR) system.

However, I mainly selected pages from the manuscript’s eighteen poems due to their cleaner and more consistent mise-en-page. Each leaf contained fifty ruled lines of texts split across two columns, whereas the layout for the songbook is much more varied (c.f. figs. 1 and 2). The ‘Universal Lines’ model, by and large the most comprehensive model for independent layout analysis on Transkribus, also routinely confused the musical notation for textual regions and lines, adding to the difficulty of training the model on the Gruuthuse codex’s musical material. A lack of comprehension was also evident in areas of environmental and targeted wear, which in some cases separated lines and compromised entire regions of text. Transkribus was not able to identify layouts clearly on heavily worn sections, embodying the inability to fully separate a digital representation of a manuscript from its physical source. All of this is important for the transcription process, as while layout and text recognition analysis can be run on Transkribus as one job, I found that running them independent of each other produced much more satisfactory results in terms of text region and line identification. As this analysis forms the basis for matching ground truth transcriptions with their respective lines on the manuscript, it was paramount that it be as accurate as possible.

Fig. 1 (left) Folio 12v, a song leaf, compared with Fig 2 (right): Folio 41v, containing a poem. Gruuthuse Manuscript, c. 1395-1408, Brugge. The Hague: KB, KW79 K10.

In total, the initial training set was comprised of twenty-three leaves from the manuscript, containing some 12,967 words. For handwritten material, Transkribus recommends around 10,000 words to provide sufficient coverage for the first training run. Two of these pages were held back from the model by Transkribus to form a validation set, in other words a dataset that simulates the transcription of a leaf that does not contain ground truth material. This, in turn, gives the trainer an estimation of the Character Error Rate (CER) as the model is tuned. To generate ground truth material, I used the transcriptions of the Gruuthuse Manuscript that have been manually executed by the research staff at the KB and imputed them line-by-line into the text layout generated by Transkribus. Helpfully, these transcriptions did away with ligatures and scribal abbreviations, which was reflected in the output of the model. As of now, manual transcription, while time-consuming, is by far the most accurate way to generate ground truth material and engenders a more accurate model in the long run. For reference, the automatic transcriptions I generated using a generalized model for Dutch averaged a CER of about 80%, rendering them illegible. Transkribus’ layout software was incredibly accurate, perfectly identifying the layout on nineteen of twenty-three pages. Common errors in this step included splitting one line of text into two or leaving out the first letter of a line of poetry which had been spaced apart from the line itself. The platform has a very user-friendly interface which made deleting wrongly identified lines and reinstating the correct layout easy through a drag-and-click process (c.f. fig. 3). With the ground truth and layout analysis solidified, I was ready to create my model through Transkribus’ neural network interface.

Fig. 3: UI for Transkribus’ layout analysis and ground truth entry. Author’s own image created during his time at the Royal Library of the Netherlands.

I opted to have my model run through 200 epochs in its training run, which in hindsight was overkill. 50-100 epochs would have optimised the process, but having not used the system before, I wanted to ensure that I obtained a satisfactory result. Instead of basing my model off Transkribus’ baseline model for modern Dutch, I opted to use Jesse Dijkshoorn’s model for fourteenth-century Dutch Charters (the only other public Middle Dutch model) as a starting point. Perhaps because this model was trained on cursive, rather than the textura font of the Gruuthuse manuscript, CER began at around 60% during the first epoch but steeply declined from the outset. As seen in the graph below (fig. 4), the validation CER was below 10% by the eighth epoch and remained relatively stable from then on. While I did not stipulate that the training run should stop if the model did not continue to improve, this was an option and may have helped to mitigate the job’s data/environmental footprint.

Fig. 4: Chart showing the results of training, with CER on the y-axis and number of epochs on the x-axis. Author’s own image created during his time at the Royal Library of the Netherlands.

By the end of the run, the CER was 4.04% with a low of 3.92%, which is very accurate for an HTR model. Of course, this was benefitted by the fact that I only trained the Transkribus model on one scribal hand, and more ground truth material from different hands would be useful in making a model with greater applications. As an example of the limitations of a model trained on a small dataset, when I attempted to transcribe a leaf from a manuscript of Jacob van Maerlant’s Der naturen bloeme (c. 1340-50) at the KB, the model’s CER jumped to 10% (fig. 5). While this is still readable, the decrease in transcription quality is noticeable. This issue could be rectified with training on more diverse ground truth material, but it speaks to the sheer volume of training required to overcome the stylistic gaps created by chronological or hand differences. In this sense, Transkribus is better suited for smaller or more focused projects that target stylistically similar manuscripts, and I am skeptical that training an HTR model that could decipher the entirety of a national memory institution’s handwritten documents would be financially feasible, let alone possible.

Fig. 5: Generative transcription of another Gruuthuse folio using the model. Author’s own image created during his time at the Royal Library of the Netherlands.

As helpful as Transkribus may seem, it is important to consider the pedagogical, epistemological, and environmental consequences of relying so heavily on an AI-driven Virtual Research Environment. The separation of ‘the text’ from its physical manuscript raises several problems relating to the true representation of the object. Unlike printed works, manuscripts can have words squeezed above or below a line as an addition, or comments in the margins that are in some cases needed to understand the text. Transkribus, by doing away with ligatures and sigla, does make the text easier to read and comprehend. The cost of this, however, is that medieval and early modern orthography may not be as visible to researchers in the future. The same can also be said for paleography. In a 2022 survey of researchers using HTR conducted by Melissa Terras, 38% of respondents said that HTR improved their own paleography skills, and only 10% said that it reduced their need to use them. The surveyed researchers were united in a belief of the continued importance of paleographic teaching and study, as at a minimum “a critical review of machine performance will always be necessary.” Due to the vast amount of training that would be required for a model to cover the orthographic and stylistic nuances of the medieval manuscript without performance decreasing, a pedagogical situation where paleographic studies are conceived simply as a means to train HTR models appears fanciful. On the other hand, if scholarly reliance on Transkribus and other VREs continues to increase, a long-term trend of paleographic research projects decreasing in number due to the improved accessibility of transcriptions seems much more likely. More important in the short run, however, are the environmental consequences of an increased reliance on AI by the academic community. Transkribus, hosted on 100% clean-energy servers in Innsbruck, has a limited footprint because of its location, but data on the servers’ cooling systems is not made publicly available. As the demand for generative AI – and its environmental footprint – continues to rise, digital humanities scholars must examine the environmental impact of VREs and determine for themselves whether the natural cost of AI-powered HTR transcription outweighs the epistemological benefits of using such a system.

In summary, while medieval manuscript studies are due to venture further into the digital realm, it is imperative that the scholarly community tread carefully when separating the text of a manuscript from the vellum and ink of which it is composed. Through showcasing the Gruuthuse Manuscript AI model I created, both the promise and limitations of Transkribus come to the fore. The transparency, sustainability, and accountability of READ-COOP serve as a model that other VREs should follow. If similar initiatives continue to prioritise the sustainable integration of AI into humanistic studies alongside technological performance, as the current cooperative has done, then HTR has the potential to effect monumental change within the storage, analysis, and dissemination of handwritten cultural material. This change, however, must remain grounded in a critical awareness of AI’s role as an interpretive tool, rather than being driven by a false belief that the digital presence of a manuscript is equal to the book itself.

 

Bibliography:

https://app.transkribus.org/models/public/57490

Bashir, Noman, Priya Donti, James Cuff, Sydney Sroka, Marija Ilic, Vivienne Sze, Christina Delimitrou, and Elsa Olivetti. 2024. “The Climate and Sustainability Implications of Generative AI.” An MIT Exploration of Generative AI, March. https://doi.org/10.21428/e4baedd9.9070dfe7.

Biezen, Jan van. “The Music Notation of the Gruuthuse Manuscript and Related Notations.” Tijdschrift van de Vereniging Voor Nederlandse Muziekgeschiedenis 22, no. 4 (1972): 231–51. https://doi.org/10.2307/938816.

Tim Causer, ‘Many Hands Make Light Work. Many Hands Together Make Merry

Work’: Transcribe Bentham And Crowdsourcing Manuscript Collections, in Mia Ridge (ed.),

Crowdsourcing Our Cultural Heritage, Farnham 2014, 57-88. https://doi-org.ezproxy.st-andrews.ac.uk/10.4324/9781315575162

https://help.transkribus.org/model-setup-and-training

https://www.ai.rug.nl/~lambert/Monk-collections-english.html

https://galerij.kb.nl/kb.html#/en/gruuthuse/page/6/zoom/2/lat/-81.3612872605706…

Hughes, Lorna M. Digitizing Collections : Strategic Issues for the Information Manager / Lorna M. Hughes. London: Facet, 2004..

Lit, L.W.C. van. “Paleography: Between Erudition and Computation.” In Among Digitized Manuscripts. Philology, Codicology, Paleography in a Digital World, 102–31. Brill, 2020. http://www.jstor.org/stable/10.1163/j.ctv2gjwzrd.8.

Terras, Melissa. “Inviting AI into the Archives: The Reception of Handwritten Recognition Technology into Historical Manuscript Transcription.” In Archives, Access and Artificial Intelligence: Working with Born-Digital and Digitized Archival Collections, edited by Lise Jaillant, 1st ed., 179–204. transcript Verlag, 2022. http://www.jstor.org/stable/jj.11425482.10. 181.

HASTA