Unlocking colonial records with Artificial Intelligence. Achieving the automated transcription of large-scale 16th and 17th-century Latin American historical collections - Lancaster EPrints

Published: November 26, 2025 at 01:11 PM

News Article

artificial-intelligence

information-technology-and-computer-science

technology-and-engineering

science-and-technology

history

Unlocking colonial records with Artificial Intelligence. Achieving the automated transcription of large-scale 16th and 17th-century Latin American historical collections - Lancaster EPrints

Content

Between the 16th and 18th centuries, millions of documents were produced across Latin America, authored by both Spanish colonizers and Indigenous peoples. These colonial records were written in a variety of complex calligraphic styles and often included Indigenous languages alongside Spanish. Such diversity and the age of these documents have made their transcription and interpretation a significant challenge, requiring specialized palaeographic expertise and knowledge of historical languages. As a result, many valuable historical insights remain locked away in archives, inaccessible to most researchers and the public. Recent advances in Machine Learning and Artificial Intelligence have opened new pathways to address these transcription challenges. In this research, the team developed two primary computational tools to aid in unlocking the rich historical data contained within colonial archives. The first tool is a historical document classifier that employs Convolutional Neural Networks (CNNs) and Support Vector Machines (SVMs) to accurately identify and categorize different calligraphic styles found in the documents. This classifier demonstrated excellent performance, achieving F1 scores above 90% for most script types, indicating a high level of precision and recall. The second computational method focuses on Handwritten Text Recognition (HTR). Using the Transkribus platform, the researchers trained models specifically on 16th and 17th-century Spanish manuscripts, which allowed the automated transcription of handwritten texts. The results showed competitive Character Error Rates (CER), with 5.25% for Redonda script, 8.92% for Itálica Cursiva, and 14.15% for Procesal Simple. These error rates are noteworthy achievements given the complexity and historical variation in handwriting styles. Together, these automated tools enable the transformation of previously “unreadable” or inaccessible archival documents into digitized, searchable data. This breakthrough will significantly enhance the accessibility of Latin American historical records, empowering libraries, archives, and researchers to unlock centuries of information that had been difficult or impossible to decipher. Ultimately, this project represents a major step forward in the digital humanities, bridging the gap between historical scholarship and cutting-edge AI technology. The implications of this technological advancement extend beyond mere transcription. By automating the reading and classification of vast colonial document collections, the research opens new avenues for historical analysis, linguistic studies, and cultural preservation. It also presents opportunities for collaborations between historians, computer scientists, and Indigenous communities to reclaim and reinterpret their shared past. As these AI-powered techniques continue to improve, they promise to revolutionize how historical documents are studied, preserving invaluable cultural heritage for future generations.

Key Insights

This study centers on the development of AI-driven tools to transcribe and classify 16th- and 17th-century Latin American colonial documents, focusing on Spanish and Indigenous archival materials dating from between the 16th and 18th centuries.

Key facts include the creation of a historical document classifier using CNNs and SVMs achieving over 90% F1 scores, and Handwritten Text Recognition models trained on period-specific manuscripts with Character Error Rates as low as 5.25%.

Direct stakeholders encompass historians, archivists, and Indigenous communities, while secondary impacts may affect academic researchers and cultural institutions.

Immediate consequences involve improved access to previously inaccessible archival data, facilitating enhanced historical research and preservation.

Comparatively, this effort aligns with earlier digitization initiatives in colonial archives but surpasses them through advanced AI, paralleling projects like the digitization of European medieval manuscripts.

Optimistically, this innovation can drive new interdisciplinary studies and democratize access to cultural heritage; however, risks include potential model biases or misinterpretations needing mitigation through continuous expert oversight.

From a technical expert perspective, recommendations include prioritizing: (1) expansion of training datasets to cover more scripts for broader applicability (medium complexity, high impact); (2) integration of Indigenous language specialists in model refinement to improve accuracy and cultural sensitivity (high complexity, significant impact); and (3) development of user-friendly archival interfaces to maximize accessibility for non-technical users (low complexity, moderate impact).

Loading...

Unlocking colonial records with Artificial Intelligence. Achieving the automated transcription of large-scale 16th and 17th-century Latin American historical collections - Lancaster EPrints

Content

Key Insights

Editors' Choice