Unlocking colonial records with Artificial Intelligence. Achieving the automated transcription of large-scale 16th and 17th-century Latin American historical collections - Lancaster EPrints

Content
Key Insights
This study centers on the development of AI-driven tools to transcribe and classify 16th- and 17th-century Latin American colonial documents, focusing on Spanish and Indigenous archival materials dating from between the 16th and 18th centuries.
Key facts include the creation of a historical document classifier using CNNs and SVMs achieving over 90% F1 scores, and Handwritten Text Recognition models trained on period-specific manuscripts with Character Error Rates as low as 5.25%.
Direct stakeholders encompass historians, archivists, and Indigenous communities, while secondary impacts may affect academic researchers and cultural institutions.
Immediate consequences involve improved access to previously inaccessible archival data, facilitating enhanced historical research and preservation.
Comparatively, this effort aligns with earlier digitization initiatives in colonial archives but surpasses them through advanced AI, paralleling projects like the digitization of European medieval manuscripts.
Optimistically, this innovation can drive new interdisciplinary studies and democratize access to cultural heritage; however, risks include potential model biases or misinterpretations needing mitigation through continuous expert oversight.
From a technical expert perspective, recommendations include prioritizing: (1) expansion of training datasets to cover more scripts for broader applicability (medium complexity, high impact); (2) integration of Indigenous language specialists in model refinement to improve accuracy and cultural sensitivity (high complexity, significant impact); and (3) development of user-friendly archival interfaces to maximize accessibility for non-technical users (low complexity, moderate impact).