The vast collections of ancient texts housed in historical document libraries often contain interrelated content that can unveil previously undiscovered links between historical events. Recognizing these connections is pivotal for advancing historical research and understanding. This project focuses on developing a novel tool designed to identify and analyze similar historical texts within a large collection.
Leveraging natural language processing (NLP) techniques, such as N-grams, the tool aims to uncover content-based relationships among diverse historical texts. Acknowledging the challenges posed by the heterogeneity and complexity of ancient documents, the project also incorporates advanced preprocessing methods to enhance the robustness and accuracy of text analysis. By bridging the gap between scattered historical narratives, this initiative aspires to provide researchers with an innovative resource for exploring the intricate web of connections in historical sources.
Fellow
Director
Research Junior Group Lead