Hackathon in the Arava
Atelier organisé par Nachum Dershowitz, résident de l'IEA de Paris
The objective of the hackathon is to develop digital tools that will facilitate the identification of shared passages—passages found in two or more texts as a result of borrowing or citing —and thus enable the comprehensive study of the evolution of canonical Buddhist corpora, including scriptures, their commentaries, and related treatises. The specific corpus to be studied is the Tibetan Buddhist canon, consisting of Indic Buddhist literature in Tibetan translation. This corpus was formed over a period of more than a thousand years
and contains various layers of materials; its translation into Tibetan spanned several centuries.
We will probe methods of finding inexact quotations and borrowed texts within the corpus, simultaneously comparing numerous different textual units, with the aim to better our understanding of the evolution and emergence of individual texts. Moreover, the tools also aim at allowing a better understanding of the processes of translation and revision of translation, and of editorial policies as well. Particular challenges are the monosyllabic nature of the Tibetan language, the omission or addition of grammatical particles without changing the meaning, varying orthographies, and affluence of homophones, which must be taken into consideration.
The importance of such an application lies first and foremost in that it will allow scholars to uncover the emergence and history of transmission of individual texts and textual corpora and to better understand the
intellectual history of Buddhism.
The intertextual tools developed for Tibetan should have wide applicability to other languages and corpora.
|
Machine Learning Tools for Historical Documents 01 octobre 2015 - 30 juin 2016 |
|