A dependency treebank of Chinese Buddhist texts

John Lee, Yin Hei Kong

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)


We present a dependency treebank of Buddhist Chinese texts, containing more than 50K characters drawn from four sutras in the Chinese Buddhist Canon. With dates of composition that span almost five centuries, these sutras bear witness to the evolution of the Chinese language. The treebank has been annotated using the part-of-speech tagset of the Penn Chinese Treebank, and the Stanford Dependencies for Chinese with slight modifications. The article first discusses the texts and the annotation framework of this treebank, and reports on inter-annotator agreement. It then describes the search platform, to which the treebank has been imported, and applies the treebank to an open question in Chinese historical linguistics-the emergence of the Chinese copula.

Original languageEnglish
Pages (from-to)140-151
Number of pages12
JournalDigital Scholarship in the Humanities
Issue number1
Publication statusPublished - 1 Apr 2016
Externally publishedYes


Dive into the research topics of 'A dependency treebank of Chinese Buddhist texts'. Together they form a unique fingerprint.

Cite this