Abstract
We present a dependency treebank of Buddhist Chinese texts, containing more than 50K characters drawn from four sutras in the Chinese Buddhist Canon. With dates of composition that span almost five centuries, these sutras bear witness to the evolution of the Chinese language. The treebank has been annotated using the part-of-speech tagset of the Penn Chinese Treebank, and the Stanford Dependencies for Chinese with slight modifications. The article first discusses the texts and the annotation framework of this treebank, and reports on inter-annotator agreement. It then describes the search platform, to which the treebank has been imported, and applies the treebank to an open question in Chinese historical linguistics-the emergence of the Chinese copula.
Original language | English |
---|---|
Pages (from-to) | 140-151 |
Number of pages | 12 |
Journal | Digital Scholarship in the Humanities |
Volume | 31 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Apr 2016 |
Externally published | Yes |