Toiling with the Pāli Canon
The paper describes the preparation of a Buddhist corpus in the Middle Indo-Aryan language Pāli, which is available only in a flat TEI format, for content-based analysis. This task includes transforming the file into a hierarchical TEI P5 representation, followed by tokenisation (including sandhi re...
Authors: | ; ; ; ; ; |
---|---|
Format: | Electronic Article |
Language: | English |
Check availability: | HBZ Gateway |
Fernleihe: | Fernleihe für die Fachinformationsdienste |
Published: |
Institute of Computer Science, Polish Academy of Sciences
2015
|
In: |
Proceedings of the Workshop on Corpus-Based Research in the Humanities (CRH)
Year: 2015, Pages: 39-48 |
Online Access: |
Volltext (kostenfrei) Volltext (kostenfrei) |