Toiling with the Pāli Canon
The paper describes the preparation of a Buddhist corpus in the Middle Indo-Aryan language Pāli, which is available only in a flat TEI format, for content-based analysis. This task includes transforming the file into a hierarchical TEI P5 representation, followed by tokenisation (including sandhi re...
| Authors: | ; ; ; ; ; |
|---|---|
| Format: | Electronic Article |
| Language: | English |
| Check availability: | HBZ Gateway |
| Interlibrary Loan: | Interlibrary Loan for the Fachinformationsdienste (Specialized Information Services in Germany) |
| Published: |
2015
|
| In: |
Proceedings of the Workshop on Corpus-Based Research in the Humanities (CRH)
Year: 2015, Pages: 39-48 |
| Online Access: |
Volltext (kostenfrei) Volltext (kostenfrei) |
| Summary: | The paper describes the preparation of a Buddhist corpus in the Middle Indo-Aryan language Pāli, which is available only in a flat TEI format, for content-based analysis. This task includes transforming the file into a hierarchical TEI P5 representation, followed by tokenisation (including sandhi resolution), lemmatisation, and POS tagging. |
|---|---|
| ISBN: | 9788363159191 |
| Contains: | Enthalten in: Corpus-based Research in the Humanities (1. : 2015 : Warschau), Proceedings of the Workshop on Corpus-Based Research in the Humanities (CRH)
|
| Persistent identifiers: | DOI: 10.15496/publikation-52722 HDL: 10900/111346 |



