Toiling with the Pāli Canon
The paper describes the preparation of a Buddhist corpus in the Middle Indo-Aryan language Pāli, which is available only in a flat TEI format, for content-based analysis. This task includes transforming the file into a hierarchical TEI P5 representation, followed by tokenisation (including sandhi re...
Authors: | ; ; ; ; ; |
---|---|
Format: | Electronic Article |
Language: | English |
Check availability: | HBZ Gateway |
Fernleihe: | Fernleihe für die Fachinformationsdienste |
Published: |
Institute of Computer Science, Polish Academy of Sciences
2015
|
In: |
Proceedings of the Workshop on Corpus-Based Research in the Humanities (CRH)
Year: 2015, Pages: 39-48 |
Online Access: |
Volltext (kostenfrei) Volltext (kostenfrei) |
Summary: | The paper describes the preparation of a Buddhist corpus in the Middle Indo-Aryan language Pāli, which is available only in a flat TEI format, for content-based analysis. This task includes transforming the file into a hierarchical TEI P5 representation, followed by tokenisation (including sandhi resolution), lemmatisation, and POS tagging. |
---|---|
ISBN: | 8363159190 |
Contains: | Enthalten in: Corpus-based Research in the Humanities (1. : 2015 : Warschau), Proceedings of the Workshop on Corpus-Based Research in the Humanities (CRH)
|
Persistent identifiers: | DOI: 10.15496/publikation-52722 HDL: 10900/111346 |