Toiling with the Pāli Canon

The paper describes the preparation of a Buddhist corpus in the Middle Indo-Aryan language Pāli, which is available only in a flat TEI format, for content-based analysis. This task includes transforming the file into a hierarchical TEI P5 representation, followed by tokenisation (including sandhi re...

Full description

Saved in:  
Bibliographic Details
Published in:Proceedings of the Workshop on Corpus-Based Research in the Humanities (CRH)
Authors: Elwert, Frederik (Author) ; Sellmer, Sven 1969- (Author) ; Wortmann, Sven (Author) ; Pachurka, Manuel (Author) ; Knauth, Jürgen (Author) ; Alfter, David (Author)
Format: Electronic Article
Language:English
Check availability: HBZ Gateway
Fernleihe:Fernleihe für die Fachinformationsdienste
Published: Institute of Computer Science, Polish Academy of Sciences 2015
In: Proceedings of the Workshop on Corpus-Based Research in the Humanities (CRH)
Online Access: Volltext (kostenfrei)
Volltext (kostenfrei)
Description
Summary:The paper describes the preparation of a Buddhist corpus in the Middle Indo-Aryan language Pāli, which is available only in a flat TEI format, for content-based analysis. This task includes transforming the file into a hierarchical TEI P5 representation, followed by tokenisation (including sandhi resolution), lemmatisation, and POS tagging.
ISBN:8363159190
Contains:Enthalten in: Corpus-based Research in the Humanities (1. : 2015 : Warschau), Proceedings of the Workshop on Corpus-Based Research in the Humanities (CRH)
Persistent identifiers:DOI: 10.15496/publikation-52722
HDL: 10900/111346