Toiling with the Pāli Canon

Elwert, Frederik; Sellmer, Sven; Wortmann, Sven; Pachurka, Manuel; Knauth, Jürgen; Alfter, David

doi:10.15496/publikation-52722

Toiling with the Pāli Canon

The paper describes the preparation of a Buddhist corpus in the Middle Indo-Aryan language Pāli, which is available only in a flat TEI format, for content-based analysis. This task includes transforming the file into a hierarchical TEI P5 representation, followed by tokenisation (including sandhi re...

Full description

Saved in:

Bibliographic Details
Authors:	Elwert, Frederik (Author) ; Sellmer, Sven 1969- (Author) ; Wortmann, Sven (Author) ; Pachurka, Manuel (Author) ; Knauth, Jürgen (Author) ; Alfter, David (Author)
Format:	Electronic Article
Language:	English
Check availability:	HBZ Gateway
Fernleihe:	Fernleihe für die Fachinformationsdienste
Published:	Institute of Computer Science, Polish Academy of Sciences 2015
In:	Proceedings of the Workshop on Corpus-Based Research in the Humanities (CRH) Year: 2015, Pages: 39-48
Online Access:	Volltext (kostenfrei) Volltext (kostenfrei)

Description
Summary:	The paper describes the preparation of a Buddhist corpus in the Middle Indo-Aryan language Pāli, which is available only in a flat TEI format, for content-based analysis. This task includes transforming the file into a hierarchical TEI P5 representation, followed by tokenisation (including sandhi resolution), lemmatisation, and POS tagging.
ISBN:	8363159190
Contains:	Enthalten in: Corpus-based Research in the Humanities (1. : 2015 : Warschau), Proceedings of the Workshop on Corpus-Based Research in the Humanities (CRH)
Persistent identifiers:	DOI: 10.15496/publikation-52722 HDL: 10900/111346

Toiling with the Pāli Canon

Similar Items