Morph tagging

From Medialab

Morphological attributes are extracted from information present in the Tanl morphed POS tags produced by the POS tagger, and added as a separate attribute to tokens..

This is done by the script morpho-splitter.py present in the PosTagger directory.

It splits clitics and morphology, creating a ConllX file with columns:

  id   form   lemma   cpos   pos   morph

The input should be a file with three columns:

  form   pos   lemma

where pos is a combined POS+Morph tag and lemma contains a representation of clitics, when present, separated by dashes.

Usage:

 morph-splitter.py [options] [file]

Options:

 -h, --help            display this help and exit
 --usage               display script usage