LingPy

This documentation is for version 2.0.dev, which is not released yet.

lingpy.compare.lexstat.LexStat.tokenize

LexStat.tokenize(ortho_profile='', source='counterpart', target='tokens', **keywords)

Tokenize the data with help of orthography profiles.

Parameters :

ortho_profile : str (default=’‘)

Path to the orthographic profile used to convert and tokenize the input data into IPA tokens. If not specified, a simple Unicode grapheme parsing is carried out.

source : str (default=”counterpart”)

The source data that shall be used for the tokenization procedures.

target : str (default=”tokens”)

The name of the target column that will be added to the wordlist.

Notes

This is a shortcut to the extended Wordlist class that loads data and automatically tokenizes it.

This Page