LingPy

lingpy.algorithm.misc.ipa2tokens

lingpy.algorithm.misc.ipa2tokens(seq, diacritics=None, vowels=None, merge_vowels=True)

Tokenize IPA-encoded strings.

Parameters :

seq : string or unicode

The input sequence that shall be tokenized.

diacritics : unicode

A string containing all diacritics which shall be considered in the respective analysis. When set to None, the default diacritic string will be used.

vowels : unicode

A string containing all vowel symbols which shall be considered in the respective analysis. When set to None, the default vowel string will be used.

merge_vowels : bool

Indicate, whether vowels should be merged into diphtongs (default=True), or whether each vowel symbol should be considered separately.

Returns :

tokens : list

A list of IPA tokens.

Examples

>>> from lingpy import *
>>> myseq = 't͡sɔyɡə'
>>> ipa2tokens(myseq)
[u't\u0361s', u'\u0254y', u'\u0261', u'\u0259']
>>> for t in ipa2tokens(myseq): print t
t͡s 
ɔy  
ɡ   
ə