LingPy

This documentation is for version 2.0.dev, which is not released yet.

lingpy.sequence.sound_classes.ipa2tokens

lingpy.sequence.sound_classes.ipa2tokens(istring, diacritics=None, vowels=None, tones=None, combiners='͜͡', breaks='.-', stress="ˈˌ'", merge_vowels=True)

Tokenize IPA-encoded strings.

Parameters :

seq : str

The input sequence that shall be tokenized.

diacritics : {str, None} (default=None)

A string containing all diacritics which shall be considered in the respective analysis. When set to None, the default diacritic string will be used.

vowels : {str, None} (default=None)

A string containing all vowel symbols which shall be considered in the respective analysis. When set to None, the default vowel string will be used.

tones : {str, None} (default=None)

A string indicating all tone letter symbals which shall be considered in the respective analysis. When set to None, the default tone string will be used.

combiners : str (default=”͜͡”)

A string with characters that are used to combine two separate characters (compare affricates such as t͡s).

breaks : str (default=”-.”)

A string containing the characters that indicate that a new token starts right after them. These can be used to indicate that two consecutive vowels should not be treated as diphtongs or for diacritics that are put before the following letter.

merge_vowels : bool

Indicate, whether vowels should be merged into diphtongs (default=True), or whether each vowel symbol should be considered separately.

Returns :

tokens : list

A list of IPA tokens.

Examples

>>> from lingpy import *
>>> myseq = 't͡sɔyɡə'
>>> ipa2tokens(myseq)
['t͡s', 'ɔy', 'ɡ', 'ə']

This Page