Load the library:
>>> import lingpy
Retrieve basic information about the library:
>>> print(lingpy.__doc__)
>>> print lingpyd.__doc__
LingPy --- A Python library for quantitative historical linguistics
===================================================================
Documentation is available in the docstrings. Online documentation is available
at http://lingulist.de/lingpy/
Submodules
-----------
sequence --- Sequence Modelling
compare --- Sequence Comparison
lexstat --- Language Comparison
Subpackages
-----------
algorithm --- Basic Algorithms for Sequence Comparison
align --- Specific Algorithms for PSA and MSA
data --- Data Handling
output --- Output Handling
test --- Tests and Evaluation
Load all important packages:
>>> from lingpy import *
Load the data given in the file test.psq:
>>> pairs = Pairwise(get_file('test.psq'))
Check for the sequence pairs:
>>> pairs.seqs
[('waldemar', 'vladimir'), ('woldemort', 'vladimir'), ('woldemort', 'waldemar')]
Align all sequences pairwise:
>>> pairs.align()
Print the data to the screen:
>>> print(pairs)
w a l - d e m a r
v - l a d i m i r
31.0
w a l - d e m a r
v - l a d i m i r
31.0
w o l - d e m o r t
v - l a d i m i - r
31.0
w o l d e m o r t
w a l d e m a - r
53.0
Load the file test.msq:
>>> mult = Multiple(get_file('test.msq'))
>>> mult.prog_align()
Print the data to the screen:
>>> print mult
w - o l d e m o r t
w - a l d e m a - r
v l a - d i m i - r
Align the data using the library method:
>>> mult.lib_align()
>>> print mult
w o l - d e m o r t
w a l - d e m a - r
v - l a d i m i - r
Carry out a check for swapped sites:
>>> mult.swap_check()
True
Get the percentage identity of the data:
>>> mult.get_pid()
0.43333333333333335
Get the sum-of-pairs score of the data:
>>> mult.sum_of_pairs()
7.2166666666666668
Load the file SLV.lxs from the test sets:
>>> lex = LexStat(get_file('SLV.lxs'))
Conduct an automatic search for cognates:
>>> lex.analyze(0.6)
[i] Loaded and calculated all essential values.
[i] Calculating scores for Russian and Russian ...
[i] Calculating scores for Russian and Polish ...
[i] Calculating scores for Russian and Bulgarian ...
[i] Calculating scores for Russian and Czech ...
[i] Calculating scores for Polish and Polish ...
[i] Calculating scores for Polish and Bulgarian ...
[i] Calculating scores for Polish and Czech ...
[i] Calculating scores for Bulgarian and Bulgarian ...
[i] Calculating scores for Bulgarian and Czech ...
[i] Calculating scores for Czech and Czech ...
[i] Created the library.
[i] Calculated pairwise scores.
[i] Calculated cognates.
Calculate the pairwise distances:
>>> lex.pairwise_distances()
array([[ 0. , 0.21818182, 0.28181818, 0.24770642],
[ 0.21818182, 0. , 0.29090909, 0.1559633 ],
[ 0.28181818, 0.29090909, 0. , 0.31192661],
[ 0.24770642, 0.1559633 , 0.31192661, 0. ]])
Cluster the data using the Neighbor-Joining algorithm:
>>> neighbor(lex.pairwise_distances(),lex.taxa)
'(((Russian:0.11,Bulgarian:0.18):0.05,Polish:0.07):0.09,Czech:0.09);'
Cluster the data using the UPGMA algorithm:
>>> upgma(lex.pairwise_distances(),lex.taxa)
'(Bulgarian:0.15,(Russian:0.12,(Polish:0.08,Czech:0.08):0.12):0.15);'