Galician word2vec embeddings trained on OpenSubtitles
Description
This dataset contains the subs2vec embeddings for Galician, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles.
For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:
- Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
- Window size: varying context windows (e.g., 2, 5, 10, …)
- Each file corresponds to a unique configuration (dimension × window size).
Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1).
If you use this dataset, please cite:
- Manuscript: https://doi.org/10.5281/zenodo.17243812
- Data: This Zenodo dataset (using the DOI provided here)
Files
Files
(13.3 GB)
Name | Size | Download all |
---|---|---|
md5:e24affd84c75ec0ad862f8ac6d9f27b7
|
96.8 MB | Download |
md5:51beaac5e2c95993c916734928f8248e
|
96.9 MB | Download |
md5:ecbfced20517cd9edccd51aeb5e453c9
|
96.9 MB | Download |
md5:49d4965d9b1bb5cce8cfc404d359c2ee
|
96.9 MB | Download |
md5:c557e864685c22fe61cd351392db6b4b
|
96.9 MB | Download |
md5:32273e0c0d907c34ca58075eac230781
|
97.0 MB | Download |
md5:218da9353d4b8a0f7c027b2f4bad1fbc
|
96.8 MB | Download |
md5:d65b514a34e1196642c563b19c2b03f3
|
97.0 MB | Download |
md5:0c855c91e57eba17365336c4b0dbe31d
|
96.8 MB | Download |
md5:b73da14aab58ab09e9235cc0aadb5472
|
97.0 MB | Download |
md5:f0d8e7a7dc4bf2a933649476b2025d64
|
96.8 MB | Download |
md5:d89f893d90eff03c933d7ec6dbe37118
|
97.0 MB | Download |
md5:24ab61e4408f3d352f9a3a93fa56a257
|
192.8 MB | Download |
md5:46c7061640347c8a2bb14b16ab4505f8
|
193.2 MB | Download |
md5:9438b56e8dcb77ff2f86f21882939392
|
192.7 MB | Download |
md5:002519b9857d292921768f5e8a941673
|
193.2 MB | Download |
md5:23708ddcd48cc45d5d4c45b91abec702
|
192.7 MB | Download |
md5:3cac77d4b9c8a621e2c410df6b3ac01f
|
193.2 MB | Download |
md5:813d35b501beeca781c19e43ab3b70b0
|
192.6 MB | Download |
md5:262c2225110cbf3872e5d7057276a593
|
193.3 MB | Download |
md5:5a094431140d69010c34a299f40df725
|
192.5 MB | Download |
md5:d14f2c431b6ba206e8413b12fcd7456f
|
193.3 MB | Download |
md5:13d6672d1a0157692f3adfd2a994277f
|
192.5 MB | Download |
md5:764dbd86a9d3aea14a6b76e6d6aa6128
|
193.2 MB | Download |
md5:fa12a22253223705339665007951044e
|
288.9 MB | Download |
md5:bc175ce0f072dfa321c901d2473c8aa9
|
289.7 MB | Download |
md5:1a6269d52a342b2fa550ec08f8f3365a
|
288.6 MB | Download |
md5:2b2f482675c0650464731bb2dcc3ec53
|
289.7 MB | Download |
md5:83e8f978dca6445f62d35802c5faa00d
|
288.4 MB | Download |
md5:983ef8738b78c9065dee99741a2b5060
|
289.7 MB | Download |
md5:5aee9a09df724628e6d279dd5fe94ca3
|
288.4 MB | Download |
md5:25527d341bc74f09a3d2f5aa7c9635c8
|
289.7 MB | Download |
md5:c30d428b3c2fda8e8417a62d5e702237
|
288.3 MB | Download |
md5:5b798ee80b562aad17498465305eb6c9
|
289.8 MB | Download |
md5:bf27595e334c14fa192439463371c645
|
288.2 MB | Download |
md5:1ad9f4ba5f89222c1f1f107bd6df8fe2
|
289.6 MB | Download |
md5:a27da249f6a4f1ee89a0da3023ba8a2b
|
482.0 MB | Download |
md5:df535f23839e02b9900da5267f4d69c8
|
483.2 MB | Download |
md5:0f63bc583f39e95b08c71b33da218bea
|
481.0 MB | Download |
md5:9fad1367c30216a25a6c8cb0b2692eda
|
483.4 MB | Download |
md5:9fb117746a72c3c73965575e6d79c5e1
|
480.5 MB | Download |
md5:98022a90948804bffc3e915a80f30d90
|
483.4 MB | Download |
md5:ad241efd9817db753b50f792bda32da9
|
480.1 MB | Download |
md5:e1280772545513e102ddf9461b75c0b9
|
483.4 MB | Download |
md5:faadd99fad306d6df31a1c78ed9a63f5
|
479.9 MB | Download |
md5:825aa7153da20a8cd6b7aa72e73feb6f
|
483.3 MB | Download |
md5:58ddefea4bdc5963a3109b5397160aab
|
479.7 MB | Download |
md5:56e9649ca68c0117bfa44efc42caf656
|
483.2 MB | Download |
md5:6757a57511e1eb82f7f2ed6086356dfd
|
49.0 MB | Download |
md5:49407c6541fdc4fb46c1fafe23400678
|
49.0 MB | Download |
md5:68b8a3a054be78dfb2dfb49b11236bfd
|
49.0 MB | Download |
md5:165e98f4c06f61080e594a21e67f5671
|
49.0 MB | Download |
md5:f1087a2cbc8941078540cd6a19b9255b
|
49.1 MB | Download |
md5:518af4e8a62ba46eed8a0dd5a824c43b
|
49.1 MB | Download |
md5:d6e6e3e2d65c34a9c2cd144d945d0a5e
|
49.0 MB | Download |
md5:680fff02571111532cc11f33c4b818cd
|
49.1 MB | Download |
md5:aaff87b48cc67ed76bf4fce7e54583e0
|
49.0 MB | Download |
md5:4af9aecf135e6321699dea8168c0db0e
|
49.1 MB | Download |
md5:4fc013dd02d53d7b82af94f3795d0624
|
49.0 MB | Download |
md5:4d1bcc5476a2a0de4e6d63dcd7c86cf3
|
49.1 MB | Download |
Additional details
Related works
- Is supplement to
- Standard: 10.5281/zenodo.17243812 (DOI)
Software
- Repository URL
- https://github.com/SemanticPriming/word2manylanguages
- Programming language
- Python, R