Published October 13, 2025 | Version v1.0.0
Dataset Open

Galician word2vec embeddings trained on OpenSubtitles

  • 1. ROR icon Harrisburg University of Science and Technology

Description

This dataset contains the subs2vec embeddings for Galician, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles

For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:

  • Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
  • Window size: varying context windows (e.g., 2, 5, 10, …)
  • Each file corresponds to a unique configuration (dimension × window size). 

Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1). 

If you use this dataset, please cite:

Files

Files (13.3 GB)

Name Size Download all
md5:e24affd84c75ec0ad862f8ac6d9f27b7
96.8 MB Download
md5:51beaac5e2c95993c916734928f8248e
96.9 MB Download
md5:ecbfced20517cd9edccd51aeb5e453c9
96.9 MB Download
md5:49d4965d9b1bb5cce8cfc404d359c2ee
96.9 MB Download
md5:c557e864685c22fe61cd351392db6b4b
96.9 MB Download
md5:32273e0c0d907c34ca58075eac230781
97.0 MB Download
md5:218da9353d4b8a0f7c027b2f4bad1fbc
96.8 MB Download
md5:d65b514a34e1196642c563b19c2b03f3
97.0 MB Download
md5:0c855c91e57eba17365336c4b0dbe31d
96.8 MB Download
md5:b73da14aab58ab09e9235cc0aadb5472
97.0 MB Download
md5:f0d8e7a7dc4bf2a933649476b2025d64
96.8 MB Download
md5:d89f893d90eff03c933d7ec6dbe37118
97.0 MB Download
md5:24ab61e4408f3d352f9a3a93fa56a257
192.8 MB Download
md5:46c7061640347c8a2bb14b16ab4505f8
193.2 MB Download
md5:9438b56e8dcb77ff2f86f21882939392
192.7 MB Download
md5:002519b9857d292921768f5e8a941673
193.2 MB Download
md5:23708ddcd48cc45d5d4c45b91abec702
192.7 MB Download
md5:3cac77d4b9c8a621e2c410df6b3ac01f
193.2 MB Download
md5:813d35b501beeca781c19e43ab3b70b0
192.6 MB Download
md5:262c2225110cbf3872e5d7057276a593
193.3 MB Download
md5:5a094431140d69010c34a299f40df725
192.5 MB Download
md5:d14f2c431b6ba206e8413b12fcd7456f
193.3 MB Download
md5:13d6672d1a0157692f3adfd2a994277f
192.5 MB Download
md5:764dbd86a9d3aea14a6b76e6d6aa6128
193.2 MB Download
md5:fa12a22253223705339665007951044e
288.9 MB Download
md5:bc175ce0f072dfa321c901d2473c8aa9
289.7 MB Download
md5:1a6269d52a342b2fa550ec08f8f3365a
288.6 MB Download
md5:2b2f482675c0650464731bb2dcc3ec53
289.7 MB Download
md5:83e8f978dca6445f62d35802c5faa00d
288.4 MB Download
md5:983ef8738b78c9065dee99741a2b5060
289.7 MB Download
md5:5aee9a09df724628e6d279dd5fe94ca3
288.4 MB Download
md5:25527d341bc74f09a3d2f5aa7c9635c8
289.7 MB Download
md5:c30d428b3c2fda8e8417a62d5e702237
288.3 MB Download
md5:5b798ee80b562aad17498465305eb6c9
289.8 MB Download
md5:bf27595e334c14fa192439463371c645
288.2 MB Download
md5:1ad9f4ba5f89222c1f1f107bd6df8fe2
289.6 MB Download
md5:a27da249f6a4f1ee89a0da3023ba8a2b
482.0 MB Download
md5:df535f23839e02b9900da5267f4d69c8
483.2 MB Download
md5:0f63bc583f39e95b08c71b33da218bea
481.0 MB Download
md5:9fad1367c30216a25a6c8cb0b2692eda
483.4 MB Download
md5:9fb117746a72c3c73965575e6d79c5e1
480.5 MB Download
md5:98022a90948804bffc3e915a80f30d90
483.4 MB Download
md5:ad241efd9817db753b50f792bda32da9
480.1 MB Download
md5:e1280772545513e102ddf9461b75c0b9
483.4 MB Download
md5:faadd99fad306d6df31a1c78ed9a63f5
479.9 MB Download
md5:825aa7153da20a8cd6b7aa72e73feb6f
483.3 MB Download
md5:58ddefea4bdc5963a3109b5397160aab
479.7 MB Download
md5:56e9649ca68c0117bfa44efc42caf656
483.2 MB Download
md5:6757a57511e1eb82f7f2ed6086356dfd
49.0 MB Download
md5:49407c6541fdc4fb46c1fafe23400678
49.0 MB Download
md5:68b8a3a054be78dfb2dfb49b11236bfd
49.0 MB Download
md5:165e98f4c06f61080e594a21e67f5671
49.0 MB Download
md5:f1087a2cbc8941078540cd6a19b9255b
49.1 MB Download
md5:518af4e8a62ba46eed8a0dd5a824c43b
49.1 MB Download
md5:d6e6e3e2d65c34a9c2cd144d945d0a5e
49.0 MB Download
md5:680fff02571111532cc11f33c4b818cd
49.1 MB Download
md5:aaff87b48cc67ed76bf4fce7e54583e0
49.0 MB Download
md5:4af9aecf135e6321699dea8168c0db0e
49.1 MB Download
md5:4fc013dd02d53d7b82af94f3795d0624
49.0 MB Download
md5:4d1bcc5476a2a0de4e6d63dcd7c86cf3
49.1 MB Download

Additional details

Related works

Is supplement to
Standard: 10.5281/zenodo.17243812 (DOI)

Software

Repository URL
https://github.com/SemanticPriming/word2manylanguages
Programming language
Python, R