3346007
doi
10.5281/zenodo.3346007
oai:zenodo.org:3346007
Altman, Russ B.
Stanford University
A global network of biomedical relationships derived from text
Percha, Bethany
Icahn School of Medicine at Mount Sinai
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
natural language processing
Medline
text mining
relation extraction
unsupervised learning
<p><strong>*** Note: Due to an issue with the 7/12/19 version of PubTator, this version of GNBR is missing some recent Medline citations. Please revert to Version 5 until we have a chance to publish Version 7. Apologies for the inconvenience.</strong></p>
<p>This repository contains labeled, weighted networks of chemical-gene, gene-gene, gene-disease, and chemical-disease relationships based on single sentences in PubMed abstracts. All raw dependency paths are provided in addition to the labeled relationships.</p>
<p>PART I: Connects dependency paths to labels, or "themes". Each record contains a dependency path followed by its score for each theme, and indicators of whether or not the path is part of the flagship path set for each theme (meaning that it was manually reviewed and determined to reflect that theme). The themes themselves are listed below and are in our paper (reference below).</p>
<p>PART II: Connects sentences to dependency paths. It consists of sentences and associated metadata, entity pairs found in the sentences, and dependency paths connecting those entity pairs. Each record contains the following information:</p>
<ul>
<li>PubMed ID</li>
<li>Sentence number (0 = title)</li>
<li>First entity name, formatted</li>
<li>First entity name, location (characters from start of abstract)</li>
<li>Second entity name, formatted</li>
<li>Second entity name, location</li>
<li>First entity name, raw string</li>
<li>Second entity name, raw string</li>
<li>First entity name, database ID(s)</li>
<li>Second entity name, database ID(s)</li>
<li>First entity type (Chemical, Gene, Disease)</li>
<li>Second entity type (Chemical, Gene, Disease)</li>
<li>Dependency path</li>
<li>Sentence, tokenized</li>
</ul>
<p>The "with-themes.txt" files only contain dependency paths with corresponding theme assignments from Part I. The plain ".txt" files contain all dependency paths.</p>
<p>This release contains the annotated network for the <strong>July 12, 2019 version of PubTator</strong>. The version discussed in our paper, below, is an older one - from April 30, 2016. If you're interested in that network, it can be found in Version 1 of this repository. We will be releasing updated networks periodically, as the PubTator community continues to release new versions of named entity annotations for Medline each month or so.</p>
<p>------------------------------------------------------------------------------------<br>
REFERENCES</p>
<p>Percha B, Altman RBA (2017) A global network of biomedical relationships derived from text. <em>Bioinformatics, </em>34(15): 2614-2624.<br>
Percha B, Altman RBA (2015) Learning the structure of biomedical relationships from unstructured text. <em>PLoS Computational Biology,</em> 11(7): e1004216.</p>
<p>This project depends on named entity annotations from the PubTator project:<br>
https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/</p>
<p>Reference:<br>
Wei CH et. al., PubTator: a Web-based text mining tool for assisting Biocuration, Nucleic acids research, 2013, 41 (W1): W518-W522.</p>
<p>Dependency parsing was provided by the Stanford CoreNLP toolkit (<strong>version 3.9.1</strong>):<br>
https://stanfordnlp.github.io/CoreNLP/index.html</p>
<p>Reference:<br>
Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60.</p>
<p>------------------------------------------------------------------------------------<br>
THEMES</p>
<p><strong>chemical-gene</strong><br>
(A+) agonism, activation<br>
(A-) antagonism, blocking<br>
(B) binding, ligand (esp. receptors)<br>
(E+) increases expression/production<br>
(E-) decreases expression/production<br>
(E) affects expression/production (neutral)<br>
(N) inhibits</p>
<p><strong>gene-chemical</strong><br>
(O) transport, channels<br>
(K) metabolism, pharmacokinetics<br>
(Z) enzyme activity</p>
<p><strong>chemical-disease</strong><br>
(T) treatment/therapy (including investigatory)<br>
(C) inhibits cell growth (esp. cancers)<br>
(Sa) side effect/adverse event<br>
(Pr) prevents, suppresses<br>
(Pa) alleviates, reduces<br>
(J) role in disease pathogenesis</p>
<p><strong>disease-chemical</strong><br>
(Mp) biomarkers (of disease progression)</p>
<p><strong>gene-disease</strong><br>
(U) causal mutations<br>
(Ud) mutations affecting disease course<br>
(D) drug targets<br>
(J) role in pathogenesis<br>
(Te) possible therapeutic effect<br>
(Y) polymorphisms alter risk<br>
(G) promotes progression</p>
<p><strong>disease-gene</strong><br>
(Md) biomarkers (diagnostic)<br>
(X) overexpression in disease<br>
(L) improper regulation linked to disease</p>
<p><strong>gene-gene</strong><br>
(B) binding, ligand (esp. receptors)<br>
(W) enhances response<br>
(V+) activates, stimulates<br>
(E+) increases expression/production<br>
(E) affects expression/production (neutral)<br>
(I) signaling pathway<br>
(H) same protein or complex<br>
(Rg) regulation<br>
(Q) production by cell population</p>
<p>------------------------------------------------------------------------------------<br>
FORMATTING NOTE</p>
<p>A few users have mentioned that the dependency paths in the "part-i" files are all lowercase text, whereas those in the "part-ii" files maintain the case of the original sentence. This complicates mapping between the two sets of files.</p>
<p>We kept the part-ii files in the same case as the original sentence to facilitate downstream debugging - it's easier to tell which words in a particular sentence are contributing to the dependency path if their original case is maintained. When working with the part-ii "with-themes" files, if you simply convert the dependency path to lowercase, it is guaranteed to match to one of the paths in the corresponding part-i file and you'll be able to get the theme scores.</p>
<p>Apologies for the additional complexity, and please reach out to us if you have any questions (see correspondence information in the <em>Bioinformatics</em> manuscript, above).</p>
Zenodo
2019-07-22
info:eu-repo/semantics/other
1035252
1579893914.584581
408845671
md5:45361876c61aa7bb8d9fdf29d7946c4b
https://zenodo.org/records/3346007/files/part-ii-dependency-paths-chemical-disease-sorted-with-themes.txt.gz
875596770
md5:3e2e021cab9a5c686c544a41607bf00a
https://zenodo.org/records/3346007/files/part-ii-dependency-paths-chemical-gene-sorted.txt.gz
155095510
md5:76e8d583bcbb4ab93a58890aa75ab292
https://zenodo.org/records/3346007/files/part-ii-dependency-paths-chemical-gene-sorted-with-themes.txt.gz
1121870814
md5:0d42fe2a13cd92d5caab0144711aa96d
https://zenodo.org/records/3346007/files/part-ii-dependency-paths-gene-disease-sorted.txt.gz
325360921
md5:34a5c72d0cae79f0f7f5d7cd644d6a64
https://zenodo.org/records/3346007/files/part-ii-dependency-paths-gene-disease-sorted-with-themes.txt.gz
72301879
md5:0a3eb8da2baf14b1d716d59be785f661
https://zenodo.org/records/3346007/files/part-i-chemical-disease-path-theme-distributions.txt.gz
396481380
md5:01df449d27020002168b8e33505b3159
https://zenodo.org/records/3346007/files/part-ii-dependency-paths-gene-gene-sorted-with-themes.txt.gz
66321595
md5:da90d026a14fe27307a72ae3c25686f2
https://zenodo.org/records/3346007/files/part-i-gene-disease-path-theme-distributions.txt.gz
1471012994
md5:8575ba81c5be54764d37a0f7b936fcb6
https://zenodo.org/records/3346007/files/part-ii-dependency-paths-chemical-disease-sorted.txt.gz
2579242017
md5:869a6c389972cf6b3c0e8602d8526aa7
https://zenodo.org/records/3346007/files/part-ii-dependency-paths-gene-gene-sorted.txt.gz
25579790
md5:725efc926b63e52abd37ee755eee0642
https://zenodo.org/records/3346007/files/part-i-chemical-gene-path-theme-distributions.txt.gz
54693465
md5:604f26d79d21f697fb51547133257b9e
https://zenodo.org/records/3346007/files/part-i-gene-gene-path-theme-distributions.txt.gz
public
10.5281/zenodo.1035252
isVersionOf
doi