BIP4COVID19: Impact metrics and indicators for coronavirus related publications
Creators
- 1. Athena Research Center
Description
This dataset contains impact metrics and indicators for a set of publications that are related to the COVID-19 infectious disease and the coronavirus that causes it. It is based on (a) the CORD-19 dataset released by the team of Semantic Scholar1 and (b) the curated data provided by the LitCovid hub2.
These data have been cleaned and integrated with data from other sources (e.g., PMC). The result was a subset of the COVID-19 dataset (34,248 unique articles). We constructed the underlying citation network and utilized it to produce, for each article, the values of the following impact measures, using the PaperRanking (https://github.com/diwis/PaperRanking) library3:
- Citation-based influence: This is based on the PageRank3 network analysis method. In the context of citation networks, it estimates the importance of each article based on its centrality in the network. Since it considers the whole network, it is an indicator of the impact in the long term.
- Citation-based popularity: This is based on the RAM4 citation network analysis method. Methods like PageRank are biased against recently published articles (new articles need time to receive their first citations). RAM alleviates this problem using an approach known as "time-awareness". This is why it is more suitable to capture the impact of a publication in the short term.
We provide two CSV files, both containing the same information, however the one having its entries ordered by the influence score and the other one by the popularity score. Both CSV files are tab separated and have the same columns (DOI, PMC_id, PubMed_id, popularity_score, influence_score).
The work is based on the following publications:
- COVID-19 Open Research Dataset (CORD-19). 2020. Version 2020-03-23. Retrieved from https://pages.semanticscholar.org/coronavirus-research. Accessed 2020-03-23. doi:10.5281/zenodo.3715506
- Chen Q, Allot A, & Lu Z. (2020) Keep up with the latest coronavirus research, Nature 579:193 (version 2020-03-25)
- I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019
- Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380
- R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
Terms of use: These data are provided "as is", without any warranties of any kind. The data are provided under the Creative Commons Attribution 4.0 International license.
Notes
Files
articles_by_influence.csv
Files
(4.7 MB)
Name | Size | Download all |
---|---|---|
md5:a81d4969d0721389722c50e45c672715
|
2.3 MB | Preview Download |
md5:18996396c23ab78a3cc1f634c21ff6bd
|
2.3 MB | Preview Download |
Additional details
References
- COVID-19 Open Research Dataset (CORD-19). 2020. Version 2020-03-23. Retrieved from https://pages.semanticscholar.org/coronavirus-research. Accessed 2020-03-23.
- I. Kanellos, T. Vergoulis, D. Sacharidis, T. Dalamagas, Y. Vassiliou: Impact-Based Ranking of Scientific Publications: A Survey and Experimental Evaluation. TKDE 2019
- Rumi Ghosh, Tsung-Ting Kuo, Chun-Nan Hsu, Shou-De Lin, and Kristina Lerman. 2011. Time-Aware Ranking in Dynamic Citation Networks. In Data Mining Workshops (ICDMW). 373–380
- R. Motwani L. Page, S. Brin and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
- Chen Q, Allot A, & Lu Z. (2020) Keep up with the latest coronavirus research, Nature 579:193 (version 2020-03-25)