These files include the raw data and some analysis for the paper: Piwowar H, Priem J, Larivière V, Alperin JP, Matthias L, Norlander B, Farley A, West J, Haustein S. (2017) The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles. PeerJ Preprints 5:e3119v1 https://doi.org/10.7287/peerj.preprints.3119 The remainder of the analysis can be run using the R scripts here: https://github.com/Impactstory/oadoi-paper1 For more details on oaDOI, including using its API to get more data and updated data, see http://oadoi.org Analysis files: - accuracy_analysis.xlsx - wos_analysis.xlsx Raw data files: - crossref_100k.csv.gz - wos_100k.csv - unpaywall_100k.csv.gz Columns for raw data files: doi (string) The DOI for the article, from Crossref. evidence (string) How we found this OA location. Example values: * "oa journal (via journal title in doaj)"" We found the name of the journal that publishes this article in the DOAJ database. * "oa repository (via pmcid lookup)"" We found this article in an index of PubMed Central articles. oa_color_long (string) The OA "color" of the open copy we found. See paper for details. best_open_url (string) The url of the open copy we found. Although this URL points to fulltext of some kind, there is no format normalization...it could be PDF, HTML, or even Word or TeX. year (int) The year of the article, from Crossref. found_green (bool) True if we found a green copy, even if we also found a hybrid, gold, or bronze copy. See paper for details. journal (string): The name of the journal publishing this article, from Crossref. The same journal may have multiple name strings (eg, "J. Foo", "Journal of Foo", "JOURNAL OF FOO", etc). These have not been fully normalized within our database, so use with care. publisher (string) The name of this paper's publisher, from Crossref. Keep in mind that publisher name strings change over time, particularly as publishers are acquired or split up. license (string) The license under which this copy is published. null when not found. We return several types of licenses: * Creative Commons licenses are uniformly abbreviated and lowercased. Example: cc-by-nc * Publisher-specific licenses are normalized using this format: acs-specific: authorchoice/editors choice usage agreement *When we have evidence that an OA license of some kind was used, but it’s not reported directly on the webpage at this location, this field returns implied-oa random (float) A random number. Sample: first few lines of crossref_100k.csv doi,evidence,oa_color_long,best_open_url,year,found_green,journal,publisher,license,random 10.1121/1.4806593,oa repository (via BASE title and first author match),green,https://eprints.soton.ac.uk/353400/1/ICA_2013_Exploration%2520of%2520the%2520differences%2520between%2520the%2520PU%2520and%2520reverberant%2520room%2520method.pdf,2013,TRUE,J. Acoust. Soc. Am.,Acoustical Society of America (ASA),,0 10.1121/1.1318900,oa repository (via BASE doi match),green,http://orbit.dtu.dk/files/5417578/Tarnow.pdf,2000,TRUE,J. Acoust. Soc. Am.,Acoustical Society of America (ASA),,0 10.1063/1.2357929,oa repository (via BASE title and first author match),green,http://arxiv.org/pdf/cond-mat/0606081,2006,TRUE,The Journal of Chemical Physics,AIP Publishing,,0