Published October 27, 2025 | Version v1
Poster Open

HardToCache: An Automated Framework for Quantifying and Enhancing Reproducibility Risks in Computational Research

  • 1. ROR icon Elizabeth City State University
  • 2. ROR icon Morehouse College
  • 3. ROR icon Mississippi Valley State University

Description

Computational studies are currently facing a significant reproducibility crisis, with approximately 70% of studies failing to replicate due to undocumented dependencies, environmental drift, and fragmented workflows. To address this issue, we introduce HardToCache—a novel framework designed to quantify reproducibility risks across research artifacts systematically.

Our methodology utilizes an automated pipeline to: (1) collect code and data from publications using scraping tools, (2) score artifacts with an 8-dimensional scorecard (rated from 0 to 5 in each category), and (3) share the results through visualizations and open GitHub repositories. Our evaluation has revealed critical gaps, as few papers provide fully executable code or detailed specifications of the environment.

The framework incorporates tools such as Python (with libraries like pandas and matplotlib), Google Colab, and GitHub to streamline the processes of data collection, analysis, and dissemination. Future work will focus on expanding the framework to biomedical and climate domains, as well as developing real-time browser extensions for live reproducibility scoring.

HardToCache establishes a measurable foundation for auditing, improving, and promoting transparency in computational science.

Files

Gateways2025_paper_13 (1).pdf

Files (1.8 MB)

Name Size Download all
md5:b72a6d9ca75e5ee7baca4c3763712d26
1.8 MB Preview Download