What is Reproducible Research?
Research is considered to be reproducible when the exact results can be reproduced if given access to the original data, software, or code. Reproducible research is sometimes known as reproducibility, reproducible statistical analysis, reproducible data analysis, reproducible reporting, and literate programming.
What needs to be reproduced:
The "what" that needs to be reproduced is typically:
- Actual results themselves, which includes:
- Values reported in the text
- The statistical evidence in support of the findings (e.g., p-values, confidence intervals, credible intervals).
Requirements for demonstrating the reproducibility
There is widespread agreement that research can only be reproducible when:
- The "raw" data is available, where "raw" refers to the data prior to any manipulation by the researcher (e.g., prior to any data cleaning and transformation).
- A complete set of instructions is provided explaining all steps used in the processing and analyzing the data.
In practice, when organizations (e.g., publishers of journals) require that research be demonstrated to be reproducible, they will make some or all of the following additional requirements:
- A set of files is provided containing the data and code, and it is possible to create the tables and any data-derived charts/graphics/visualizations by running the code.
- Details about the system being used to run the analysis: operating system, patches, random number seeds, specific versions of all software/packages/libraries are listed.
- The code is written in a way that can be readily understood.
- Open/transparent. All the data and materials are available (as opposed to "available upon request") -- e.g., posted on GitHub, or in an international data repository.
- That is, either:
- Another party (e.g., a reviewer) has successfully reproduced the results and certified them as such.
- Logs demonstrate that key results were successfully created from the inputs.
- The key results are linked to the data and code, so the relationship can be directly inspected.
A final requirement, which is sometimes known as literate programming, is that:
- The entire report is written using code. That is, a file or files are provided which, when run, import the data, produce all the results, insert the results into the text of the report, and format the report.
Benefits of reproducible research
Reproducibility has the following benefits:
- increased likelihood that the research will be correct
- reproducibility makes it easier to check the research
- it is easier to reproduce the research independently
- easier to extend the research
- reusable code and instruction resulting in increased efficiencies
Reproducible versus replicable
A closely related concept is that of whether research is replicable, which is the idea that research results can be reproduced by independent researchers using different methods.