What is Reproducible Research?
Research is considered to be reproducible when the exact results can be reproduced if given access to the original data, software, or code. Reproducible research is sometimes known as reproducibility, reproducible statistical analysis, reproducible data analysis, reproducible reporting, and literate programming.
What needs to be reproduced:
The "what" that needs to be reproduced is typically:
- Actual results themselves, which includes:
- Values reported in the text
- The statistical evidence in support of the findings (e.g., p-values, confidence intervals, credible intervals).
Requirements for demonstrating reproducibility
There is widespread agreement that research can only be reproducible when:
- The "raw" data is available, where "raw" refers to the data prior to any manipulation by the researcher (e.g., prior to any data cleaning and transformation).
- A complete set of instructions is provided explaining all steps used in processing and analyzing the data.
In practice, when organizations (e.g., publishers of journals) require that research be demonstrated to be reproducible, they will make some or all of the following additional requirements:
- A set of files are provided containing the data and code, and it is possible to create the tables and any data-derived charts/graphics/visualizations by running the code.
- Details about the system being used to run the analysis: operating system, patches, random number seeds, specific versions of all software/packages/libraries are listed.
- The code is written in a way that can be readily understood.
- Open/transparent. All the data and materials are available (as opposed to "available upon request") -- e.g., posted on GitHub, or in an international data repository.
- That is, either:
- Another party (e.g., a reviewer) has successfully reproduced the results and certified them as such.
- Logs demonstrate that key results were successfully created from the inputs.
- The key results are linked to the data and code, so the relationship can be directly inspected.
A final requirement, which is sometimes known as literate programming, is that:
- The entire report is written using code. That is, a file or files are provided which, when run, import the data, produce all the results, insert the results into the text of the report, and format the report.
Benefits of reproducibility
Reproducibility has the following benefits:
- It is believed to increase the likelihood that the research will be correct.
- It makes it easier for the research to be checked.
- It makes it easier for the research to be reproduced independently.
- It makes it easier for the research to be extended.
- It allows the code/instructions to be reused, which makes other research more efficient (e.g., if updating results with new data).
Reproducible versus replicable
A closely related concept is that of whether research is replicable, which is the idea that research results can be reproduced by independent researchers using different methods.