Packaging the case study

During these tutorials we have been working on a case study about the multiresistant bacteria MRSA. Here we will build and run a Docker container that contains all the work we've done so far.

  • We've set up a GitHub repository for version control and for hosting our project.
  • We've defined a Conda environment that specifies the packages we're depending on in the project.
  • We've constructed a Snakemake workflow that performs the data analysis and keeps track of files and parameters.
  • We've written a R Markdown document that takes the results from the Snakemake workflow and summarizes them in a report.

The workshop-reproducible-research/tutorials/containers directory contains the final versions of all the files we've generated in the other tutorials: environment.yml, Snakefile, config.yml, code/header.tex, and code/supplementary_material.Rmd. The only difference compared to the other tutorials is that we have also included the rendering of the Supplementary Material HTML file into the Snakemake workflow as the rule make_supplementary. Running all of these steps will take some time to execute (around 20 minutes or so), in particular if you're on a slow internet connection.

Now take a look at Dockerfile. Everything should look quite familiar to you, since it's basically the same steps as in the image we constructed in the previous section, although some sections have been moved around. The main difference is that we add the project files needed for executing the workflow (mentioned in the previous paragraph), and install the conda packages listed in environment.yml. If you look at the CMD command you can see that it will run the whole Snakemake workflow by default.

Now run docker build as before, tag the image with my_docker_project:

docker build -t my_docker_project -f Dockerfile .

Go get a coffee while the image builds (or you could use docker pull nbisweden/workshop-reproducible-research which will download the same image).

Validate with docker image ls. Now all that remains is to run the whole thing with docker run. We just want to get the results, so mount the directory /course/results/ to, say, mrsa_results in your current directory.

Well done! You now have an image that allows anyone to exactly reproduce your analysis workflow (if you first docker push to Dockerhub that is).