Skip to content

Extra material

Here are some useful resources if you want to read more about Jupyter in general:

Running jupyter notebooks on a cluster#

  • Login to Uppmax, making sure to use a specific login node, e.g. rackham1:
ssh <your-user-name>@rackham1.uppmax.uu.se
  • Create/activate a conda environment containing jupyter then run:
jupyter notebook

When the Jupyter server starts up you should see something resembling:

[I 11:00:00.000 NotebookApp] Serving notebooks from local directory: <path-to-your-local-dir>
[I 11:00:00.000 NotebookApp] Jupyter Notebook 6.4.6 is running at:
[I 11:00:00.000 NotebookApp] http://localhost:8889/?token=357d65100058efa40a0641fce7005addcff339876c5e8000
[I 11:00:00.000 NotebookApp]  or http://127.0.0.1:8889/?token=357d65100058efa40a0641fce7005addcff339876c5e8000
[I 11:00:00.000 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Now a Jupyter notebook server is running on the Uppmax end. The line that says:

[I 11:00:00.000 NotebookApp] http://localhost:8889/?token=357d65100058efa40a0641fce7005addcff339876c5e8000

contains information on the port used on the server side (8889 in this case) and the token required to use the server (357d65100058efa40a0641fce7005addcff339876c5e8000).

Next step is to use this information to login to the server from your local computer.

On your local computer

In a terminal, run the following command to start port forwarding of port 8080 on your local computer to the remote port on the Uppmax side. Replace with the port given when you started the server on Uppmax. Also replace with your user name on Uppmax.

ssh -N -L localhost:8080:localhost:<remote-port> <your-user-name>@rackham1.uppmax.uu.se

As long as this process is running the port forwarding is running. To disable it simply interrupt it with CTRL + C.

Connect to the jupyter server by opening localhost:8080 in your browser. When prompted, paste the token you got when starting the server on Uppmax.

You are now (hopefully) accessing the jupyter server that's running on Uppmax, via your local browser.

Integrating notebooks with Snakemake workflows#

In the case study section of this tutorial we created a Jupyter notebook that used output from a Snakemake workflow and produced some summary results and plots. Wouldn't it be nice if this was actually part of the workflow itself? To generate a HTML version of the notebook we can use what we learned in the section about converting notebooks. The command to execute the notebook and save it in HTML format in a file results/supplementary.html would be:

jupyter nbconvert --to HTML --output-dir results --output supplementary.html --execute supplementary_material.ipynb

This command could be used in a rule, e.g. make_supplementary, the input of which would be results/tables/counts.tsv, intermediate/multiqc_general_stats.txt, and results/rulegraph.png. See if you can work out how to implement such a rule at the end of the Snakefile found in the jupyter/ directory. You can find an example in the code chunk below:

rule make_supplementary:
    input:
        counts = "results/tables/counts.tsv",
        summary = "results/tables/counts.tsv.summary",
        multiqc_file = "intermediate/multiqc_general_stats.txt",
        rulegraph = "results/rulegraph.png"
    output:
        "results/supplementary.html"
    params:
        base = lambda wildcards, output: os.path.basename(output[0]),
        dir = lambda wildcards, output: os.path.dirname(output[0])
    shell:
        """
        jupyter nbconvert --to HTML --output-dir {params.dir} --output {params.base} \
            --execute supplementary_material.ipynb
        """

Note

The Conda enivronment for the jupyter tutorial does not contain packages required to run the full snakemake workflow. So if you wish to test jupyter integration fully you should update the conda environment by running conda install snakemake-minimal fastqc sra-tools multiqc bowtie2 tbb samtools htseq bedtools wget graphviz

More integrations#

Snakemake actually supports the execution of notebooks via the notebook: rules directive. See more about Jupyter integration in the snakemake docs. In the notebook: directive of such a rule you specify the path to a jupyter notebook (relative to the Snakefile) which is then executed when the rule is run.

So how is this useful?

In the notebook itself this gives you access to a snakemake object containing information about input and output files for the rule via snakemake.input and snakemake.output. Similarly, you can access rule wildcards with snakemake.wildcards, params with snakemake.params, and config settings with snakemake.config.

When snakemake runs the rule with the notebook: directive jupyter-nbconvert is used to execute the notebook. No HTML output is generated here but it is possible to store a version of the notebook in its final processed form by adding the following to the rule:

log:
    notebook="<path>/<to>/<processed>/<notebook.ipynb>"

Because you won't get the notebook in full HTML glory, this type of integration is better suited if you want to use a notebook to generate figures and store these in local files (e.g. pdf/svg/png formats).

We'll use the supplementary_material.ipynb notebook as an example! Let's say that instead of exporting the entire notebook to HTML we want a rule that outputs pdf versions of the barplot and heatmap figures we created.

Let's start by setting up the rule. For simplicity we'll use the same input as when we edited the notebook in the first place. The output will be results/barplot.pdf and results/heatmap.pdf. Let's also output a finalized version of the notebook using the log: notebook= directive:

rule make_supplementary_plots:
    input:
        counts = "results/tables/counts.tsv",
        summary = "results/tables/counts.tsv.summary",
        multiqc = "intermediate/multiqc_general_stats.txt",
        rulegraph = "results/rulegraph.png"
    output:
        barplot = "results/barplot.pdf",
        heatmap = "results/heatmap.pdf"
    log:
        notebook = "results/supplementary.ipynb"

The notebook will now have access to snakemake.input.counts, snakemake.output.barplot and snakemake.output.heatmap when executed from within the workflow. Let's go ahead and edit the notebook! In the cell where we defined notebook parameters edit the code so that it looks like this:

counts_file=snakemake.input.counts
summary_file=snakemake.input.summary
multiqc_file=snakemake.input.multiqc
rulegraph_file=snakemake.input.rulegraph

SRR_IDs=snakemake.params.SRR_IDs
GSM_IDs=snakemake.params.GSM_IDs
GEO_ID=snakemake.params.GEO_ID

Notice that we set the SRR_IDs, GSM_IDs and GEO_ID variables using variables in snakemake.params? However, we haven't defined these in our rule yet so let's go ahead and do that now. Add the params section so that the make_supplementary_plots in the Snakefile looks like this:

rule make_supplementary_plots:
    input:
        counts = "results/tables/counts.tsv",
        multiqc = "intermediate/multiqc_general_stats.txt",
        rulegraph = "results/rulegraph.png"
    output:
        barplot = "results/barplot.pdf",
        heatmap = "results/heatmap.pdf"
    log:
        notebook = "results/supplementary.ipynb"
    params:
        SRR_IDs = ["SRR935090","SRR935091","SRR935092"],
        GSM_IDs = ["GSM1186459", "GSM1186460", "GSM1186461"],
        GEO_ID = "GSE48896"
    notebook: "supplementary_material.ipynb"

Tip

One way to further generalize this rule could be to define the SRR_IDs, GSM_IDs and GEO_ID parameters in a config file instead, in which case they would be directly accessible from within the notebook using snakemake.config['SRR_IDs'] etc.

Now the rule contains everything needed, but we still need to edit the notebook to save the plots to the output files. First, edit the cell that generates the barplot so that it looks like this:

# Create a stacked barplot
ax = summary_plot_data.T.plot(kind="bar", stacked=True, color=colors)
# Move legend and set legend title
ax.legend(bbox_to_anchor=(1,1), title="Category");
plt.savefig(snakemake.output.barplot, dpi=300, bbox_inches="tight") ## <-- Add this line!

Finally, edit the cell that generates the heatmap so that it looks like this:

count_data = counts.loc[:, SRR_IDs]
heatmap_data = count_data.loc[(count_data.std(axis=1).div(count_data.mean(axis=1))>1.2)&(count_data.max(axis=1)>5)]
heatmap_data = heatmap_data.rename(columns = name_dict['title'])
with sns.plotting_context("notebook", font_scale=0.7):
    ax = sns.clustermap(data=np.log10(heatmap_data+1), cmap="RdYlBu_r",
                        method="complete", yticklabels=True, linewidth=.5,
                        cbar_pos=(.7, .85, .05, .1), figsize=(3,9))
    plt.setp(ax.ax_heatmap.get_xticklabels(), rotation=270)
    plt.savefig(snakemake.output.heatmap, dpi=300, bbox_inches="tight") ## <-- Add this line!

Now you can run the following to generate the plots:

snakemake -j 1 make_supplementary_plots

Presentations with Jupyter#

As if all the above wasn't enough you can also create presentations/slideshows with Jupyter! Simply use conda to install the RISE extension to your jupyter environment:

conda install -c conda-forge rise

then open up a notebook of your choice. In the menu click View -> Cell Toolbar -> Slideshow. Now every cell will have a drop-down in the upper right corner allowing you to set the cell type:

  • Slide: a regular slide
  • Sub-Slide: a regular slide that will be displayed below the previous
  • Fragment: these cells split up slides so that content (fragments) are added only when you press Space
  • Skip: these cells will not appear in the presentation
  • Notes: these cells act as notes, shown in the speaker view but not in the main view

The presentation can be run directly from the notebook by clicking the 'Enter/Exit RISE Slideshow' button (looks like a bar chart) in the toolbar, or by using the keyboard shortcut Alt-r. Running it directly from a notebook means you can also edit and run cells during your presentation. The downside is that the presentation is not as portable because it may rely on certain software packages that other people are not comfortable with installing.

You can also export the notebook to an HTML-file with jupyter nbconvert --execute --to SLIDES <your-notebook.ipynb>. The resulting file, with the slideshow functionality included, can be opened in any browser. However, in this format you cannot run/edit cells.

An example ?

JupyterLab#

JupyterLab is a cutting-edge web-based development environment that offers an interactive and dynamic interface for working with notebooks, code, and data. Its adaptable design empowers users to customize and organize their workflows, catering to various applications such as data science, scientific computing, and machine learning. Moreover, its modular architecture is conducive to adding extensions that augment and diversify the functionality of the platform.

image

Want to try? You can go to https://jupyter.org/try or https://jupyterhub.cluster.france-bioinformatique.fr/ (if you have an account on the IFB cluster).

Jupyter book#

Jupyter Book

With Jypter book, build an online book using a collection of Jupyter Notebooks and Markdown files with

  • Interactivity
  • Citations
  • Build and host it online with GitHub/GitHub Pages...
  • or locally on your own laptop

An example ? You can find here a Linux introduction create with Jupyter Book.