Ignoring files

Git is aware of all files within the repository. However, it is not uncommon to have files that we don't want Git to track. For instance, our analysis might produce several intermediate files and results. We typically don't track such files. Rather, we want to track the actual code and other related files (e.g. configuration files) that produce the intermediate and result files, given the raw input data.

  • Let's make some mock-up intermediate and result files. These are some of the files that would have been generated by the Snakemake workflow if it was run.
mkdir intermediate
mkdir results
touch intermediate/multiqc_general_stats.txt
touch results/supplementary.pdf
touch log.tmp
  • Run git status. You will see that Git tells you that you have untracked files. However, we don't want Git to track these files anyway. To tell Git what files to ignore we use a file called .gitignore. Let's create it:
touch .gitignore
  • Open the .gitignore file in a text editor and add the following lines to it:
# Ignore these directories:
results/
intermediate/

# Ignore temporary files:
*.tmp
  • Run git status again. Now there is no mention of the results and intermediate directories or the log.tmp file. Notice that we can use wildcards (*) to ignore files with a given pattern, e.g. a specific file extension.

  • Sometimes you want to ignore all files in a directory with one or two exceptions. For example, you don't want to track all your huge raw data files, but there may be a smaller data file that you do want to track, e.g. metadata or a list of barcodes used in your experiment. Let's add some mock data:

mkdir data
touch data/huge.fastq.gz
touch data/metadata.txt
  • Git allows you to ignore all files using the aforementioned wildcard, but then exclude certain files from that ignore command. Open the .gitignore file again and add the following:
# Ignore all files in the data/ directory
data/*

# Exclude the metadata file by prefixing it with an exclamation mark
!data/metadata.txt
  • Finish up by adding the .gitignore and data/metadata.txt files to the staging area and committing them:
git add .gitignore
git commit -m "Add .gitignore file"
git add data/metadata.txt
git commit -m "Add metadata file"

Tip

It is common for certain programming languages or text editors to leave e.g. swap files or hidden data files in the working directory, which you don't want to track using Git. Instead of manually adding these to every single project you have, you can use the .gitignore_global file, which should be placed in your home directory. It works exactly like a normal gitignore file, but is applied to all Git repositories that you are using on your machine. Some common file extensions that might be put in the global gitignore are .DS_Store if you're working in R or .swp if you're coding in vim. To configure git to use the .gitignore_global file you can run git config --global core.excludesfile ~/.gitignore_global.

Quick recap

We now learned how to ignore certain files and directories:

  • The .gitignore file controls which files and directories Git should ignore, if any.
  • Specific files can be excluded from ignored directories using the exclamation mark (!) prefix.