Ignoring files
Git is aware of all files within the repository. However, it is not uncommon to have files that we don't want Git to track. For instance, our analysis might produce several intermediate files and results. We typically don't track such files. Rather, we want to track the actual code and other related files (e.g. configuration files) that produce the intermediate and result files, given the raw input data.
- Let's make some mock-up intermediate and result files. These are some of the files that would have been generated by the Snakemake workflow if it was run.
mkdir intermediate
mkdir results
touch intermediate/multiqc_general_stats.txt
touch results/supplementary.pdf
touch log.tmp
- Run
git status
. You will see that Git tells you that you have untracked files. However, we don't want Git to track these files anyway. To tell Git what files to ignore we use a file called.gitignore
. Let's create it:
touch .gitignore
- Open the
.gitignore
file in a text editor and add the following lines to it:
# Ignore these directories:
results/
intermediate/
# Ignore temporary files:
*.tmp
-
Run
git status
again. Now there is no mention of theresults
andintermediate
directories or thelog.tmp
file. Notice that we can use wildcards (*) to ignore files with a given pattern, e.g. a specific file extension. -
Sometimes you want to ignore all files in a directory with one or two exceptions. For example, you don't want to track all your huge raw data files, but there may be a smaller data file that you do want to track, e.g. metadata or a list of barcodes used in your experiment. Let's add some mock data:
mkdir data
touch data/huge.fastq.gz
touch data/metadata.txt
- Git allows you to ignore all files using the aforementioned wildcard, but then exclude certain files from that ignore command. Open the
.gitignore
file again and add the following:
# Ignore all files in the data/ directory
data/*
# Exclude the metadata file by prefixing it with an exclamation mark
!data/metadata.txt
- Finish up by adding the
.gitignore
anddata/metadata.txt
files to the staging area and committing them:
git add .gitignore
git commit -m "Add .gitignore file"
git add data/metadata.txt
git commit -m "Add metadata file"
Tip
It is common for certain programming languages or text editors to leave e.g. swap files or hidden data files in the working directory, which you don't want to track using Git. Instead of manually adding these to every single project you have, you can use the .gitignore_global
file, which should be placed in your home directory. It works exactly like a normal gitignore file, but is applied to all Git repositories that you are using on your machine. Some common file extensions that might be put in the global gitignore are .DS_Store
if you're working in R or .swp
if you're coding in vim. To configure git to use the .gitignore_global
file you can run git config --global core.excludesfile ~/.gitignore_global
.
Quick recap
We now learned how to ignore certain files and directories:
- The
.gitignore
file controls which files and directories Git should ignore, if any. - Specific files can be excluded from ignored directories using the exclamation mark (
!
) prefix.