Working remotly
So far we've only been working on files present on our own computer, i.e. locally. While Git is an amazing tool for reproducibility even if you're working alone, it really starts to shine in collaborative work. This entails working with remote repositories, i.e. repositories that are stored somewhere online; some of the most common places to store your repositories are GitHub, BitBucket and GitLab. GitHub is the most popular of these, and is what we'll be using for this tutorial.
An important thing to keep in mind here is the difference between Git (the version control system) and online hosting of Git repositories (such as GitHub): the former is the core of keeping track of your code's history, while the latter is how to store and share that history with others.
GitHub setup#
If you have not done so already, go to github.com and create an account. You can also create an account on another online hosting service for version control, e.g. Bitbucket or GitLab. The exercises below are written with examples from GitHub (as that is the most popular platform with the most extensive features), but the same thing can be done on alternative services, although the exact menu structure and link placements differ a bit.
Any upload to and from GitHub requires you to authenticate yourself. GitHub used to allow authentication with your account and password, but this is no longer the case - using SSH keys is favoured instead. Knowing exactly what these are is not necessary to get them working, but we encourage you to read the box below to learn more about them! GitHub has excellent, platform-specific instructions both on how to generate and add SSH keys to your account, so please use them before moving on!
SSH keys and authentication
Using SSH (Secure Shell) for authentication basically entails setting up a pair of keys: one private and one public. You keep the private key on your local computer and give the public key to anywhere you want to be able to connect to, e.g. GitHub. The public key can be used to encrypt messages that only the corresponding private key can decrypt. A simplified description of how SSH authentication works goes like this:
- The client (i.e. the local computer) sends the ID of the SSH key pair it would like to use for authentication to the server (e.g. GitHub)
- If that ID is found, the server generates a random number and encrypts this with the public key and sends it back to the client
- The client decrypts the random number with the private key and sends it back to the server
Notice that the private key always remains on the client's side and is never transferred over the connection; the ability to decrypt messages encrypted with the public key is enough to ascertain the client's authenticity. This is in contrast with using passwords, which are themselves sent across a connection (albeit encrypted). It is also important to note that even though the keys come in pairs it is impossible to derive the private key from the public key. If you want to read more details about how SSH authentication work you can check out this website, which has more in-depth information than we provide here.
Create a remote repository#
Log in to your GitHub account and press the New button:
- Make sure you are listed as the owner
- Add a repository name, e.g.
git_tutorial
- You can keep the repo private or make it public, as you wish
- Skip including a README, a
.gitignore
and license
Note
When creating a new repository the best practices is to directly set the README, the .gitignore
and the licence. In our case we must not because we already have initiated a repository locally that we want to push remotly. If we put anything in the remote repository before linking it to the local one, a new tracking history will be created in the remote and it will be divergent compared to the local tracking history. You will end up with a fatal: refusing to merge unrelated histories
error.
You will now be redirected to the repository page which will list several ways for you to start adding content (files) to the repository. What we will do is to connect the local repository we've been working on so far to the remote GitHub server using SSH:
- Add a remote SSH address to your local repository (make sure you change
user
to your GitHub username andgit_tutorial
to your repository name):
git remote add origin git@github.com:user/git_tutorial.git
- Run
git remote -v
. This will show you what remote location is connected to your local Git clone. The short name of the default remote is usually "origin" by convention.
Note
Make sure you've used an SSH address (i.e. starting with git@github.com
rather than an HTTPS address (starting with https://github.com
)!
- We have not yet synced the local and remote repositories, though, we've simply connected them. Let's sync them now:
git push origin main
The push
command sends our local history of the main
branch to the same branch on the remote (origin
). Our Git repository is now stored on GitHub!
- Run
git status
. This should tell you that:
On branch main
nothing to commit, working tree clean
You always need to specify git push origin main
by default, but you can circumvent this by telling Git that you always want to push to origin/main
when you're on your local main
branch. To do this, use the command git branch --set-upstream-to origin/main
. Try it out now.
- Now run
git-status
again. You should see that now git additionally tells you that your local branch is up to date with the remote branch.
If you go to the repository's GitHub page you should now be able to see all your files and your code there! It should look something like this:
You can see a lot of things there, such as each file and the latest commit that changed them, the repository's branches and a message from GitHub at the bottom: "Help people interested in this repository understand your project by adding a README." This refers to GitHub's built-in functionality of automatically rendering any markdown document named README
or README.md
in the repository's root directory and displaying it along with what you can already see. Let's try it out!
Info
You can find nice README Template on recherche.data.gouv.fr web site.
- Let's create a
README.md
file and fill it with the following text:
# A Git tutorial
This repository contains tutorial information related to the **SouthGreen** course
*Tools for Reproducible Research* based on the **NBIS/ELIXIR** material, specifically the session on using the `git`
software for version control.
## Links
You can find the latest stable version of the Git tutorial for the course
[here](https://uppsala.instructure.com/courses/73110/pages/git-1-introduction?module_item_id=367079).
- Add, commit and push these changes to GitHub.
git add README.md
git commit -m "Add README.md"
git push origin main
You should now be able to see the rendered markdown document, which looks a bit different from the text you copied in from above. Note that there are two different header levels, which come from the number of hash signs (#
) used. You can also see bold text (which was surrounded by two asterisks), italic text (surrounded by one asterisk), in-line code (surrounded by acute accents) and a link (link text inside square brackets followed by link address inside parentheses).
It is important to add README-files to your repositories so that they are better documented and more easily understood by others and, more likely, your future self. In fact, documentation is an important part of reproducible research! While the tools that you are introduced to by this course are all directly related to making science reproducible, you will also need good documentation. Make it a habit of always adding README-files for your repositories, fully explaining the ideas and rationale behind the project. You can even add README-files to sub-directories as well, giving you the opportunity to go more in-depth where you so desire.
Tip
There are a lot more things you can do with markdown than what we show here. Indeed, this entire course is mostly written in markdown! You can read more about markdown here.
Quick recap
We learned how to connect local Git repositories to remote locations such as GitHub and how to upload commits using git push
. We also learned the basics of markdown and how it can be used to document Git repositories.
Browsing GitHub#
GitHub and the rest of the websites that offer remote hosting of git repositories all have numerous features, which can be somewhat difficult to navigate in the beginning. We here go through some of the basics of what you can do with GitHub.
- Go to your GitHub repository in your browser again and click on Code to the left. Click on
config.yml
. You will see the contents of the file. Notice that it is the latest version, where we previously changed thegenome_id
variable:
- Click on History. You will see an overview of the commits involving changes made to this file:
- Click on the
Change to ST398 for alignment
commit. You will see the changes made toconfig.yml
file compared to the previous commit.
- Go back to the repository's main page and click on the commit tracker on the right above the list of files, which will give you an overview of all commits made. Clicking on a specific commit lets you see the changes introduced by that commit. Click on the commit that was the initial commit, where we added all the files.
You will now see the files as they were when we first added them. Specifically you can see that the Dockerfile
is back, even though we deleted it! Click on the Code tab to the left to return to the overview of the latest repository version.
Quick recap
We learned some of the most important features of the GitHub interface and how repositories can be viewed online.
Working with remote repositories#
While remote repositories are extremely useful as backups and for collaborating with others, that's not their only use: remotes also help when you are working from different computers, a computer cluster or a cloud service.
- Let's pretend that you want to work on this repository from a different computer. First, create a different directory (e.g.
git_remote_tutorial
) in a separate location that is not already tracked by Git andcd
into it. Now we can download the repository we just uploaded using the following:
git clone git@github.com:user/git_tutorial.git .
Again, make sure to replace user
with your GitHub user name.
Notice the dot at the end of the command above, which will put the clone into the current directory, instead of creating a new directory with the same name as the remote repository. You will see that all your files are here, identical to the original git_tutorial
repository!
-
Since you already gave the address to Git when you cloned the repository, you don't have to add it manually as before. Verify this with
git remote -v
. -
Let's say that we now want to change the
multiqc
software to an earlier version: open theenvironment.yml
file in the second local repo and changemultiqc=1.12
tomultiqc=1.7
; add and commit the change. -
We can now use
push
again to sync our remote repository with the new local changes. Refresh your web page again and see that the changes have taken effect.
Since we have now updated the remote repository with code that came from the second local repository, the first local repository is now outdated. We thus need to update the first local repo with the new changes. This can be done with the pull
command.
-
cd
back into the first local repository (e.g.git_tutorial
) and run thegit pull
command. This will download the newest changes from the remote repository and merge them locally automatically. -
Check that everything is up-to-date with
git status
.
Another command is git fetch
, which will download remote changes without merging them. This can be useful when you want to see if there are any remote changes that you may want to merge, without actually doing it, such as in a collaborative setting. In fact, git pull
in its default mode is just a shorthand for git fetch
followed by git merge FETCH_HEAD
(where FETCH_HEAD
points to the tip of the branch that was just fetched).
That's quite a few concepts and commands you've just learnt! It can be a bit hard to keep track of everything and the connections between local and remote Git repositories and how you work with them, but hopefully the following figure will give you a short visual summary:
Quick recap
We have learned the difference between local and remote copies of git repositories and how to sync them:
git push
uploads commits to a remote repositorygit pull
downloads commits from a remote repository and merges them to the local branchgit fetch
downloads commits from a remote repository without merging them to the local branchgit clone
makes a local copy of a remote repository
Remote branches#
Remote branches work much in the same way a local branches, but you have to push them separately; you might have noticed that GitHub only listed our repository as having one branch (you can see this by going to the Code tab). This is because we only pushed our main
branch to the remote. Let's create a new local branch and add some changes that we'll push as a separate branch to our remote - you should do this in the original git_tutorial
repository, so move back into that directory.
- Create a new branch named
trimming
and add the--trim5 5
flag to the bowtie2-command part of theSnakefile
, which should now look like this:
bowtie2 --trim5 5 --very-sensitive-local -x $indexBase -U {input.fastq} > {output} 2> {log}
-
Add and commit the change to your local repository.
-
Instead of doing what we previously did, i.e. merge the
trimming
branch into themain
branch, we'll pushtrimming
straight to our remote:
git push origin trimming
- Go the repository at GitHub and see if the new branch has appeared. Just above the file listing click the Branch drop-down and select the new branch to view it. Can you see the difference in the
Snakefile
depending on which branch you choose?
We now have two branches both locally and remotely: main
and trimming
. We can continue working on our trimming
branch until we're satisfied (all the while pushing to the remote branch with the same name), at which point we want to merge it into main
.
-
Checkout your local
main
branch and merge it with thetrimming
branch. -
Push your
main
branch to your remote and subsequently delete your localtrimming
branch.
The above command only deleted the local branch. If you want to remove the branch from the remote repository as well, run:
git push origin --delete trimming
Quick recap
We learned how to push local branches to a remote with git push origin <branch>
and how to delete remote branches with git push origin --delete <branch>
.
Sharing tags#
Your local repository tags are not included when you do a normal push. To push tags to the remote you need to supply the --tags
flag to the git push
command:
git push --tags
-
Go to the repository overview page on GitHub. You will see that the repository now has three tags! If you click on Tags you will be given an overview of the existing tags for your repository - if you click Releases you will see more or less the same information. Confusing? Well, a tag is a Git concept while a release is a GitHub concept that is based on Git tags. Releases add some extra features that can be useful for distributing software and are done manually from the repository's GitHub page.
-
Click on one of the tags. Here users can download a compressed file containing the repository at the version specified by the tags.
Alternatively, Git users who want to reproduce your analysis with the code used for the publication can clone the GitHub repository and then run git checkout publication
.
Quick recap
We learned how to push Git tags to a remote by using the --tags
flag.