Description | Hands On Lab Exercises for Linux |
---|---|
Related-course materials | Linux for Dummies |
Authors | Christine Tranchant-Dubreuil (christine.tranchant@ird.fr) |
Creation Date | 26/02/2018 |
Last Modified Date | 04/04/2023 |
Modified by | G. Sarah, G. Sempere, N. Tando, C. Tranchant |
Summary
- Preambule: Softwares to install before connecting to a distant linux server
- Practice 1: Transferring files with filezilla
sftp
- Practice 2: Get Connecting on a linux server by
ssh
- Practice 3: First steps : prompt &
pwd
command - Practice 4: List the files using
ls
command - Practice 5 : List the files using
ls
command and metacharacter * - practice-6 : Moving into file system using
cd
andls
commands - practice-7 : Manipulating Files and Folders
- practice-8 : Searching with
grep
- practice-9 : Blast analysis
- Practice-10: Redirecting a command output to a File with
>
- Practice-11: Sending data from one command to another (piping) with
|
- practice-12 : Dealing with VCF files
- practice-13 : Filtering VCF files
- Practice-14 : Getting basic stats
- Tips
- Links
- License
Preambule
Getting connected to a Linux servers from Windows with SSH (Secure Shell) protocol
Platform | Software | Description | url |
---|---|---|---|
mobaXterm | An enhanced terminal for Windows with an X11 server and a tabbed SSH client | More |
Transferring and copying files from your computer to a Linux servers with SFTP (SSH File Transfer Protocol) protocol
Platform | Software | Description | url |
---|---|---|---|
filezilla | FTP and SFTP client | Download |
Viewing and editing files on your computer before transferring on the linux server or directly on the distant server
Type | Software | url |
---|---|---|
Distant, consol mode | nano | Tutorial |
Distant, consol mode | vi | Tutorial |
Distant, graphic mode | komodo edit | Download |
Linux & windows based editor | Notepad++ | Download |
Practice 1 : Transferring files with filezilla sftp
Download and install FileZilla
Open FileZilla and save the cluster adress into the site manager
In the FileZilla menu, go to File > Site Manager. Then go through these 5 steps:
- Click New Site.
- Add a CUSTOM NAME for this site such as IRD_HPC.
- Add the HOSTNAME (see table below).
- Set the Logon Type to “Normal” and insert your username and password used to connect on the IRD HPC
- Press the “Connect” button.
Cluster HPC | hostname |
---|---|
IRD HPC | bioinfo-nas.ird.fr |
Transferring files
- From your computer to the cluster : click and drag an text file item from the left to the right column
- From the cluster to your computer : click and drag an text file item from he right to the left column
Practice 2 : Get Connecting on a linux server by ssh
In mobaXterm:
- Click the session button, then click SSH.
- In the remote host text box, type: HOSTNAME (see table below)
- Check the specify username box and enter your user name
- In the console, enter the password when prompted.
Cluster HPC | hostname |
---|---|
IRD HPC | bioinfo-nas.ird.fr |
Practice 3 : First steps : prompt & pwd
- What is the current/working directory just by looking the prompt?
- Check the name of your working directory with
pwd
command? - On the console, type your 2 first linux commands to get data necessary for the next (we will explain the two commands later):
- Check through filezilla the content of your home directory on the server now (cf. filetree just below)
- Delete through filezilla the file LINUX-TP.tar.gz on the server
Practice 4 : List the files using ls
command
- List the content of your home directory
- List the content of the directory
Fasta
by using its absolute path in first then its relative path -ls
command - List the content of the directory
Data
with thels
command and the option-R
- List the content of the directory
Bank
with thels
command and the option-al
or-a -l
Practice 5 : List the files using ls
command and metacharacter *
- List the content of the directory
T-coffee
. Are there only fasta files ? -ls
command - List only the files starting by sample (in the directory
T-coffee
) -ls
command & * - List only the files with the fasta extension (in the directory
T-coffee
) -ls
command & *
Practice 6 : Moving into file system using cd
and ls
command
- Go to the directory
Script
and check in the prompt you have correctly changed your working directory (pwd
). - List the dir content with
ls
. - Go to the
Fasta
directory using../
- Go to the
Fastq
directory . From this directory, and without any change in your working dir, list what’s insamBam
directory - List
vcf
directory using -R option. What is there in this dir ? - Come back to the home directory.
NOTE
Test the command tree
Practice 7 : Manipulating Files and Folders
We will prepare our blast analysis performed after by creating directory and moving files as showing in the image just below :
- Create a subdirectory called
BlastAnalysis
in the directoryLINUX_TP
with themkdir
command. - Move
transcritsAssembly.fasta
into this new directory with themv
command. - List the content of
LINUX-TP
andBlastAnalysis
with thels
command. - Copy
AllEst.fasta
in the directoryBank
with thecp
command. - List the content of the
LINUX-TP
andBank
directories. What are the differences between mv and cp? - Remove the file
AllEst.fasta
in the directoryLINUX-TP
with therm
command. - Copy the whole directory
T-coffee
with the nameT-coffee-copy
into the directoryLINUX-TP
. - After checking the content of the directory
LINUX-TP
, remove the directoryT-coffee-copy
. How to remove a directory ? - Remove all the files into the director
T-coffee-copy
with therm *
command. - Remove the directory
T-coffee-copy
.
Practice 8 : Searching with grep
- Go on the following page : https://plants.ensembl.org/Oryza_sativa/Info/Index using your internet navigator
- Copy the url of the rice genome annotation file (gff format, all chromosomes) that we will use to download the file directly on the server
- Go to the
Bank
directory and type the following command :
- After checking the content of your current directory, what have you done with the
wget
command? - Decompress the gff with the command
gzip -d file.gz
- Display the firsts and lasts lines of the gff file
- Print the lines with the word
gene
in the gff file - Count the number of genes
- Search for the nbs-lrr genes
- Count lines without the word “putative”
Practice 9 : Blast analysis
Connection to bioinfo-inter.ird.fr
Open another terminal or mobaxterm session but this time choose the bioinfo-inter.ird.fr server.
Preparing working environment
Before launching your blast, you have to prepare your working environment (even if we will not use slurm) :
- go inside the directory /scratch2
- create a directory called ‘formation_YOUR_ID’ into the directory
/scratch2
and go into this new drectory - download the archive with the data that will be used to perform a blast -
wget http://itrop.ird.fr/LINUX-TP/BlastAnalysis.tar.gz
- decompress the gzip file
tar -xzvf BlastAnalysis.tar.gz
- after listing the content of the current directory, remove the archive
BlastAnalysis.tar.gz
- go inside the directory BlastAnalysis
- Load the module blast, we will use the program
makeblastdbcmd
to create a localblast
database then the programblastn
.
Creating a custom database with makeblastdb
As we use a custom database for the first time, if we have a fasta format file of these sequences we have to create a database from our fasta format file AllEst.fasta
with the makeblastdb
command.
- Go inside the
Bank
directory and list the content of this directory - create a nucleotide database by typing:
- List the content of the directory to check if the database has been indexed
BLASTing against our remote database
- Go inside the
blastAnalysis
directory - print the blast manual -
blastn -help
- Perform the blast by typing the following command, using transcritsAssembly.fasta as a query file:
- Display the result file with the command
less
- Perform the blast adding the option outfmt equals to 6 and display the result file
- Perform the blast adding the option -outfmt ‘6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore’
####### Output formats
####### Output tabular format (6 or 7): one line per results splitted in 12 fields.
Parsing the results file
- Display the first 10 lines of the file -
head
- Display the first 15 lines of the file -
head
- Display it last 15 lines -
tail
- Count the number of line -
wc
- Sort the lines using the second field (subject id) by alphabetical order, ascending then descending -
sort
- Sort lines by e‐value (ascending) and by “alignment length” (descending) -
sort
- Extract the first 4 fields -
cut
- Extract query id, subject id, evalue, alignment length
cut
Practice 10 : Redirecting a command output to a File with >
- Extract all ESTs identifiers and print them in the file ESTs_accession.list -
cut >
Practice 11 : Sending data from one command to another (piping) with |
- How many sequences contains the file transcritsAssembly.fasta ?
- How many sequences have a homology with EST sequences ? (TIPs:
cut
command withsort -u
(uniq) oruniq
command )) - Extract ESTs sequences from database (or “bank”) with
seqtk
by typing :
- Count the number of sequences extracted -
grep ">" c
- Get the help of theprogram
seqtk comp
-seqtk comp
- Run the program
seqtk comp
on your fasta file created just before
- Display only the accession and the length with the command
cut
directly from the output of the commandseqtk comp
- What is the shorthest sequence (Accession and length)?
- What is the longuest sequence (Accession and length)?
Practice 12 : Dealing with vcf Files
- List the content of the directory
/scratch2/VCF_LINUX
- Before creating your directory
/scratch2/VCF_LINUX_FORMATIONX
, displays the amount of disk space available on the file system with the commanddf
- Create your directory the directory
/scratch2/VCF_LINUX_FORMATIONX
and go into it. - Create a shortcut of the different vcf file in the directory
/scratch2/VCF_LINUX
with the commandln -s source_file myfile
For example
Thus, OgOb-all-MSU7-CHR6.GATKVARIANTFILTRATION.LINK.vcf is the name of the new file containing the reference to the file named OgOb-all-MSU7-CHR6.GATKVARIANTFILTRATION.vcf.
- Repeat the same operation with the other vcf files
- List the content of the directory
VCF_LINUX_FORMATIONX
withls -l
- Display the size of each vcf files in the directory
/scratch2/VCF_LINUX
then in your directory/scratch2/VCF_LINUX_FORMATIONX
-du
- Display the size of the directory
/scratch2/VCF_LINUX
and the directory/scratch2/VCF_LINUX_FORMATIONX
-du
- Displays the first lines of the vcf files -
zcat, head
commands - Displays the last lines of the vcf files -
zcat, tail
commands - Count the lines of the vcf files -
zcat, wc -l
command
Practice 13 : Filtering VCF files |
- zgrep
To get some basics stats of the output VCF files, let’s use linux command!
- How many polymorphisms were detected (Displaying all the lines which does not start with # / header lines) in the different vcf files ?
- How many polymorphisms were considered “good” after filtering steps by GATK VARIANTFILTRATION (ie marked
PASS
)? - How many polyporphisms were considered “bad” and filtered out (Displaying all the lines without the
PASS
tag )? - Save only the “good” polymorphisms detected that were considered “good” in a new file called
OgOb-all-MSU7-CHR6.GATKVARIANTFILTRATION.GOOD.vcf
- Display the size of this new vcf files
Practice 14 : Getting basic stats
- Go into the directory
LINUX-TP/Data/fastq/pairedTwoIndividusGzippedIrigin
-cd
- List the directory content
- Run fastq-stats program ( more to get stats about the fastq file
irigin1_1.fastq.gz
- BONUS :
Use a
for
loop to run fastq-stats with every fastq file in the directory
Tips
How to convert between Unix and Windows text files?
The format of Windows and Unix text files differs slightly. In Windows, lines end with both the line feed and carriage return ASCII characters, but Unix uses only a line feed. As a consequence, some Windows applications will not show the line breaks in Unix-format files. Likewise, Unix programs may display the carriage returns in Windows text files with Ctrl-m (^M) characters at the end of each line.
There are many ways to solve this problem as using text file compatible, unix2dos / dos2unix command or vi to do the conversion. To use the two last ones, the files to convert must be on a Linux computer.
use notepad as file editor on windows
When using Unix files on Windows, it is useful to convert the line endings to display text files correclty in other Windows-based or linux-based editors.
In Notepad++: Edit > EOL Conversion > Windows Format
unix2dos
& dos2unix
# Checking if my fileformat is dos
[tranchant@master0 ~]$ cat -v test.txt
jeidjzdjzd^M
djzoidjzedjzed^M
ndzndioezdnezd^M
# Converting from dos to linux format
[tranchant@master0 ~]$ dos2unix test.txt
dos2unix: converting file test.txt to Unix format ...
[tranchant@master0 ~]$ cat -v test.txt
jeidjzdjzd
djzoidjzedjzed
ndzndioezdnezd
# Converting from linux to dos format
[tranchant@master0 ~]$ unix2dos test.txt
unix2dos: converting file test.txt to DOS format ...
[tranchant@master0 ~]$ cat -v test.txt
jeidjzdjzd^M
djzoidjzedjzed^M
ndzndioezdnezd^M
[tranchant@master0 ~]$
vi
- In vi, you can remove carriage return _^M _ characters with the following command:
:1,$s/^M//g
- To input the ^M character, press Ctrl-v, and then press Enter or return.
- In vim, use :
set ff=unix
to convert to Unix; use:set ff=dos
to convert to Windows.
How to open and read a file through a text editor on a distant linux server?
vi
nano
Komodo Edit
After installing Komodo Edit, open it and click on Edit –> Preferences
Select Servers from the left and enter sftp account information, then save it.
To edit a distant content, click on File –> Open –> Remote File
Getting Help on any command-line
with the option --help
Virtually all commands understand the -h
(or --help
) option, which produces a short usage description of the command and its options.
[tranchant@master0 ~]$ ls --help
Utilisation : ls [OPTION]... [FILE]...
Afficher des renseignements sur les FILEs (du répertoire actuel par défaut).
Trier les entrées alphabétiquement si aucune des options -cftuvSUX ou --sort
ne sont utilisées.
Les arguments obligatoires pour les options longues le sont aussi pour les
options courtes.
-a, --all ne pas ignorer les entrées débutant par .
-A, --almost-all ne pas inclure . ou .. dans la liste
--author avec -l, afficher l'auteur de chaque fichier
-b, --escape afficher les caractères non graphiques avec des
protections selon le style C
--block-size=SIZE convertir les tailles en SIZE avant de les
afficher. Par exemple, « --block-size=M » affiche
les tailles en unités de 1 048 576 octets ;
consultez le format SIZE ci-dessous
-B, --ignore-backups ne pas inclure les entrées se terminant par ~ dans
la liste
-c avec -lt : afficher et trier selon ctime (date de
dernière modification provenant des informations
d'état du fichier) ;
avec -l : afficher ctime et trier selon le nom ;
autrement : trier selon ctime
-C afficher les noms en colonnes
--color[=WHEN] colorier la sortie ; par défaut, WHEN peut être
« never » (jamais), « auto » (automatique) ou
« always » (toujours, valeur par défaut) ; des
renseignements complémentaires sont ci-dessous
-d, --directory afficher les noms de répertoires, pas leur contenu
...
with the man
command
Every command and nearly every application in Linux has a man (manual) file, so finding such a file is as simple as typing man command to bring up a longer manual entry for the specified command.
# Type man ls to display the related manual
LS(1) Manuel de l'utilisateur Linux LS(1)
NOM
ls, dir, vdir - Afficher le contenu d'un répertoire
SYNOPSIS
ls [options] [fichier...]
dir [fichier...]
vdir [fichier...]
Options POSIX : [-CFRacdilqrtu1] [--]
Options GNU (forme courte) : [-1abcdfgiklmnopqrstuvwxABCDFGHLNQRSUX] [-w cols] [-T cols] [-I motif]
[--full-time] [--show-control-chars] [--block-size=taille] [--format={long,verbose,commas,across,verti‐
cal,single-column}] [--sort={none,time,size,extension}] [--time={atime,access,use,ctime,status}]
[--color[={none,auto,always}]] [--help] [--version] [--]
DESCRIPTION
La commande ls affiche tout d'abord l'ensemble de ses arguments fichiers autres que des répertoires. Puis ls
affiche l'ensemble des fichiers contenus dans chaque répertoire indiqué. Si aucun argument autre qu'une
option n'est fourni, l'argument « . » (répertoire en cours) est pris par défaut. Avec l'option -d, les
répertoires fournis en argument ne sont pas considérés comme des répertoires (on affiche leurs noms et pas
leurs contenus). Un fichier n'est affiché que si son nom ne commence pas par un point, ou si l'option -a est
fournie.
Chacune des listes de fichiers (fichiers autres que des répertoires, et contenu de chaque répertoire) est
triée séparément en fonction de la séquence d'ordre de la localisation en cours. Lorsque l'option -l est
.....
Some helpful tips for using the man command :
Arrow keys
: Move up and down the man file by using the arrow keys.q
: Quit back to the command prompt by typing q.
Links
- Related courses : Linux for Dummies
- Tutorials : Linux Command-Line Cheat Sheet