South Green Logo

South Green Trainings pages

Description Hands On Lab Exercises for Linux
Related-course materials Linux for Dummies
Authors Christine Tranchant-Dubreuil (christine.tranchant@ird.fr)
Creation Date 26/02/2018
Last Modified Date 04/04/2023
Modified by G. Sarah, G. Sempere, N. Tando, C. Tranchant

Summary


Preambule

Getting connected to a Linux servers from Windows with SSH (Secure Shell) protocol
Platform Software Description url
mobaXterm An enhanced terminal for Windows with an X11 server and a tabbed SSH client More
Transferring and copying files from your computer to a Linux servers with SFTP (SSH File Transfer Protocol) protocol
Platform Software Description url
filezilla FTP and SFTP client Download
Viewing and editing files on your computer before transferring on the linux server or directly on the distant server
Type Software url
Distant, consol mode nano Tutorial
Distant, consol mode vi Tutorial
Distant, graphic mode komodo edit Download
Linux & windows based editor Notepad++ Download

Practice 1 : Transferring files with filezilla sftp

Download and install FileZilla
Open FileZilla and save the cluster adress into the site manager

In the FileZilla menu, go to File > Site Manager. Then go through these 5 steps:

  1. Click New Site.
  2. Add a CUSTOM NAME for this site such as IRD_HPC.
  3. Add the HOSTNAME (see table below).
  4. Set the Logon Type to “Normal” and insert your username and password used to connect on the IRD HPC
  5. Press the “Connect” button.
Cluster HPC hostname
IRD HPC bioinfo-nas.ird.fr
Transferring files

  1. From your computer to the cluster : click and drag an text file item from the left to the right column
  2. From the cluster to your computer : click and drag an text file item from he right to the left column

Practice 2 : Get Connecting on a linux server by ssh

In mobaXterm:

  1. Click the session button, then click SSH.
    • In the remote host text box, type: HOSTNAME (see table below)
    • Check the specify username box and enter your user name
  2. In the console, enter the password when prompted.
Cluster HPC hostname
IRD HPC bioinfo-nas.ird.fr

Practice 3 : First steps : prompt & pwd

# get the file from the web
wget http://itrop.ird.fr/LINUX-TP/LINUX-TP.tar.gz

# decompress the gzip file
tar -xzvf LINUX-TP.tar.gz


Practice 4 : List the files using ls command


Practice 5 : List the files using ls command and metacharacter *


Practice 6 : Moving into file system using cd and ls command


NOTE

Test the command tree

[tranchant@node6 LINUX-TP]$ tree
.
├── AllEst.fasta
├── Bank
│   ├── referenceArcad.fasta
│   ├── referenceIrigin.dict
│   ├── referenceIrigin.fasta
│   ├── referenceIrigin.fasta.fai
│   ├── referencePindelChr1.fasta
│   ├── referencePindelChr1.fasta.fai
│   ├── referenceRnaseq.fa
│   └── referenceRnaseqGFF.gff3
├── Data
│   ├── fastq
│   │   ├── assembly
│   │   │   ├── ebolaAssembly
│   │   │   │   ├── ebola1.fastq
│   │   │   │   ├── ebola1.fq
│   │   │   │   ├── ebola2.fastq
│   │   │   │   └── ebola2.fq
│   │   │   └── pairedOneIndivuPacaya
│   │   │       ├── g02L5Mapped_R1.fq
│   │   │       └── g02L5Mapped_R2.fq
...
│   │   ├── tlara_tRNA_aln10.output.gz
│   │   ├── tlara_tRNA_aln50.output.gz
│   │   ├── tlara_tRNA_aln51.output.gz
│   │   ├── two_profiles.template_file
│   │   └── x.gz
│   └── vcf
│       ├── duplicVCF
│       │   ├── smallDuplic-filtered.vcf
│       │   └── smallDuplic.vcf
│       ├── singleVCF
│       │   └── GATKVARIANTFILTRATION.vcf
│       ├── testsnmf.geno
│       ├── vcfForRecalibration
│       │   └── control.vcf
│       └── vcfForSNiPlay
│           └── testsnmf.vcf
├── Fasta
│   ├── C_AllContigs.fasta
│   ├── contig_tgicl.fasta
│   ├── enterobacteries.fasta
│   ├── sequence.fasta
│   └── uniprot_sprot.fasta
├── Script
│   ├── array.pl
│   ├── codon_usage.pl
│   ├── hash.pl
│   ├── helloWorld.pl
│   ├── loops-for.pl
│   ├── matching.pl
│   ├── readFasta.pl
│   ├── retrieve-accession.pl
│   ├── sorting-array.pl
│   ├── string-array.pl
│   └── transliterate.pl
└── transcritsAssembly.fasta

29 directories, 253 files
[tranchant@node6 LINUX-TP]$ 


Practice 7 : Manipulating Files and Folders

We will prepare our blast analysis performed after by creating directory and moving files as showing in the image just below :

Practice 8 : Searching with grep

wget gff_url

Practice 9 : Blast analysis

Connection to bioinfo-inter.ird.fr

Open another terminal or mobaxterm session but this time choose the bioinfo-inter.ird.fr server.

Preparing working environment

Before launching your blast, you have to prepare your working environment (even if we will not use slurm) :

module load bioinfo/blast/2.12.0+
Creating a custom database with makeblastdb

As we use a custom database for the first time, if we have a fasta format file of these sequences we have to create a database from our fasta format file AllEst.fasta with the makeblastdb command.

makeblastdb -in AllEst.fasta -dbtype nucl -parse_seqids
BLASTing against our remote database
blastn -query [fastaFile] -db [databaseFile] -out [resultFile]
blastn -query [fastaFile] -db [databaseFile] -outfmt [0-11] -out [resultFile]
 blastn -query [fastaFile] -db [databaseFile] -outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore' -out [resultFile]

####### Output formats

The flag for the output format is -outfmt followed by a number which denotes the format request :

0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1,
10 = Comma-separated values,
11 = BLAST archive format (ASN.1)
</pre>

####### Output tabular format (6 or 7): one line per results splitted in 12 fields.

1. query id
2. subject id
3. percent identity
4. alignment length
5. number of mismatche-
6. number of gap openings
7. query start
8. query end
9. subject start
10. subject end
11. expect value
12. bit score
Parsing the results file

Practice 10 : Redirecting a command output to a File with >


Practice 11 : Sending data from one command to another (piping) with |

module load bioinfo/seqtk/1.3-r106
seqtk 
seqtk subseq
seqtk subseq [bank.fasta] [ests.id] | head
seqtk subseq [bank.fasta] [ests.id] > ests.fasta
ests.id the file containing the sequence names 
bank.fasta the file containig the sequences that we want to extract
seqtk comp  FASTA_FILE | head

Practice 12 : Dealing with vcf Files

For example

ln -s /scratch2/VCF_LINUX_FORMATIONX/OgOb-all-MSU7-CHR6.GATKVARIANTFILTRATION.vcf.gz OgOb-all-MSU7-CHR6.GATKVARIANTFILTRATION.LINK.vcf.gz

Thus, OgOb-all-MSU7-CHR6.GATKVARIANTFILTRATION.LINK.vcf is the name of the new file containing the reference to the file named OgOb-all-MSU7-CHR6.GATKVARIANTFILTRATION.vcf.


Practice 13 : Filtering VCF files | - zgrep

To get some basics stats of the output VCF files, let’s use linux command!


Practice 14 : Getting basic stats

fastq-stats -D irigin1_1.fastq.gz
for file in *fastq; do 
  fastq-stats -D $file > $file.fastq-stats ; 
done;

Tips

How to convert between Unix and Windows text files?

The format of Windows and Unix text files differs slightly. In Windows, lines end with both the line feed and carriage return ASCII characters, but Unix uses only a line feed. As a consequence, some Windows applications will not show the line breaks in Unix-format files. Likewise, Unix programs may display the carriage returns in Windows text files with Ctrl-m (^M) characters at the end of each line.

There are many ways to solve this problem as using text file compatible, unix2dos / dos2unix command or vi to do the conversion. To use the two last ones, the files to convert must be on a Linux computer.

use notepad as file editor on windows

When using Unix files on Windows, it is useful to convert the line endings to display text files correclty in other Windows-based or linux-based editors.

In Notepad++: Edit > EOL Conversion > Windows Format

unix2dos & dos2unix

# Checking if my fileformat is dos
[tranchant@master0 ~]$ cat -v test.txt
jeidjzdjzd^M
djzoidjzedjzed^M
ndzndioezdnezd^M

# Converting from dos to linux format
[tranchant@master0 ~]$ dos2unix test.txt
dos2unix: converting file test.txt to Unix format ...
[tranchant@master0 ~]$ cat -v test.txt
jeidjzdjzd
djzoidjzedjzed
ndzndioezdnezd

# Converting from linux to dos format
[tranchant@master0 ~]$ unix2dos test.txt
unix2dos: converting file test.txt to DOS format ...
[tranchant@master0 ~]$ cat -v test.txt
jeidjzdjzd^M
djzoidjzedjzed^M
ndzndioezdnezd^M
[tranchant@master0 ~]$
vi

How to open and read a file through a text editor on a distant linux server?
vi

Manual

nano

Manual

Komodo Edit

After installing Komodo Edit, open it and click on Edit –> Preferences

Select Servers from the left and enter sftp account information, then save it.

To edit a distant content, click on File –> Open –> Remote File


Getting Help on any command-line

with the option --help

Virtually all commands understand the -h (or --help) option, which produces a short usage description of the command and its options.


[tranchant@master0 ~]$ ls --help
Utilisation : ls [OPTION]... [FILE]...
Afficher des renseignements sur les FILEs (du répertoire actuel par défaut).
Trier les entrées alphabétiquement si aucune des options -cftuvSUX ou --sort
ne sont utilisées.

Les arguments obligatoires pour les options longues le sont aussi pour les
options courtes.
  -a, --all                  ne pas ignorer les entrées débutant par .
  -A, --almost-all           ne pas inclure . ou .. dans la liste
      --author               avec -l, afficher l'auteur de chaque fichier
  -b, --escape               afficher les caractères non graphiques avec des
                               protections selon le style C
      --block-size=SIZE      convertir les tailles en SIZE avant de les
                               afficher. Par exemple, « --block-size=M » affiche
                               les tailles en unités de 1 048 576 octets ;
                               consultez le format SIZE ci-dessous
  -B, --ignore-backups       ne pas inclure les entrées se terminant par ~ dans
                               la liste
  -c                         avec -lt : afficher et trier selon ctime (date de
                               dernière modification provenant des informations
                               d'état du fichier) ;
                               avec -l : afficher ctime et trier selon le nom ;
                               autrement : trier selon ctime
  -C                         afficher les noms en colonnes
      --color[=WHEN]         colorier la sortie ; par défaut, WHEN peut être
                               « never » (jamais), « auto » (automatique) ou
                               « always » (toujours, valeur par défaut) ; des
                               renseignements complémentaires sont ci-dessous
  -d, --directory            afficher les noms de répertoires, pas leur contenu
...


with the man command

Every command and nearly every application in Linux has a man (manual) file, so finding such a file is as simple as typing man command to bring up a longer manual entry for the specified command.


# Type man ls to display the related manual

LS(1)                                      Manuel de l'utilisateur Linux                                      LS(1)

NOM
       ls, dir, vdir - Afficher le contenu d'un répertoire

SYNOPSIS
       ls [options] [fichier...]
       dir [fichier...]
       vdir [fichier...]

       Options POSIX : [-CFRacdilqrtu1] [--]

       Options  GNU  (forme  courte)  :  [-1abcdfgiklmnopqrstuvwxABCDFGHLNQRSUX]  [-w  cols]  [-T  cols] [-I motif]
       [--full-time]  [--show-control-chars]   [--block-size=taille]   [--format={long,verbose,commas,across,verti‐
       cal,single-column}]       [--sort={none,time,size,extension}]       [--time={atime,access,use,ctime,status}]
       [--color[={none,auto,always}]] [--help] [--version] [--]

DESCRIPTION
       La commande ls affiche tout d'abord l'ensemble de ses arguments fichiers autres que des répertoires. Puis ls
       affiche  l'ensemble  des  fichiers  contenus  dans chaque répertoire indiqué. Si aucun argument autre qu'une
       option n'est fourni, l'argument « . » (répertoire en cours) est pris  par  défaut.  Avec  l'option  -d,  les
       répertoires  fournis  en argument ne sont pas considérés comme des répertoires (on affiche leurs noms et pas
       leurs contenus). Un fichier n'est affiché que si son nom ne commence pas par un point, ou si l'option -a est
       fournie.

       Chacune  des  listes  de fichiers (fichiers autres que des répertoires, et contenu de chaque répertoire) est
       triée séparément en fonction de la séquence d'ordre de la localisation en cours.  Lorsque  l'option  -l  est
       .....

Some helpful tips for using the man command :



License

The resource material is licensed under the Creative Commons Attribution 4.0 International License (here).