Description | Hands On Lab Exercises for Metabarcoding |
---|---|
Authors | J Orjuela (julie.orjuela@ird.fr), A Dereeper (alexis.dereeper@ird.fr), F Constancias (florentin.constancias@cirad.fr), J Reveilleud (JR) (julie.reveillaud@inra.fr), M Simonin (marie.simonin@ird.fr), F Mahé (frederic.mahe@cirad.fr), A Comte (aurore.comte@ird.fr) |
Creation Date | 18/04/2018 |
Last Modified Date | 22/05/2019 |
Summary
- Practice 1: Obtaining an OTU table with FROGS in Galaxy
- Practice 1.1: Preprocessing
- Practice 1.2: Clustering
- Practice 1.3: Stats on clustering (optional)
- Practice 1.4: Remove chimera
- Practice 1.5 OTU Filtering
- Practice 1.6: Stats on clustering (optional)
- Practice 1.7: Taxonomic affiliation
- Practice 1.8: Affiliation stats
- Practice 1.9: BIOM format standardization
- Practice 1.10: Building a Tree
- Practice 1.12: Workflow in Galaxy
- Practice 2: FROGs in command line
- Practice 3: Handling and visualizing OTU table using PhyloSeq R package
- Links
- License
Practice 1 : Obtaining an OTU table with FROGS in Galaxy
In this training we will first performed metabarcoding analysis with the FROGS v3.1 pipeline in the Galaxy environment https://github.com/geraldinepascal/FROGS
. In a second time, we will perform similar analysis in command line on HPC i-Trop cluster.
- Connect to Galaxy i-Trop with formationN account.
- Create a new history and import Metabarcoding sample datasets (paired-end fastq files compressed by tar ) from
Shared Data / Data libraries /formation Galaxy 2019 / Metabarcoding
. RecoveryDATA_s.tar.gz
andSummary.txt
- Fastq files used here are a subset of reads obtained in a metagenomic study of Edwards et al 2015 containing 4 soil compartments: Rhizosphere, Rhizoplane, Endosphere and Bulk_Soil of a rice culture.
We will launch every step of a metabarcoding analysis as follow :
1.1 Preprocess
- Merge paired reads and dereplicate using the Preprocessing tool with FLASH as merge software -
FROGS Pre-process
- => Read size is 250 pb, expected, minimum and maximun amplicon size are 250,100,350 pb respectively. Use custom sequencing protocol. Use a mistmach rate of 0.15.
- How many sequences have been overlapped?
- How many sequences remain after dereplication?
- What amplicon size is obtained in the majority of merged sequences?
1.2 Clustering
- Build Clustering using swarm -
FROGS Clustering swarm
- => Use an aggregation distance of 1. Don’t use denoising option.
- The biom file shows the abundance of each cluster.
- The fasta file contains the cluster (OTU) representative sequences.
- The tsv file shows what sequences are contained in each cluster.
1.3 Stats on clustering (optional)
- Obtain statistics about abundance of sequences in clusters -
FROGS Clusters stat
- How many clusters were obtained by swarm?
- How many sequences are contained in the biggest cluster?
- How many clusters contain only one sequence?
- Observe the cumulative sequence proportion by cluster size
- Observe cluster sharing between samples through hierarchical clustering tree
1.4 Remove chimera
- Remove chimera using biom obtained from swarm -
FROGS Remove chimera
- What proportion of clusters were kept in this step?
1.5 OTU Filtering
- Filters OTUs on several criteria. -
FROGS Filters
- Eliminate OTUs with a low number of sequences (abundance at 0.005%) and keep OTUs present in at least two samples.
- How many OTUs were removed in this step?
- How many OTUs were removed because of low abundance?
- Relaunch OTU Filtering but using abundance at 0.01%. How many OTUs were removed because of low abundance?
1.6 Stats on clustering (optional)
- Rerun statistics of clusters after filtering -
FROGS Clusters stat
- Look the effect of the cumulative proportion by cluster size.
1.7 Taxonomic affiliation
- Perform taxonomic affiliation of each OTU by BLAST -
FROGS Affiliation OTU
- Use the SILVA 132 16S database for taxonomic assignation by BLAST.
- Activate RDP assignation.
- How many OTU were taxonomically assigned to species?
- Visualize the biom file enriched with taxomonic information.
1.8 Affiliation stats
- Obtain statistics of affiliation -
FROGS Affiliation stat
- Use rarefaction ranks : Family Genus Species
- Observe global distribution of taxonomies by sample.
- Look the rarefaction curve, which is a measure of samples vs diversity.
1.9 BIOM format standardization
Retrieve a standardize biom file using - FROGS BIOM to std BIOM
- You have now a standard BIOM file to phyloseq analysis.
1.10 Building a Tree
- Build a tree with MAFFT and FastTree
FROGS Tree
using filter.fasta and filter.biom
1.11 Phyloseq stats in FROGSTAT
-
Import data in R
FROGSSTAT Phyloseq Import
using the standard BIOM file and thesummary.txt
file without normalisation. -
Make taxonomic barcharts (kingdom level)
FROGSSTAT Phyloseq Composition Visualisation
usingenv_material
as grouping variable and the R data objet. -
Compute alpha diversity
FROGSSTAT Phyloseq Alpha Diversity
Calculate Observed, Chao1 and Shannon diversity indices. Useenv_material
as enviroment variable. -
Compute beta diversity
FROGSSTAT Phyloseq Beta Diversity
. Useenv_material
as grouping variable and the R data objet and ‘Other methods’: cc, unifrac. -
Build a head map plot and ordination
FROGSSTAT Phyloseq Structure Visualisation
: Useenv_material
as grouping variable, the R data objet and the beta-diversity unifrac.tsv output. -
Hierarchical clustering of samples using Unifrac distance matrix
FROGSSTAT Phyloseq Sample Clustering
: Useenv_material
as grouping variable, the R data objet and the beta-diversity unifrac.tsv output. -
Calculate a anova using unifrac distance matrix with
FROGSSTAT Phyloseq Anova
1. 12 Workflow in Galaxy
Import a preformated FROGS workflow from Galaxy. Go to Shared Data / Workflows /FROGS
and import it. This workflow contains the whole of steps used before. Be free of modified it and lauch it if you want.
Practice 2 : Launch FROGs in command line
Pipeline in bash format (for command line use):
https://github.com/SouthGreenPlatform/trainings/blob/gh-pages/files/run_frogs_pipelinev3.sh
-
Connection to account in IRD i-Trop cluster in ssh mode
ssh formationX@bioinfo-master.ird.fr
-
Input data
DATA_s.tar.gz
andsummary.txt
are accessible from nas:/data2/formation/TPMetabarcoding/FROGS/ folder. -
Create a TP-FROGS directory in your $HOME and go inside
- Download
LaunchFROGs_v3.sh
script and give execution rights
-
Visualise
LaunchFROGs_v3.sh
script -
Launch
LaunchFROGsv3.sh
in qsub mode. Give your user name to this script as parametter.
-
Recovery output repertory and transfer it to your local machine using Fillezila or
scp
- if
scp
mode transfert from your local machine terminal as follow
- if
Otherwise, use this download link: https://elwe.rhrk.uni-kl.de/outgoing/OUTPUT_FROGSV3.zip (valid until 2019-06-01)
Practice 3 : Tutoriel Phyloseq Formation Metabarcoding
3.1 Setup your environment
Start with a clean session
load the packages
Let’s define the working directory on your local computer
3.2 Building a Phyloseq object
… and load the data generated using FROGS
Data is a phyloseq object
We can access the ‘OTU’ / sample occurence table with the follwing command
You can also use tidyr syntax to make your code net and tidy
in R, type ? and the function name to some help
Question1 : What is the sequencing depth of the samples ?
Phyloseq has some built-in functions to explore the data
Let’s plot the sorted sequencing depth
Question2 : How many reads are representing each of the first 10 OTU (i.e., swarm’s clusters) ?
We can access the taxonomical information of the different OTU with the follwing command
Metadata are also stored in data phyloseq object
Phyloseq has some built-in functions
3.3 Distribution per OTU and per sample
Let’s plot the sequence distribution per OTU and per sample First, create a dataframe with nreads : the sorted number of reads per OTU, sorted : the index of the sorted OTU and type : OTU
These are the first rows of our dataframe
We can plot this dataframe using ggplot
Now we are going to create another dataframe with the sequencing depth per sample sample_sums()
Let’s bind the two tables
Check the first rows
Check the last rows
We can plot the data using ggplot and wrap the data according to type column (that’s why we specified OTU and Samples )
3.4 Rarefaction curves
Let’s explore the rarefaction curves i.e., OTU richness vs sequencing depth
We can do something nicer with ggplot
3.5 OTU table Filtering
We are now going to filter the OTU table
Explore the Taxonomy at the Kingdom level
Remove untargeted OTU (we consider Unclassified OTU at the Kingdom level as noise) using subset_taxa
Remove low occurence / abundance OTU i.e., more than 10 sequences in total and appearing in more than 1 sample
3.6 Normalisation to minumum sequencing depth
Rarefy to en even sequencing depth (i.e., min(colSums(otu_table(data)))
Rarefaction curves on filtered data
One can export the filtered OTU table
3.7 alpha-diversity
We can now explore the alpha-dviersity on the filtered and rarefied data
That plot could be nicer Data to plot are stored in p$data
boxplot using ggplot
More Complex
Export the alpha div values into a dataframe in short format
Alpha-div Stats using TukeyHSD on ANOVA
3.8 beta-diversity
Compute dissimilarity
Ordination
run PCoA ordination on the generated distance
Samples coordinate on the PCoA vecotrs are stored in but plot_ordination can make use of ord object easily
Let’s see if the observed pattern is significant using PERMANOVA i.e., adonis function from vegan
dispersion
Let’s see if there are difference in dispersion (i.e., variance)
ANOSIM
ANOSIM test can also test for differences among group
3.9 Composition plot
Now, we would like to plot the distribution of phylum transformed in %
We can generate a nicer plot using plot_composition function
3.10 Some exercises
- 1 - How many OTUs belong to Archaea (in two commands using
%>%
) ?
Answer :
- 2 - Plot OTU richness (and only richness = ‘Observed’ in phyloseq) of Alphaproteobacteria among samples
Answer :
- 3 - Explore beta diversity of Alphaproteobacteria using “morisita” distance without data transformation and without considering endosphere samples (subset_samples). Are sample from Bulk_Soil and Rhizosphere different in terms of beta-diversity (use %in% c(“Soil”, “Prank”) in order to subset from several categories
Answer :
- 4 - Plot proportion Chloroplasts
Answer :
- 5 - Plot proportion of OTU belonging to Mitochondria and facet the plot according to Site (i.e., env_material).
Some surprises ?
Answer:
- 6 - beta-diversity
6a. Plot beta-diversity of Mitochondria and Chloroplasts OTU using Bray-Curtis distance on untransformed table
6b. what is the percentage of Mitochondria and Chloroplasts OTU
6c. plot a basic barplot of it
Answer:
- 7 - Do the filterd-out OTU display alpha / beta diversity patterns?
Links
- Related courses :