Metabarcoding using Qiime2 and DADA2This page describes how to analyse metabarcoding data using Qiime2 and DADA2. From fastq files to SVs table obtention and Phyloseq analysis. |
Authors | Marie Simonin and Julie Orjuela |
Institut | IRD |
dada2, silva, vsearch, metabarcoding, 16S, 18S, ITS, Qiime2, denoising, SVs
Files format
fastq, SVs tables, OTU tables
Practice 2 : Obtaining an OTU table with QIIME : Microbiome denoising and pre-processing
Connect you in ssh mode to cluster using formation counts.
1. Import raw sequence data (demultiplexed fastQ files) into Qiime2.
- Option with a manifest file: you need to create and use a manifest file that links the sample names to the fastq files The manifest file is a csv file where the first column is the “sample-id”, the second column is the “absolute-filepath” to the fastq.gz file, the third column is the “direction” of the reads (forward or reverse). These are mandatory column names.Here is an example for paired end sequences with Phred scores of 33. !! The csv file must be in the american format: replace “;” by “,” as a separator if needed.
Create the manifest file to import the fastq files in qiime2
Go into the folder where are the fastq.gz
Load Qiime2 on the server
Import the fastq files in Qiime2 (stored in Qiime2 as a qza file). qza file is the data format (fastq, txt, fasta) in Qiime2
2. Verification of sequence quality and number of sequences per sample.
Visualize the qzv file on qiime tools view: qzv file is the visualization format in Qiime2
- If you are working locally (not on the server), use this function to visualize the qzv file online
3. Denoising with DADA2
Based on the quality information and presence of primers the different p-trim and p-trunc parameters need to be changed. they are specific to each study and primers. Here we have forward primers of 21 bp and reverse of 20 bp. The total amplicon length is 291 bp, based on the qzv visualization we decide on the truncation length (p-trunc-len) of the forward and reverse reads. You can change the number of threads on the server with p-n-threads. This command will generate 3 files: the OTU table (16S-table.qza), the representative sequence fasta file (16S-rep-seqs.qza) and denoising statistic file (16S-denoising-stats.qza).
4. Make summary files and visualize the outputs of DADA2.
It necessitates a metadata file with the treatment information (provided). The first column needs to be “sample-id”and the other columns are treatment, site, etc information. Go to Qiime2 View website to visualize the qzv files
5. Assign taxonomy to the SVs.
Download pretrained classifier for the V4 region (Silva 132 99% OTUs from 515F/806R region of sequences) based on the SILVA database:
To create the classifier based on your own parameters (fragment size, region) follow this tutorial, for now we will use the pre-trained classifier for the V4 region (515F-806R) at 99% similarity:
Visualization of the taxonomy output
6. Remove SVs in the table that are Chloroplast or Mitochondria (not bacterial or archaeal taxa)
7. Possible filtering/cleaning steps
Rarefy in Qiime2
Remove SVs that are present only in 1 sample
Filter the the rep-seq.qza to keep only SVs that are present in the final SV table (remove SVs that were Chloroplast, Mitochondria or found in only one sample…)
Summary after cleaning steps
8. Export SV table (biom file) and representative sequences (fasta file) for analyses in R studio (structure and diversity analyses) - Qiime2
9. Make Phylogenetic tree
Export trees