Name | Commands to manipulate VCF files. |
---|---|
Description | This page describes a serie of tools and linux commands used to manipulate VCF files. |
Authors | christine Tranchant-Dubreuil (christine.tranchant@ird.fr) |
Creation Date | 10/03/2017 |
Last Modified Date | 25/03/2018 |
We need, in this tutorial:
- 1 vcf file
- GATK tools
- bcftools
Keywords
gatk
,bcftools
Summary
- Extracting list of samples from a vcf file
- Extracting a subset of samples from a multigenome vcf file
- Select two samples out of a vcf with many samples with
GATK selectVariants
- Select genotypes from a file containing a list of samples to include with
GATK selectVariants
- Select genotypes from a file containing a list of samples to exclude with
GATK selectVariants
- Select genotypes from a file containing a list of samples to include with
bcftools
- Select two samples out of a vcf with many samples with
- Calculating the nucleotide diversity from a vcf file with
vcftools
Extracting list of samples from a vcf file
one line with all samples with grep
one line by sample with grep | cut | xargs
Extracting a subset of samples from a multigenome vcf file
Select two samples out of a vcf with many samples with GATK selectVariants
Rk : if you get the following error message “Fasta dict file … for reference … does not exist”, please see https://www.broadinstitute.org/gatk/guide/article?id=1601
Select genotypes from a file containing a list of samples to include with GATK selectVariants
Select genotypes from a file containing a list of samples to exclude with GATK selectVariants
Rk : if you get the following error message : “Bad input: Samples entered on command line (through -sf or -sn)) that are not present in the VCF”, run with –ALLOW_NONOVERLAPPING_COMMAND_LINE_SAMPLES