- Goal of the exercice
- Prerequisites
- EGNEP run
- Understanding EGNEP run
- Understanding EGNEP results
- Busco run
- MyGenomeBrowser
- EGNEP errors
- EGNEP kill
- How to install EGNEP
Goal of the exercice
Annotate the genes of the whole Arabidopsis genome from the following dataset with Eugene-EP appliance / VM from IFB cloud
- 116M TAIR_genome.fasta (135 Mbp 5 chr)
- 30M TAIR_est2.fasta RIKEN Arabidopsis full-length cDNA clones (RAFL clones) http://epd.brc.riken.jp/en/pdna/rafl_clones
- 8.3M uniprot_sp_viridiplantae_not_camelineae_short_header.fna UniprotKB Swiss-Prot taxonomy:”Viridiplantae [33090]” NOT taxonomy:”Camelineae [980083]” => 23706
- 63M uniprot_trembl_brassiceae_short_header.fna UniprotKB TrEMBL taxonomy:”Brassiceae 981071” => 171467
Prerequisites
- You need an account on the IFB cloud with an ACTIVE SSH-KEY. If you don’t have any please refer to the following documentation:
Create a user account on the IFB cloud & ACTIVATE YOUR SSH-KEY (if not already done)
- You need to be connected to an EuGène appliance with 8 CPU et 32 Go de RAM. If it is not the case, please refer to the following documentation:
Launch a Eugene-EP appliance on the IFB cloud
EGNEP run
From an Eugene IFB cloud appliance
1) Preparing your data
From your the root terminal of your appliance, index the databanks
cd /root/bank_tair/
makeblastdb -in TAIR_est2.fasta -dbtype nucl
makeblastdb -in repbase20.05_aaSeq_cleaned_TE.fa -dbtype prot
makeblastdb -in uniprot_sp_viridiplantae_not_camelineae_short_header.fna -dbtype prot -parse_seqids
makeblastdb -in uniprot_trembl_brassiceae_short_header.fna -dbtype prot -parse_seqids
2) Running & checking
Run EGNEP (without –no_red option)
cd
nohup time $EGNEP/bin/int/egn-euk.pl --indir /root/input_dir/ --outdir /root/output_dir/ --cfg /root/bank_tair/egnep-test.cfg --workingdir /root/work_dir/ >& pipeline.txt &
Check the progress of the run
less pipeline.txt
tail -f pipeline.txt
################################################################################
# /usr/bin/egnep-1.4/bin/int/egn-euk.pl --indir /root/input_dir/ --outdir /root/output_dir/ --cfg /root/bank_tair/egnep-test.cfg --workingdir /root/work_dir/
# EuGene Pipeline EUK - version 1.4
# EUGENEDIR /usr/bin/eugene-4.2a
# EGNEP /usr/bin/egnep-1.4
# Log file /root/work_dir/logger.1536944185.2045.txt
################################################################################
################################################################################
Create tree.....................................................................started
Create tree.....................................................................done
######################### Protein database cleaning ##########################
##################### Protein sequence similarity search #####################
BlastX uniprot_sp_viridiplantae_not_camelineae_short_header.fna uniprot_trembl_brassiceae_short_header.fnastarted
BLASTX PARAMETERS=-outfmt 6 -evalue 0.01 -gapopen 9 -gapextend 2 -max_target_seqs 500000 -max_intron_length 15000 -seg yes
UBLAST PARAMETERS=-threads 8 -evalue 1 -lopen 9 -lext 2 -accel
Understanding EGNEP run
1) What are the value of environment variable $EGNEP and $EUGENEDIR ?
echo $EGNEP
/usr/bin/egnep-1.4
echo $EUGENEDIR
/usr/bin/eugene-4.2a
2) Where is the EGN-EP configuration file and how to set the data parameters ?
gedit bank_tair/egnep-test.cfg &
blastx_db_list=1 2
blastx_db_1_file=/root/bank_tair/uniprot_sp_viridiplantae_not_camelineae_short_header.fna
blastx_db_2_file=/root/bank_tair/uniprot_trembl_brassiceae_short_header.fna
est_list=1
est_1_file=/root/bank_tair/TAIR_est2.fasta
repeat_sequence_db=/root/bank_tair/repbase20.05_aaSeq_cleaned_TE.fa
3) Where is the EGN-EP executable?
/usr/bin/egnep-1.4/bin/int/egn-euk.pl
4) Where is the Log file?
/root/work_dir/logger.1536944185.2045.txt
Understanding EGNEP results
EGN-EP
1) Where are the analysis results of Chr4?
/root/work_dir/0001/Chr4
ls *.gff3
Chr4.blast1.gff3 Chr4.est1.gff3 Chr4.masked.blastrep.gff3 Chr4.repet_noexpressed_nosimprot.gff3 Chr4.rnammer.gff3
Chr4.blast2.gff3 Chr4.ltrharvest.gff3 Chr4.red.gff3 Chr4.rfamscan.gff3 Chr4.trnascan.gff3
2) Where are the eugenev0.par and eugenev1.par, what are the difference?
/root/work_dir/egn_param
diff eugenev0.par eugenev1.par
< Sensor.AnnotaStruct.use 3
---
> AnnotaStruct.FileExtension[3] repet_noexpressed_nosimprot
> AnnotaStruct.TranscriptFeature[3] transcript
> AnnotaStruct.Start*[3] 0
> AnnotaStruct.StartType[3] s
> AnnotaStruct.Stop*[3] 0
> AnnotaStruct.StopType[3] s
> AnnotaStruct.Acc*[3] 0
> AnnotaStruct.AccType[3] s
> AnnotaStruct.Don*[3] 0
> AnnotaStruct.DonType[3] s
> AnnotaStruct.TrStart*[3] 0
> AnnotaStruct.TrStartType[3] s
> AnnotaStruct.TrStop*[3] 0
> AnnotaStruct.TrStopType[3] s
> AnnotaStruct.TrStartNpc*[3] 0
> AnnotaStruct.TrStartNpcType[3] s
> AnnotaStruct.TrStopNpc*[3] 0
> AnnotaStruct.TrStopNpcType[3] s
> AnnotaStruct.Exon*[3] 0
> AnnotaStruct.Intron*[3] 1
> AnnotaStruct.CDS*[3] 0
> AnnotaStruct.npcRNA*[3] 0
> AnnotaStruct.Intergenic*[3] 2
> AnnotaStruct.format[3] GFF3
> Sensor.AnnotaStruct.use 4
3) Where are the sensor priorities of the eugenev1.par?
Sensor.MarkovIMM 1
Sensor.SignalWAM 10
Sensor.AnnotaStruct 30
Sensor.BlastX 20
Sensor.Est 20
Sensor.MarkovIMM.use 1
Sensor.SignalWAM.use 2 (acceptor, donor)
Sensor.AnnotaStruct.use 4 (trna, rrna, ncrna, repeat)
Sensor.Est.use 1
Sensor.BlastX.use 1
4) Where is the report file?
/root/output_dir
more report.1536944185.2045.txt
## Transcriptome mapping information
Nb transcriptome seq_number mapped_sequence_number(raw gmap result) mapped_filtered_sequence_number(after filtering) mapped_filtered_seque
nce__percentage
1 /root/bank_tair/TAIR_est2.fasta 20683 20668 20625 99.7
## Splicing sites read in the training dataset
Canonical acceptor AG 72636 sites
Canonical donor GT 71768 sites
Non canonical donor GC 771 sites 1.1% of the canonical site number
## Arabidopsis Thaliana specific repeat domains
File=/root/work_dir/db/SpeciesRepeatDomain.fa
Repeat domain number=1274 Repeat domain length=684374 nt (0.6% of genomic sequences)
## LTR masking
LTR region length=2054544 nt (1.7% of genomic sequences)
## Red repeat predictions
Red region length=23443746 nt (19.6% of genomic sequences)
## Repeat regions (LTR + species specific repeat domains, where no expression and no protein similarity)
Repeat region number=10735 Repeat region length=21406343 nt (17.9% of genomic sequences)
5) Where are the general statistics?
/root/output_dir
root@machine068d5c96-f666-4443-aea3-7c6d0c83170a:~/output_dir# more sequences.general_statistics.xls
Number of nucleotides (without 'N') 119482012
Per cent GC 36.06
Total number of genes 25968
Total nucleotides (bp) 51117321
** Protein coding genes
Number of protein coding genes 23786
Mean gene length (bp) 2132.27
Coding nucleotides (bp) 29924318
Per cent genes with introns 78
Per cent genes with five UTR 59
Per cent genes with three UTR 65
Exons
Mean number per gene 5.22
Mean length (bp) 280.80
GC per cent 42.85
Introns
Mean number per gene 4.22
Mean length (bp) 157.66
GC per cent 32.52
CDS
Mean length (bp) 1258.06
Min length (bp) 123.00
Max length (bp) 15234.00
GC per cent 44.14
five_prime_UTR
Mean length (bp) 131.84
GC per cent 38.45
three_prime_UTR
Mean length (bp) 201.22
GC per cent 33.01
** Non protein coding genes
Number of non protein coding genes 2182
Mean ncRNA gene length (bp) 182.93
Min length (bp) 39
Max length (bp) 7615
GC per cent 46.45
Per cent ncRNA genes with introns 0
Mean exon number per ncRNA gene 1.00
** Intergenic (inter protein-coding genes)
Mean length 2639.79
GC per cent 33.26
6) Where are the gene annotation file and the polypeptide sequence file?
/root/output_dir
root@machine068d5c96-f666-4443-aea3-7c6d0c83170a:~/output_dir# more sequences.gff3
##gff-version 3
##sequence-region Chr1 1 30427671
Chr1 EuGene gene 3634 5894 . + . ID=gene:Chr1g0000001;Name=Chr1g0000001
Chr1 EuGene mRNA 3634 5894 . + . ID=mRNA:Chr1g0000001;Name=Chr1g0000001;Parent=gene:Chr1g0000001
Chr1 EuGene exon 3634 3913 . + . ID=exon:Chr1g0000001.1;Parent=mRNA:Chr1g0000001
Chr1 EuGene exon 3996 4276 . + 2 ID=exon:Chr1g0000001.2;Parent=mRNA:Chr1g0000001
Chr1 EuGene exon 4486 4605 . + 0 ID=exon:Chr1g0000001.3;Parent=mRNA:Chr1g0000001
Chr1 EuGene exon 4706 5095 . + 0 ID=exon:Chr1g0000001.4;Parent=mRNA:Chr1g0000001
Chr1 EuGene exon 5174 5326 . + 0 ID=exon:Chr1g0000001.5;Parent=mRNA:Chr1g0000001
Chr1 EuGene exon 5439 5894 . + 0 ID=exon:Chr1g0000001.6;Parent=mRNA:Chr1g0000001
Chr1 EuGene five_prime_UTR 3634 3759 . + . ID=five_prime_UTR:Chr1g0000001.0;Parent=mRNA:Chr1g0000001;est_cons=100.0;est_incons=0
.0
Chr1 EuGene CDS 3760 3913 . + 0 ID=CDS:Chr1g0000001.1;Parent=mRNA:Chr1g0000001;est_cons=100.0;est_incons=0.0
Chr1 EuGene CDS 3996 4276 . + 2 ID=CDS:Chr1g0000001.2;Parent=mRNA:Chr1g0000001;est_cons=100.0;est_incons=0.0
Chr1 EuGene CDS 4486 4605 . + 0 ID=CDS:Chr1g0000001.3;Parent=mRNA:Chr1g0000001;est_cons=100.0;est_incons=0.0
Chr1 EuGene CDS 4706 5095 . + 0 ID=CDS:Chr1g0000001.4;Parent=mRNA:Chr1g0000001;est_cons=100.0;est_incons=0.0
Chr1 EuGene CDS 5174 5326 . + 0 ID=CDS:Chr1g0000001.5;Parent=mRNA:Chr1g0000001;est_cons=100.0;est_incons=0.0
Chr1 EuGene CDS 5439 5630 . + 0 ID=CDS:Chr1g0000001.6;Parent=mRNA:Chr1g0000001;est_cons=100.0;est_incons=0.0
Chr1 EuGene three_prime_UTR 5631 5894 . + . ID=three_prime_UTR:Chr1g0000001.12;Parent=mRNA:Chr1g0000001;est_cons=100.0;est_incons
=0.0
grep -c '>' sequences_prot.fna
23786
Eugene
7) Where to find and what is the command line to run eugene?
/root/work_dir
more logger.1536944185.2045.txt
export PARALOOP=/usr/bin/egnep-1.4/bin/ext/paraloop ; /usr/bin/egnep-1.4/bin/ext/paraloop/bin/paraloop.pl --clean --wait --ncpus=7 --interleaved --program=Shell --input /root/work_dir/annotationV1//raw_eugene/EGN_ANNOT_1536944185.2045/eugene.cmd.paraloop --output /root/work_dir/annotationV1//raw_eugene/EGN_ANNOT_1536944185.2045/paraloop.output --clean > /dev/null 2>&1
more /root/work_dir/annotationV1//raw_eugene/EGN_ANNOT_1536944185.2045/eugene.cmd.paraloop
export EUGENEDIR=/usr/bin/eugene-4.2a; /usr/bin/eugene-4.2a/bin/eugene -A /root/work_dir/egn_param//eugenev1.par -m /root/work_dir/egn_param//eugene.mat -pg
-O /root/work_dir/annotationV1//raw_eugene/ /root/work_dir/0001/Chr1/Chr1 > /root/work_dir/annotationV1//raw_eugene/Chr1.eugene.stdout 2> /root/work_dir/ann
otationV1//raw_eugene/Chr1.eugene.stderr
export EUGENEDIR=/usr/bin/eugene-4.2a; /usr/bin/eugene-4.2a/bin/eugene -A /root/work_dir/egn_param//eugenev1.par -m /root/work_dir/egn_param//eugene.mat -pg
-O /root/work_dir/annotationV1//raw_eugene/ /root/work_dir/0001/Chr5/Chr5 > /root/work_dir/annotationV1//raw_eugene/Chr5.eugene.stdout 2> /root/work_dir/ann
otationV1//raw_eugene/Chr5.eugene.stderr
8) Where are the intron parameters ?
/usr/bin/eugene-4.2a/models
root@machine068d5c96-f666-4443-aea3-7c6d0c83170a:/usr/bin/eugene-4.2a/models# more intron.dist
40 0.0
41 0.0
If you change it you need to recompile (make; make install)
9) Where are the splice signal WAM files?
/usr/bin/eugene-4.2a/models/WAM/plant
Busco run
run_BUSCO.py -i output_dir/sequences_prot.fna -o BUSCO_output -sp arabidopsis -c 4 -l bank_tair/embryophyta_odb9 -m proteins
10) What are the Busco sumary results?
INFO ****************** Start a BUSCO 3.0.2 analysis, current time: 09/20/2018 08:54:17 ******************
INFO Configuration loaded from /usr/bin/BUSCO/config/config.ini
INFO Init tools...
INFO Check dependencies...
INFO Check input file...
INFO To reproduce this run: python /usr/bin/BUSCO/scripts/run_BUSCO.py -i output_dir/sequences_prot.fna -o BUSCO_output -l /usr/bin/BUSCO/scripts/LINEAGE/embryophyta_odb9/ -m proteins -c 4
INFO Mode is: proteins
INFO The lineage dataset is: embryophyta_odb9 (eukaryota)
INFO Temp directory is ./tmp/
INFO Running HMMER on the proteins:
INFO [hmmsearch] 144 of 1440 task(s) completed at 09/20/2018 08:56:25
INFO [hmmsearch] 288 of 1440 task(s) completed at 09/20/2018 08:59:06
INFO [hmmsearch] 432 of 1440 task(s) completed at 09/20/2018 09:00:54
INFO [hmmsearch] 576 of 1440 task(s) completed at 09/20/2018 09:02:27
INFO [hmmsearch] 720 of 1440 task(s) completed at 09/20/2018 09:03:28
INFO [hmmsearch] 864 of 1440 task(s) completed at 09/20/2018 09:03:58
INFO [hmmsearch] 1008 of 1440 task(s) completed at 09/20/2018 09:04:12
INFO [hmmsearch] 1152 of 1440 task(s) completed at 09/20/2018 09:04:26
INFO [hmmsearch] 1296 of 1440 task(s) completed at 09/20/2018 09:04:36
INFO [hmmsearch] 1440 of 1440 task(s) completed at 09/20/2018 09:04:44
INFO Results:
INFO C:93.2%[S:91.9%,D:1.3%],F:3.2%,M:3.6%,n:1440
INFO 1342 Complete BUSCOs (C)
INFO 1324 Complete and single-copy BUSCOs (S)
INFO 18 Complete and duplicated BUSCOs (D)
INFO 46 Fragmented BUSCOs (F)
INFO 52 Missing BUSCOs (M)
INFO 1440 Total BUSCO groups searched
INFO BUSCO analysis done with WARNING(s). Total running time: 628.484469891 seconds
INFO Results written in /root/run_BUSCO_output/
MyGenomeBrowser
If you need to launch myGenomeBrowser, run the script script_myGenomeBrowser.sh (/root) (login/password generated)
TO COMPLETE SSB
EGNEP errors
1) Variables
The environment variables should be already set
# export EUGENEDIR=/usr/bin/eugene-4.2a
# export EGNEP=/usr/bin/egnep-1.4
2) Index databanks
The database “/root/bank_tair/uniprot-thaliana_swiss2.fasta” does not exist or isn’t indexed. (Use ‘makeblastdb’ program to index).
The database “/root/bank_tair/uniprot-thaliana_trembl2.fasta” does not exist or isn’t indexed. (Use ‘makeblastdb’ program to index).
For software licence reasons, transfer the transposable element polypeptide file, for instance
Downloads SIDIBEBOCS$ scp repbase20.05_aaSeq_cleaned_TE.fa root@134.158.247.40:/root/bank_tair/
3) Program missing
=>The value of the parameter prg_rnammer is >/usr/bin/egnep-1.4/bin/ext/rnammer< which is not a name of an existing and non empty file at /usr/bin/egnep-1.4/bin/int/egn-euk.pl line 2207. Command exited with non-zero status 25
cd /usr/bin/egnep-1.4/bin/ext/
lrwxrwxrwx 1 root root 17 Aug 27 15:14 bedtools2 -> bedtools2-2.24.0/
drwxrwxr-x 11 root root 4096 Aug 27 15:12 bedtools2-2.24.0
drwxr-xr-x 4 root root 4096 Aug 27 15:12 bin
-rwxrwxr-x 1 339 ubuntu 55318 Feb 24 2017 BioFileConverter.pl
drwxrwxr-x 5 339 ubuntu 4096 Feb 24 2017 blast-2.2.26
-rwxrwxr-x 1 339 ubuntu 2509 Feb 24 2017 convert_rfam2gff3.pl
-rwxrwxr-x 1 339 ubuntu 2280 Feb 24 2017 convert_rnammer2gff3.pl
-rwxrwxr-x 1 339 ubuntu 2813 Feb 24 2017 convert_trnascan2gff3.pl
lrwxrwxrwx 1 root root 17 Aug 27 15:12 genometools -> genometools-1.5.6
drwxrwxr-x 15 root root 4096 Aug 27 15:12 genometools-1.5.6
lrwxrwxrwx 1 root root 15 Aug 27 15:22 gmap -> gmap-2017-02-15
drwxr-xr-x 7 11414 1279 4096 Aug 27 15:22 gmap-2017-02-15
drwxrwxr-x 13 990287 93203 4096 Mar 4 2015 hmmer-3.1b2-linux-intel-x86_64
drwxr-xr-x 3 root root 4096 Aug 27 15:12 include
drwxr-xr-x 10 41650 93203 4096 Aug 27 15:08 infernal-1.1.1
drwxr-xr-x 3 2314 cdrom 4096 Jul 4 2006 lib
-rwxr-xr-x 1 2314 cdrom 0 Jul 24 2007 LICENSE
-rwxrwxr-x 1 339 ubuntu 9931 Feb 24 2017 lipm_bed_filter.pl
-rwxrwxr-x 1 339 ubuntu 3351 Feb 24 2017 lipm_bed_split_by_sequence.pl
-rwxrwxr-x 1 339 ubuntu 26402 Feb 24 2017 lipm_bed_to_expr.pl
-rwxrwxr-x 1 339 ubuntu 6591 Feb 24 2017 lipm_bed_to_gff3.pl
-rwxrwxr-x 1 339 ubuntu 4284 Feb 24 2017 lipm_dbprot_remove_repbase.pl
-rwxrwxr-x 1 339 ubuntu 1600 Feb 24 2017 lipm_fasta2overlappingwins.pl
-rwxrwxr-x 1 339 ubuntu 15028 Feb 24 2017 lipm_fasta2tree.pl
-rwxrwxr-x 1 339 ubuntu 10965 Feb 24 2017 lipm_fastafilter.pl
-rwxrwxr-x 1 339 ubuntu 3700 Feb 24 2017 lipm_fastasplitter.pl
-rwxrwxr-x 1 339 ubuntu 18852 Feb 24 2017 lipm_genome_statistics.pl
-rwxrwxr-x 1 339 ubuntu 12125 Feb 24 2017 lipm_m8_to_gff3.pl
-rwxrwxr-x 1 339 ubuntu 5885 Feb 24 2017 lipm_m8tom8plus.pl
-rwxrwxr-x 1 339 ubuntu 7770 Feb 24 2017 lipm_N50.pl
-rwxrwxr-x 1 339 ubuntu 1047496 Feb 24 2017 lipm_nrdb
-rwxrwxr-x 1 339 ubuntu 1172 Feb 24 2017 lipm_smp.pl
-rwxrwxr-x 1 339 ubuntu 13886 Feb 24 2017 lipm_transfer_gff3_attributes.pl
-rwxrwxr-x 1 339 ubuntu 10551 Feb 24 2017 lipm_wig_to_expr.pl
-rwxrwxr-x 1 339 ubuntu 9792 Feb 24 2017 MapWithBlast.pl
lrwxrwxrwx 1 root root 18 Aug 27 15:14 ncbi-blast -> ncbi-blast-2.2.31+
drwxr-xr-x 4 11236 13030 4096 Jun 2 2015 ncbi-blast-2.2.31+
lrwxrwxrwx 1 339 ubuntu 13 Feb 24 2017 paraloop -> paraloop-1.3/
drwxrwxr-x 7 339 ubuntu 4096 Feb 24 2017 paraloop-1.3
lrwxrwxrwx 1 root root 9 Aug 27 15:22 red -> redUnix64
drwxr-x--- 2 10194 10194 4096 Jun 18 2015 redUnix64
-rwxr-xr-x 1 2314 cdrom 0 Aug 27 15:22 rnammer
-rw-r--r-- 1 root root 0 Feb 5 2015 rnammer-1.2.src.tar.Z
-rwxr-xr-x 1 2314 cdrom 8849 Aug 27 15:22 rnammere
drwxr-xr-x 4 root root 4096 Aug 27 15:10 share
lrwxrwxrwx 1 339 ubuntu 17 Feb 24 2017 tRNAscan-SE -> tRNAscan-SE-1.3.1
drwxr-x--- 5 2841 users 4096 Aug 27 15:08 tRNAscan-SE-1.3.1
-rw-r--r-- 1 root root 740960 Jun 16 07:54 tRNAscan-SE.tar.gz
lrwxrwxrwx 1 root root 24 Aug 27 15:22 usearch -> usearch9.2.64_i86linux32
-rwxr-xr-x 1 root root 0 Aug 27 15:22 usearch9.2.64_i86linux32
-rwxr-xr-x 1 2314 cdrom 0 Feb 6 2007 xml2fsa
-rwxr-xr-x 1 2314 cdrom 0 May 22 2007 xml2gff
# rm rnammer
# mv rnammere rnammer
=>The value of the parameter prg_usearch is >/usr/bin/egnep-1.4/bin/ext/usearch< which is not a name of an existing and non empty file at /usr/bin/egnep-1.4/bin/int/egn-euk.pl line 2307. Command exited with non-zero status 25
scp sidibebocs@cc2-login.cirad.fr:/homedir/sidibebocs/work/ganoderma/egnep-1.4/bin/ext/usearch9.2.64_i86linux32 .
4) Red error
You can try with the no_red argument will disable the repeat-masking and thus will require less memory to run. However, it is not recommanded to use this argument as it will potentially have negative effect on gene prediction.
nohup $EGNEP/bin/int/egn-euk.pl --no_red --indir /root/input_dir/ --outdir /root/output_dir/ --cfg /root/bank_tair/egnep-test.cfg --workingdir /root
/work_dir >& pipeline.txt &
5) Empty result file?
# more pipeline.txt
nohup: ignoring input
################################################################################
################################################################################
# /usr/bin/egnep-1.4/bin/int/egn-euk.pl --indir /root/input_dir/ --outdir /root/output_dir/ --cfg /root/bank_tair/egnep-test.cfg --workingdir /root/work_dir/
# EuGene Pipeline EUK - version 1.4
# EUGENEDIR /usr/bin/eugene-4.2a
# EGNEP /usr/bin/egnep-1.4
# Log file /root/work_dir/logger.1535964229.7865.txt
################################################################################
################################################################################
Create tree.....................................................................started
Create tree.....................................................................done
######################### Protein database cleaning ##########################
##################### Protein sequence similarity search #####################
BlastX uniprot-thaliana_swiss2.fasta uniprot-thaliana_trembl2.fasta.............started
BLASTX PARAMETERS=-outfmt 6 -evalue 0.01 -gapopen 9 -gapextend 2 -max_target_seqs 500000 -max_intron_length 15000 -seg yes
UBLAST PARAMETERS=-threads 8 -evalue 1 -lopen 9 -lext 2 -accel 1
/usr/bin/egnep-1.4/bin/ext/gmap/bin/gmap_build -d sequences -D /root/work_dir/db/GMAP_INDEX /root/work_dir/sequences 2> /root/work_dir/db/GMAP_INDEX/gmap_idx.7865.stde
BlastX uniprot-thaliana_swiss2.fasta uniprot-thaliana_trembl2.fasta.............done
########################### Transcriptome mapping ############################
Gmap TAIR_est2.fasta............................................................started
PARAMETERS=-n0 -B 5 -t 8 -L 100000 --min-intronlength=35 -K 25000 --trim-end-exons=25
FILTERS=EST length percentage > 50, identity percentage > 95
Gmap TAIR_est2.fasta............................................................done
############################# IMM model building #############################
Build IMM models................................................................started
BlastX TAIR_est2.fasta.filterlen300 uniprot-thaliana_swiss2.fasta...........started
PARAMETERS=-outfmt 6 -evalue 0.01 -gapopen 9 -gapextend 2 -max_target_seqs 500000 -max_intron_length 15000 -seg yes
BlastX TAIR_est2.fasta.filterlen300 uniprot-thaliana_swiss2.fasta...........done
BLASTX FILTERS= HSP_length > 100 AA, identity percentage > 50, e-value > 0.0001
Gmap TAIR_est2.fasta.filterlen300...........................................started
PARAMETERS=-n0 -B 5 -t 8 -L 100000 --min-intronlength=35 -K 25000 --trim-end-exons=25
FILTERS=EST length percentage > 95, identity percentage > 95
ERROR: no data to train eugene IMM (because no result for mapping of the reference transcriptome to the genomic sequence): choose an other reference transcriptome and launch again.
Gmap TAIR_est2.fasta.filterlen300...........................................done
EGNEP kill
You know the process identifier (PID = 26448)
# nohup $EGNEP/bin/int/egn-euk.pl --no_red --indir /root/input_dir/ --outdir /root/output_dir/ --cfg /root/bank_tair/egnep-test.cfg --workingdir /root/work_dir >& pipeline.txt &
[1] 26448
You can see all the subprocesses
ps -edf | grep egn
root 4507 26367 0 16:11 pts/0 00:00:00 grep --color=auto egn
root 26448 26367 0 15:47 pts/0 00:00:01 /usr/bin/perl /usr/bin/egnep-1.4/bin/int/egn-euk.pl --no_red --indir /root/input_dir/ --outdir /root/output_dir/ --cfg /root/bank_tair/egnep-test.cfg --workingdir /root/work_dir
root 31805 26697 0 15:55 pts/0 00:00:00 /usr/bin/perl /usr/bin/egnep-1.4/bin/int/get_BlastX.pl --sequence /root/work_dir/0001/Chr1/Chr1 --cfg /root/bank_tair/egnep-test.cfg --db /root/work_dir/db/uniprot-thaliana_trembl2.fasta --outfile /root/work_dir/0001/Chr1/Chr1.blast2 --workingdir /root/work_dir/0001/Chr1/work.1536421660.26448/
root 31806 31805 0 15:55 pts/0 00:00:00 /usr/bin/perl /usr/bin/egnep-1.4/bin/ext/MapWithBlast.pl --sequence /root/work_dir/0001/Chr1/Chr1 --db /root/work_dir/db/uniprot-thaliana_trembl2.fasta --output /root/work_dir/0001/Chr1/Chr1.blast2 --workingdir /root/work_dir/0001/Chr1/work.1536421660.26448/ --cfg /root/bank_tair/egnep-test.cfg
root 31807 31806 0 15:55 pts/0 00:00:00 sh -c export PARALOOP=/usr/bin/egnep-1.4/bin/ext/paraloop ; /usr/bin/egnep-1.4/bin/ext/paraloop/bin/paraloop.pl --clean --wait --ncpus=7 --interleaved --program=Shell --input /root/work_dir/0001/Chr1/work.1536421660.26448//BlastX.31806.1536422142/Chr1_cmd.31806.1536422142 --output /root/work_dir/0001/Chr1/work.1536421660.26448//BlastX.31806.1536422142/Chr1_cmd.31806.1536422142.output --clean
root 31808 31807 0 15:55 pts/0 00:00:00 /usr/bin/perl /usr/bin/egnep-1.4/bin/ext/paraloop/bin/paraloop.pl --clean --wait --ncpus=7 --interleaved --program=Shell --input /root/work_dir/0001/Chr1/work.1536421660.26448//BlastX.31806.1536422142/Chr1_cmd.31806.1536422142 --output /root/work_dir/0001/Chr1/work.1536421660.26448//BlastX.31806.1536422142/Chr1_cmd.31806.1536422142.output --clean
You need to kill at least
# kill -9 26448
# kill -9 26697
Before rerunning
# rm pipeline.txt
# rm -fr work_dir
# mkdir work_dir