-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathblastall_formatdb
52 lines (47 loc) · 3.23 KB
/
blastall_formatdb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
For instance,we should mapping the given seq into stranscipts(assembly or three generate data),then on the Linux,blast software
must be installed.
first and foremost,we should find the non-redundancy .fa file and read your query seq file(fa).
when above work alread done,we can begin the maping with blast software.
Firstly,we should use the formatdb software to build the database with .fa file,command show folow:
$/home/hui/formatdb -i <fasta file> -p F -o T #F was nucleic acid, T represent protein database ##note:short seq created database easier successufully,on the conversely,failure.
then,command will generated some binary file in your local work dir.
so,we should use blastall software.
$/home/hui/blastall -p blastn -d <prefix database> -i <query fasta> -e 1e-10 -v 1 -m 8 -b 10 -o output_file.txt
$blastall -p tblastx -d database/Demo.Unigene.fa -i ../Analysis_antifreezing_protein/antifreeze_protein.fa -e 1e-10 -v 1 -m 8 -b 10 -o output_file_protein.txt
Note:tblastx represented that inputting nucleotide seq should be translate into protein to mapping to the database translate into protein yet.
Comment:-p program type :blastn(nucleotide mapping nucleotide database) blastp(protein sequence) blastx(nucleotide maping protein database)
and so on.
-d database
-i query sequence(fasta)
-e expectation value(mean value)
-v one-line description(default 500)
-m display format,8 was tabular(table)
-b output tabular result counts,record numbers
-o output file
-m 0 output the specific information that map the database.
-m 8 output format annotation:
left to right
query id,subject id,identity,hit length,missing counts,gap counts,query begin,query end,subject begin,subject end,E-value,Score
remark:high score and low E must be selected.
all information regarding blast software,you should go to see the home page with blast.
Additional comments:
Formatdb to create database for the sake of blast.
step1:blast seq
blastx -num_descriptions 100 -num_alignments 2 -evalue 1e-5 -db ~/ncbi/Nt_Nr_division/nr_PLN -query ~/query/Q686.fa -num_threads 2 -out ~/Result/blastx_table1.blast
step2:convert specific file,blastx_table1.blast, into tab format.
perl ~/blast/bin/blast_parser.pl -nohead -eval 1 -tophit 1 -m 0 -topmatch 1 blastx_table1.blast > blastx_table1.tab.best
Note: -num_descriptions 100 means showing 100 descriptions in blastx_table1.blast file as:
ref|XP_008238972.1| PREDICTED: uncharacterized protein LOC103337... 60.8 6e-09
ref|XP_008373782.1| PREDICTED: uncharacterized protein LOC103437... 48.5 6e-06 here display two description.
-num_alignments 2 means showing two concrete mapping information as:
Query 260 LQCFYCHGFGHIEAVYpskassvppppSHLQGMIAQYPSHGGNQQWVADTGANTHITNDL 81
++C C +GH ++ + P HL M AQ S WV DTGAN+HITNDL
Sbjct 353 IRCQICGRYGHSALDCYNRLNMSNEGPQHLTAMTAQQSSSPRPPNWVVDTGANSHITNDL 412
###latest blast 2.9.0+ 2021-1-11
makeblastdb -in genome.fasta -dbtype nucl -parse_seqids -out ./index
database sequence:genome.fasta
dbtype:'nucl' was nuclear; 'prot' was protein
the database name -out
###mapping:
## mapping the query sequence into nuclear database
blastn -query input.fa -db ./index -evalue 1e-6 -outfmt 6 -num_threads 6 -out out_file