There are four parameters for the executable file:
1. The first parameter is the filename of FASTA file of DNA sequences.
2. The second parameter is the filename of the descriptions of motif candidates. (format)
3. The third parameter is the filename of the descriptions of 1st order Markov model. (format)
4. The fourth parameter is 0 or 1.
0 means not considering double strands. 1 means considering double strands
File Format of motif candidate
|
line 1 an integer: the number of motifs to do ranking |
|
line 2 an integer: the type of motifs
0 means consensus motifs on
1 means a set of motif words on 3 means motifs on PWM (see example3) |
|
line3 to the end description of motifs |
|
For type 0, a typical description is @ID
m1 ACCGTCC "m1" is the name of the motif and it can be omitted, but the token "@ID" can't be omitted. |
|
For type 1, a typical description is @ID
m1 @SIZE
3 TTTGTCAA ATGAGATT TCAAATCG "m1" is the name of the motif and it can be omitted, but the token "@ID" can't be omitted. “ |
|
For type3, a typical description is @ID
GCR1 A 0 2 0 0 0 T 0 4 6 0 0 G 0 0 0 0 0 C 6 0 0 6 6 "m1" is the name of the motif and it can be omitted, but the token "@ID" can't be omitted. |
File Format of descriptions of 1st order Markov model
Assume the markov region is R and R[i] is the i-th char in the chain. The parameter of order-1 markov model
consists of two parts. One is the initial probability which is a 1*4 vector and the other is the tansfer
probability which is a 4*4 matrix.
A typical description of an order-1 markov model is
Pr(R[0]=A)
Pr(R[0]=C)
Pr(R[0]=G)
Pr(R[0]=G)
Pr(R[1]=A|R[0]=A)
Pr(R[1]=C|R[0]=A)
Pr(R[1]=G|R[0]=A)
Pr(R[1]=T|R[0]=A)
Pr(R[1]=A|R[0]=C)
Pr(R[1]=C|R[0]=C)
Pr(R[1]=G|R[0]=C)
Pr(R[1]=T|R[0]=C)
Pr(R[1]=A|R[0]=G)
Pr(R[1]=C|R[0]=G)
Pr(R[1]=G|R[0]=G)
Pr(R[1]=T|R[0]=G)
Pr(R[1]=A|R[0]=T)
Pr(R[1]=C|R[0]=T)
Pr(R[1]=G|R[0]=T)
Pr(R[1]=T|R[0]=T) Back to top