06.04.2006, 08:54  #1 
Moderator Join Date: 09 2001 Location: South Korea, Gumi
Posts: 7,699
Downloads: 102 Uploads: 34 Blog Entries: 16
Reputation: 561  6  KnuthMorrisPratt algorithm Main features
The design of the KnuthMorrisPratt algorithm follows a tight analysis of the Morris and Pratt algorithm. Let us look more closely at the MorrisPratt algorithm. It is possible to improve the length of the shifts. Consider an attempt at a left position j, that is when the the window is positioned on the text factor y[j .. j+m1]. Assume that the first mismatch occurs between x[i] and y[i+j] with 0 < i < m. Then, x[0 .. i1] = y[j .. i+j1] =u and a = x[i] y[i+j]=b. When shifting, it is reasonable to expect that a prefix v of the pattern matches some suffix of the portion u of the text. Moreover, if we want to avoid another immediate mismatch, the character following the prefix v in the pattern must be different from a. The longest such prefix v is called the tagged border of u (it occurs at both ends of u followed by different characters in x). This introduces the notation: let kmpNext[i] be the length of the longest border of x[0 .. i1] followed by a character c different from x[i] and 1 if no such tagged border exits, for 0 < i m. Then, after a shift, the comparisons can resume between characters x[kmpNext[i]] and y[i+j] without missing any occurrence of x in y, and avoiding a backtrack on the text (see figure 7.1). The value of kmpNext[0] is set to 1. Figure 7.1: Shift in the KnuthMorrisPratt algorithm (v border of u and c b). The searching phase can be performed in O(m+n) time. The KnuthMorrisPratt algorithm performs at most 2n1 text character comparisons during the searching phase. The delay (maximal number of comparisons for a single text character) is bounded by log(m) where is the golden ratio ( ). The C code void preKmp(char *x, int m, int kmpNext[]) { int i, j; i = 0; j = kmpNext[0] = 1; while (i < m) { while (j > 1 && x[i] != x[j]) j = kmpNext[j]; i++; j++; if (x[i] == x[j]) kmpNext[i] = kmpNext[j]; else kmpNext[i] = j; } } void KMP(char *x, int m, char *y, int n) { int i, j, kmpNext[XSIZE]; /* Preprocessing */ preKmp(x, m, kmpNext); /* Searching */ i = j = 0; while (j < n) { while (i > 1 && x[i] != y[j]) i = kmpNext[i]; i++; j++; if (i >= m) { OUTPUT(j  i); i = kmpNext[i]; } } } The example Preprocessing phase The kmpNext table First attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 2 3 4 G C A G A G A G Shift by: 4 (ikmpNext[i]=3 1) Second attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 G C A G A G A G Shift by: 1 (ikmpNext[i]=0 1) Third attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 2 3 4 5 6 7 8 G C A G A G A G Shift by: 7 (ikmpNext[i]=81) Fourth attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 G C A G A G A G Shift by: 1 (ikmpNext[i]=10) Fifth attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 G C A G A G A G Shift by: 1 (ikmpNext[i]=0 1) Sixth attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 G C A G A G A G Shift by: 1 (ikmpNext[i]=0 1) Seventh attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 G C A G A G A G Shift by: 1 (ikmpNext[i]=0 1) Eighth attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 G C A G A G A G Shift by: 1 (ikmpNext[i]=0 1) The KnuthMorrisPratt algorithm performs 18 character comparisons on the example. http://wwwigm.univmlv.fr/~lecroq/string/examples/exp8.html 
Thread Tools  

На правах рекламы:  
