AKB Forums

Go Back   AKB Forums > Technical sections > Algorithms
Home Register Blogs FAQ Members List Calendar Downloads Arcade Mark Forums Read

Algorithms The source of algorithms for your project

Troubles when posting message? Click here! :: Проблемы с отправлением сообщения? Нажмите сюда!

Reply
 
LinkBack Thread Tools Display Modes
Old Apr 6, 2006, 07:54   #1
Administrator
 
acid's Avatar
 
Join Date: Sep 2001
Location: Yerevan, Armenia
Posts: 6,995
Blog Entries: 14
Rep Power: 10
Reputation: 183
Knuth-Morris-Pratt algorithm

Main features
  • performs the comparisons from left to right;
  • preprocessing phase in O(m) space and time complexity;
  • searching phase in O(n+m) time complexity (independent from the alphabet size);
  • delay bounded by log(m) where is the golden ratio ( ).
Description


The design of the Knuth-Morris-Pratt algorithm follows a tight analysis of the Morris and Pratt algorithm. Let us look more closely at the Morris-Pratt algorithm. It is possible to improve the length of the shifts.
Consider an attempt at a left position j, that is when the the window is positioned on the text factor y[j .. j+m-1]. Assume that the first mismatch occurs between x[i] and y[i+j] with 0 < i < m. Then, x[0 .. i-1] = y[j .. i+j-1] =u and a = x[i] y[i+j]=b.
When shifting, it is reasonable to expect that a prefix v of the pattern matches some suffix of the portion u of the text. Moreover, if we want to avoid another immediate mismatch, the character following the prefix v in the pattern must be different from a. The longest such prefix v is called the tagged border of u (it occurs at both ends of u followed by different characters in x).
This introduces the notation: let kmpNext[i] be the length of the longest border of x[0 .. i-1] followed by a character c different from x[i] and -1 if no such tagged border exits, for 0 < i m. Then, after a shift, the comparisons can resume between characters x[kmpNext[i]] and y[i+j] without missing any occurrence of x in y, and avoiding a backtrack on the text (see figure 7.1). The value of kmpNext[0] is set to -1.

Figure 7.1: Shift in the Knuth-Morris-Pratt algorithm (v border of u and c b).
The table kmpNext can be computed in O(m) space and time before the searching phase, applying the same searching algorithm to the pattern itself, as if x=y.
The searching phase can be performed in O(m+n) time. The Knuth-Morris-Pratt algorithm performs at most 2n-1 text character comparisons during the searching phase. The delay (maximal number of comparisons for a single text character) is bounded by log(m) where is the golden ratio ( ).



The C code
void preKmp(char *x, int m, int kmpNext[]) {
int i, j;

i = 0;
j = kmpNext[0] = -1;
while (i < m) {
while (j > -1 && x[i] != x[j])
j = kmpNext[j];
i++;
j++;
if (x[i] == x[j])
kmpNext[i] = kmpNext[j];
else
kmpNext[i] = j;
}
}


void KMP(char *x, int m, char *y, int n) {
int i, j, kmpNext[XSIZE];

/* Preprocessing */
preKmp(x, m, kmpNext);

/* Searching */
i = j = 0;
while (j < n) {
while (i > -1 && x[i] != y[j])
i = kmpNext[i];
i++;
j++;
if (i >= m) {
OUTPUT(j - i);
i = kmpNext[i];
}
}
}


The example


Preprocessing phase

The kmpNext table

First attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 2 3 4 G C A G A G A G Shift by: 4 (i-kmpNext[i]=3- -1)
Second attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 G C A G A G A G Shift by: 1 (i-kmpNext[i]=0- -1)
Third attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 2 3 4 5 6 7 8 G C A G A G A G Shift by: 7 (i-kmpNext[i]=8-1)
Fourth attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 G C A G A G A G Shift by: 1 (i-kmpNext[i]=1-0)
Fifth attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 G C A G A G A G Shift by: 1 (i-kmpNext[i]=0- -1)
Sixth attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 G C A G A G A G Shift by: 1 (i-kmpNext[i]=0- -1)
Seventh attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 G C A G A G A G Shift by: 1 (i-kmpNext[i]=0- -1)
Eighth attempt G C A T C G C A G A G A G T A T A C A G T A C G 1 G C A G A G A G Shift by: 1 (i-kmpNext[i]=0- -1)
The Knuth-Morris-Pratt algorithm performs 18 character comparisons on the example.


http://www-igm.univ-mlv.fr/~lecroq/string/examples/exp8.html
__________________
Chat with acid


acid is offline   Reply With Quote Quote selected
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
plss help...booth's multiplication algorithm ayaksah Algorithms 1 Sep 10, 2005 11:22
Wired Equivalent Privacy (WEP) algorithm (part of the 802.11) greka Software Security 0 May 29, 2004 14:06


All times are GMT. The time now is 12:19.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
This board was founded on September 29, 2001
Powered by Viper Internet

Affordable Web Hosting | ParevNet

Buy text link