Thursday, May 24, 2018

Levenshtein Distance Algorithm (R code)

Levenshtein Distance represents the minimum number of letter changes required to convert a word to another word.

The below R language code gives the Levenshtein Distance (letter changes required) to convert "Saturday" to "Sunday".



str1 <- c("Saturday")
str2 <- c("Sunday")
str1knt <- nchar(str1)
str2knt <- nchar(str2)
str11<-substring(str1,1:str1knt,1:str1knt)
m2<-matrix(c(" ","",str11),nrow=1,ncol=str1knt+2,byrow=FALSE)
sq<-seq(0,str1knt)
m2<-rbind(m2,c(" ",sq))

i=1
k=str2knt
while(k>0){ 
m2=rbind(m2,c(substring(str2,i,i),i,replicate(str1knt,0)))
k=k-1
i=i+1
}

k=3
while(k<=str2knt+2){
i=3
                while(i<=str1knt+2){
                apex<-as.numeric(m2[k-1,i-1])
                horiz<-as.numeric(m2[k-1,i])
                verti<-as.numeric(m2[k,i-1])
                s2char<-m2[k,1]
                s1char<-m2[1,i]

                if(s2char==s1char){ m2[k,i] <- min(apex,horiz,verti)} else { m2[k,i] <- min(apex,horiz,verti)+1}
                i=i+1
                }
k=k+1
}

LevenshteinDistance <- as.numeric(m2[k-1,i-1])
LevenshteinDistance

No comments:

Post a Comment