Volume 15 - Issue 3
A Fuzzy Approach to Approximate String Matching for Text Retrieval in NLP
Abstract
Approximate string matching has many applications in Natural Language Processing. This paper provides a comparison of various algorithms for approximate string matching. Most of the algorithms are based on the edit distance between characters in the two strings. It also covers the challenges in using these algorithms for the purpose of text retrieval. The authors propose an alternative approach for approximate string matching which are better suited for text retrieval. In this study we are comparing two strings to identify similarities using a matrix. The matrix will be updated for each overlap character between two strings. An overlap counter is maintained to increment value for each overlap character position and reset position to 0 when no overlap position is encountered. The maximum counter value is then used in a ratio to calculate the degree of similarity. The algorithm implemented using Python language. The results indicate the proposed approach can be used for identifying lexically similar words. This type of approach will find it use in lemmatization, text summarization, topic modelling and data mining solutions.
Paper Details
PaperID: 191044
Author's Name: Krishna Prakash Kalyanathaya, Dr.D. Akila and Dr.G. Suseendren
Volume: Volume 15
Issues: Issue 3
Keywords: Natural Language Processing, Data mining, Semantic Similarity, WordNet.
Year: 2019
Month: May
Pages: 26-32