πŸ“ž +91-7667918914 | βœ‰οΈ ijarcce@gmail.com
International Journal of Advanced Research in Computer and Communication Engineering
International Journal of Advanced Research in Computer and Communication Engineering A monthly Peer-reviewed & Refereed journal
ISSN Online 2278-1021ISSN Print 2319-5940Since 2012
IJARCCE adheres to the suggestive parameters outlined by the University Grants Commission (UGC) for peer-reviewed journals, upholding high standards of research quality, ethical publishing, and academic excellence.
← Back to VOLUME 3, ISSUE 10, OCTOBER 2014

Document-Document similarity matrix and Multiple-Kernel Fuzzy C-Means Algorithm-based web document clustering for information retrieval

πŸ‘ 36 viewsπŸ“₯ 2 downloads
Share: 𝕏 f in ✈ βœ‰
Abstract: Due to continuous development of World Wide Web, web database are growing massively where automatic grouping of web documents pose a new challenge for researchers to easily retrieve the information. Literature presents different algorithms for web document clustering useful for information retrieval. In this work, Document-Document similarity matrix and Multiple-Kernel Fuzzy C-Means Algorithm-based web document clustering is developed for information retrieval. At first, web documents are read and initial pre-processing are applied to extract the important words. Then, feature space is constructed using keywords and its frequency. Subsequently, document to document similarity matrix is constructed using the similarity measure, called semantic retrieval measure (SR). The measure considers four different criteria, such as, the probability of occurrence in the document, probability of occurrence in the first document, probability of occurrence in the second document and probability of occurrence in both synonyms set. Based on this measure, D-D matrix is computed to do the final grouping using Multiple-Kernel Fuzzy C-Means Algorithm. The experimentation is done with 100 web documents and the results are evaluated with accuracy and entropy.

Keywords: Information retrieval, Similarity measure, web document clustering, Entropy, Accuracy.

How to Cite:

[1] , β€œDocument-Document similarity matrix and Multiple-Kernel Fuzzy C-Means Algorithm-based web document clustering for information retrieval,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE)

Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.