๐Ÿ“ž +91-7667918914 | โœ‰๏ธ ijarcce@gmail.com
International Journal of Advanced Research in Computer and Communication Engineering
International Journal of Advanced Research in Computer and Communication Engineering A monthly Peer-reviewed & Refereed journal
ISSN Online 2278-1021ISSN Print 2319-5940Since 2012
IJARCCE adheres to the suggestive parameters outlined by the University Grants Commission (UGC) for peer-reviewed journals, upholding high standards of research quality, ethical publishing, and academic excellence.
← Back to VOLUME 3, ISSUE 3, MARCH 2014

Main Content Extraction From Web Page Using Dom

MS. PRANJALI G.GONDSE, PROFESSOR ANJALI B.RAUT

๐Ÿ‘ 41 views๐Ÿ“ฅ 1 download
Share: ๐• f in โœˆ โœ‰
Abstract: Today internet has made the life of human dependent on it. Almost everything and anything can be searched on net. The rapid growth of World Wide Web has been tremendous in recent years. With the large amount of information on the Internet, web pages have been the potential source of information retrieval and data mining technology such as commercial search engines, web mining applications. Internet web pages contain several items that cannot be classified as the informative content, e.g., search and filtering panel, navigation links, advertisements, and so on called as noisy parts. Most clients and end-users search for the informative content, and largely do not seek the non- informative content. A tool that assists an end-user or application to search and process information from Web pages automatically, must separate the โ€œprimary or informative content sectionsโ€ from the other content sections. These sections are known as โ€œWeb page blocksโ€ or just โ€œblocks.โ€ First, a tool must segment the Web pages into Web page blocks and, second, the tool must separate the primary content blocks from the non informative content block .Main focus is on review and evaluation of algorithm , capable of extracting main content from web page. Proposed algorithms outperform several existing algorithms with respect to runtime and/or accuracy. Furthermore, a Web cache system that applies proposed algorithms to remove non informative content blocks and to identify similar blocks across Web pages can achieve significant storage savings will be shown.

Keywords: DOM Tree, information extraction, web mining

How to Cite:

[1] MS. PRANJALI G.GONDSE, PROFESSOR ANJALI B.RAUT, โ€œMain Content Extraction From Web Page Using Dom,โ€ International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE)

Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.