Web Text Mining for news by Classification

Sarika Y. Pabalkar

← Back to VOLUME 1, ISSUE 6, AUGUST 2012

Web Text Mining for news by Classification

Ms. Sarika Y. Pabalkar

Pad Dr. D.Y Patil Institute of Institute of Engineering and Technology, Pimpri , Pune, Maharashtra, India.

👁 46 views📥 1 download

Abstract: In today’s world most information resources on the World Wide Web are published as HTML or XML pages and number of web pages is increasing rapidly with expansion of the web. In order to make better use of web information, technologies that can automatically re-organize and manipulate web pages are pursued such as by web information retrieval, web page classification and other web mining work. Research and application of Web text mining is an important branch in the data mining. Now people mainly use the search engine to look up Web information. The search engine like Google can hardly provide individual service according to different need of different user. However, Web text mining aims to resolve this problem. In Web text mining, the text extraction and the characteristic express of its extraction contents are the foundation of mining work, the text classification is the most important and basic mining method. Thus classification means classify each text of text set to a certain class depending on the definition of classification system. Thus, the challenge becomes not only to find all the subject occurrences, but also to filter out just those that have the desired meaning. Nowadays people usually use the search engine—Google, Yahoo etc. to browse the Web information mainly. But these search engines involve so wide range, whose intelligence level is low. It is very difficult to mine data further. The development of techniques for mining unstructured, semi-structured, and fully structured textual data has become increasingly important in industry.

Keywords: Text Mining, Extraction, Classification, Stemming, Stopword Removal

How to Cite:

[1] Ms. Sarika Y. Pabalkar, “Web Text Mining for news by Classification,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE)

This work is licensed under a Creative Commons Attribution 4.0 International License.