Abstract: Phishing is the fraudulent activity that is done by the phishers, in order to gain information of users such as their user IDs, passwords and credit card details through online. The users will be victim for this kind of activities, because phishing web pages looks very similar to real ones, so finds difficult to distinguish between the fake website and ones, detecting this kind of webpage is very difficult because for identification it takes several attributes into consideration which user might not knowing those things. The existing phishing detection systems are highly dependent on database and they are very time consuming also. In this proposed system, Hadoop-Map Reduce is used for fast retrieval of URL attributes, which plays a major role in identifying phishing web pages and it is known for its time efficiency and throughput also can gained using this. The PART algorithm is used for classifying and predicting the phished pages, which is more efficient and accurate than the algorithms used in existing systems. The main goal is to provide security to the user’s data while browsing.

Keywords: Phishing, Anti-Phishing, Hadoop, Map Reduce, Information Retrieval, Data Mining.