Abstract: Anomaly detection is considered as one among the important domain in data mining. Both supervised and unsupervised learning methods are used in anomaly detection task. In this paper emphasis is given to distance based prediction of anomalies. We studied the traditional methods which involves index-based, nested-loop and cell-based approaches towards anomaly detection. As the size of the datasets become very large the task of detecting anomalies becomes computationally complex. Having the push towards big data mining, it will become more necessary to adopt existing anomaly detection algorithms to various distributed computing platforms. Our paper is based on a survey on the different strategies that can be adopted for anomaly analysis using distributed computing techniques. First we studied the concept of anomaly detection solving set, a subset of the input data set representing a model that can be used to predict anomalies. The solving set is defined using necessary number of points that helps in the detection of the top anomalies by taking into consideration only a subset of all the pair wise distances from the data set. Then we analysed the possibility of using Map Reduce framework for performing anomaly analysis. A MapReduce based solving set algorithm for anomaly detection using Hadoop framework is also proposed.
Keywords: Anomaly, Distributed Computing, Map Reduce, Hadoop.