Abstract: Computer Forensics analysis is defined as the discipline that combines elements of law and computer science which used to analysis the seized computers in Forensics department. Clustering algorithms are typically used for exploratory data analysis, where there is little or no prior knowledge about the data. This is exclusively in a number of applications of Computer Forensics, including the one addressed in our work. In exacting, algorithms for clustering documents can make possible the innovation and functional knowledge from the documents under analysis. To be had an approach that applies document clustering algorithms to forensic analysis of computers seized in police investigations. It can be  moving out with six familiar clustering algorithms (K-means,K-medoids, Single Link, Complete Link, Average Link, and CSPA) applied to five real-world datasets obtained from computers seized in real-world investigations. Automatically labeling document clusters with words which identify their topics is difficult to do well. In order to solve this problem we present two methods of labeling document clusters provoked by the model that words are generated by a hierarchy of mixture components of varying generality. The first method assumes existence of a document hierarchy (manually constructed or resulting from a hierarchical clustering algorithm) and uses a chi squared test of consequence to detect different word usage across categories in the hierarchy. The second method selects words which equally occur frequently in a cluster and effectively differentiate the given cluster from the other clusters.  We compare these methods on abstracts of documents selected from a subset of the hierarchy of the Cora search engine for computer science research papers. Labels produced by our methods showed superior results to the commonly employed methods.


Keywords: Data mining, Forensic Analysis, Clustering, Fuzzy c-means, EM Algorithm